Crandore Hub

targeted

Targeted Inference

Various methods for targeted and semiparametric inference including augmented inverse probability weighted (AIPW) estimators for missing data and causal inference (Bang and Robins (2005) <doi:10.1111/j.1541-0420.2005.00377.x>), variable importance and conditional average treatment effects (CATE) (van der Laan (2006) <doi:10.2202/1557-4679.1008>), estimators for risk differences and relative risks (Richardson et al. (2017) <doi:10.1080/01621459.2016.1192546>), assumption lean inference for generalized linear model parameters (Vansteelandt et al. (2022) <doi:10.1111/rssb.12504>).

README

---
output:
  github_document:
  toc: true
  toc_depth: 2
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r  include=FALSE }
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  cache = FALSE,
  out.width = "70%"
)
```

<!-- badges: start -->
[![r-cmd-check](https://github.com/kkholst/targeted/actions/workflows/r_check.yaml/badge.svg?branch=main)](https://github.com/kkholst/targeted/actions/workflows/r_check.yaml)
[![codecov](https://codecov.io/github/kkholst/targeted/branch/main/graph/badge.svg)](https://app.codecov.io/gh/kkholst/targeted/tree/main)
[![license](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/license/apache-2-0)
[![cran](https://www.r-pkg.org/badges/version-last-release/targeted)](https://cranlogs.r-pkg.org/downloads/total/last-month/targeted)
<!-- badges: end -->


# Targeted Learning in R (`targeted`) <a href="https://kkholst.github.io/targeted/"><img src="man/figures/logo.png" align="right" height="250" style="float:right; height:250px;"  alt="targeted website" /></a>

Various methods for targeted learning and semiparametric inference including
augmented inverse probability weighted (AIPW) estimators for missing data and
causal inference (Bang and Robins (2005)
<doi:10.1111/j.1541-0420.2005.00377.x>), variable importance and conditional
average treatment effects (CATE) (van der Laan (2006)
<doi:10.2202/1557-4679.1008>), estimators for risk differences and relative
risks (Richardson et al. (2017) <doi:10.1080/01621459.2016.1192546>), assumption
lean inference for generalized linear model parameters (Vansteelandt et al.
(2022) <doi:10.1111/rssb.12504>).


# Installation

You can install the released version of targeted from [CRAN](<https://CRAN.R-project.org>) with:

```{r  install, eval=FALSE }
install.packages("targeted")
```

And the development version from [GitHub](<https://github.com/kkholst/targeted>) with:

```{r  eval=FALSE }
remotes::install_github("kkholst/targeted", ref="dev")
```

Computations such as cross-validation are parallelized via the `{future}`
package. To enable parallel computations and progress-bars the following code
can be executed
```{r future, eval=FALSE}
future::plan("multisession")
progressr::handlers(global=TRUE)
```

# Examples

To illustrate some of the functionality of the `targeted` package we simulate
some data from the following model
$$Y = \exp\{-(W_1 - 1)^2 - (W_2 - 1)^2)\} - 2\exp\{-(W_1+1)^2 - (W_2+1)^2\}A + \epsilon$$
with independent measurement error $\epsilon\sim\mathcal{N}(0,1)$, treatment variable $A \sim Bernoulli(\text{expit}\{-1+W_1\})$ and
independent covariates $W_1, W_2\sim\mathcal{N}(0,1/2)$.
```{r simdata}
library("targeted")

simdata <- function(n, ...) {
   w1 <- rnorm(n) # covariates
   w2 <- rnorm(n) # ...
   a <- rbinom(n, 1, plogis(-1 + w1)) # treatment indicator
   y <-  exp(- (w1 - 1)**2 - (w2 - 1)**2) - # continuous response
     2 * exp(- (w1 + 1)**2 - (w2 + 1)**2) * a + # additional effect in treated
     rnorm(n, sd=0.5**.5)
   data.frame(y, a, w1, w2)
}
set.seed(1)
d <- simdata(5e3)

head(d)
```

```{r simplot}
wnew <- seq(-3,3, length.out=200)
dnew <- expand.grid(w1 = wnew, w2 = wnew, a = 1)
y <- with(dnew,
          exp(- (w1 - 1)**2 - (w2 - 1)**2) -
          2 * exp(- (w1 + 1)**2 - (w2 + 1)**2)*a
          )
image(wnew, wnew, matrix(y, ncol=length(wnew)),
      col=viridisLite::viridis(64),
      main=expression(paste("E(Y|",W[1],",",W[2],")")),
      xlab=expression(W[1]), ylab=expression(W[2]))
```



## Nuisance (prediction) models

Methods for targeted and semiparametric inference rely on fitting nuisance
models to observed data when estimating the target parameter of interest. The
`{targeted}` package implements the [R6 reference class](https://r6.r-lib.org/)
`learner` to harmonize common statistical and machine learning models for the
usage as nuisance models across the various implemented estimators, such as the
`targeted:cate` function. Commonly used models are constructed as `learner`
class objects through the `learner_*` functions.


As an example, we can specify a linear regression model with an interaction term
between treatment and the two covariates $W_1$ and $W_2$
```{r learner1}
lr <- learner_glm(y ~ (w1 + w2)*a, family = gaussian)
lr
```

To fit the model to the data we use the `estimate` method
```{r lrest}
lr$estimate(d)
lr$fit
```

Predictions, $E(Y\mid W_1, W_2)$, can be performed with the `predict` method
```{r lrpred}
head(d) |> lr$predict()
```

```{r lrplot}
pr <- matrix(lr$predict(dnew), ncol=length(wnew))
image(wnew, wnew, pr, col=viridisLite::viridis(64),
      main=expression(paste("E(Y|",W[1],",",W[2],")")),
      xlab=expression(W[1]), ylab=expression(W[2]))
```


Similarly, a Random Forest can be specified with
```{r lrrf}
lr_rf <- learner_grf(y ~ w1 + w2 + a, num.trees = 500)
```

Lists of models can also be constructed for different hyper-parameters with the
`learner_expand_grid` function.

### Cross-validation

To assess the model generalization error we can perform $k$-fold cross-validation with the `cv` method
```{r lrcv, cache=TRUE}
mod <- list(glm = lr, rf = lr_rf)
cv(mod, data = d, rep = 2, nfolds = 5) |> summary()
```


### Ensembles (Super-Learner)

An ensemble learner (super-learner) can easily be constructed from lists of `learner` objects
```{r lrsl, cache=TRUE}
sl <- learner_sl(mod, nfolds = 10)
sl$estimate(d)
sl
```


```{r lrsl_plot, cache=TRUE}
pr <- matrix(sl$predict(dnew), ncol=length(wnew))
image(wnew, wnew, pr, col=viridisLite::viridis(64),
      main=expression(paste("E(Y|",W[1],",",W[2],")")),
      xlab=expression(W[1]), ylab=expression(W[2]))
```


## Average Treatment Effects

In the following we are interested in estimating the target parameter $\psi_a(P)
= E_P[Y(a)]$, where $Y(a)$ is the *potential outcome* we would have observed if
treatment $a$ had been administered, possibly contrary to the actual treatment
that was observed, i.e., $Y = Y(A)$.
To assess the treatment effect we can then the consider the *average treatment
effect* (ATE)
$$E_P[Y(1)]-E_P[Y(0)],$$
or some other contrast of interest $g(\psi_1(P), \psi_0(P))$.
Under the following assumptions

1) Stable Unit Treatment Values Assumption (the treatment of a specific subject is not affecting the potential outcome of other subjects)
2) Positivity, $P(A\mid W)>\epsilon$ for some $\epsilon>0$ and baseline covariates $W$
3) No unmeasured confounders, $Y(a)\perp \!\!\! \perp A|W$

then the target parameter can be identified from the observed data distribution as
$$E(E[Y|W,A=a]) = E(E[Y(a)|W]) = E[Y(a)]$$
or
$$E[Y I(A=a)/P(A=a|W)] = E[Y(a)].$$

This suggests estimators based on outcome regression ($g$-computation) or
inverse probability weighting. More generally, under the above assumption we can
constructor a *one-step* estimator from the *Efficient Influence Function*
combining these two $$ E\left[\frac{I(A=a)}{\Pi_a(W)}(Y-Q(W,A)) + Q(W,a)\right].$$\
In practice, this requires plugin estimates of both the outcome model,
$Q(W,A) := E(Y\mid A, W)$, and of the treatment propensity model
$\Pi_a(W) := P(A=a\mid W)$. The corresponding estimator is consistent even if just one of the two nuisance models is correctly specified.

First we specify the propensity model
```{r propensity}
prmod <- learner_glm(a ~ w1 + w2, family=binomial)
```

We will reuse one of the outcome models from the previous section, and use the
`cate` function to estimate the treatment effect
```{r ate, cache=TRUE}
a <- cate(response.model = lr_rf, propensity.model = prmod, data = d, nfolds = 5)
a
```
In the output we get estimates of both the mean potential outcomes and the
difference of those, the average treatment effect, given as the term `(Intercept)`.

```{r}
summary(a)
```

Here we use the `nfolds=5` argument to use 5-fold *cross-fitting* to guarantee
that the estimates converges weakly to a Gaussian distribution even though
that the estimated influence function based on plugin estimates from the Random
Forest does not necessarily lie in a $P$-Donsker class.

# Project organization

We use the `dev` branch for development and the `main` branch for stable
releases. All releases follow [semantic versioning](https://semver.org/), are
[tagged](https://github.com/kkholst/targeted/tags) and notable
changes are reported in
[NEWS.md](https://github.com/kkholst/targeted/blob/main/NEWS.md).

# I Have a Question / I Want to Report a Bug

If you want to ask questions, require help or clarification, or report a
bug, we recommend to either contact a maintainer directly or the
following:

- Open an [Issue](https://github.com/kkholst/targeted/issues).
- Provide as much context as you can about what you’re running into.
- Provide project and platform versions, depending on what seems
  relevant.

We will then take care of the issue as soon as possible.


# Contributing to `targeted`

All types of contributions are encouraged and valued. See the
[CONTRIBUTING.md](https://github.com/kkholst/targeted/blob/main/CONTRIBUTING.md)
for details about how to contribute code to this project.

Versions across snapshots

VersionRepositoryFileSize
0.7.1 2026-04-09 windows/windows R-4.5 targeted_0.7.1.zip 2.4 MiB

Dependencies (latest)

Imports

LinkingTo

Suggests