targeted
Targeted Inference
Various methods for targeted and semiparametric inference including augmented inverse probability weighted (AIPW) estimators for missing data and causal inference (Bang and Robins (2005) <doi:10.1111/j.1541-0420.2005.00377.x>), variable importance and conditional average treatment effects (CATE) (van der Laan (2006) <doi:10.2202/1557-4679.1008>), estimators for risk differences and relative risks (Richardson et al. (2017) <doi:10.1080/01621459.2016.1192546>), assumption lean inference for generalized linear model parameters (Vansteelandt et al. (2022) <doi:10.1111/rssb.12504>).
README
---
output:
github_document:
toc: true
toc_depth: 2
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r include=FALSE }
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
cache = FALSE,
out.width = "70%"
)
```
<!-- badges: start -->
[](https://github.com/kkholst/targeted/actions/workflows/r_check.yaml)
[](https://app.codecov.io/gh/kkholst/targeted/tree/main)
[](https://opensource.org/license/apache-2-0)
[](https://cranlogs.r-pkg.org/downloads/total/last-month/targeted)
<!-- badges: end -->
# Targeted Learning in R (`targeted`) <a href="https://kkholst.github.io/targeted/"><img src="man/figures/logo.png" align="right" height="250" style="float:right; height:250px;" alt="targeted website" /></a>
Various methods for targeted learning and semiparametric inference including
augmented inverse probability weighted (AIPW) estimators for missing data and
causal inference (Bang and Robins (2005)
<doi:10.1111/j.1541-0420.2005.00377.x>), variable importance and conditional
average treatment effects (CATE) (van der Laan (2006)
<doi:10.2202/1557-4679.1008>), estimators for risk differences and relative
risks (Richardson et al. (2017) <doi:10.1080/01621459.2016.1192546>), assumption
lean inference for generalized linear model parameters (Vansteelandt et al.
(2022) <doi:10.1111/rssb.12504>).
# Installation
You can install the released version of targeted from [CRAN](<https://CRAN.R-project.org>) with:
```{r install, eval=FALSE }
install.packages("targeted")
```
And the development version from [GitHub](<https://github.com/kkholst/targeted>) with:
```{r eval=FALSE }
remotes::install_github("kkholst/targeted", ref="dev")
```
Computations such as cross-validation are parallelized via the `{future}`
package. To enable parallel computations and progress-bars the following code
can be executed
```{r future, eval=FALSE}
future::plan("multisession")
progressr::handlers(global=TRUE)
```
# Examples
To illustrate some of the functionality of the `targeted` package we simulate
some data from the following model
$$Y = \exp\{-(W_1 - 1)^2 - (W_2 - 1)^2)\} - 2\exp\{-(W_1+1)^2 - (W_2+1)^2\}A + \epsilon$$
with independent measurement error $\epsilon\sim\mathcal{N}(0,1)$, treatment variable $A \sim Bernoulli(\text{expit}\{-1+W_1\})$ and
independent covariates $W_1, W_2\sim\mathcal{N}(0,1/2)$.
```{r simdata}
library("targeted")
simdata <- function(n, ...) {
w1 <- rnorm(n) # covariates
w2 <- rnorm(n) # ...
a <- rbinom(n, 1, plogis(-1 + w1)) # treatment indicator
y <- exp(- (w1 - 1)**2 - (w2 - 1)**2) - # continuous response
2 * exp(- (w1 + 1)**2 - (w2 + 1)**2) * a + # additional effect in treated
rnorm(n, sd=0.5**.5)
data.frame(y, a, w1, w2)
}
set.seed(1)
d <- simdata(5e3)
head(d)
```
```{r simplot}
wnew <- seq(-3,3, length.out=200)
dnew <- expand.grid(w1 = wnew, w2 = wnew, a = 1)
y <- with(dnew,
exp(- (w1 - 1)**2 - (w2 - 1)**2) -
2 * exp(- (w1 + 1)**2 - (w2 + 1)**2)*a
)
image(wnew, wnew, matrix(y, ncol=length(wnew)),
col=viridisLite::viridis(64),
main=expression(paste("E(Y|",W[1],",",W[2],")")),
xlab=expression(W[1]), ylab=expression(W[2]))
```
## Nuisance (prediction) models
Methods for targeted and semiparametric inference rely on fitting nuisance
models to observed data when estimating the target parameter of interest. The
`{targeted}` package implements the [R6 reference class](https://r6.r-lib.org/)
`learner` to harmonize common statistical and machine learning models for the
usage as nuisance models across the various implemented estimators, such as the
`targeted:cate` function. Commonly used models are constructed as `learner`
class objects through the `learner_*` functions.
As an example, we can specify a linear regression model with an interaction term
between treatment and the two covariates $W_1$ and $W_2$
```{r learner1}
lr <- learner_glm(y ~ (w1 + w2)*a, family = gaussian)
lr
```
To fit the model to the data we use the `estimate` method
```{r lrest}
lr$estimate(d)
lr$fit
```
Predictions, $E(Y\mid W_1, W_2)$, can be performed with the `predict` method
```{r lrpred}
head(d) |> lr$predict()
```
```{r lrplot}
pr <- matrix(lr$predict(dnew), ncol=length(wnew))
image(wnew, wnew, pr, col=viridisLite::viridis(64),
main=expression(paste("E(Y|",W[1],",",W[2],")")),
xlab=expression(W[1]), ylab=expression(W[2]))
```
Similarly, a Random Forest can be specified with
```{r lrrf}
lr_rf <- learner_grf(y ~ w1 + w2 + a, num.trees = 500)
```
Lists of models can also be constructed for different hyper-parameters with the
`learner_expand_grid` function.
### Cross-validation
To assess the model generalization error we can perform $k$-fold cross-validation with the `cv` method
```{r lrcv, cache=TRUE}
mod <- list(glm = lr, rf = lr_rf)
cv(mod, data = d, rep = 2, nfolds = 5) |> summary()
```
### Ensembles (Super-Learner)
An ensemble learner (super-learner) can easily be constructed from lists of `learner` objects
```{r lrsl, cache=TRUE}
sl <- learner_sl(mod, nfolds = 10)
sl$estimate(d)
sl
```
```{r lrsl_plot, cache=TRUE}
pr <- matrix(sl$predict(dnew), ncol=length(wnew))
image(wnew, wnew, pr, col=viridisLite::viridis(64),
main=expression(paste("E(Y|",W[1],",",W[2],")")),
xlab=expression(W[1]), ylab=expression(W[2]))
```
## Average Treatment Effects
In the following we are interested in estimating the target parameter $\psi_a(P)
= E_P[Y(a)]$, where $Y(a)$ is the *potential outcome* we would have observed if
treatment $a$ had been administered, possibly contrary to the actual treatment
that was observed, i.e., $Y = Y(A)$.
To assess the treatment effect we can then the consider the *average treatment
effect* (ATE)
$$E_P[Y(1)]-E_P[Y(0)],$$
or some other contrast of interest $g(\psi_1(P), \psi_0(P))$.
Under the following assumptions
1) Stable Unit Treatment Values Assumption (the treatment of a specific subject is not affecting the potential outcome of other subjects)
2) Positivity, $P(A\mid W)>\epsilon$ for some $\epsilon>0$ and baseline covariates $W$
3) No unmeasured confounders, $Y(a)\perp \!\!\! \perp A|W$
then the target parameter can be identified from the observed data distribution as
$$E(E[Y|W,A=a]) = E(E[Y(a)|W]) = E[Y(a)]$$
or
$$E[Y I(A=a)/P(A=a|W)] = E[Y(a)].$$
This suggests estimators based on outcome regression ($g$-computation) or
inverse probability weighting. More generally, under the above assumption we can
constructor a *one-step* estimator from the *Efficient Influence Function*
combining these two $$ E\left[\frac{I(A=a)}{\Pi_a(W)}(Y-Q(W,A)) + Q(W,a)\right].$$\
In practice, this requires plugin estimates of both the outcome model,
$Q(W,A) := E(Y\mid A, W)$, and of the treatment propensity model
$\Pi_a(W) := P(A=a\mid W)$. The corresponding estimator is consistent even if just one of the two nuisance models is correctly specified.
First we specify the propensity model
```{r propensity}
prmod <- learner_glm(a ~ w1 + w2, family=binomial)
```
We will reuse one of the outcome models from the previous section, and use the
`cate` function to estimate the treatment effect
```{r ate, cache=TRUE}
a <- cate(response.model = lr_rf, propensity.model = prmod, data = d, nfolds = 5)
a
```
In the output we get estimates of both the mean potential outcomes and the
difference of those, the average treatment effect, given as the term `(Intercept)`.
```{r}
summary(a)
```
Here we use the `nfolds=5` argument to use 5-fold *cross-fitting* to guarantee
that the estimates converges weakly to a Gaussian distribution even though
that the estimated influence function based on plugin estimates from the Random
Forest does not necessarily lie in a $P$-Donsker class.
# Project organization
We use the `dev` branch for development and the `main` branch for stable
releases. All releases follow [semantic versioning](https://semver.org/), are
[tagged](https://github.com/kkholst/targeted/tags) and notable
changes are reported in
[NEWS.md](https://github.com/kkholst/targeted/blob/main/NEWS.md).
# I Have a Question / I Want to Report a Bug
If you want to ask questions, require help or clarification, or report a
bug, we recommend to either contact a maintainer directly or the
following:
- Open an [Issue](https://github.com/kkholst/targeted/issues).
- Provide as much context as you can about what you’re running into.
- Provide project and platform versions, depending on what seems
relevant.
We will then take care of the issue as soon as possible.
# Contributing to `targeted`
All types of contributions are encouraged and valued. See the
[CONTRIBUTING.md](https://github.com/kkholst/targeted/blob/main/CONTRIBUTING.md)
for details about how to contribute code to this project.
Versions across snapshots
| Version | Repository | File | Size |
|---|---|---|---|
0.7.1 |
2026-04-09 windows/windows R-4.5 | targeted_0.7.1.zip |
2.4 MiB |