Overview
cfperformance provides methods for estimating model performance measures (MSE, AUC, calibration) under hypothetical/counterfactual interventions. These methods are essential when:
- A prediction model will be deployed in settings where treatment policies differ from the training setting
- Predictions are meant to support decisions about treatment initiation
- You want to assess model performance after transporting from a source (e.g., RCT) to a target population
Based on Boyer, Dahabreh & Steingrimsson (2025). “Estimating and evaluating counterfactual prediction models.” Statistics in Medicine, 44(23-24), e70287. doi:10.1002/sim.70287
Installation
# Install development version from GitHub
# install.packages("devtools")
devtools::install_github("boyercb/cfperformance")Quick Start
library(cfperformance)
# Load example data
data(cvd_sim)
# Estimate counterfactual MSE under no treatment
result <- cf_mse(
predictions = cvd_sim$risk_score,
outcomes = cvd_sim$event,
treatment = cvd_sim$treatment,
covariates = cvd_sim[, c("age", "bp", "chol")],
treatment_level = 0, # Evaluate under no treatment
estimator = "dr" # Doubly robust estimator
)
resultKey Features
- MSE/Brier Score: Loss-based performance under counterfactual intervention
-
AUC: Discrimination ability under counterfactual intervention
- Calibration: Reliability of risk predictions under counterfactual intervention
- Multiple Estimators: Naive, Conditional Loss/Outcome Model, IPW, and Doubly Robust
- Inference: Bootstrap and influence function-based standard errors
-
Cross-validation: Counterfactual-aware model selection with
cf_cv()andcf_compare() - Transportability: Evaluate model performance when transporting from a source population (e.g., RCT) to a target population
Counterfactual Performance Estimation
Estimate how a prediction model would perform under a hypothetical treatment policy (e.g., if everyone received or avoided treatment):
library(cfperformance)
data(cvd_sim)
# Compare estimators for counterfactual MSE
estimators <- c("naive", "cl", "ipw", "dr")
sapply(estimators, function(est) {
cf_mse(
predictions = cvd_sim$risk_score,
outcomes = cvd_sim$event,
treatment = cvd_sim$treatment,
covariates = cvd_sim[, c("age", "bp", "chol")],
treatment_level = 0,
estimator = est
)$estimate
})
# Estimate counterfactual AUC
cf_auc(
predictions = cvd_sim$risk_score,
outcomes = cvd_sim$event,
treatment = cvd_sim$treatment,
covariates = cvd_sim[, c("age", "bp", "chol")],
treatment_level = 0,
estimator = "dr"
)
# Cross-validation for model selection
cf_compare(
models = list(
"Simple" = event ~ age,
"Full" = event ~ age + bp + chol
),
data = cvd_sim,
treatment = "treatment",
treatment_level = 0,
metric = "mse",
K = 5
)Transportability Analysis
The package also implements transportability estimators from Steingrimsson et al. (2022) and Voter et al. (2025) for evaluating prediction model performance when transporting from a source population (typically an RCT) to a target population:
# Load transportability example data
data(transport_sim)
# Estimate MSE in target population
tr_mse(
predictions = transport_sim$risk_score,
outcomes = transport_sim$event,
treatment = transport_sim$treatment,
source = transport_sim$source, # 1=source/RCT, 0=target
covariates = transport_sim[, c("age", "biomarker", "smoking")],
treatment_level = 0,
analysis = "transport",
estimator = "dr"
)
# Estimate AUC in the target population
tr_auc(
predictions = transport_sim$risk_score,
outcomes = transport_sim$event,
treatment = transport_sim$treatment,
source = transport_sim$source,
covariates = transport_sim[, c("age", "biomarker", "smoking")],
treatment_level = 0,
analysis = "transport",
estimator = "dr"
)
# Estimate calibration in the target population
tr_calibration(
predictions = transport_sim$risk_score,
outcomes = transport_sim$event,
treatment = transport_sim$treatment,
source = transport_sim$source,
covariates = transport_sim[, c("age", "biomarker", "smoking")],
treatment_level = 0,
analysis = "transport",
estimator = "ipw"
)See vignette("transportability", package = "cfperformance") for details.
Machine Learning Integration
The package supports flexible ML methods for nuisance model estimation with automatic cross-fitting for valid inference:
# Use random forest for propensity scores and outcome models
cf_mse(
predictions = cvd_sim$risk_score,
outcomes = cvd_sim$event,
treatment = cvd_sim$treatment,
covariates = cvd_sim[, c("age", "bp", "chol")],
treatment_level = 0,
estimator = "dr",
propensity_model = ml_learner("ranger", num.trees = 500),
outcome_model = ml_learner("xgboost", nrounds = 100),
cross_fit = TRUE
)Supported learners: - ranger - Fast random forests - xgboost - Gradient boosting - grf - Generalized random forests (honest estimation) - glmnet - Elastic net with cross-validated λ - superlearner - Ensemble learning - custom - User-supplied fit/predict functions
See vignette("ml-integration", package = "cfperformance") for details.
Documentation
See vignette("introduction", package = "cfperformance") for a comprehensive introduction.
Citation
If you use this package in your research, please cite:
Boyer CB, Dahabreh IJ, Steingrimsson JA. Estimating and evaluating counterfactual prediction models. Statistics in Medicine. 2025; 44(23-24):e70287. doi: 10.1002/sim.70287
For transportability methods, also cite:
Steingrimsson JA, Gatsonis C, Li B, Dahabreh IJ. Transporting a Prediction Model for Use in a New Target Population. American Journal of Epidemiology. 2022; 192(2):296-304. doi: 10.1093/aje/kwac128
Voter SR, et al. Counterfactual prediction from machine learning models: transportability and joint analysis for model development and evaluation using multi-source data. Diagnostic and Prognostic Research. 2025; 9(4). doi: 10.1186/s41512-025-00201-y
@article{boyer2025estimating,
title={Estimating and Evaluating Counterfactual Prediction Models},
author={Boyer, Christopher B. and Dahabreh, Issa J. and Steingrimsson, Jon A.},
journal={Statistics in Medicine},
volume={44},
number={23-24},
pages={e70287},
year={2025},
doi={10.1002/sim.70287}
}
@article{10.1093/aje/kwac128,
title = {Transporting a Prediction Model for Use in a New Target Population},
author = {Steingrimsson, Jon A. and Gatsonis, Constantine and Li, Bing and Dahabreh, Issa J.},
journal = {American Journal of Epidemiology},
volume = {192},
number = {2},
pages = {296-304},
year = {2022},
doi = {10.1093/aje/kwac128}
}
@article{voter2025transportability,
title={Counterfactual prediction from machine learning models: transportability and joint analysis for model development and evaluation using multi-source data},
author = {Voter, Sarah C. and Dahabreh, Issa J. and Boyer, Christopher B. and Rahbar, Habib and Kontos, Despina and Steingrimsson, Jon A.},
journal={Diagnostic and Prognostic Research},
volume={9},
number={4},
year={2025},
doi={10.1186/s41512-025-00201-y}
}