Estimates the calibration curve of a prediction model under a hypothetical intervention where treatment is set to a specific level.
Usage
cf_calibration(
predictions,
outcomes,
treatment,
covariates,
treatment_level = 0,
estimator = c("dr", "ipw", "om"),
propensity_model = NULL,
outcome_model = NULL,
smoother = c("loess", "binned"),
n_bins = 10,
span = 0.75,
se_method = c("none", "bootstrap"),
n_boot = 200,
conf_level = 0.95,
ps_trim = NULL,
...
)Arguments
- predictions
Numeric vector of model predictions.
- outcomes
Numeric vector of observed outcomes.
- treatment
Numeric vector of treatment indicators (0/1).
- covariates
A matrix or data frame of baseline covariates (confounders).
- treatment_level
The counterfactual treatment level (default: 0).
- estimator
Character string specifying the estimator:
"naive": Naive estimator (biased)"cl": Conditional loss estimator"ipw": Inverse probability weighting estimator"dr": Doubly robust estimator (default)
- propensity_model
Optional fitted propensity score model. If NULL, a logistic regression model is fit using the covariates.
- outcome_model
Optional fitted outcome model. If NULL, a regression model is fit using the covariates among treated/untreated. For binary outcomes, this should be a model for E[Y|X,A] (binomial family). For continuous outcomes, this should be a model for E[L|X,A] (gaussian family).
- smoother
Smoothing method for the calibration curve:
"loess": Local polynomial regression (default)"binned": Binned calibration
- n_bins
Number of bins for binned calibration (default: 10).
- span
Span parameter for LOESS smoothing (default: 0.75).
- se_method
Method for standard error estimation:
"none": No standard errors (default, fastest)"bootstrap": Bootstrap standard errors
- n_boot
Number of bootstrap replications (default: 200).
- conf_level
Confidence level for intervals (default: 0.95).
- ps_trim
Propensity score trimming specification. Controls how extreme propensity scores are handled. Can be:
NULL(default): Uses absolute boundsc(0.01, 0.99)"none": No trimming applied"quantile": Quantile-based trimming with defaultc(0.01, 0.99)"absolute": Explicit absolute bounds with defaultc(0.01, 0.99)A numeric vector of length 2:
c(lower, upper)absolute boundsA single numeric: Symmetric bounds
c(x, 1-x)A list with
method("absolute"/"quantile"/"none") andbounds
- ...
Additional arguments passed to internal functions.
Value
An object of class c("cf_calibration", "cf_performance") containing:
- predicted
Vector of predicted probabilities
- observed
Vector of smoothed observed probabilities
- smoother
Smoothing method used
- ici
Integrated calibration index
- e50
Median absolute calibration error
- e90
90th percentile absolute calibration error
- emax
Maximum absolute calibration error
- se
List of standard errors (if se_method = "bootstrap")
- ci_lower
List of lower CI bounds (if se_method = "bootstrap")
- ci_upper
List of upper CI bounds (if se_method = "bootstrap")
- boot_curves
Bootstrap calibration curves for CI bands (if se_method = "bootstrap")
Details
The counterfactual calibration curve estimates the relationship between predicted risk and observed risk under the counterfactual intervention.
The function implements three estimators:
IPW Estimator: Weights observations by the inverse probability of receiving the counterfactual treatment. Requires a correctly specified propensity score model.
Outcome Model (OM) Estimator: Uses the fitted outcome model \(E[Y | X, A=a]\) to estimate calibration over all observations. Requires a correctly specified outcome model.
Doubly Robust (DR) Estimator: Combines OM and IPW approaches. Consistent if either the propensity or outcome model is correctly specified.
References
Boyer, C. B., Dahabreh, I. J., & Steingrimsson, J. A. (2025). "Estimating and evaluating counterfactual prediction models." Statistics in Medicine, 44(23-24), e70287. doi:10.1002/sim.70287
Steingrimsson, J. A., Gatsonis, C., Li, B., & Dahabreh, I. J. (2023). "Transporting a prediction model for use in a new target population." American Journal of Epidemiology, 192(2), 296-304.
Examples
# Generate example data
set.seed(123)
n <- 500
x <- rnorm(n)
a <- rbinom(n, 1, plogis(-0.5 + 0.5 * x))
y <- rbinom(n, 1, plogis(-1 + x - 0.5 * a))
pred <- plogis(-1 + 0.8 * x)
# Estimate counterfactual calibration curve with different estimators
result_ipw <- cf_calibration(
predictions = pred,
outcomes = y,
treatment = a,
covariates = data.frame(x = x),
treatment_level = 0,
estimator = "ipw"
)
result_dr <- cf_calibration(
predictions = pred,
outcomes = y,
treatment = a,
covariates = data.frame(x = x),
treatment_level = 0,
estimator = "dr"
)
print(result_dr)
#>
#> Counterfactual CALIBRATION Estimation
#> ----------------------------------------
#> Estimator: dr
#> Treatment level: 0
#> N observations: 500
#>
#> Calibration Metrics:
#> ICI (Integrated Calibration Index): 0.0444
#> E50 (Median absolute error): 0.0184
#> E90 (90th percentile error): 0.0996
#> Emax (Maximum error): 0.3784
#>
# plot(result_dr) # If ggplot2 is available
# With bootstrap confidence bands
# result_boot <- cf_calibration(
# predictions = pred, outcomes = y, treatment = a,
# covariates = data.frame(x = x), treatment_level = 0,
# se_method = "bootstrap", n_boot = 200
# )
# plot(result_boot) # Shows confidence bands
