Skip to contents

Estimates the calibration curve of a prediction model under a hypothetical intervention where treatment is set to a specific level.

Usage

cf_calibration(
  predictions,
  outcomes,
  treatment,
  covariates,
  treatment_level = 0,
  estimator = c("dr", "ipw", "om"),
  propensity_model = NULL,
  outcome_model = NULL,
  smoother = c("loess", "binned"),
  n_bins = 10,
  span = 0.75,
  se_method = c("none", "bootstrap"),
  n_boot = 200,
  conf_level = 0.95,
  ps_trim = NULL,
  ...
)

Arguments

predictions

Numeric vector of model predictions.

outcomes

Numeric vector of observed outcomes.

treatment

Numeric vector of treatment indicators (0/1).

covariates

A matrix or data frame of baseline covariates (confounders).

treatment_level

The counterfactual treatment level (default: 0).

estimator

Character string specifying the estimator:

  • "naive": Naive estimator (biased)

  • "cl": Conditional loss estimator

  • "ipw": Inverse probability weighting estimator

  • "dr": Doubly robust estimator (default)

propensity_model

Optional fitted propensity score model. If NULL, a logistic regression model is fit using the covariates.

outcome_model

Optional fitted outcome model. If NULL, a regression model is fit using the covariates among treated/untreated. For binary outcomes, this should be a model for E[Y|X,A] (binomial family). For continuous outcomes, this should be a model for E[L|X,A] (gaussian family).

smoother

Smoothing method for the calibration curve:

  • "loess": Local polynomial regression (default)

  • "binned": Binned calibration

n_bins

Number of bins for binned calibration (default: 10).

span

Span parameter for LOESS smoothing (default: 0.75).

se_method

Method for standard error estimation:

  • "none": No standard errors (default, fastest)

  • "bootstrap": Bootstrap standard errors

n_boot

Number of bootstrap replications (default: 200).

conf_level

Confidence level for intervals (default: 0.95).

ps_trim

Propensity score trimming specification. Controls how extreme propensity scores are handled. Can be:

  • NULL (default): Uses absolute bounds c(0.01, 0.99)

  • "none": No trimming applied

  • "quantile": Quantile-based trimming with default c(0.01, 0.99)

  • "absolute": Explicit absolute bounds with default c(0.01, 0.99)

  • A numeric vector of length 2: c(lower, upper) absolute bounds

  • A single numeric: Symmetric bounds c(x, 1-x)

  • A list with method ("absolute"/"quantile"/"none") and bounds

...

Additional arguments passed to internal functions.

Value

An object of class c("cf_calibration", "cf_performance") containing:

predicted

Vector of predicted probabilities

observed

Vector of smoothed observed probabilities

smoother

Smoothing method used

ici

Integrated calibration index

e50

Median absolute calibration error

e90

90th percentile absolute calibration error

emax

Maximum absolute calibration error

se

List of standard errors (if se_method = "bootstrap")

ci_lower

List of lower CI bounds (if se_method = "bootstrap")

ci_upper

List of upper CI bounds (if se_method = "bootstrap")

boot_curves

Bootstrap calibration curves for CI bands (if se_method = "bootstrap")

Details

The counterfactual calibration curve estimates the relationship between predicted risk and observed risk under the counterfactual intervention.

The function implements three estimators:

IPW Estimator: Weights observations by the inverse probability of receiving the counterfactual treatment. Requires a correctly specified propensity score model.

Outcome Model (OM) Estimator: Uses the fitted outcome model \(E[Y | X, A=a]\) to estimate calibration over all observations. Requires a correctly specified outcome model.

Doubly Robust (DR) Estimator: Combines OM and IPW approaches. Consistent if either the propensity or outcome model is correctly specified.

References

Boyer, C. B., Dahabreh, I. J., & Steingrimsson, J. A. (2025). "Estimating and evaluating counterfactual prediction models." Statistics in Medicine, 44(23-24), e70287. doi:10.1002/sim.70287

Steingrimsson, J. A., Gatsonis, C., Li, B., & Dahabreh, I. J. (2023). "Transporting a prediction model for use in a new target population." American Journal of Epidemiology, 192(2), 296-304.

Examples

# Generate example data
set.seed(123)
n <- 500
x <- rnorm(n)
a <- rbinom(n, 1, plogis(-0.5 + 0.5 * x))
y <- rbinom(n, 1, plogis(-1 + x - 0.5 * a))
pred <- plogis(-1 + 0.8 * x)

# Estimate counterfactual calibration curve with different estimators
result_ipw <- cf_calibration(
  predictions = pred,
  outcomes = y,
  treatment = a,
  covariates = data.frame(x = x),
  treatment_level = 0,
  estimator = "ipw"
)

result_dr <- cf_calibration(
  predictions = pred,
  outcomes = y,
  treatment = a,
  covariates = data.frame(x = x),
  treatment_level = 0,
  estimator = "dr"
)
print(result_dr)
#> 
#> Counterfactual CALIBRATION Estimation
#> ---------------------------------------- 
#> Estimator: dr 
#> Treatment level: 0 
#> N observations: 500 
#> 
#> Calibration Metrics:
#>   ICI (Integrated Calibration Index): 0.0444 
#>   E50 (Median absolute error): 0.0184 
#>   E90 (90th percentile error): 0.0996 
#>   Emax (Maximum error): 0.3784 
#> 
# plot(result_dr)  # If ggplot2 is available

# With bootstrap confidence bands
# result_boot <- cf_calibration(
#   predictions = pred, outcomes = y, treatment = a,
#   covariates = data.frame(x = x), treatment_level = 0,
#   se_method = "bootstrap", n_boot = 200
# )
# plot(result_boot)  # Shows confidence bands