Skip to contents

Creates a learner specification that can be passed to propensity_model or outcome_model arguments in cf_mse(), cf_auc(), cf_calibration(), and their transportability variants. When an ml_learner specification is provided, cross-fitting is automatically used for valid inference.

Usage

ml_learner(
  method = c("ranger", "xgboost", "grf", "glmnet", "superlearner", "custom"),
  ...,
  fit_fun = NULL,
  predict_fun = NULL
)

Arguments

method

Character string specifying the learner type:

...

Additional arguments passed to the fitting function.

fit_fun

For method = "custom", a function with signature function(formula, data, family, ...) that returns a fitted model object.

predict_fun

For method = "custom", a function with signature function(object, newdata, ...) that returns predicted probabilities.

Value

An object of class ml_learner containing the learner specification.

Details

Supported Learners

ranger: Fast random forest implementation. Key arguments:

  • num.trees: Number of trees (default: 500)

  • mtry: Number of variables to sample at each split

  • min.node.size: Minimum node size

xgboost: Gradient boosting. Key arguments:

  • nrounds: Number of boosting rounds (default: 100)

  • max_depth: Maximum tree depth (default: 6)

  • eta: Learning rate (default: 0.3)

grf: Generalized random forests with built-in honesty. Key arguments:

  • num.trees: Number of trees (default: 2000)

  • honesty: Whether to use honest estimation (default: TRUE)

glmnet: Elastic net regularization with cross-validation. Key arguments:

  • alpha: Elastic net mixing parameter (0 = ridge, 1 = lasso, default: 1)

  • nfolds: Number of CV folds for lambda selection (default: 10)

superlearner: Ensemble of multiple learners. Key arguments:

  • SL.library: Vector of learner names (default: c("SL.glm", "SL.ranger"))

Examples

if (FALSE) { # \dontrun{
# Random forest for propensity score
cf_mse(
  Y = outcome, A = treatment, predictions = preds, data = df,
  propensity_formula = treatment ~ x1 + x2,
  propensity_model = ml_learner("ranger", num.trees = 500),
  cross_fit = TRUE
)

# XGBoost with custom parameters
cf_mse(
  Y = outcome, A = treatment, predictions = preds, data = df,
  propensity_formula = treatment ~ x1 + x2,
  propensity_model = ml_learner("xgboost", nrounds = 200, max_depth = 4),
  cross_fit = TRUE
)

# Custom learner
my_fit <- function(formula, data, family, ...) {
  glm(formula, data = data, family = binomial())
}
my_predict <- function(object, newdata, ...) {
  predict(object, newdata = newdata, type = "response")
}
cf_mse(
  ...,
  propensity_model = ml_learner("custom", fit_fun = my_fit,
                                 predict_fun = my_predict)
)
} # }