Creates a learner specification that can be passed to propensity_model
or outcome_model arguments in cf_mse(), cf_auc(), cf_calibration(),
and their transportability variants. When an ml_learner specification is
provided, cross-fitting is automatically used for valid inference.
Usage
ml_learner(
method = c("ranger", "xgboost", "grf", "glmnet", "superlearner", "custom"),
...,
fit_fun = NULL,
predict_fun = NULL
)Arguments
- method
Character string specifying the learner type:
"ranger": Random forest viaranger::ranger()"xgboost": Gradient boosting viaxgboost::xgboost()"grf": Generalized random forest viagrf::regression_forest()orgrf::probability_forest()"glmnet": Regularized regression viaglmnet::cv.glmnet()"superlearner": Ensemble viaSuperLearner::SuperLearner()"custom": User-supplied fit/predict functions
- ...
Additional arguments passed to the fitting function.
- fit_fun
For
method = "custom", a function with signaturefunction(formula, data, family, ...)that returns a fitted model object.- predict_fun
For
method = "custom", a function with signaturefunction(object, newdata, ...)that returns predicted probabilities.
Details
Supported Learners
ranger: Fast random forest implementation. Key arguments:
num.trees: Number of trees (default: 500)mtry: Number of variables to sample at each splitmin.node.size: Minimum node size
xgboost: Gradient boosting. Key arguments:
nrounds: Number of boosting rounds (default: 100)max_depth: Maximum tree depth (default: 6)eta: Learning rate (default: 0.3)
grf: Generalized random forests with built-in honesty. Key arguments:
num.trees: Number of trees (default: 2000)honesty: Whether to use honest estimation (default: TRUE)
glmnet: Elastic net regularization with cross-validation. Key arguments:
alpha: Elastic net mixing parameter (0 = ridge, 1 = lasso, default: 1)nfolds: Number of CV folds for lambda selection (default: 10)
superlearner: Ensemble of multiple learners. Key arguments:
SL.library: Vector of learner names (default:c("SL.glm", "SL.ranger"))
Examples
if (FALSE) { # \dontrun{
# Random forest for propensity score
cf_mse(
Y = outcome, A = treatment, predictions = preds, data = df,
propensity_formula = treatment ~ x1 + x2,
propensity_model = ml_learner("ranger", num.trees = 500),
cross_fit = TRUE
)
# XGBoost with custom parameters
cf_mse(
Y = outcome, A = treatment, predictions = preds, data = df,
propensity_formula = treatment ~ x1 + x2,
propensity_model = ml_learner("xgboost", nrounds = 200, max_depth = 4),
cross_fit = TRUE
)
# Custom learner
my_fit <- function(formula, data, family, ...) {
glm(formula, data = data, family = binomial())
}
my_predict <- function(object, newdata, ...) {
predict(object, newdata = newdata, type = "response")
}
cf_mse(
...,
propensity_model = ml_learner("custom", fit_fun = my_fit,
predict_fun = my_predict)
)
} # }
