Causal Mediation Analysis

cmest is used to implement six causal mediation analysis approaches including the regression-based approach by Valeri et al. (2013) and VanderWeele et al. (2014), the weighting-based approach by VanderWeele et al. (2014), the inverse odd-ratio weighting approach by Tchetgen Tchetgen (2013), the natural effect model by Vansteelandt et al. (2012), the marginal structural model by VanderWeele et al. (2017), and the g-formula approach by Robins (1986).

cmest(
  data = NULL,
  model = "rb",
  full = TRUE,
  casecontrol = FALSE,
  yrare = NULL,
  yprevalence = NULL,
  estimation = "imputation",
  inference = "bootstrap",
  outcome = NULL,
  event = NULL,
  exposure = NULL,
  mediator = NULL,
  EMint = NULL,
  basec = NULL,
  postc = NULL,
  yreg = NULL,
  mreg = NULL,
  wmnomreg = NULL,
  wmdenomreg = NULL,
  ereg = NULL,
  postcreg = NULL,
  astar = 0,
  a = 1,
  mval = NULL,
  yval = NULL,
  basecval = NULL,
  nboot = 200,
  boot.ci.type = "per",
  nRep = 5,
  multimp = FALSE,
  args_mice = NULL
)

# S3 method for cmest
print(x, ...)

# S3 method for cmest
summary(object, ...)

# S3 method for summary.cmest
print(x, digits = 4, ...)

Arguments

data	dataset
model	causal mediation analysis approach. `rb`, `wb`, `iorw`, `ne`, `msm` and `gformula` are implemented. Default is `rb`. See `Details`.
full	a logical value. If `TRUE`, output a full list of causal effects; if `FALSE`, output a reduced list of causal effects. Default is `TRUE`.
casecontrol	a logical value. `TRUE` indicates a case control study in which the first level of the outcome is treated as the control and the second level of the outcome is treated as the case. Default is `FALSE`.
yrare	a logical value (used when `casecontrol` is `TRUE`). `TRUE` indicates the case is rare.
yprevalence	the prevalence of the case (used when `casecontrol` is `TRUE`).
estimation	method for estimating causal effects. `paramfunc` and `imputation` are implemented (the first 4 letters are enough). Default is `imputation`. See `Details`.
inference	method for estimating standard errors of causal effects. `delta` and `bootstrap` are implemented (the first 4 letters are enough). Default is `bootstrap`. See `Details`.
outcome	variable name of the outcome.
event	variable name of the event (used when `yreg` is `coxph`, `aft_exp`, or `aft_weibull`).
exposure	variable name of the exposure.
mediator	a vector of variable name(s) of the mediator(s).
EMint	a logical value indicating the existence of exposure-mediator interaction in `yreg` (used when `model` is not `iorw`). `TRUE` when there is exposure-mediator interaction in `yreg`. If `TRUE` and character `yreg` is provided, the outcome regression includes exposure-mediator interaction(s) between the exposure and each of the mediator(s).
basec	a vector of variable name(s) of the exposure-outcome confounder(s), exposure-mediator confounder(s) and mediator-outcome confounder(s) not affected by the exposure
postc	a vector of variable name(s) of the mediator-outcome confounder(s) affected by the exposure following the temporal order
yreg	outcome regression model. See `Details`.
mreg	a list specifying a regression model for each variable in `mediator` (used when `model` is `rb`, `msm` or `gformula`). The order of regression models must follow the order of variables in `mediator`. See `Details`.
wmnomreg	a list specifying a regression model for calculating the nominators of weights with respect to each variable in `mediator` (used when `model` is `msm`). The order of regression models must follow the order of variables in `mediator`. See `Details`.
wmdenomreg	a list specifying a regression model for calculating the denominators of weights with respect to each variable in `mediator` (used when `model` is `msm`). The order of regression models must follow the order of variables in `mediator`. See `Details`.
ereg	exposure regression model for calculating weights with respect to the exposure (used when `model` is `wb` or `msm` with a non-empty `basec` or when `model` is `iorw`). See `Details`.
postcreg	a list specifying a regression model for each variable in `postc` (used when `model` is `gformula`). The order of regression models must follow the order of variables in `postc`. See `Details`.
astar	the control value for the exposure. Default is `0`.
a	the active value for the exposure. Default is `1`.
mval	a list specifying a value for each variable in `mediator` at which the variable is controlled (used when `model` is `rb`, `wb`, `ne`, `msm` or `gformula`).
yval	the value of the outcome at which causal effects on the risk/odds ratio scale are estimated (used when the outcome is categorical).
basecval	a list specifying a conditional value for each variable in `basec` conditional on which causal effects are estimated (used when `estimation` is `paramfunc`). The order of conditional values must follow the order of variables in `basec`. If `NULL`, mean values of variable(s) in `basec` are used.
nboot	the number of boots applied (used when `inference` is `bootstrap`). Default is 200.
boot.ci.type	the type of bootstrap confidence interval. If `per`, percentile bootstrap confidence intervals are estimated; if `bca`, bias-corrected and accelerated (BCa) bootstrap confidence intervals are estimated. Default is `per`.
nRep	number of replications or hypothetical values of the exposure to sample for each observation unit (used when `model` is `ne`). Default is `5`.
multimp	a logical value (used when `data` contains missing values). If `TRUE`, conduct multiple imputations using the mice function. Default is `FALSE`.
args_mice	a list of additional arguments passed to the mice function. See mice for details.
x	an object of class `cmest`
object	an object of class `cmest`
digits	minimal number of significant digits. See print.default.

Value

An object of class cmest is returned:

call

the function call,

data

the dataset,

methods

a list of methods used which may include model, full, casecontrol, yprevalence, yrare, estimation, inference, nboot, boot.ci.type and nRep,

variables

a list of variables used which may include outcome, event, exposure, mediator, EMint, basec and postc,

reg.input

a list of regressions input,

multimp

a list of arguments used for multiple imputation,

ref

reference values used which may include a, astar, mval, yval and basecval,

reg.output

a list of regressions output. If multimp is TRUE, reg.output contains regressions fitted by each of the imputed dataset,

effect.pe

point estimates of causal effects,

effect.se

standard errors of causal effects,

effect.ci.low

the lower limits of 95% confidence intervals of causal effects,

effect.ci.high

the higher limits of 95% confidence intervals of causal effects,

effect.pval

p-values of causal effects,

...

Details

Regressions

Each regression in yreg, mreg, wmnomreg, wmdenomreg, ereg and postcreg can be specified by a user-defined regression object or the character name of the regression.

The Character Name of A Regression

linear: linear regression fitted by glm with family = gaussian()
logistic: logistic regression fitted by glm with family = logit()
loglinear: log linear regression fitted by glm with family = poisson() for a binary response
poisson: poisson regression fitted by glm with family = poisson() for a count response
quasipoisson: quasipoisson regression fitted by glm with family = quasipoisson()
negbin: negative binomial regression fitted by glm.nb
multinomial: multinomial regression fitted by multinom
ordinal: ordered logistic regression fitted by polr
coxph: cox proportional hazard model fitted by coxph
aft_exp: accelerated failure time model fitted by survreg with dist = "exponential"
aft_weibull: accelerated failure time model fitted by survreg with dist = "weibull"

coxph, aft_exp and aft_weibull are currently not implemented for mreg, wmnomreg, wmdenomreg, ereg and postcreg.

The User-defined Regression Object

A user-defined regression object can be fitted by lm, glm, glm.nb, gam, multinom, polr, coxph and survreg. Objects fitted by coxph and survreg are currently not supported for mreg, wmnomreg, wmdenomreg, ereg and postcreg.

The cmest function calculates weights for regressions when weighting is required. If a user-defined regression object is fitted with prior weights, the final weights for this regression object are constructed by multiplying the prior weights and the weights calculated inside the cmest function.

Causal Mediation Analysis Approaches

Let Y denote outcome, A denote exposure, M=(M_1,...,M_k)^T denote mediator, C denote basec, L=(L_1,...,L_s)^T denote postc.

rb: the regression-based approach by Valeri et al. (2013) and VanderWeele et al. (2014). yreg and mreg are required. If specified as a user-defined regression object, yreg should regress Y on A, M and C and mreg[p] should regress M_p on A and C for p=1,...,k.
wb: the weighting-based approach by VanderWeele et al. (2014). yreg is required. When basec is not empty, ereg is also required and A must be categorical. If specified as a user-defined regression object, yreg should regress Y on A, M and C and ereg should regress A on C.
iorw: the inverse odd-ratio weighting approach by Tchetgen Tchetgen (2013). yreg and ereg are required and A must be categorical. If specified as a user-defined regression object, yreg should regress Y on A and C and ereg should regress A on M and C.
ne: the natural effect model by Vansteelandt et al. (2012). yreg is required. If specified as a user-defined regression object, yreg should regress Y on A, M and C. The variables in the formula of yreg must follow the order of A, M and C, i.e., the first variable must point to the exposure, the variable(s) right after the exposure must point to the mediator(s), e.g., Y ~ A + M_1 + M_2 + A*M_1 + C.
msm: the marginal structural model by VanderWeele et al. (2017). yreg, mreg, wmnomreg and wmdenomreg are required and all mediators must be categorical. When basec is not empty, ereg is also required and A must be categorical. If specified as a user-defined regression object, yreg should regress Y on A and M; mreg[p] should regress M_p on A for p=1,...,k; wmnomreg[p] should regress M_p on A, M_1, ..., M_{p-1} for p=1,...,k; wmdenomreg[p] should regress M_p on A, M_1, ..., M_{p-1}, C and L for p=1,...,k; and ereg should regress A on C.
gformula: the g-formula approach by Robins (1986). yreg, mreg are required. postcreg is also required when postc is not empty. If specified as a user-defined regression object, yreg should regress Y on A, M, C and L, mreg[p] should regress M_p on A, C and L for p=1,...,k, postcreg[q] should regress L_q on A and C for q=1,...,s.

When postc is not empty, only msm and gformula can be used.

When there are mediatior-mediator interactions in yreg, only wb, iorw, ne and msm can be used.

Estimation Methods

paramfunc: closed-form parameter function estimation (only available when model = "rb" and length(mediator) = 1). The point estimate of each causal effect is obtained by a closed-form formula of regression coefficients. Effects conditional on basecval are estimated.
imputation: direct counterfactual imputation estimation. The point estimate of each causal effect is obtained by imputing counterfactuals directly.

To use paramfunc, yreg and mreg must be specified by the character name of the regression. yreg can be chosen from linear, logistic, loglinear, poisson, quasipoisson, negbin, coxph, aft_exp and aft_weibull. mreg can be chosen from linear, logistic and multinomial.

To use paramfunc with yreg = "logistic" or yreg = "coxph", the outcome must be rare.

Inference Methods

delta: delta method (only available when estimation = "paramfunc"). The standard errors of causal effects are obtained by the delta method. The confidence intervals of causal effects are obtained by normal distribution approximation.
bootstrap: bootstrapping. The standard errors of causal effects are obtained by the standard deviations of bootstrapped results. The confidence intervals of causal effects are obtained by percentiles of bootstrapped results.

Estimated Causal Effects

For a continuous outcome, the causal effects on the difference scale are estimated. For a categorical, count or survival outcome, the causal effects on the ratio scale are estimated. The interpretation of the ratio depends on the type of the outcome and it can be risk ratio for a categorical outcome, rate ratio for a count outcome, hazard ratio for a survival outcome fitted by coxph, mean survival ratio for a survival outcome fitted by survreg, etc.

Continuous Outcome

When model = "rb", "wb", "ne", "msm" or "gformula" with an empty postc and EMint is TRUE, cde (controlled direct effect), pnde (pure natural direct effect), tnde (total natural direct effect), pnie (pure natural indirect effect), tnie (total natural indirect effect), te (total effect), intref (reference interaction), intmed (mediated interaction), cde(prop) (proportion cde), intref(prop) (proportion intref), intmed(prop) (proportion intmed), pnie(prop) (proportion pnie), pm (proportion mediated), int (proportion attributable to interaction) and pe (proportion eliminated) are estimated.

When model = "rb", "wb", "ne", "msm" or "gformula" with an empty postc and EMint is FALSE, cde, pnde, tnde, pnie, tnie, te and pm are estimated.

When postc is not empty, pnde, tnde, pnie, tnie, intref, intmed, intref(prop), intmed(prop), pnie(prop), pm, int and pe are replaced by their randomized analogues rpnde, rtnde, rpnie, rtnie, rintref, rintmed, rintref(prop), rintmed(prop), rpnie(prop), rpm, rint and rpe.

When model = "iorw", te, pnde, tnie and pm are estimated.

Categorical, Count or Survival Outcome

When model = "rb", "wb", "ne", "msm" or "gformula" with an empty postc and EMint is TRUE, Rcde (cde ratio), Rpnde (pnde ratio), Rtnde (tnde ratio), Rpnie (Rpnie ratio), Rtnie (tnie ratio), Rte (te ratio), ERcde (excess ratio due to cde), ERintref (excess ratio due to intref), ERintmed (excess ratio due to intmed), ERpnie (excess ratio due to pnie), ERcde(prop) (proportion ERcde), ERintref(prop) (proportion ERintref), ERintmed(prop) (proportion ERintmed), ERpnie(prop) (proportion ERpnie), pm, int and pe are estimated.

When model = "rb", "wb", "ne", "msm" or "gformula" with an empty postc and EMint is FALSE, Rcde, Rpnde, Rtnde, Rpnie, tnie, te and pm are estimated.

When model = "msm" or "gformula" with a non-empty postc, Rpnde, Rtnde, Rpnie, Rtnie, ERintref, ERintmed, ERpnie, ERintref(prop), ERintmed(prop), ERpnie(prop), pm, int and pe are replaced by their randomized analogues rRpnde, rRtnde, rRpnie, rRtnie, rERintref, rERintmed, rERpnie, rERintref(prop), rERintmed(prop), rERpnie(prop), rpm, rint and rpe.

When model = "iorw", Rte, Rpnde, Rtnie and pm are estimated.

Methods (by generic)

print(cmest): Print the results of cmest nicely
summary(cmest): Summarize the results of cmest nicely

Functions

print(summary.cmest): Print the summary of cmest nicely

References

Valeri L, VanderWeele TJ (2013). Mediation analysis allowing for exposure-mediator interactions and causal interpretation: theoretical assumptions and implementation with SAS and SPSS macros. Psychological Methods. 18(2): 137 - 150.

VanderWeele TJ, Vansteelandt S (2014). Mediation analysis with multiple mediators. Epidemiologic Methods. 2(1): 95 - 115.

Tchetgen Tchetgen EJ (2013). Inverse odds ratio-weighted estimation for causal mediation analysis. Statistics in medicine. 32: 4567 - 4580.

Nguyen QC, Osypuk TL, Schmidt NM, Glymour MM, Tchetgen Tchetgen EJ (2015). Practical guidance for conducting mediation analysis with multiple mediators using inverse odds ratio weighting. American Journal of Epidemiology. 181(5): 349 - 356.

VanderWeele TJ, Tchetgen Tchetgen EJ (2017). Mediation analysis with time varying exposures and mediators. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 79(3): 917 - 938.

Robins JM (1986). A new approach to causal inference in mortality studies with a sustained exposure period-Application to control of the healthy worker survivor effect. Mathematical Modelling. 7: 1393 - 1512.

Vansteelandt S, Bekaert M, Lange T (2012). Imputation Strategies for the Estimation of Natural Direct and Indirect Effects. Epidemiologic Methods. 1(1): 131 - 158.

Steen J, Loeys T, Moerkerke B, Vansteelandt S (2017). Medflex: an R package for flexible mediation analysis using natural effect models. Journal of Statistical Software. 76(11).

VanderWeele TJ (2014). A unification of mediation and interaction: a 4-way decomposition. Epidemiology. 25(5): 749 - 61.

Imai K, Keele L, Tingley D (2010). A general approach to causal mediation analysis. Psychological Methods. 15(4): 309 - 334.

Schomaker M, Heumann C (2018). Bootstrap inference when using multiple imputation. Statistics in Medicine. 37(14): 2252 - 2266.

Efron B (1987). Better Bootstrap Confidence Intervals. Journal of the American Statistical Association. 82(397): 171-185.

Examples


if (FALSE) {
library(CMAverse)

# single-mediator case with rb, no exposure-mediator interaction
exp1 <- cmest(data = cma2020, model = "rb", outcome = "contY", 
exposure = "A", mediator = "M2", basec = c("C1", "C2"), 
EMint = FALSE, mreg = list("multinomial"), yreg = "linear", 
astar = 0, a = 1, mval = list("M2_0"), estimation = "paramfunc", 
inference = "delta")
summary(exp1)

# single-mediator case with rb
exp2 <- cmest(data = cma2020, model = "rb", outcome = "contY", 
exposure = "A", mediator = "M2", basec = c("C1", "C2"), 
EMint = TRUE, mreg = list("multinomial"), yreg = "linear", 
astar = 0, a = 1, mval = list("M2_0"), estimation = "paramfunc", 
inference = "delta")
summary(exp2)

# multiple-mediator case with rb
# 10 boots are used for illustration
exp3 <- cmest(data = cma2020, model = "rb", outcome = "contY", 
exposure = "A", mediator = c("M1", "M2"), basec = c("C1", "C2"), 
EMint = TRUE, mreg = list("logistic", "multinomial"), 
yreg = "linear", astar = 0, a = 1, mval = list(0, "M2_0"), 
estimation = "imputation", inference = "bootstrap", nboot = 10,
boot.ci.type = "bca")

# multiple-mediator case with ne
exp4 <- cmest(data = cma2020, model = "ne", outcome = "contY", EMint = TRUE,
exposure = "A", mediator = c("M1", "M2"), basec = c("C1", "C2"), 
yreg = glm(contY ~ A + M1 + M2 + A*M1 + A*M2 + C1 + C2, family = gaussian, data = cma2020), 
astar = 0, a = 1, mval = list(0, "M2_0"), estimation = "imputation", 
inference = "bootstrap", nboot = 10)

# case control study with msm
exp5 <- cmest(data = cma2020, model = "msm", casecontrol = TRUE, 
yrare = TRUE, outcome = "binY", exposure = "A", 
mediator = c("M1", "M2"), EMint = TRUE, basec = c("C1", "C2"), yreg = "logistic", 
ereg = "logistic", mreg = list(glm(M1 ~ A, family = binomial, 
data = cma2020), nnet::multinom(M2 ~ A, data = cma2020, trace = FALSE)), 
wmnomreg = list(glm(M1 ~ A, family = binomial, data = cma2020), 
nnet::multinom(M2 ~ A + M1, data = cma2020, trace = FALSE)),
wmdenomreg = list(glm(M1 ~ A + C1 + C2, family = binomial, data = cma2020), 
nnet::multinom(M2 ~ A + M1 + C1 + C2, data = cma2020, trace = FALSE)), astar = 0, a = 1, 
mval = list(0, "M2_0"), estimation = "imputation", 
inference = "bootstrap", nboot = 10)
}