cmest is used to implement six causal mediation analysis approaches including the regression-based approach by Valeri et al. (2013) and VanderWeele et al. (2014), the weighting-based approach by VanderWeele et al. (2014), the inverse odd-ratio weighting approach by Tchetgen Tchetgen (2013), the natural effect model by Vansteelandt et al. (2012), the marginal structural model by VanderWeele et al. (2017), and the g-formula approach by Robins (1986).

cmest(
  data = NULL,
  model = "rb",
  full = TRUE,
  casecontrol = FALSE,
  yrare = NULL,
  yprevalence = NULL,
  estimation = "imputation",
  inference = "bootstrap",
  outcome = NULL,
  event = NULL,
  exposure = NULL,
  mediator = NULL,
  EMint = NULL,
  basec = NULL,
  postc = NULL,
  yreg = NULL,
  mreg = NULL,
  wmnomreg = NULL,
  wmdenomreg = NULL,
  ereg = NULL,
  postcreg = NULL,
  astar = 0,
  a = 1,
  mval = NULL,
  yval = NULL,
  basecval = NULL,
  nboot = 200,
  boot.ci.type = "per",
  nRep = 5,
  multimp = FALSE,
  args_mice = NULL
)

# S3 method for cmest
print(x, ...)

# S3 method for cmest
summary(object, ...)

# S3 method for summary.cmest
print(x, digits = 4, ...)

Arguments

data

dataset

model

causal mediation analysis approach. rb, wb, iorw, ne, msm and gformula are implemented. Default is rb. See Details.

full

a logical value. If TRUE, output a full list of causal effects; if FALSE, output a reduced list of causal effects. Default is TRUE.

casecontrol

a logical value. TRUE indicates a case control study in which the first level of the outcome is treated as the control and the second level of the outcome is treated as the case. Default is FALSE.

yrare

a logical value (used when casecontrol is TRUE). TRUE indicates the case is rare.

yprevalence

the prevalence of the case (used when casecontrol is TRUE).

estimation

method for estimating causal effects. paramfunc and imputation are implemented (the first 4 letters are enough). Default is imputation. See Details.

inference

method for estimating standard errors of causal effects. delta and bootstrap are implemented (the first 4 letters are enough). Default is bootstrap. See Details.

outcome

variable name of the outcome.

event

variable name of the event (used when yreg is coxph, aft_exp, or aft_weibull).

exposure

variable name of the exposure.

mediator

a vector of variable name(s) of the mediator(s).

EMint

a logical value indicating the existence of exposure-mediator interaction in yreg (used when model is not iorw). TRUE when there is exposure-mediator interaction in yreg. If TRUE and character yreg is provided, the outcome regression includes exposure-mediator interaction(s) between the exposure and each of the mediator(s).

basec

a vector of variable name(s) of the exposure-outcome confounder(s), exposure-mediator confounder(s) and mediator-outcome confounder(s) not affected by the exposure

postc

a vector of variable name(s) of the mediator-outcome confounder(s) affected by the exposure following the temporal order

yreg

outcome regression model. See Details.

mreg

a list specifying a regression model for each variable in mediator (used when model is rb, msm or gformula). The order of regression models must follow the order of variables in mediator. See Details.

wmnomreg

a list specifying a regression model for calculating the nominators of weights with respect to each variable in mediator (used when model is msm). The order of regression models must follow the order of variables in mediator. See Details.

wmdenomreg

a list specifying a regression model for calculating the denominators of weights with respect to each variable in mediator (used when model is msm). The order of regression models must follow the order of variables in mediator. See Details.

ereg

exposure regression model for calculating weights with respect to the exposure (used when model is wb or msm with a non-empty basec or when model is iorw). See Details.

postcreg

a list specifying a regression model for each variable in postc (used when model is gformula). The order of regression models must follow the order of variables in postc. See Details.

astar

the control value for the exposure. Default is 0.

a

the active value for the exposure. Default is 1.

mval

a list specifying a value for each variable in mediator at which the variable is controlled (used when model is rb, wb, ne, msm or gformula).

yval

the value of the outcome at which causal effects on the risk/odds ratio scale are estimated (used when the outcome is categorical).

basecval

a list specifying a conditional value for each variable in basec conditional on which causal effects are estimated (used when estimation is paramfunc). The order of conditional values must follow the order of variables in basec. If NULL, mean values of variable(s) in basec are used.

nboot

the number of boots applied (used when inference is bootstrap). Default is 200.

boot.ci.type

the type of bootstrap confidence interval. If per, percentile bootstrap confidence intervals are estimated; if bca, bias-corrected and accelerated (BCa) bootstrap confidence intervals are estimated. Default is per.

nRep

number of replications or hypothetical values of the exposure to sample for each observation unit (used when model is ne). Default is 5.

multimp

a logical value (used when data contains missing values). If TRUE, conduct multiple imputations using the mice function. Default is FALSE.

args_mice

a list of additional arguments passed to the mice function. See mice for details.

x

an object of class cmest

object

an object of class cmest

digits

minimal number of significant digits. See print.default.

Value

An object of class cmest is returned:

call

the function call,

data

the dataset,

methods

a list of methods used which may include model, full, casecontrol, yprevalence, yrare, estimation, inference, nboot, boot.ci.type and nRep,

variables

a list of variables used which may include outcome, event, exposure, mediator, EMint, basec and postc,

reg.input

a list of regressions input,

multimp

a list of arguments used for multiple imputation,

ref

reference values used which may include a, astar, mval, yval and basecval,

reg.output

a list of regressions output. If multimp is TRUE, reg.output contains regressions fitted by each of the imputed dataset,

effect.pe

point estimates of causal effects,

effect.se

standard errors of causal effects,

effect.ci.low

the lower limits of 95% confidence intervals of causal effects,

effect.ci.high

the higher limits of 95% confidence intervals of causal effects,

effect.pval

p-values of causal effects,

...

Details

Regressions

Each regression in yreg, mreg, wmnomreg, wmdenomreg, ereg and postcreg can be specified by a user-defined regression object or the character name of the regression.

The Character Name of A Regression

  • linear: linear regression fitted by glm with family = gaussian()

  • logistic: logistic regression fitted by glm with family = logit()

  • loglinear: log linear regression fitted by glm with family = poisson() for a binary response

  • poisson: poisson regression fitted by glm with family = poisson() for a count response

  • quasipoisson: quasipoisson regression fitted by glm with family = quasipoisson()

  • negbin: negative binomial regression fitted by glm.nb

  • multinomial: multinomial regression fitted by multinom

  • ordinal: ordered logistic regression fitted by polr

  • coxph: cox proportional hazard model fitted by coxph

  • aft_exp: accelerated failure time model fitted by survreg with dist = "exponential"

  • aft_weibull: accelerated failure time model fitted by survreg with dist = "weibull"

coxph, aft_exp and aft_weibull are currently not implemented for mreg, wmnomreg, wmdenomreg, ereg and postcreg.

The User-defined Regression Object

A user-defined regression object can be fitted by lm, glm, glm.nb, gam, multinom, polr, coxph and survreg. Objects fitted by coxph and survreg are currently not supported for mreg, wmnomreg, wmdenomreg, ereg and postcreg.

The cmest function calculates weights for regressions when weighting is required. If a user-defined regression object is fitted with prior weights, the final weights for this regression object are constructed by multiplying the prior weights and the weights calculated inside the cmest function.

Causal Mediation Analysis Approaches

Let Y denote outcome, A denote exposure, M=(M_1,...,M_k)^T denote mediator, C denote basec, L=(L_1,...,L_s)^T denote postc.

  • rb: the regression-based approach by Valeri et al. (2013) and VanderWeele et al. (2014). yreg and mreg are required. If specified as a user-defined regression object, yreg should regress Y on A, M and C and mreg[p] should regress M_p on A and C for p=1,...,k.

  • wb: the weighting-based approach by VanderWeele et al. (2014). yreg is required. When basec is not empty, ereg is also required and A must be categorical. If specified as a user-defined regression object, yreg should regress Y on A, M and C and ereg should regress A on C.

  • iorw: the inverse odd-ratio weighting approach by Tchetgen Tchetgen (2013). yreg and ereg are required and A must be categorical. If specified as a user-defined regression object, yreg should regress Y on A and C and ereg should regress A on M and C.

  • ne: the natural effect model by Vansteelandt et al. (2012). yreg is required. If specified as a user-defined regression object, yreg should regress Y on A, M and C. The variables in the formula of yreg must follow the order of A, M and C, i.e., the first variable must point to the exposure, the variable(s) right after the exposure must point to the mediator(s), e.g., Y ~ A + M_1 + M_2 + A*M_1 + C.

  • msm: the marginal structural model by VanderWeele et al. (2017). yreg, mreg, wmnomreg and wmdenomreg are required and all mediators must be categorical. When basec is not empty, ereg is also required and A must be categorical. If specified as a user-defined regression object, yreg should regress Y on A and M; mreg[p] should regress M_p on A for p=1,...,k; wmnomreg[p] should regress M_p on A, M_1, ..., M_{p-1} for p=1,...,k; wmdenomreg[p] should regress M_p on A, M_1, ..., M_{p-1}, C and L for p=1,...,k; and ereg should regress A on C.

  • gformula: the g-formula approach by Robins (1986). yreg, mreg are required. postcreg is also required when postc is not empty. If specified as a user-defined regression object, yreg should regress Y on A, M, C and L, mreg[p] should regress M_p on A, C and L for p=1,...,k, postcreg[q] should regress L_q on A and C for q=1,...,s.

When postc is not empty, only msm and gformula can be used.

When there are mediatior-mediator interactions in yreg, only wb, iorw, ne and msm can be used.

Estimation Methods

  • paramfunc: closed-form parameter function estimation (only available when model = "rb" and length(mediator) = 1). The point estimate of each causal effect is obtained by a closed-form formula of regression coefficients. Effects conditional on basecval are estimated.

  • imputation: direct counterfactual imputation estimation. The point estimate of each causal effect is obtained by imputing counterfactuals directly.

To use paramfunc, yreg and mreg must be specified by the character name of the regression. yreg can be chosen from linear, logistic, loglinear, poisson, quasipoisson, negbin, coxph, aft_exp and aft_weibull. mreg can be chosen from linear, logistic and multinomial.

To use paramfunc with yreg = "logistic" or yreg = "coxph", the outcome must be rare.

Inference Methods

  • delta: delta method (only available when estimation = "paramfunc"). The standard errors of causal effects are obtained by the delta method. The confidence intervals of causal effects are obtained by normal distribution approximation.

  • bootstrap: bootstrapping. The standard errors of causal effects are obtained by the standard deviations of bootstrapped results. The confidence intervals of causal effects are obtained by percentiles of bootstrapped results.

Estimated Causal Effects

For a continuous outcome, the causal effects on the difference scale are estimated. For a categorical, count or survival outcome, the causal effects on the ratio scale are estimated. The interpretation of the ratio depends on the type of the outcome and it can be risk ratio for a categorical outcome, rate ratio for a count outcome, hazard ratio for a survival outcome fitted by coxph, mean survival ratio for a survival outcome fitted by survreg, etc.

Continuous Outcome

When model = "rb", "wb", "ne", "msm" or "gformula" with an empty postc and EMint is TRUE, cde (controlled direct effect), pnde (pure natural direct effect), tnde (total natural direct effect), pnie (pure natural indirect effect), tnie (total natural indirect effect), te (total effect), intref (reference interaction), intmed (mediated interaction), cde(prop) (proportion cde), intref(prop) (proportion intref), intmed(prop) (proportion intmed), pnie(prop) (proportion pnie), pm (proportion mediated), int (proportion attributable to interaction) and pe (proportion eliminated) are estimated.

When model = "rb", "wb", "ne", "msm" or "gformula" with an empty postc and EMint is FALSE, cde, pnde, tnde, pnie, tnie, te and pm are estimated.

When postc is not empty, pnde, tnde, pnie, tnie, intref, intmed, intref(prop), intmed(prop), pnie(prop), pm, int and pe are replaced by their randomized analogues rpnde, rtnde, rpnie, rtnie, rintref, rintmed, rintref(prop), rintmed(prop), rpnie(prop), rpm, rint and rpe.

When model = "iorw", te, pnde, tnie and pm are estimated.

Categorical, Count or Survival Outcome

When model = "rb", "wb", "ne", "msm" or "gformula" with an empty postc and EMint is TRUE, Rcde (cde ratio), Rpnde (pnde ratio), Rtnde (tnde ratio), Rpnie (Rpnie ratio), Rtnie (tnie ratio), Rte (te ratio), ERcde (excess ratio due to cde), ERintref (excess ratio due to intref), ERintmed (excess ratio due to intmed), ERpnie (excess ratio due to pnie), ERcde(prop) (proportion ERcde), ERintref(prop) (proportion ERintref), ERintmed(prop) (proportion ERintmed), ERpnie(prop) (proportion ERpnie), pm, int and pe are estimated.

When model = "rb", "wb", "ne", "msm" or "gformula" with an empty postc and EMint is FALSE, Rcde, Rpnde, Rtnde, Rpnie, tnie, te and pm are estimated.

When model = "msm" or "gformula" with a non-empty postc, Rpnde, Rtnde, Rpnie, Rtnie, ERintref, ERintmed, ERpnie, ERintref(prop), ERintmed(prop), ERpnie(prop), pm, int and pe are replaced by their randomized analogues rRpnde, rRtnde, rRpnie, rRtnie, rERintref, rERintmed, rERpnie, rERintref(prop), rERintmed(prop), rERpnie(prop), rpm, rint and rpe.

When model = "iorw", Rte, Rpnde, Rtnie and pm are estimated.

Methods (by generic)

Functions

References

Valeri L, VanderWeele TJ (2013). Mediation analysis allowing for exposure-mediator interactions and causal interpretation: theoretical assumptions and implementation with SAS and SPSS macros. Psychological Methods. 18(2): 137 - 150.

VanderWeele TJ, Vansteelandt S (2014). Mediation analysis with multiple mediators. Epidemiologic Methods. 2(1): 95 - 115.

Tchetgen Tchetgen EJ (2013). Inverse odds ratio-weighted estimation for causal mediation analysis. Statistics in medicine. 32: 4567 - 4580.

Nguyen QC, Osypuk TL, Schmidt NM, Glymour MM, Tchetgen Tchetgen EJ (2015). Practical guidance for conducting mediation analysis with multiple mediators using inverse odds ratio weighting. American Journal of Epidemiology. 181(5): 349 - 356.

VanderWeele TJ, Tchetgen Tchetgen EJ (2017). Mediation analysis with time varying exposures and mediators. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 79(3): 917 - 938.

Robins JM (1986). A new approach to causal inference in mortality studies with a sustained exposure period-Application to control of the healthy worker survivor effect. Mathematical Modelling. 7: 1393 - 1512.

Vansteelandt S, Bekaert M, Lange T (2012). Imputation Strategies for the Estimation of Natural Direct and Indirect Effects. Epidemiologic Methods. 1(1): 131 - 158.

Steen J, Loeys T, Moerkerke B, Vansteelandt S (2017). Medflex: an R package for flexible mediation analysis using natural effect models. Journal of Statistical Software. 76(11).

VanderWeele TJ (2014). A unification of mediation and interaction: a 4-way decomposition. Epidemiology. 25(5): 749 - 61.

Imai K, Keele L, Tingley D (2010). A general approach to causal mediation analysis. Psychological Methods. 15(4): 309 - 334.

Schomaker M, Heumann C (2018). Bootstrap inference when using multiple imputation. Statistics in Medicine. 37(14): 2252 - 2266.

Efron B (1987). Better Bootstrap Confidence Intervals. Journal of the American Statistical Association. 82(397): 171-185.

See also

Examples

if (FALSE) { library(CMAverse) # single-mediator case with rb, no exposure-mediator interaction exp1 <- cmest(data = cma2020, model = "rb", outcome = "contY", exposure = "A", mediator = "M2", basec = c("C1", "C2"), EMint = FALSE, mreg = list("multinomial"), yreg = "linear", astar = 0, a = 1, mval = list("M2_0"), estimation = "paramfunc", inference = "delta") summary(exp1) # single-mediator case with rb exp2 <- cmest(data = cma2020, model = "rb", outcome = "contY", exposure = "A", mediator = "M2", basec = c("C1", "C2"), EMint = TRUE, mreg = list("multinomial"), yreg = "linear", astar = 0, a = 1, mval = list("M2_0"), estimation = "paramfunc", inference = "delta") summary(exp2) # multiple-mediator case with rb # 10 boots are used for illustration exp3 <- cmest(data = cma2020, model = "rb", outcome = "contY", exposure = "A", mediator = c("M1", "M2"), basec = c("C1", "C2"), EMint = TRUE, mreg = list("logistic", "multinomial"), yreg = "linear", astar = 0, a = 1, mval = list(0, "M2_0"), estimation = "imputation", inference = "bootstrap", nboot = 10, boot.ci.type = "bca") # multiple-mediator case with ne exp4 <- cmest(data = cma2020, model = "ne", outcome = "contY", EMint = TRUE, exposure = "A", mediator = c("M1", "M2"), basec = c("C1", "C2"), yreg = glm(contY ~ A + M1 + M2 + A*M1 + A*M2 + C1 + C2, family = gaussian, data = cma2020), astar = 0, a = 1, mval = list(0, "M2_0"), estimation = "imputation", inference = "bootstrap", nboot = 10) # case control study with msm exp5 <- cmest(data = cma2020, model = "msm", casecontrol = TRUE, yrare = TRUE, outcome = "binY", exposure = "A", mediator = c("M1", "M2"), EMint = TRUE, basec = c("C1", "C2"), yreg = "logistic", ereg = "logistic", mreg = list(glm(M1 ~ A, family = binomial, data = cma2020), nnet::multinom(M2 ~ A, data = cma2020, trace = FALSE)), wmnomreg = list(glm(M1 ~ A, family = binomial, data = cma2020), nnet::multinom(M2 ~ A + M1, data = cma2020, trace = FALSE)), wmdenomreg = list(glm(M1 ~ A + C1 + C2, family = binomial, data = cma2020), nnet::multinom(M2 ~ A + M1 + C1 + C2, data = cma2020, trace = FALSE)), astar = 0, a = 1, mval = list(0, "M2_0"), estimation = "imputation", inference = "bootstrap", nboot = 10) }