cmest
is used to implement six causal mediation analysis approaches including
the regression-based approach by Valeri et al. (2013) and VanderWeele
et al. (2014), the weighting-based approach by VanderWeele et al. (2014),
the inverse odd-ratio weighting approach by Tchetgen Tchetgen (2013),
the natural effect model by Vansteelandt et al. (2012), the marginal structural
model by VanderWeele et al. (2017), and the g-formula approach by Robins (1986).
cmest( data = NULL, model = "rb", full = TRUE, casecontrol = FALSE, yrare = NULL, yprevalence = NULL, estimation = "imputation", inference = "bootstrap", outcome = NULL, event = NULL, exposure = NULL, mediator = NULL, EMint = NULL, basec = NULL, postc = NULL, yreg = NULL, mreg = NULL, wmnomreg = NULL, wmdenomreg = NULL, ereg = NULL, postcreg = NULL, astar = 0, a = 1, mval = NULL, yval = NULL, basecval = NULL, nboot = 200, boot.ci.type = "per", nRep = 5, multimp = FALSE, args_mice = NULL ) # S3 method for cmest print(x, ...) # S3 method for cmest summary(object, ...) # S3 method for summary.cmest print(x, digits = 4, ...)
data | dataset |
---|---|
model | causal mediation analysis approach. |
full | a logical value. If |
casecontrol | a logical value. |
yrare | a logical value (used when |
yprevalence | the prevalence of the case (used when |
estimation | method for estimating causal effects. |
inference | method for estimating standard errors of causal effects. |
outcome | variable name of the outcome. |
event | variable name of the event (used when |
exposure | variable name of the exposure. |
mediator | a vector of variable name(s) of the mediator(s). |
EMint | a logical value indicating the existence of exposure-mediator interaction in
|
basec | a vector of variable name(s) of the exposure-outcome confounder(s), exposure-mediator confounder(s) and mediator-outcome confounder(s) not affected by the exposure |
postc | a vector of variable name(s) of the mediator-outcome confounder(s) affected by the exposure following the temporal order |
yreg | outcome regression model. See |
mreg | a list specifying a regression model for each variable in |
wmnomreg | a list specifying a regression model for calculating the nominators of
weights with respect to each variable in |
wmdenomreg | a list specifying a regression model for calculating the denominators of
weights with respect to each variable in |
ereg | exposure regression model for calculating weights with respect to the exposure (used
when |
postcreg | a list specifying a regression model for each variable in |
astar | the control value for the exposure. Default is |
a | the active value for the exposure. Default is |
mval | a list specifying a value for each variable in |
yval | the value of the outcome at which causal effects on the risk/odds ratio scale are estimated (used when the outcome is categorical). |
basecval | a list specifying a conditional value for each variable in |
nboot | the number of boots applied (used when |
boot.ci.type | the type of bootstrap confidence interval. If |
nRep | number of replications or hypothetical values of the exposure to sample for
each observation unit (used when |
multimp | a logical value (used when |
args_mice | a list of additional arguments passed to the mice function. See mice for details. |
x | an object of class |
object | an object of class |
digits | minimal number of significant digits. See print.default. |
An object of class cmest
is returned:
the function call,
the dataset,
a list of methods used which may include model
, full
,
casecontrol
, yprevalence
, yrare
, estimation
, inference
,
nboot
, boot.ci.type
and nRep
,
a list of variables used which may include outcome
, event
,
exposure
, mediator
, EMint
, basec
and postc
,
a list of regressions input,
a list of arguments used for multiple imputation,
reference values used which may include a
, astar
, mval
,
yval
and basecval
,
a list of regressions output. If multimp
is TRUE
,
reg.output contains regressions fitted by each of the imputed dataset,
point estimates of causal effects,
standard errors of causal effects,
the lower limits of 95% confidence intervals of causal effects,
the higher limits of 95% confidence intervals of causal effects,
p-values of causal effects,
Regressions
Each regression in yreg
, mreg
, wmnomreg
, wmdenomreg
,
ereg
and postcreg
can be specified by a user-defined regression
object or the character name of the regression.
The Character Name of A Regression
linear
: linear regression fitted by glm with family = gaussian()
logistic
: logistic regression fitted by glm with family = logit()
loglinear
: log linear regression fitted by glm with
family = poisson()
for a binary response
poisson
: poisson regression fitted by glm with
family = poisson()
for a count response
quasipoisson
: quasipoisson regression fitted by glm with
family = quasipoisson()
negbin
: negative binomial regression fitted by glm.nb
multinomial
: multinomial regression fitted by multinom
ordinal
: ordered logistic regression fitted by polr
coxph
: cox proportional hazard model fitted by coxph
aft_exp
: accelerated failure time model fitted by survreg
with dist = "exponential"
aft_weibull
: accelerated failure time model fitted by survreg
with dist = "weibull"
coxph
, aft_exp
and aft_weibull
are currently not implemented for
mreg
, wmnomreg
, wmdenomreg
, ereg
and postcreg
.
The User-defined Regression Object
A user-defined regression object can be fitted by lm, glm, glm.nb,
gam, multinom, polr, coxph and
survreg. Objects fitted by coxph and survreg
are currently not supported for mreg
, wmnomreg
, wmdenomreg
,
ereg
and postcreg
.
The cmest
function calculates weights for regressions when weighting is required. If a
user-defined regression object is fitted with prior weights, the final weights for this
regression object are constructed by multiplying the prior weights and the weights calculated
inside the cmest
function.
Causal Mediation Analysis Approaches
Let Y
denote outcome
, A
denote exposure
, M=(M_1,...,M_k)^T
denote mediator
, C
denote basec
, L=(L_1,...,L_s)^T
denote postc
.
rb
: the regression-based approach by Valeri et al. (2013) and
VanderWeele et al. (2014). yreg
and mreg
are required. If specified as
a user-defined regression object, yreg
should regress Y
on A
,
M
and C
and mreg[p]
should regress M_p
on A
and C
for p=1,...,k
.
wb
: the weighting-based approach by VanderWeele et al. (2014).
yreg
is required. When basec
is not empty, ereg
is also required
and A
must be categorical. If specified as a user-defined regression object,
yreg
should regress Y
on A
, M
and C
and ereg
should regress A
on C
.
iorw
: the inverse odd-ratio weighting approach by
Tchetgen Tchetgen (2013). yreg
and ereg
are required and
A
must be categorical. If specified as a user-defined regression object,
yreg
should regress Y
on A
and C
and ereg
should
regress A
on M
and C
.
ne
: the natural effect model by Vansteelandt et al. (2012).
yreg
is required. If specified as a user-defined regression object, yreg
should regress Y
on A
, M
and C
. The variables in the
formula of yreg
must follow the order of A
, M
and C
, i.e.,
the first variable must point to the exposure, the variable(s) right after the
exposure must point to the mediator(s), e.g., Y ~ A + M_1 + M_2 + A*M_1 + C
.
msm
: the marginal structural model by VanderWeele et al. (2017).
yreg
, mreg
, wmnomreg
and wmdenomreg
are required and all
mediators must be categorical. When basec
is not empty, ereg
is also
required and A
must be categorical. If specified as a user-defined regression
object, yreg
should regress Y
on A
and M
; mreg[p]
should regress M_p
on A
for p=1,...,k
; wmnomreg[p]
should
regress M_p
on A
, M_1
, ..., M_{p-1}
for p=1,...,k
;
wmdenomreg[p]
should regress M_p
on A
, M_1
, ..., M_{p-1}
,
C
and L
for p=1,...,k
; and ereg
should regress A
on
C
.
gformula
: the g-formula approach by Robins (1986).
yreg
, mreg
are required. postcreg
is also required when postc
is not empty. If specified as a user-defined regression object, yreg
should
regress Y
on A
, M
, C
and L
, mreg[p]
should
regress M_p
on A
, C
and L
for p=1,...,k
, postcreg[q]
should regress L_q
on A
and C
for q=1,...,s
.
When postc
is not empty, only msm
and gformula
can be used.
When there are mediatior-mediator interactions in yreg
, only wb
, iorw
,
ne
and msm
can be used.
Estimation Methods
paramfunc
: closed-form parameter function estimation (only available when
model = "rb"
and length(mediator) = 1
). The point estimate of each causal
effect is obtained by a closed-form formula of regression coefficients. Effects conditional
on basecval
are estimated.
imputation
: direct counterfactual imputation estimation. The point estimate
of each causal effect is obtained by imputing counterfactuals directly.
To use paramfunc
, yreg
and mreg
must be specified by the character name
of the regression. yreg
can be chosen from linear
, logistic
, loglinear
,
poisson
, quasipoisson
, negbin
, coxph
, aft_exp
and
aft_weibull
. mreg
can be chosen from linear
, logistic
and
multinomial
.
To use paramfunc
with yreg = "logistic"
or yreg = "coxph"
, the outcome must
be rare.
Inference Methods
delta
: delta method (only available when estimation = "paramfunc"
).
The standard errors of causal effects are obtained by the delta method. The confidence
intervals of causal effects are obtained by normal distribution approximation.
bootstrap
: bootstrapping. The standard errors of causal effects are
obtained by the standard deviations of bootstrapped results. The confidence intervals of
causal effects are obtained by percentiles of bootstrapped results.
Estimated Causal Effects
For a continuous outcome, the causal effects on the difference scale are estimated. For a categorical, count or survival outcome, the causal effects on the ratio scale are estimated. The interpretation of the ratio depends on the type of the outcome and it can be risk ratio for a categorical outcome, rate ratio for a count outcome, hazard ratio for a survival outcome fitted by coxph, mean survival ratio for a survival outcome fitted by survreg, etc.
Continuous Outcome
When model = "rb", "wb", "ne", "msm" or "gformula"
with an empty postc
and
EMint
is TRUE
, cde
(controlled direct effect), pnde
(pure natural
direct effect), tnde
(total natural direct effect), pnie
(pure natural indirect
effect), tnie
(total natural indirect effect), te
(total effect), intref
(reference interaction), intmed
(mediated interaction),
cde(prop)
(proportion cde
), intref(prop)
(proportion
intref
), intmed(prop)
(proportion intmed
), pnie(prop)
(proportion pnie
), pm
(proportion mediated), int
(proportion
attributable to interaction) and pe
(proportion eliminated) are estimated.
When model = "rb", "wb", "ne", "msm" or "gformula"
with an empty postc
and
EMint
is FALSE
, cde
, pnde
, tnde
, pnie
, tnie
,
te
and pm
are estimated.
When postc
is not empty, pnde
, tnde
, pnie
, tnie
,
intref
, intmed
, intref(prop)
, intmed(prop)
, pnie(prop)
,
pm
, int
and pe
are replaced by their randomized analogues rpnde
,
rtnde
, rpnie
, rtnie
, rintref
, rintmed
, rintref(prop)
,
rintmed(prop)
, rpnie(prop)
, rpm
, rint
and rpe
.
When model = "iorw"
, te
, pnde
, tnie
and pm
are estimated.
Categorical, Count or Survival Outcome
When model = "rb", "wb", "ne", "msm" or "gformula"
with an empty postc
and
EMint
is TRUE
, Rcde
(cde
ratio), Rpnde
(pnde
ratio),
Rtnde
(tnde
ratio), Rpnie
(Rpnie
ratio), Rtnie
(tnie
ratio),
Rte
(te
ratio), ERcde
(excess ratio due to cde
), ERintref
(excess
ratio due to intref
), ERintmed
(excess ratio due to intmed
), ERpnie
(excess ratio due to pnie
), ERcde(prop)
(proportion ERcde
),
ERintref(prop)
(proportion ERintref
), ERintmed(prop)
(proportion ERintmed
),
ERpnie(prop)
(proportion ERpnie
), pm
, int
and pe
are estimated.
When model = "rb", "wb", "ne", "msm" or "gformula"
with an empty postc
and
EMint
is FALSE
, Rcde
, Rpnde
, Rtnde
, Rpnie
, tnie
,
te
and pm
are estimated.
When model = "msm" or "gformula"
with a non-empty postc
, Rpnde
, Rtnde
,
Rpnie
, Rtnie
, ERintref
, ERintmed
, ERpnie
,
ERintref(prop)
, ERintmed(prop)
, ERpnie(prop)
, pm
, int
and
pe
are replaced by their randomized analogues rRpnde
, rRtnde
, rRpnie
,
rRtnie
, rERintref
, rERintmed
, rERpnie
, rERintref(prop)
,
rERintmed(prop)
, rERpnie(prop)
, rpm
, rint
and rpe
.
When model = "iorw"
, Rte
, Rpnde
, Rtnie
and pm
are estimated.
print(cmest)
: Print the results of cmest
nicely
summary(cmest)
: Summarize the results of cmest
nicely
print(summary.cmest)
: Print the summary of cmest
nicely
Valeri L, VanderWeele TJ (2013). Mediation analysis allowing for exposure-mediator interactions and causal interpretation: theoretical assumptions and implementation with SAS and SPSS macros. Psychological Methods. 18(2): 137 - 150.
VanderWeele TJ, Vansteelandt S (2014). Mediation analysis with multiple mediators. Epidemiologic Methods. 2(1): 95 - 115.
Tchetgen Tchetgen EJ (2013). Inverse odds ratio-weighted estimation for causal mediation analysis. Statistics in medicine. 32: 4567 - 4580.
Nguyen QC, Osypuk TL, Schmidt NM, Glymour MM, Tchetgen Tchetgen EJ (2015). Practical guidance for conducting mediation analysis with multiple mediators using inverse odds ratio weighting. American Journal of Epidemiology. 181(5): 349 - 356.
VanderWeele TJ, Tchetgen Tchetgen EJ (2017). Mediation analysis with time varying exposures and mediators. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 79(3): 917 - 938.
Robins JM (1986). A new approach to causal inference in mortality studies with a sustained exposure period-Application to control of the healthy worker survivor effect. Mathematical Modelling. 7: 1393 - 1512.
Vansteelandt S, Bekaert M, Lange T (2012). Imputation Strategies for the Estimation of Natural Direct and Indirect Effects. Epidemiologic Methods. 1(1): 131 - 158.
Steen J, Loeys T, Moerkerke B, Vansteelandt S (2017). Medflex: an R package for flexible mediation analysis using natural effect models. Journal of Statistical Software. 76(11).
VanderWeele TJ (2014). A unification of mediation and interaction: a 4-way decomposition. Epidemiology. 25(5): 749 - 61.
Imai K, Keele L, Tingley D (2010). A general approach to causal mediation analysis. Psychological Methods. 15(4): 309 - 334.
Schomaker M, Heumann C (2018). Bootstrap inference when using multiple imputation. Statistics in Medicine. 37(14): 2252 - 2266.
Efron B (1987). Better Bootstrap Confidence Intervals. Journal of the American Statistical Association. 82(397): 171-185.
if (FALSE) { library(CMAverse) # single-mediator case with rb, no exposure-mediator interaction exp1 <- cmest(data = cma2020, model = "rb", outcome = "contY", exposure = "A", mediator = "M2", basec = c("C1", "C2"), EMint = FALSE, mreg = list("multinomial"), yreg = "linear", astar = 0, a = 1, mval = list("M2_0"), estimation = "paramfunc", inference = "delta") summary(exp1) # single-mediator case with rb exp2 <- cmest(data = cma2020, model = "rb", outcome = "contY", exposure = "A", mediator = "M2", basec = c("C1", "C2"), EMint = TRUE, mreg = list("multinomial"), yreg = "linear", astar = 0, a = 1, mval = list("M2_0"), estimation = "paramfunc", inference = "delta") summary(exp2) # multiple-mediator case with rb # 10 boots are used for illustration exp3 <- cmest(data = cma2020, model = "rb", outcome = "contY", exposure = "A", mediator = c("M1", "M2"), basec = c("C1", "C2"), EMint = TRUE, mreg = list("logistic", "multinomial"), yreg = "linear", astar = 0, a = 1, mval = list(0, "M2_0"), estimation = "imputation", inference = "bootstrap", nboot = 10, boot.ci.type = "bca") # multiple-mediator case with ne exp4 <- cmest(data = cma2020, model = "ne", outcome = "contY", EMint = TRUE, exposure = "A", mediator = c("M1", "M2"), basec = c("C1", "C2"), yreg = glm(contY ~ A + M1 + M2 + A*M1 + A*M2 + C1 + C2, family = gaussian, data = cma2020), astar = 0, a = 1, mval = list(0, "M2_0"), estimation = "imputation", inference = "bootstrap", nboot = 10) # case control study with msm exp5 <- cmest(data = cma2020, model = "msm", casecontrol = TRUE, yrare = TRUE, outcome = "binY", exposure = "A", mediator = c("M1", "M2"), EMint = TRUE, basec = c("C1", "C2"), yreg = "logistic", ereg = "logistic", mreg = list(glm(M1 ~ A, family = binomial, data = cma2020), nnet::multinom(M2 ~ A, data = cma2020, trace = FALSE)), wmnomreg = list(glm(M1 ~ A, family = binomial, data = cma2020), nnet::multinom(M2 ~ A + M1, data = cma2020, trace = FALSE)), wmdenomreg = list(glm(M1 ~ A + C1 + C2, family = binomial, data = cma2020), nnet::multinom(M2 ~ A + M1 + C1 + C2, data = cma2020, trace = FALSE)), astar = 0, a = 1, mval = list(0, "M2_0"), estimation = "imputation", inference = "bootstrap", nboot = 10) }