vignettes/multiple_imputation.Rmd
multiple_imputation.Rmd
This example demonstrates how to estimate causal effects with
multiple imputations by using cmest
when the dataset
contains missing values. For this purpose, we simulate some data
containing a continuous baseline confounder \(C_1\), a binary baseline confounder \(C_2\), a binary exposure \(A\), a binary mediator \(M\) and a binary outcome \(Y\). The true regression models for \(A\), \(M\)
and \(Y\) are: \[logit(E(A|C_1,C_2))=0.2+0.5C_1+0.1C_2\]
\[logit(E(M|A,C_1,C_2))=1+2A+1.5C_1+0.8C_2\]
\[logit(E(Y|A,M,C_1,C_2)))=-3-0.4A-1.2M+0.5AM+0.3C_1-0.6C_2\]
To create the dataset with missing values, we randomly delete 10% of values in the dataset.
set.seed(1)
expit <- function(x) exp(x)/(1+exp(x))
n <- 10000
C1 <- rnorm(n, mean = 1, sd = 0.1)
C2 <- rbinom(n, 1, 0.6)
A <- rbinom(n, 1, expit(0.2 + 0.5*C1 + 0.1*C2))
M <- rbinom(n, 1, expit(1 + 2*A + 1.5*C1 + 0.8*C2))
Y <- rbinom(n, 1, expit(-3 - 0.4*A - 1.2*M + 0.5*A*M + 0.3*C1 - 0.6*C2))
data_noNA <- data.frame(A, M, Y, C1, C2)
missing <- sample(1:(5*n), n*0.1, replace = FALSE)
C1[missing[which(missing <= n)]] <- NA
C2[missing[which((missing > n)*(missing <= 2*n) == 1)] - n] <- NA
A[missing[which((missing > 2*n)*(missing <= 3*n) == 1)] - 2*n] <- NA
M[missing[which((missing > 3*n)*(missing <= 4*n) == 1)] - 3*n] <- NA
Y[missing[which((missing > 4*n)*(missing <= 5*n) == 1)] - 4*n] <- NA
data <- data.frame(A, M, Y, C1, C2)
The DAG for this scientific setting is:
## Registered S3 methods overwritten by 'lme4':
## method from
## cooks.distance.influence.merMod car
## influence.merMod car
## dfbeta.influence.merMod car
## dfbetas.influence.merMod car
cmdag(outcome = "Y", exposure = "A", mediator = "M",
basec = c("C1", "C2"), postc = NULL, node = TRUE, text_col = "white")
To conduct multiple imputations, we set the multimp
argument to be TRUE
. The regression-based approach is used
for illustration. The multiple imputation results:
res_multimp <- cmest(data = data, model = "rb", outcome = "Y", exposure = "A",
mediator = "M", basec = c("C1", "C2"), EMint = TRUE,
mreg = list("logistic"), yreg = "logistic",
astar = 0, a = 1, mval = list(1),
estimation = "paramfunc", inference = "delta",
multimp = TRUE, args_mice = list(m = 10))
summary(res_multimp)
## Causal Mediation Analysis
##
## # Regressions with imputed dataset 1
##
## ## Outcome regression:
##
## Call:
## glm(formula = Y ~ A + M + A * M + C1 + C2, family = binomial(),
## data = getCall(x$reg.output[[1L]]$yreg)$data, weights = getCall(x$reg.output[[1L]]$yreg)$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.5560 -0.2416 -0.1755 -0.1615 3.0524
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.5624 0.7669 -4.645 3.40e-06 ***
## A 0.6413 0.5279 1.215 0.22442
## M -1.0737 0.3738 -2.873 0.00407 **
## C1 0.9419 0.6960 1.353 0.17594
## C2 -0.8335 0.1440 -5.788 7.11e-09 ***
## A:M -0.3402 0.5538 -0.614 0.53897
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2030.4 on 9999 degrees of freedom
## Residual deviance: 1973.2 on 9994 degrees of freedom
## AIC: 1985.2
##
## Number of Fisher Scoring iterations: 7
##
##
## ## Mediator regressions:
##
## Call:
## glm(formula = M ~ A + C1 + C2, family = binomial(), data = getCall(x$reg.output[[1L]]$mreg[[1L]])$data,
## weights = getCall(x$reg.output[[1L]]$mreg[[1L]])$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.2820 0.1137 0.1633 0.2434 0.5946
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.03518 0.64316 -0.055 0.956376
## A 1.66962 0.14384 11.607 < 2e-16 ***
## C1 2.48028 0.65225 3.803 0.000143 ***
## C2 0.94592 0.13553 6.980 2.96e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2286.6 on 9999 degrees of freedom
## Residual deviance: 2061.3 on 9996 degrees of freedom
## AIC: 2069.3
##
## Number of Fisher Scoring iterations: 7
##
##
## # Regressions with imputed dataset 2
##
## ## Outcome regression:
##
## Call:
## glm(formula = Y ~ A + M + A * M + C1 + C2, family = binomial(),
## data = getCall(x$reg.output[[2L]]$yreg)$data, weights = getCall(x$reg.output[[2L]]$yreg)$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.5540 -0.2437 -0.1701 -0.1590 3.0527
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.3513 0.7716 -4.343 1.40e-05 ***
## A 0.6633 0.5285 1.255 0.20950
## M -1.0654 0.3738 -2.850 0.00437 **
## C1 0.7414 0.7017 1.057 0.29065
## C2 -0.8794 0.1458 -6.031 1.63e-09 ***
## A:M -0.3846 0.5545 -0.694 0.48788
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2007.3 on 9999 degrees of freedom
## Residual deviance: 1947.3 on 9994 degrees of freedom
## AIC: 1959.3
##
## Number of Fisher Scoring iterations: 7
##
##
## ## Mediator regressions:
##
## Call:
## glm(formula = M ~ A + C1 + C2, family = binomial(), data = getCall(x$reg.output[[2L]]$mreg[[1L]])$data,
## weights = getCall(x$reg.output[[2L]]$mreg[[1L]])$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.2765 0.1127 0.1633 0.2463 0.5698
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.2522 0.6441 0.392 0.695407
## A 1.7041 0.1453 11.732 < 2e-16 ***
## C1 2.1925 0.6522 3.362 0.000775 ***
## C2 0.9310 0.1358 6.855 7.14e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2271.9 on 9999 degrees of freedom
## Residual deviance: 2048.6 on 9996 degrees of freedom
## AIC: 2056.6
##
## Number of Fisher Scoring iterations: 7
##
##
## # Regressions with imputed dataset 3
##
## ## Outcome regression:
##
## Call:
## glm(formula = Y ~ A + M + A * M + C1 + C2, family = binomial(),
## data = getCall(x$reg.output[[3L]]$yreg)$data, weights = getCall(x$reg.output[[3L]]$yreg)$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.5978 -0.2455 -0.1721 -0.1611 3.0518
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.3476 0.7664 -4.368 1.25e-05 ***
## A 0.8351 0.5116 1.632 0.10261
## M -1.0600 0.3736 -2.837 0.00455 **
## C1 0.7329 0.6948 1.055 0.29148
## C2 -0.8783 0.1443 -6.085 1.17e-09 ***
## A:M -0.5339 0.5383 -0.992 0.32128
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2038.1 on 9999 degrees of freedom
## Residual deviance: 1974.0 on 9994 degrees of freedom
## AIC: 1986
##
## Number of Fisher Scoring iterations: 7
##
##
## ## Mediator regressions:
##
## Call:
## glm(formula = M ~ A + C1 + C2, family = binomial(), data = getCall(x$reg.output[[3L]]$mreg[[1L]])$data,
## weights = getCall(x$reg.output[[3L]]$mreg[[1L]])$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.2771 0.1135 0.1611 0.2478 0.5803
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.1175 0.6424 0.183 0.85487
## A 1.7076 0.1451 11.765 < 2e-16 ***
## C1 2.3329 0.6511 3.583 0.00034 ***
## C2 0.9053 0.1351 6.699 2.1e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2279.3 on 9999 degrees of freedom
## Residual deviance: 2054.9 on 9996 degrees of freedom
## AIC: 2062.9
##
## Number of Fisher Scoring iterations: 7
##
##
## # Regressions with imputed dataset 4
##
## ## Outcome regression:
##
## Call:
## glm(formula = Y ~ A + M + A * M + C1 + C2, family = binomial(),
## data = getCall(x$reg.output[[4L]]$yreg)$data, weights = getCall(x$reg.output[[4L]]$yreg)$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.5665 -0.2407 -0.1740 -0.1611 3.0521
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.3821 0.7635 -4.430 9.42e-06 ***
## A 0.5992 0.5182 1.156 0.247530
## M -1.1938 0.3590 -3.325 0.000884 ***
## C1 0.8605 0.6992 1.231 0.218436
## C2 -0.8218 0.1448 -5.677 1.37e-08 ***
## A:M -0.3001 0.5450 -0.551 0.581866
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2015.0 on 9999 degrees of freedom
## Residual deviance: 1956.5 on 9994 degrees of freedom
## AIC: 1968.5
##
## Number of Fisher Scoring iterations: 7
##
##
## ## Mediator regressions:
##
## Call:
## glm(formula = M ~ A + C1 + C2, family = binomial(), data = getCall(x$reg.output[[4L]]$mreg[[1L]])$data,
## weights = getCall(x$reg.output[[4L]]$mreg[[1L]])$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.2968 0.1106 0.1591 0.2458 0.5925
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.0239 0.6455 0.037 0.970466
## A 1.7537 0.1466 11.965 < 2e-16 ***
## C1 2.4044 0.6539 3.677 0.000236 ***
## C2 0.9374 0.1359 6.897 5.32e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2271.9 on 9999 degrees of freedom
## Residual deviance: 2038.1 on 9996 degrees of freedom
## AIC: 2046.1
##
## Number of Fisher Scoring iterations: 7
##
##
## # Regressions with imputed dataset 5
##
## ## Outcome regression:
##
## Call:
## glm(formula = Y ~ A + M + A * M + C1 + C2, family = binomial(),
## data = getCall(x$reg.output[[5L]]$yreg)$data, weights = getCall(x$reg.output[[5L]]$yreg)$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.5573 -0.2458 -0.1727 -0.1606 3.0612
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.2152 0.7541 -4.264 2.01e-05 ***
## A 0.4723 0.5092 0.928 0.353602
## M -1.2817 0.3462 -3.703 0.000213 ***
## C1 0.7980 0.6944 1.149 0.250480
## C2 -0.8747 0.1445 -6.053 1.42e-09 ***
## A:M -0.1467 0.5364 -0.273 0.784508
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2038.1 on 9999 degrees of freedom
## Residual deviance: 1972.8 on 9994 degrees of freedom
## AIC: 1984.8
##
## Number of Fisher Scoring iterations: 7
##
##
## ## Mediator regressions:
##
## Call:
## glm(formula = M ~ A + C1 + C2, family = binomial(), data = getCall(x$reg.output[[5L]]$mreg[[1L]])$data,
## weights = getCall(x$reg.output[[5L]]$mreg[[1L]])$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.2892 0.1105 0.1631 0.2451 0.5819
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.1842 0.6422 0.287 0.774268
## A 1.7317 0.1457 11.886 < 2e-16 ***
## C1 2.2255 0.6499 3.424 0.000617 ***
## C2 0.9759 0.1358 7.184 6.75e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2286.6 on 9999 degrees of freedom
## Residual deviance: 2052.2 on 9996 degrees of freedom
## AIC: 2060.2
##
## Number of Fisher Scoring iterations: 7
##
##
## # Regressions with imputed dataset 6
##
## ## Outcome regression:
##
## Call:
## glm(formula = Y ~ A + M + A * M + C1 + C2, family = binomial(),
## data = getCall(x$reg.output[[6L]]$yreg)$data, weights = getCall(x$reg.output[[6L]]$yreg)$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.5425 -0.2478 -0.1734 -0.1649 3.0346
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.0935 0.7581 -4.081 4.49e-05 ***
## A 0.5823 0.5175 1.125 0.2605
## M -1.1526 0.3583 -3.217 0.0013 **
## C1 0.5570 0.6944 0.802 0.4225
## C2 -0.8334 0.1438 -5.796 6.77e-09 ***
## A:M -0.2852 0.5439 -0.524 0.6000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2038.1 on 9999 degrees of freedom
## Residual deviance: 1980.0 on 9994 degrees of freedom
## AIC: 1992
##
## Number of Fisher Scoring iterations: 7
##
##
## ## Mediator regressions:
##
## Call:
## glm(formula = M ~ A + C1 + C2, family = binomial(), data = getCall(x$reg.output[[6L]]$mreg[[1L]])$data,
## weights = getCall(x$reg.output[[6L]]$mreg[[1L]])$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.2831 0.1116 0.1635 0.2465 0.5820
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.1784 0.6435 0.277 0.781575
## A 1.7256 0.1449 11.906 < 2e-16 ***
## C1 2.2335 0.6511 3.430 0.000603 ***
## C2 0.9586 0.1354 7.081 1.44e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2294.0 on 9999 degrees of freedom
## Residual deviance: 2061.5 on 9996 degrees of freedom
## AIC: 2069.5
##
## Number of Fisher Scoring iterations: 7
##
##
## # Regressions with imputed dataset 7
##
## ## Outcome regression:
##
## Call:
## glm(formula = Y ~ A + M + A * M + C1 + C2, family = binomial(),
## data = getCall(x$reg.output[[7L]]$yreg)$data, weights = getCall(x$reg.output[[7L]]$yreg)$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.5639 -0.2415 -0.1744 -0.1606 3.0539
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.5496 0.7698 -4.611 4.01e-06 ***
## A 0.6739 0.5284 1.275 0.20215
## M -1.0735 0.3738 -2.872 0.00408 **
## C1 0.9293 0.6985 1.330 0.18336
## C2 -0.8402 0.1445 -5.815 6.07e-09 ***
## A:M -0.3776 0.5543 -0.681 0.49573
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2022.7 on 9999 degrees of freedom
## Residual deviance: 1964.8 on 9994 degrees of freedom
## AIC: 1976.8
##
## Number of Fisher Scoring iterations: 7
##
##
## ## Mediator regressions:
##
## Call:
## glm(formula = M ~ A + C1 + C2, family = binomial(), data = getCall(x$reg.output[[7L]]$mreg[[1L]])$data,
## weights = getCall(x$reg.output[[7L]]$mreg[[1L]])$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.2951 0.1118 0.1615 0.2428 0.5994
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.08708 0.64471 -0.135 0.892561
## A 1.70024 0.14529 11.702 < 2e-16 ***
## C1 2.53093 0.65396 3.870 0.000109 ***
## C2 0.95308 0.13616 7.000 2.56e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2271.9 on 9999 degrees of freedom
## Residual deviance: 2043.1 on 9996 degrees of freedom
## AIC: 2051.1
##
## Number of Fisher Scoring iterations: 7
##
##
## # Regressions with imputed dataset 8
##
## ## Outcome regression:
##
## Call:
## glm(formula = Y ~ A + M + A * M + C1 + C2, family = binomial(),
## data = getCall(x$reg.output[[8L]]$yreg)$data, weights = getCall(x$reg.output[[8L]]$yreg)$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.5742 -0.2415 -0.1753 -0.1619 3.0350
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.5985 0.7640 -4.710 2.48e-06 ***
## A 0.7253 0.5288 1.372 0.1702
## M -1.0158 0.3725 -2.727 0.0064 **
## C1 0.9601 0.6929 1.385 0.1659
## C2 -0.8179 0.1434 -5.703 1.18e-08 ***
## A:M -0.4815 0.5539 -0.869 0.3847
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2038.1 on 9999 degrees of freedom
## Residual deviance: 1982.5 on 9994 degrees of freedom
## AIC: 1994.5
##
## Number of Fisher Scoring iterations: 7
##
##
## ## Mediator regressions:
##
## Call:
## glm(formula = M ~ A + C1 + C2, family = binomial(), data = getCall(x$reg.output[[8L]]$mreg[[1L]])$data,
## weights = getCall(x$reg.output[[8L]]$mreg[[1L]])$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.2971 0.1107 0.1584 0.2465 0.5975
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.02998 0.64194 -0.047 0.962749
## A 1.75748 0.14646 11.999 < 2e-16 ***
## C1 2.45664 0.65105 3.773 0.000161 ***
## C2 0.92943 0.13550 6.859 6.91e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2279.3 on 9999 degrees of freedom
## Residual deviance: 2043.4 on 9996 degrees of freedom
## AIC: 2051.4
##
## Number of Fisher Scoring iterations: 7
##
##
## # Regressions with imputed dataset 9
##
## ## Outcome regression:
##
## Call:
## glm(formula = Y ~ A + M + A * M + C1 + C2, family = binomial(),
## data = getCall(x$reg.output[[9L]]$yreg)$data, weights = getCall(x$reg.output[[9L]]$yreg)$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.5625 -0.2385 -0.1746 -0.1606 3.0559
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.5868 0.7729 -4.641 3.47e-06 ***
## A 0.6867 0.5282 1.300 0.19357
## M -1.0834 0.3741 -2.896 0.00378 **
## C1 0.9452 0.7020 1.346 0.17818
## C2 -0.8130 0.1452 -5.600 2.15e-08 ***
## A:M -0.3861 0.5546 -0.696 0.48624
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1999.6 on 9999 degrees of freedom
## Residual deviance: 1944.1 on 9994 degrees of freedom
## AIC: 1956.1
##
## Number of Fisher Scoring iterations: 7
##
##
## ## Mediator regressions:
##
## Call:
## glm(formula = M ~ A + C1 + C2, family = binomial(), data = getCall(x$reg.output[[9L]]$mreg[[1L]])$data,
## weights = getCall(x$reg.output[[9L]]$mreg[[1L]])$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.2818 0.1126 0.1620 0.2472 0.5857
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.09774 0.64135 0.152 0.878874
## A 1.71875 0.14492 11.860 < 2e-16 ***
## C1 2.33198 0.64966 3.590 0.000331 ***
## C2 0.93071 0.13509 6.889 5.6e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2294.0 on 9999 degrees of freedom
## Residual deviance: 2063.6 on 9996 degrees of freedom
## AIC: 2071.6
##
## Number of Fisher Scoring iterations: 7
##
##
## # Regressions with imputed dataset 10
##
## ## Outcome regression:
##
## Call:
## glm(formula = Y ~ A + M + A * M + C1 + C2, family = binomial(),
## data = getCall(x$reg.output[[10L]]$yreg)$data, weights = getCall(x$reg.output[[10L]]$yreg)$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.6006 -0.2466 -0.1696 -0.1596 3.0471
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.2053 0.7595 -4.220 2.44e-05 ***
## A 0.7631 0.5012 1.523 0.12786
## M -1.1229 0.3580 -3.137 0.00171 **
## C1 0.6827 0.6961 0.981 0.32674
## C2 -0.8995 0.1449 -6.210 5.31e-10 ***
## A:M -0.4937 0.5281 -0.935 0.34987
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2038.1 on 9999 degrees of freedom
## Residual deviance: 1970.7 on 9994 degrees of freedom
## AIC: 1982.7
##
## Number of Fisher Scoring iterations: 7
##
##
## ## Mediator regressions:
##
## Call:
## glm(formula = M ~ A + C1 + C2, family = binomial(), data = getCall(x$reg.output[[10L]]$mreg[[1L]])$data,
## weights = getCall(x$reg.output[[10L]]$mreg[[1L]])$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.3026 0.1094 0.1600 0.2428 0.5976
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.003158 0.643227 0.005 0.996083
## A 1.745766 0.146506 11.916 < 2e-16 ***
## C1 2.407423 0.652168 3.691 0.000223 ***
## C2 0.981818 0.136344 7.201 5.98e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2279.3 on 9999 degrees of freedom
## Residual deviance: 2039.5 on 9996 degrees of freedom
## AIC: 2047.5
##
## Number of Fisher Scoring iterations: 7
##
##
## # Effect decomposition on the odds ratio scale via the regression-based approach with multiple imputation
##
## Closed-form parameter function estimation with
## delta method standard errors, confidence intervals and p-values
##
## Estimate Std.error 95% CIL 95% CIU P.val
## Rcde 1.33818 0.22617 0.96084 1.864 0.0848 .
## Rpnde 1.41766 0.23702 1.02154 1.967 0.0368 *
## Rtnde 1.35411 0.22314 0.98036 1.870 0.0658 .
## Rpnie 0.92797 0.03740 0.85750 1.004 0.0636 .
## Rtnie 0.88638 0.05341 0.78763 0.998 0.0453 *
## Rte 1.25658 0.19851 0.92198 1.713 0.1482
## ERcde 0.30838 0.20145 -0.08645 0.703 0.1258
## ERintref 0.10948 0.11563 -0.11715 0.336 0.3437
## ERintmed -0.08911 0.09429 -0.27392 0.096 0.3446
## ERpnie -0.07199 0.03736 -0.14522 0.001 0.0540 .
## ERcde(prop) 1.20207 0.26051 0.69148 1.713 3.94e-06 ***
## ERintref(prop) 0.43032 0.50804 -0.56542 1.426 0.3970
## ERintmed(prop) -0.35041 0.41505 -1.16389 0.463 0.3985
## ERpnie(prop) -0.28198 0.26120 -0.79394 0.230 0.2803
## pm -0.63239 0.53431 -1.67962 0.415 0.2366
## int 0.07991 0.09394 -0.10421 0.264 0.3950
## pe -0.20207 0.26051 -0.71267 0.309 0.4379
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Rcde: controlled direct effect odds ratio; Rpnde: pure natural direct effect odds ratio; Rtnde: total natural direct effect odds ratio; Rpnie: pure natural indirect effect odds ratio; Rtnie: total natural indirect effect odds ratio; Rte: total effect odds ratio; ERcde: excess relative risk due to controlled direct effect; ERintref: excess relative risk due to reference interaction; ERintmed: excess relative risk due to mediated interaction; ERpnie: excess relative risk due to pure natural indirect effect; ERcde(prop): proportion ERcde; ERintref(prop): proportion ERintref; ERintmed(prop): proportion ERintmed; ERpnie(prop): proportion ERpnie; pm: overall proportion mediated; int: overall proportion attributable to interaction; pe: overall proportion eliminated)
##
## Relevant variable values:
## $a
## [1] 1
##
## $astar
## [1] 0
##
## $yval
## [1] "1"
##
## $mval
## $mval[[1]]
## [1] 1
##
##
## $basecval
## $basecval[[1]]
## [1] 0.9993387
##
## $basecval[[2]]
## [1] 0.6060235
Compare the multiple imputation results with true results:
res_noNA <- cmest(data = data_noNA, model = "rb", outcome = "Y", exposure = "A",
mediator = "M", basec = c("C1", "C2"), EMint = TRUE,
mreg = list("logistic"), yreg = "logistic",
astar = 0, a = 1, mval = list(1),
estimation = "paramfunc", inference = "delta")
summary(res_noNA)
## Causal Mediation Analysis
##
## # Outcome regression:
##
## Call:
## glm(formula = Y ~ A + M + A * M + C1 + C2, family = binomial(),
## data = getCall(x$reg.output$yreg)$data, weights = getCall(x$reg.output$yreg)$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.5681 -0.2414 -0.1741 -0.1604 3.0521
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.5731 0.7713 -4.633 3.61e-06 ***
## A 0.7084 0.5283 1.341 0.17990
## M -1.0471 0.3736 -2.803 0.00507 **
## C1 0.9337 0.6973 1.339 0.18054
## C2 -0.8415 0.1443 -5.829 5.56e-09 ***
## A:M -0.4222 0.5541 -0.762 0.44607
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2022.7 on 9999 degrees of freedom
## Residual deviance: 1965.0 on 9994 degrees of freedom
## AIC: 1977
##
## Number of Fisher Scoring iterations: 7
##
##
## # Mediator regressions:
##
## Call:
## glm(formula = M ~ A + C1 + C2, family = binomial(), data = getCall(x$reg.output$mreg[[1L]])$data,
## weights = getCall(x$reg.output$mreg[[1L]])$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.2578 0.1145 0.1647 0.2519 0.5563
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.4353 0.6410 0.679 0.49712
## A 1.7076 0.1443 11.837 < 2e-16 ***
## C1 1.9982 0.6474 3.086 0.00203 **
## C2 0.9024 0.1345 6.709 1.96e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2294.0 on 9999 degrees of freedom
## Residual deviance: 2072.5 on 9996 degrees of freedom
## AIC: 2080.5
##
## Number of Fisher Scoring iterations: 7
##
##
## # Effect decomposition on the odds ratio scale via the regression-based approach
##
## Closed-form parameter function estimation with
## delta method standard errors, confidence intervals and p-values
##
## Estimate Std.error 95% CIL 95% CIU P.val
## Rcde 1.33138 0.22278 0.95912 1.848 0.0872 .
## Rpnde 1.41988 0.23708 1.02358 1.970 0.0358 *
## Rtnde 1.34928 0.22056 0.97940 1.859 0.0669 .
## Rpnie 0.93338 0.03576 0.86587 1.006 0.0719 .
## Rtnie 0.88697 0.05295 0.78902 0.997 0.0445 *
## Rte 1.25939 0.19804 0.92535 1.714 0.1425
## ERcde 0.30416 0.20019 -0.08820 0.697 0.1287
## ERintref 0.11571 0.11472 -0.10914 0.341 0.3131
## ERintmed -0.09387 0.09323 -0.27659 0.089 0.3140
## ERpnie -0.06662 0.03576 -0.13670 0.003 0.0624 .
## ERcde(prop) 1.17262 0.23540 0.71124 1.634 6.32e-07 ***
## ERintref(prop) 0.44610 0.49051 -0.51528 1.407 0.3631
## ERintmed(prop) -0.36189 0.39881 -1.14354 0.420 0.3642
## ERpnie(prop) -0.25684 0.23474 -0.71691 0.203 0.2739
## pm -0.61872 0.50289 -1.60437 0.367 0.2186
## int 0.08421 0.09264 -0.09736 0.266 0.3633
## pe -0.17262 0.23540 -0.63400 0.289 0.4634
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Rcde: controlled direct effect odds ratio; Rpnde: pure natural direct effect odds ratio; Rtnde: total natural direct effect odds ratio; Rpnie: pure natural indirect effect odds ratio; Rtnie: total natural indirect effect odds ratio; Rte: total effect odds ratio; ERcde: excess relative risk due to controlled direct effect; ERintref: excess relative risk due to reference interaction; ERintmed: excess relative risk due to mediated interaction; ERpnie: excess relative risk due to pure natural indirect effect; ERcde(prop): proportion ERcde; ERintref(prop): proportion ERintref; ERintmed(prop): proportion ERintmed; ERpnie(prop): proportion ERpnie; pm: overall proportion mediated; int: overall proportion attributable to interaction; pe: overall proportion eliminated)
##
## Relevant variable values:
## $a
## [1] 1
##
## $astar
## [1] 0
##
## $yval
## [1] "1"
##
## $mval
## $mval[[1]]
## [1] 1
##
##
## $basecval
## $basecval[[1]]
## [1] 0.9993463
##
## $basecval[[2]]
## [1] 0.6061