vignettes/multiple_imputation.Rmd
multiple_imputation.Rmd
This example demonstrates how to estimate causal effects with
multiple imputations by using cmest
when the dataset
contains missing values. For this purpose, we simulate some data
containing a continuous baseline confounder \(C_1\), a binary baseline confounder \(C_2\), a binary exposure \(A\), a binary mediator \(M\) and a binary outcome \(Y\). The true regression models for \(A\), \(M\)
and \(Y\) are: \[logit(E(A|C_1,C_2))=0.2+0.5C_1+0.1C_2\]
\[logit(E(M|A,C_1,C_2))=1+2A+1.5C_1+0.8C_2\]
\[logit(E(Y|A,M,C_1,C_2)))=-3-0.4A-1.2M+0.5AM+0.3C_1-0.6C_2\]
To create the dataset with missing values, we randomly delete 10% of values in the dataset.
set.seed(1)
expit <- function(x) exp(x)/(1+exp(x))
n <- 10000
C1 <- rnorm(n, mean = 1, sd = 0.1)
C2 <- rbinom(n, 1, 0.6)
A <- rbinom(n, 1, expit(0.2 + 0.5*C1 + 0.1*C2))
M <- rbinom(n, 1, expit(1 + 2*A + 1.5*C1 + 0.8*C2))
Y <- rbinom(n, 1, expit(-3 - 0.4*A - 1.2*M + 0.5*A*M + 0.3*C1 - 0.6*C2))
data_noNA <- data.frame(A, M, Y, C1, C2)
missing <- sample(1:(5*n), n*0.1, replace = FALSE)
C1[missing[which(missing <= n)]] <- NA
C2[missing[which((missing > n)*(missing <= 2*n) == 1)] - n] <- NA
A[missing[which((missing > 2*n)*(missing <= 3*n) == 1)] - 2*n] <- NA
M[missing[which((missing > 3*n)*(missing <= 4*n) == 1)] - 3*n] <- NA
Y[missing[which((missing > 4*n)*(missing <= 5*n) == 1)] - 4*n] <- NA
data <- data.frame(A, M, Y, C1, C2)
The DAG for this scientific setting is:
## Registered S3 methods overwritten by 'lme4':
## method from
## cooks.distance.influence.merMod car
## influence.merMod car
## dfbeta.influence.merMod car
## dfbetas.influence.merMod car
cmdag(outcome = "Y", exposure = "A", mediator = "M",
basec = c("C1", "C2"), postc = NULL, node = TRUE, text_col = "white")
To conduct multiple imputations, we set the multimp
argument to be TRUE
. The regression-based approach is used
for illustration. The multiple imputation results:
res_multimp <- cmest(data = data, model = "rb", outcome = "Y", exposure = "A",
mediator = "M", basec = c("C1", "C2"), EMint = TRUE,
mreg = list("logistic"), yreg = "logistic",
astar = 0, a = 1, mval = list(1),
estimation = "paramfunc", inference = "delta",
multimp = TRUE, args_mice = list(m = 10))
summary(res_multimp)
## Causal Mediation Analysis
##
## # Regressions with imputed dataset 1
##
## ## Outcome regression:
##
## Call:
## glm(formula = Y ~ A + M + A * M + C1 + C2, family = binomial(),
## data = getCall(x$reg.output[[1L]]$yreg)$data, weights = getCall(x$reg.output[[1L]]$yreg)$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.6544 -0.2446 -0.1733 -0.1571 3.0818
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.6155 0.7563 -4.781 1.75e-06 ***
## A 0.8454 0.4866 1.737 0.082324 .
## M -1.1828 0.3592 -3.293 0.000993 ***
## C1 1.1130 0.6888 1.616 0.106133
## C2 -0.9115 0.1442 -6.322 2.58e-10 ***
## A:M -0.5107 0.5149 -0.992 0.321299
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2061.1 on 9999 degrees of freedom
## Residual deviance: 1986.3 on 9994 degrees of freedom
## AIC: 1998.3
##
## Number of Fisher Scoring iterations: 7
##
##
## ## Mediator regressions:
##
## Call:
## glm(formula = M ~ A + C1 + C2, family = binomial(), data = getCall(x$reg.output[[1L]]$mreg[[1L]])$data,
## weights = getCall(x$reg.output[[1L]]$mreg[[1L]])$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.2753 0.1143 0.1637 0.2465 0.5893
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.04186 0.63872 0.066 0.947748
## A 1.68676 0.14358 11.748 < 2e-16 ***
## C1 2.39526 0.64720 3.701 0.000215 ***
## C2 0.92552 0.13464 6.874 6.24e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2301.4 on 9999 degrees of freedom
## Residual deviance: 2075.6 on 9996 degrees of freedom
## AIC: 2083.6
##
## Number of Fisher Scoring iterations: 7
##
##
## # Regressions with imputed dataset 2
##
## ## Outcome regression:
##
## Call:
## glm(formula = Y ~ A + M + A * M + C1 + C2, family = binomial(),
## data = getCall(x$reg.output[[2L]]$yreg)$data, weights = getCall(x$reg.output[[2L]]$yreg)$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.5705 -0.2402 -0.1732 -0.1601 3.0503
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.5202 0.7726 -4.556 5.21e-06 ***
## A 0.7264 0.5289 1.374 0.16959
## M -1.0590 0.3737 -2.834 0.00459 **
## C1 0.8821 0.6995 1.261 0.20727
## C2 -0.8319 0.1452 -5.728 1.01e-08 ***
## A:M -0.4478 0.5548 -0.807 0.41961
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2007.3 on 9999 degrees of freedom
## Residual deviance: 1950.4 on 9994 degrees of freedom
## AIC: 1962.4
##
## Number of Fisher Scoring iterations: 7
##
##
## ## Mediator regressions:
##
## Call:
## glm(formula = M ~ A + C1 + C2, family = binomial(), data = getCall(x$reg.output[[2L]]$mreg[[1L]])$data,
## weights = getCall(x$reg.output[[2L]]$mreg[[1L]])$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.2907 0.1102 0.1616 0.2451 0.5747
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.2246 0.6457 0.348 0.727979
## A 1.7384 0.1467 11.852 < 2e-16 ***
## C1 2.2056 0.6532 3.377 0.000734 ***
## C2 0.9561 0.1366 6.998 2.59e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2264.5 on 9999 degrees of freedom
## Residual deviance: 2034.0 on 9996 degrees of freedom
## AIC: 2042
##
## Number of Fisher Scoring iterations: 7
##
##
## # Regressions with imputed dataset 3
##
## ## Outcome regression:
##
## Call:
## glm(formula = Y ~ A + M + A * M + C1 + C2, family = binomial(),
## data = getCall(x$reg.output[[3L]]$yreg)$data, weights = getCall(x$reg.output[[3L]]$yreg)$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.6001 -0.2442 -0.1722 -0.1596 3.0502
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.4724 0.7663 -4.532 5.85e-06 ***
## A 0.8326 0.5111 1.629 0.10332
## M -1.0319 0.3731 -2.766 0.00568 **
## C1 0.8459 0.6935 1.220 0.22259
## C2 -0.8750 0.1443 -6.063 1.34e-09 ***
## A:M -0.5568 0.5375 -1.036 0.30027
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2038.1 on 9999 degrees of freedom
## Residual deviance: 1974.7 on 9994 degrees of freedom
## AIC: 1986.7
##
## Number of Fisher Scoring iterations: 7
##
##
## ## Mediator regressions:
##
## Call:
## glm(formula = M ~ A + C1 + C2, family = binomial(), data = getCall(x$reg.output[[3L]]$mreg[[1L]])$data,
## weights = getCall(x$reg.output[[3L]]$mreg[[1L]])$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.2757 0.1151 0.1629 0.2471 0.5941
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.0388 0.6391 -0.061 0.951589
## A 1.6807 0.1436 11.706 < 2e-16 ***
## C1 2.4881 0.6482 3.838 0.000124 ***
## C2 0.9087 0.1344 6.762 1.36e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2301.4 on 9999 degrees of freedom
## Residual deviance: 2077.1 on 9996 degrees of freedom
## AIC: 2085.1
##
## Number of Fisher Scoring iterations: 7
##
##
## # Regressions with imputed dataset 4
##
## ## Outcome regression:
##
## Call:
## glm(formula = Y ~ A + M + A * M + C1 + C2, family = binomial(),
## data = getCall(x$reg.output[[4L]]$yreg)$data, weights = getCall(x$reg.output[[4L]]$yreg)$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.5675 -0.2413 -0.1751 -0.1606 3.0619
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.6023 0.7713 -4.671 3.00e-06 ***
## A 0.6911 0.5286 1.307 0.19111
## M -1.0792 0.3742 -2.884 0.00392 **
## C1 0.9686 0.6978 1.388 0.16515
## C2 -0.8408 0.1445 -5.817 5.98e-09 ***
## A:M -0.3692 0.5548 -0.665 0.50575
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2022.7 on 9999 degrees of freedom
## Residual deviance: 1964.0 on 9994 degrees of freedom
## AIC: 1976
##
## Number of Fisher Scoring iterations: 7
##
##
## ## Mediator regressions:
##
## Call:
## glm(formula = M ~ A + C1 + C2, family = binomial(), data = getCall(x$reg.output[[4L]]$mreg[[1L]])$data,
## weights = getCall(x$reg.output[[4L]]$mreg[[1L]])$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.2808 0.1111 0.1632 0.2465 0.5690
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.3146 0.6450 0.488 0.62577
## A 1.7271 0.1458 11.845 < 2e-16 ***
## C1 2.1044 0.6523 3.226 0.00125 **
## C2 0.9591 0.1360 7.054 1.74e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2279.3 on 9999 degrees of freedom
## Residual deviance: 2048.9 on 9996 degrees of freedom
## AIC: 2056.9
##
## Number of Fisher Scoring iterations: 7
##
##
## # Regressions with imputed dataset 5
##
## ## Outcome regression:
##
## Call:
## glm(formula = Y ~ A + M + A * M + C1 + C2, family = binomial(),
## data = getCall(x$reg.output[[5L]]$yreg)$data, weights = getCall(x$reg.output[[5L]]$yreg)$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.5604 -0.2433 -0.1709 -0.1579 3.0599
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.5033 0.7704 -4.547 5.44e-06 ***
## A 0.6653 0.5285 1.259 0.20812
## M -1.0524 0.3737 -2.816 0.00486 **
## C1 0.8870 0.6969 1.273 0.20308
## C2 -0.8892 0.1456 -6.105 1.03e-09 ***
## A:M -0.3842 0.5545 -0.693 0.48840
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2015.0 on 9999 degrees of freedom
## Residual deviance: 1953.6 on 9994 degrees of freedom
## AIC: 1965.6
##
## Number of Fisher Scoring iterations: 7
##
##
## ## Mediator regressions:
##
## Call:
## glm(formula = M ~ A + C1 + C2, family = binomial(), data = getCall(x$reg.output[[5L]]$mreg[[1L]])$data,
## weights = getCall(x$reg.output[[5L]]$mreg[[1L]])$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.2809 0.1115 0.1641 0.2454 0.5732
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.2656 0.6424 0.413 0.679244
## A 1.7167 0.1452 11.827 < 2e-16 ***
## C1 2.1534 0.6495 3.315 0.000916 ***
## C2 0.9635 0.1360 7.086 1.38e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2279.3 on 9999 degrees of freedom
## Residual deviance: 2050.7 on 9996 degrees of freedom
## AIC: 2058.7
##
## Number of Fisher Scoring iterations: 7
##
##
## # Regressions with imputed dataset 6
##
## ## Outcome regression:
##
## Call:
## glm(formula = Y ~ A + M + A * M + C1 + C2, family = binomial(),
## data = getCall(x$reg.output[[6L]]$yreg)$data, weights = getCall(x$reg.output[[6L]]$yreg)$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.5671 -0.2411 -0.1711 -0.1576 3.0601
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.4978 0.7662 -4.565 4.99e-06 ***
## A 0.6005 0.5175 1.160 0.24591
## M -1.1364 0.3582 -3.172 0.00151 **
## C1 0.9577 0.7003 1.368 0.17145
## C2 -0.8730 0.1455 -5.999 1.98e-09 ***
## A:M -0.3284 0.5441 -0.604 0.54612
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2007.3 on 9999 degrees of freedom
## Residual deviance: 1945.8 on 9994 degrees of freedom
## AIC: 1957.8
##
## Number of Fisher Scoring iterations: 7
##
##
## ## Mediator regressions:
##
## Call:
## glm(formula = M ~ A + C1 + C2, family = binomial(), data = getCall(x$reg.output[[6L]]$mreg[[1L]])$data,
## weights = getCall(x$reg.output[[6L]]$mreg[[1L]])$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.2673 0.1146 0.1624 0.2531 0.5829
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.1661 0.6337 0.262 0.793244
## A 1.7269 0.1437 12.015 < 2e-16 ***
## C1 2.2465 0.6415 3.502 0.000461 ***
## C2 0.9027 0.1333 6.775 1.25e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2330.8 on 9999 degrees of freedom
## Residual deviance: 2097.8 on 9996 degrees of freedom
## AIC: 2105.8
##
## Number of Fisher Scoring iterations: 7
##
##
## # Regressions with imputed dataset 7
##
## ## Outcome regression:
##
## Call:
## glm(formula = Y ~ A + M + A * M + C1 + C2, family = binomial(),
## data = getCall(x$reg.output[[7L]]$yreg)$data, weights = getCall(x$reg.output[[7L]]$yreg)$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.5652 -0.2426 -0.1738 -0.1608 3.0640
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.4733 0.7707 -4.507 6.58e-06 ***
## A 0.6897 0.5285 1.305 0.19189
## M -1.1095 0.3750 -2.958 0.00309 **
## C1 0.8568 0.6994 1.225 0.22061
## C2 -0.8486 0.1449 -5.855 4.77e-09 ***
## A:M -0.3552 0.5551 -0.640 0.52227
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2015.0 on 9999 degrees of freedom
## Residual deviance: 1955.7 on 9994 degrees of freedom
## AIC: 1967.7
##
## Number of Fisher Scoring iterations: 7
##
##
## ## Mediator regressions:
##
## Call:
## glm(formula = M ~ A + C1 + C2, family = binomial(), data = getCall(x$reg.output[[7L]]$mreg[[1L]])$data,
## weights = getCall(x$reg.output[[7L]]$mreg[[1L]])$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.2870 0.1124 0.1613 0.2440 0.5882
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.03183 0.64602 0.049 0.960703
## A 1.70552 0.14538 11.731 < 2e-16 ***
## C1 2.41571 0.65432 3.692 0.000223 ***
## C2 0.93227 0.13603 6.853 7.22e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2264.5 on 9999 degrees of freedom
## Residual deviance: 2039.8 on 9996 degrees of freedom
## AIC: 2047.8
##
## Number of Fisher Scoring iterations: 7
##
##
## # Regressions with imputed dataset 8
##
## ## Outcome regression:
##
## Call:
## glm(formula = Y ~ A + M + A * M + C1 + C2, family = binomial(),
## data = getCall(x$reg.output[[8L]]$yreg)$data, weights = getCall(x$reg.output[[8L]]$yreg)$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.5849 -0.2491 -0.1701 -0.1614 3.0541
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.0819 0.7609 -4.050 5.12e-05 ***
## A 0.7038 0.5002 1.407 0.159435
## M -1.1883 0.3593 -3.307 0.000942 ***
## C1 0.5817 0.6971 0.834 0.404029
## C2 -0.8908 0.1449 -6.147 7.89e-10 ***
## A:M -0.3894 0.5279 -0.738 0.460721
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2030.4 on 9999 degrees of freedom
## Residual deviance: 1963.7 on 9994 degrees of freedom
## AIC: 1975.7
##
## Number of Fisher Scoring iterations: 7
##
##
## ## Mediator regressions:
##
## Call:
## glm(formula = M ~ A + C1 + C2, family = binomial(), data = getCall(x$reg.output[[8L]]$mreg[[1L]])$data,
## weights = getCall(x$reg.output[[8L]]$mreg[[1L]])$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.2753 0.1142 0.1640 0.2446 0.5847
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.06976 0.64465 0.108 0.913822
## A 1.67172 0.14392 11.615 < 2e-16 ***
## C1 2.37882 0.65253 3.646 0.000267 ***
## C2 0.93134 0.13539 6.879 6.03e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2279.3 on 9999 degrees of freedom
## Residual deviance: 2058.5 on 9996 degrees of freedom
## AIC: 2066.5
##
## Number of Fisher Scoring iterations: 7
##
##
## # Regressions with imputed dataset 9
##
## ## Outcome regression:
##
## Call:
## glm(formula = Y ~ A + M + A * M + C1 + C2, family = binomial(),
## data = getCall(x$reg.output[[9L]]$yreg)$data, weights = getCall(x$reg.output[[9L]]$yreg)$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.5696 -0.2417 -0.1702 -0.1580 3.0560
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.4477 0.7728 -4.461 8.14e-06 ***
## A 0.6996 0.5294 1.322 0.18631
## M -1.0701 0.3737 -2.863 0.00419 **
## C1 0.8415 0.7009 1.201 0.22994
## C2 -0.8754 0.1461 -5.993 2.07e-09 ***
## A:M -0.4294 0.5554 -0.773 0.43942
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1999.6 on 9999 degrees of freedom
## Residual deviance: 1939.4 on 9994 degrees of freedom
## AIC: 1951.4
##
## Number of Fisher Scoring iterations: 7
##
##
## ## Mediator regressions:
##
## Call:
## glm(formula = M ~ A + C1 + C2, family = binomial(), data = getCall(x$reg.output[[9L]]$mreg[[1L]])$data,
## weights = getCall(x$reg.output[[9L]]$mreg[[1L]])$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.2908 0.1099 0.1594 0.2463 0.5717
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.2425 0.6477 0.374 0.708141
## A 1.7544 0.1476 11.889 < 2e-16 ***
## C1 2.1962 0.6549 3.354 0.000798 ***
## C2 0.9334 0.1367 6.830 8.51e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2249.6 on 9999 degrees of freedom
## Residual deviance: 2021.0 on 9996 degrees of freedom
## AIC: 2029
##
## Number of Fisher Scoring iterations: 7
##
##
## # Regressions with imputed dataset 10
##
## ## Outcome regression:
##
## Call:
## glm(formula = Y ~ A + M + A * M + C1 + C2, family = binomial(),
## data = getCall(x$reg.output[[10L]]$yreg)$data, weights = getCall(x$reg.output[[10L]]$yreg)$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.5648 -0.2429 -0.1716 -0.1598 3.0392
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.4182 0.7682 -4.450 8.59e-06 ***
## A 0.6923 0.5289 1.309 0.19056
## M -1.0344 0.3727 -2.775 0.00552 **
## C1 0.8076 0.6967 1.159 0.24639
## C2 -0.8595 0.1447 -5.940 2.85e-09 ***
## A:M -0.4509 0.5541 -0.814 0.41578
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2022.7 on 9999 degrees of freedom
## Residual deviance: 1964.4 on 9994 degrees of freedom
## AIC: 1976.4
##
## Number of Fisher Scoring iterations: 7
##
##
## ## Mediator regressions:
##
## Call:
## glm(formula = M ~ A + C1 + C2, family = binomial(), data = getCall(x$reg.output[[10L]]$mreg[[1L]])$data,
## weights = getCall(x$reg.output[[10L]]$mreg[[1L]])$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.2811 0.1121 0.1615 0.2459 0.5713
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.2112 0.6472 0.326 0.744143
## A 1.7095 0.1461 11.700 < 2e-16 ***
## C1 2.2442 0.6551 3.426 0.000613 ***
## C2 0.9230 0.1362 6.779 1.21e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2257.0 on 9999 degrees of freedom
## Residual deviance: 2035.2 on 9996 degrees of freedom
## AIC: 2043.2
##
## Number of Fisher Scoring iterations: 7
##
##
## # Effect decomposition on the odds ratio scale via the regression-based approach with multiple imputation
##
## Closed-form parameter function estimation with
## delta method standard errors, confidence intervals and p-values
##
## Estimate Std.error 95% CIL 95% CIU P.val
## Rcde 1.33974 0.22906 0.95827 1.873 0.0871 .
## Rpnde 1.43121 0.24439 1.02413 2.000 0.0358 *
## Rtnde 1.35828 0.22703 0.97885 1.885 0.0669 .
## Rpnie 0.92992 0.03662 0.86084 1.005 0.0651 .
## Rtnie 0.88254 0.05397 0.78285 0.995 0.0410 *
## Rte 1.26309 0.20237 0.92269 1.729 0.1449
## ERcde 0.31081 0.20477 -0.09053 0.712 0.1291
## ERintref 0.12117 0.11823 -0.11056 0.353 0.3055
## ERintmed -0.09834 0.09609 -0.28667 0.090 0.3061
## ERpnie -0.07006 0.03661 -0.14182 0.002 0.0556 .
## ERcde(prop) 1.18229 0.24912 0.69402 1.671 2.08e-06 ***
## ERintref(prop) 0.46292 0.50976 -0.53619 1.462 0.3638
## ERintmed(prop) -0.37585 0.41497 -1.18916 0.437 0.3651
## ERpnie(prop) -0.26937 0.25241 -0.76409 0.225 0.2859
## pm -0.64522 0.53551 -1.69480 0.404 0.2283
## int 0.08708 0.09587 -0.10083 0.275 0.3638
## pe -0.18229 0.24912 -0.67057 0.306 0.4643
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Rcde: controlled direct effect odds ratio; Rpnde: pure natural direct effect odds ratio; Rtnde: total natural direct effect odds ratio; Rpnie: pure natural indirect effect odds ratio; Rtnie: total natural indirect effect odds ratio; Rte: total effect odds ratio; ERcde: excess relative risk due to controlled direct effect; ERintref: excess relative risk due to reference interaction; ERintmed: excess relative risk due to mediated interaction; ERpnie: excess relative risk due to pure natural indirect effect; ERcde(prop): proportion ERcde; ERintref(prop): proportion ERintref; ERintmed(prop): proportion ERintmed; ERpnie(prop): proportion ERpnie; pm: overall proportion mediated; int: overall proportion attributable to interaction; pe: overall proportion eliminated)
##
## Relevant variable values:
## $a
## [1] 1
##
## $astar
## [1] 0
##
## $yval
## [1] "1"
##
## $mval
## $mval[[1]]
## [1] 1
##
##
## $basecval
## $basecval[[1]]
## [1] 0.9993387
##
## $basecval[[2]]
## [1] 0.6060235
Compare the multiple imputation results with true results:
res_noNA <- cmest(data = data_noNA, model = "rb", outcome = "Y", exposure = "A",
mediator = "M", basec = c("C1", "C2"), EMint = TRUE,
mreg = list("logistic"), yreg = "logistic",
astar = 0, a = 1, mval = list(1),
estimation = "paramfunc", inference = "delta")
summary(res_noNA)
## Causal Mediation Analysis
##
## # Outcome regression:
##
## Call:
## glm(formula = Y ~ A + M + A * M + C1 + C2, family = binomial(),
## data = getCall(x$reg.output$yreg)$data, weights = getCall(x$reg.output$yreg)$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.5681 -0.2414 -0.1741 -0.1604 3.0521
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.5731 0.7713 -4.633 3.61e-06 ***
## A 0.7084 0.5283 1.341 0.17990
## M -1.0471 0.3736 -2.803 0.00507 **
## C1 0.9337 0.6973 1.339 0.18054
## C2 -0.8415 0.1443 -5.829 5.56e-09 ***
## A:M -0.4222 0.5541 -0.762 0.44607
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2022.7 on 9999 degrees of freedom
## Residual deviance: 1965.0 on 9994 degrees of freedom
## AIC: 1977
##
## Number of Fisher Scoring iterations: 7
##
##
## # Mediator regressions:
##
## Call:
## glm(formula = M ~ A + C1 + C2, family = binomial(), data = getCall(x$reg.output$mreg[[1L]])$data,
## weights = getCall(x$reg.output$mreg[[1L]])$weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.2578 0.1145 0.1647 0.2519 0.5563
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.4353 0.6410 0.679 0.49712
## A 1.7076 0.1443 11.837 < 2e-16 ***
## C1 1.9982 0.6474 3.086 0.00203 **
## C2 0.9024 0.1345 6.709 1.96e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2294.0 on 9999 degrees of freedom
## Residual deviance: 2072.5 on 9996 degrees of freedom
## AIC: 2080.5
##
## Number of Fisher Scoring iterations: 7
##
##
## # Effect decomposition on the odds ratio scale via the regression-based approach
##
## Closed-form parameter function estimation with
## delta method standard errors, confidence intervals and p-values
##
## Estimate Std.error 95% CIL 95% CIU P.val
## Rcde 1.33138 0.22278 0.95912 1.848 0.0872 .
## Rpnde 1.41988 0.23708 1.02358 1.970 0.0358 *
## Rtnde 1.34928 0.22056 0.97940 1.859 0.0669 .
## Rpnie 0.93338 0.03576 0.86587 1.006 0.0719 .
## Rtnie 0.88697 0.05295 0.78902 0.997 0.0445 *
## Rte 1.25939 0.19804 0.92535 1.714 0.1425
## ERcde 0.30416 0.20019 -0.08820 0.697 0.1287
## ERintref 0.11571 0.11472 -0.10914 0.341 0.3131
## ERintmed -0.09387 0.09323 -0.27659 0.089 0.3140
## ERpnie -0.06662 0.03576 -0.13670 0.003 0.0624 .
## ERcde(prop) 1.17262 0.23540 0.71124 1.634 6.32e-07 ***
## ERintref(prop) 0.44610 0.49051 -0.51528 1.407 0.3631
## ERintmed(prop) -0.36189 0.39881 -1.14354 0.420 0.3642
## ERpnie(prop) -0.25684 0.23474 -0.71691 0.203 0.2739
## pm -0.61872 0.50289 -1.60437 0.367 0.2186
## int 0.08421 0.09264 -0.09736 0.266 0.3633
## pe -0.17262 0.23540 -0.63400 0.289 0.4634
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Rcde: controlled direct effect odds ratio; Rpnde: pure natural direct effect odds ratio; Rtnde: total natural direct effect odds ratio; Rpnie: pure natural indirect effect odds ratio; Rtnie: total natural indirect effect odds ratio; Rte: total effect odds ratio; ERcde: excess relative risk due to controlled direct effect; ERintref: excess relative risk due to reference interaction; ERintmed: excess relative risk due to mediated interaction; ERpnie: excess relative risk due to pure natural indirect effect; ERcde(prop): proportion ERcde; ERintref(prop): proportion ERintref; ERintmed(prop): proportion ERintmed; ERpnie(prop): proportion ERpnie; pm: overall proportion mediated; int: overall proportion attributable to interaction; pe: overall proportion eliminated)
##
## Relevant variable values:
## $a
## [1] 1
##
## $astar
## [1] 0
##
## $yval
## [1] "1"
##
## $mval
## $mval[[1]]
## [1] 1
##
##
## $basecval
## $basecval[[1]]
## [1] 0.9993463
##
## $basecval[[2]]
## [1] 0.6061