alexanderrobitzsch / miceadds Goto Github PK

View Code? Open in Web Editor NEW

16.0 2.0 2.0 2.75 MB

Some Additional Multiple Imputation Functions, Especially for 'mice'.

Home Page: https://alexanderrobitzsch.github.io/miceadds/

R 67.99% C++ 31.97% C 0.04%

missing-data multiple-imputation

miceadds's Introduction

miceadds

Some Additional Multiple Imputation Functions, Especially for 'mice'

If you use miceadds and have suggestions for improvement or have found bugs, please email me at [email protected]. Please always provide a minimal dataset, necessary to demonstrate the problem, a minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and all necessary information on the used librarys, the R version, and the OS it is run on, perhaps a sessionInfo().

Manual

The manual can be found here https://alexanderrobitzsch.github.io/miceadds/

CRAN version `miceadds` 3.17-44 (2024-01-08)

The official version of miceadds is hosted on CRAN and can be found here. The CRAN version can be installed from within R using:

utils::install.packages("miceadds")

GitHub version `miceadds` 3.18-2 (2024-01-10)

The version hosted here is the development version of miceadds. The GitHub version can be installed from within R using:

devtools::install_github("alexanderrobitzsch/miceadds")

miceadds's People

Contributors

Stargazers

Watchers

Forkers

stefvanbuuren gabehassler

miceadds's Issues

`visitSequence` in `mice 3.0` changes to character

The visitSequence argument in mice 3.0 will change from numerical to character, but miceadds 2.11-80 seems to use the numerical nature:

devtools::install_github(repo = "stefvanbuuren/mice", ref = "dev")
library(miceadds)
library(Amelia)
data(nhanes,package="mice")
set.seed(566)
a.out <- Amelia::amelia(x =  nhanes , m=10)
a.mids <- miceadds::datlist2mids( a.out$imputations )

Error in 0 * imp1$visitSequence : non-numeric argument to binary operator

Could you fix this?

warning message when using datalist2mids

When using datalist2mids I get this error I don't know what is about.

Warning message in data.matrix(x):
“NAs introduced by coercion”Warning message in data.matrix(x):
“NAs introduced by coercion”Warning message in data.matrix(x):
“NAs introduced by coercion”Warning message in data.matrix(x):
“NAs introduced by coercion”

Any hints?

mice.impute.bygroup - outs of bounds

I ran the examples. It works well. When I tried mine... it got this message:

utilError in x[, group_vname] : subscript out of bounds

Thank you very much for your advice.

Sandro

visitSequence.determine character vector

modify function visitSequence.determine to output character vector instead of numeric vector due to changes in mice (>= 3.0)

datlist2mids fails after subsetting if factors don't still contain all levels

I encountered the following error when trying to subset a mids object that contain factor levels. If some of the factors levels are missing after subsetting then the resulting datlist object cannot be converted back to a mids object. This is particularly a problem if the subsetting variable is itself a factor. Is there a way around this?

library(miceadds)
#> Loading required package: mice
#> Loading required package: lattice
#> 
#> Attaching package: 'mice'
#> The following objects are masked from 'package:base':
#> 
#>     cbind, rbind
#> * miceadds 3.7-6 (2019-12-15 13:38:43)
data(nhanes2)
nhanes2_imp  <- mice(nhanes2, seed = 12345, printFlag = FALSE)
nhanes2_list <- mids2datlist(nhanes2_imp)
nhanes2_sub  <- subset_datlist(nhanes2_list, subset = nhanes2$age == "20-39")
datlist2mids(nhanes2_sub)
#> Error in Ops.factor(left, right): level sets of factors are different

^{Created on 2020-01-15 by the reprex package (v0.3.0)}

Interaction of new `where` argument with `miceadds`

Dear Alexander,

I am currently working on a new argument to the mice function, called where, which allows the user to specify which cells to impute. There is a develop branch at my account (https://github.com/stefvanbuuren/mice/tree/where/R).

One of the consequences is that I need to extend all univariate imputation functions with a wy argument. I've been careful not to break any existing code, but I've found that functions mice.impute.2lonly.pmm and mice.impute.2lonly.norm became dysfunctional. The problem is that wy cannot be passed if there is a multilevel structure, and needs to be broken down into classes, just like you did with ry. I have fixed both functions, but it might be that the behaviour may also show up in similar functions you incorporated into miceadds. Would you mind taking a look at whether this indeed the case, and see whether my solution could also be used in miceadds?

I have not been able to include the wy argument into mice.impute.2l.pan since I would need to overwrite (part of) the data in y1 before entering pan. Any ideas how to fix that?

Best wishes, Stef.

post hoc test for mi.anova

Hello,

thank you for the mi.anova function, it makes it very easy to calculate an ANOVA with pooled imputed data sets. However, I was wonderung if there is a possibility to perform a post hoc test with all imputed data sets. So far, I have just figured out how to do a post hoc test when I select one of the imputed data sets.

Thanks a lot!

Question on using mice.impute.imputeR.lmFun

Hello,

I would like to try out forward step-wise regression as a model-building strategy for the imputation models. From the mice.impute.imputeR.lmFun documentation, I understand that I can use the imputeR::stepForR() method as a univariate imputation in the mice algorithm.

If I follow the example reported in the documentation on the mice.impute.imputeR.lmFun, I would set up the code like this:

# Load packages
library(mice)
library(miceadds)
library(ridge)
library(imputeR)

# Load data
data(nhanes, package = "mice")
dat <- nhanes

# Make hyp a factor
dat$hyp <- as.factor(dat$hyp)

# Define general imputation method 
method <- c(
    age = "",
    bmi = "norm",
    hyp = "imputeR.cFun",
    chl = "imputeR.lmFun" # use one of the imputeR methods for cont. vars
)

# Define specifics of imputeR methods
Fun <- list(
    hyp = imputeR::ridgeC,
    chl = imputeR::stepForR # use forward step-wise method from imputeR
)

# Usual run of mice with and extra "Fun" argument
imp <- mice::mice(dat, method = method, maxit = 10, m = 4, Fun = Fun)

My understanding is that:

The mice algorithm will estimate the forward stepwise regression to predict "chl" on a bootstrap version of the observed values on "chl" at every iteration.
The predictors that the forward stepwise algorithm will scan are the ones defined by the predictor matrix provided in the mice call (the default one in this case).
At every iteration, the step-wise algorithm might select different predictors for the imputation model of "chl".

Do I understand this correctly?

not able to impute using ml.lmer in v 3.3?

I'm trying to impute missing values in Level 1 and Level 2 variables using the ml.lmer command which was working in miceadds v(3.2), but keeps throwing an error since the update yesterday.

Error message :
Error in check.method(method = method, data = data, where = where, blocks = blocks, : The following functions were not found: mice.impute.ml.lmer, mice.impute.ml.lmer

How can I resolve this?

subscript out of bounds

df<-structure(list(Zone = structure(c(1L, 2L, 1L, 1L, 1L, 2L, 2L, 
               3L, 3L), .Label = c("A", "B", "C"), class = "factor"), Y = c(2L, 
               3L, 5L, 4L, 3L, 5L, 7L, 8L, 10L), Cat1 = structure(c(1L, 1L, 
               1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("Fact1", "Fact2"), class = "factor"), 
               Cat2 = structure(c(1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L), .Label = c("Level1", 
               "Level2", "Level3"), class = "factor")), class = "data.frame", row.names = c(NA,-9L))
Fitglm<-glm(Y~Cat1*Cat2,df,family="gaussian")
summary(Fitglm)
Fitglm.cluster<-glm.cluster(df,Y~Cat1*Cat2,cluster="Zone",family="gaussian")
summary(Fitglm.cluster)
length(Fitglm.cluster$glm_res$coefficients)
Fitglm.cluster$glm_res$coefficients
length(Fitglm.cluster$vcov)
Fitglm.cluster$vcov

This re-produces the error. I think that it is down to the coefficient parameter vector contains the NA's (length 6) but the variance-covaraince doesn't (length 4x4). The summary from Fitglm shows this whilst Fitglm.cluster "crashes" out.

glm.cluster reliant on/generates NULL wgt__ value

I have glm.cluster embedded in a larger function. When I run the function (see below) in a new R session, I get the following error:
Error in eval(extras, data, env) : object 'wgt_' not found

When I run glm.cluster(...) with the parameters saved as variables but without assigning the output, it automatically saves wgt__ as a value in the session environment. This does not return an error and I can then run the overarching function without errors.

It seems to hinge on whether or not wgt__ is saved in the environment - as soon as I remove this from the environment, the error message returns. I can also run the function in a new session by assigning
wgt__ <- NULL before the function call.

I've included example code and some fake data which generates the error.

R Version: 3.6.0
Mac OSX 10.14.5

Minimal working example

`library(tidyverse)
library(miceadds)

example_data <- read_csv("example_data.csv")

cluster_glm <- function(data, formula, cluster, type) {

mod <- glm.cluster(formula = formula,
data = data,
cluster = cluster,
weights = NULL,
family = if(type == "logit") {
binomial(link="logit")
} else {
gaussian
}
) %>%
summary(.) %>%
as.data.frame(.)

return(mod)
}

mod_basic_vote <- cluster_glm(formula = "Y ~ X1 + X2 + X3",
data = example_data,
type = "logit",
cluster = "pid")

wgt__ <- NULL

mod_basic_vote <- cluster_glm(formula = "Y ~ X1 + X2 + X3",
data = example_data,
type = "logit",
cluster = "pid")`
mwe.zip

Error when using 2lonly.function with polyreg

Thanks for miceadds package. I'm using miceadds 3.5-14 with mice 3.6.0 in R 3.6.1.

When I try to use 2lonly.function with polyreg, I get the error missing values in pred not allowed. This doesn't happen if only using 2lonly.function with logreg.

Example is given below. Please let me know if I'm doing something incorrectly.

library(mice)
library(miceadds)

set.seed(1)

n.subjects <- 20
n.obs.per.subject <- 4

# x1 varies within subject (e.g. age)
# x2, x3, x4 and x5 are constant within subject

d <- data.frame(subject = rep(1:n.subjects, each = n.obs.per.subject),
                x1 = round(rep(c(10, 15, 20, 25), n.subjects) +
                           rnorm(n.subjects*n.obs.per.subject, 0, 0.5), 2),
                x2 = rep(sample(0:1, n.subjects, replace=TRUE), each = n.obs.per.subject),
                x3 = factor(rep(sample(0:2, n.subjects, replace=TRUE),
                                each = n.obs.per.subject)),
                x4 = round(rep(rnorm(n.subjects), each = n.obs.per.subject), 2),
                x5 = round(rep(rnorm(n.subjects), each = n.obs.per.subject), 2),
                y = round(rnorm(n.subjects*n.obs.per.subject), 2))


d[d$subject %in% 1:3, "x2"] <- NA
d[d$subject %in% 1:5, "x3"] <- NA
d[d$subject %in% 4:7, "x4"] <- NA
d[d$subject %in% 6:9, "x5"] <- NA
d[d$subject %in% 9:12, "y"] <- NA
d[c(1:2, 17:18), "y"] <- NA

methA <- c(subject = "",
           x1 = "",
           x2 = "2lonly.function",
           x3 = "2lonly.function",
           x4 = "2lonly.pmm",
           x5 = "2lonly.pmm",
           y = "2l.pan")

predMat <- matrix(1, nrow = ncol(d), ncol = ncol(d))
colnames(predMat) <- rownames(predMat) <- names(d)
predMat[, "subject"] <- -2
predMat["y", "x1"] <- 4
diag(predMat) <- 0

predMat
##         subject x1 x2 x3 x4 x5 y
## subject       0  1  1  1  1  1 1
## x1           -2  0  1  1  1  1 1
## x2           -2  1  0  1  1  1 1
## x3           -2  1  1  0  1  1 1
## x4           -2  1  1  1  0  1 1
## x5           -2  1  1  1  1  0 1
## y            -2  4  1  1  1  1 0

miceA <- mice(data = d,
              method = methA,
              predictorMatrix = predMat,
              imputationFunction = list(x2 = "logreg", x3 = "polyreg"),
              cluster_var = list(x2 = "subject", x3 = "subject"),
              seed = 1)

## iter imp variable
##  1   1  x2  x3  x4  x5  y
##  1   2  x2  x3  x4  x5  yError in pan::pan(y1, subj, pred, xcol, zcol, prior, seed = s1, iter = paniter) :
##  missing values in pred not allowed

# We also get the error if predMat["y", "x1"] is 1, 2 or 3.

# Different seed values in the call to mice() may get through different numbers of imputed data sets
# before producing the error.

# For comparison, use "polyreg" for x3 instead of "2lonly.function"
methB <- c(subject = "",
           x1 = "",
           x2 = "2lonly.function",
           x3 = "polyreg",
           x4 = "2lonly.pmm",
           x5 = "2lonly.pmm",
           y = "2l.pan")

miceB <- mice(data = d,
              method = methB,
              predictorMatrix = predMat,
              imputationFunction = list(x2 = "logreg"),
              cluster_var = list(x2 = "subject"))

# No error

summary() subscript out of bounds

I have run a glm.cluster from the quassipoisson family. When I try to get a summary I get this message:

Error in [<-(*tmp*, , "z value", value = csmod[, "Estimate"]/csmod[, :
subscript out of bounds

I can extract the information I need from the data structure, but it would be good to have formatted output from R.

R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] multiwayvcov_1.2.3 miceadds_2.14-26 mice_3.3.0 lattice_0.20-35

loaded via a namespace (and not attached):
[1] lavaan_0.6-3 zoo_1.8-3 tidyselect_0.2.4 mitools_2.3
[5] purrr_0.2.5 splines_3.5.0 stats4_3.5.0 yaml_2.2.0
[9] mgcv_1.8-24 pan_1.6 survival_2.42-6 rlang_0.2.2
[13] jomo_2.6-4 nloptr_1.0.4 pillar_1.3.0 glue_1.3.0
[17] lavaan.survey_1.1.3.1 bindrcpp_0.2.2 bindr_0.1.1 mirt_1.29
[21] GPArotation_2014.11-1 CDM_6.6-5 mvtnorm_1.0-8 coda_0.19-1
[25] permute_0.9-4 dcurver_0.9.1 sirt_2.7-50 parallel_3.5.0
[29] broom_0.5.0 Rcpp_0.12.18 backports_1.1.2 vegan_2.5-2
[33] lme4_1.1-18-1 Deriv_3.8.5 polycor_0.7-9 mnormt_1.5-5
[37] dplyr_0.7.6 survey_3.33-2 grid_3.5.0 tools_3.5.0
[41] sandwich_2.5-0 magrittr_1.5 tibble_1.4.2 cluster_2.0.7-1
[45] crayon_1.3.4 tidyr_0.8.1 TAM_2.13-15 pbivnorm_0.6.0
[49] pkgconfig_2.0.2 MASS_7.3-50 Matrix_1.2-14 assertthat_0.2.0
[53] minqa_1.2.4 rstudioapi_0.7 boot_1.3-20 mitml_0.3-6
[57] R6_2.2.2 rpart_4.1-13 sfsmisc_1.1-2 nnet_7.3-12
[61] nlme_3.1-137 compiler_3.5.0

Error when combining correlations

There seems to be be an error when running the code on pages 101 to 102 of the miceadds manual. Here is the code:

library(mice)
data(nhanes, package="mice")
set.seed(9090)
# nhanes data in one chain
imp.mi <- miceadds::mice.1chain( nhanes, burnin=5, iter=20, Nimp=4,
                                 method=rep("norm", 4 ) )
# correlation coefficients of variables 4, 2 and 3 (indexed in nhanes data)
res <- miceadds::micombine.cor(mi.res=imp.mi, variables=c(4,2,3) )
## variable1 variable2 r rse fisher_r fisher_rse fmi t p
## 1 chl bmi 0.2458 0.2236 0.2510 0.2540 0.3246 0.9879 0.3232
## 2 chl hyp 0.2286 0.2152 0.2327 0.2413 0.2377 0.9643 0.3349
## 3 bmi hyp -0.0084 0.2198 -0.0084 0.2351 0.1904 -0.0358 0.9714
## lower95 upper95
## 1 -0.2421 0.6345
## 2 -0.2358 0.6080
## 3 -0.4376 0.4239
# extract matrix with correlations and its standard errors
attr(res, "r_matrix")
attr(res, "rse_matrix")
# inference for covariance
res2 <- miceadds::micombine.cov(mi.res=imp.mi, variables=c(4,2,3) )
# inference can also be conducted for non-imputed data
res3 <- miceadds::micombine.cov(mi.res=nhanes, variables=c(4,2,3) )
#############################################################################
# EXAMPLE 2: nhanes data | comparing different correlation coefficients
#############################################################################
library(psych)
library(mitools)
# imputing data
imp1 <- mice::mice( nhanes, method=rep("norm", 4 ) )
summary(imp1)
#*** Pearson correlation
res1 <- miceadds::micombine.cor(mi.res=imp1, variables=c(4,2) )
#*** Spearman rank correlation
res2 <- miceadds::micombine.cor(mi.res=imp1, variables=c(4,2), method="spearman")
#*** Kendalls tau
# test of computation of tau for first imputed dataset
dat1 <- mice::complete(imp1, action=1)
tau1 <- psych::corr.test(x=dat1[,c(4,2)], method="kendall")
tau1$r[1,2] # estimate
tau1$se # standard error
# results of Kendalls tau for all imputed datasets
res3 <- with( data=imp1,
              expr=psych::corr.test( x=cbind( chl, bmi ), method="kendall") )
# extract estimates
betas <- lapply( res3$analyses, FUN=function(ll){ ll$r[1,2] } )
# extract variances
vars <- lapply( res3$analyses, FUN=function(ll){ ll$se^2 } )
# Rubin inference
tau_comb <- mitools::MIcombine( betas, vars )
summary(tau_comb)

The problem seems to be with the code tau_comb <- mitools::MIcombine( betas, vars ), which generates the error Error in (1 + 1/m) * evar/vbar : non-conformable arrays

While this might be a mitools (rather than a miceadds) issue, it seems noteworthy that the code from the package manual doesn't run to completion.

Possible error in example for ml.lmer documentation

When specifying the random slope for x1, it will cross-classify by id2 and id3 with the current code. To make it nested (as in a three-level model), I believe it needs to be id2:id3.

Current:

#----- specify random slopes
random_slopes <- list()
#** random slopes for variable x1
random_slopes[["x1"]] <- list( "id2"=c("x2"), "id3"=c("y1") )
# if no random slopes should be specified, the corresponding entry can be left empty
# and only a random intercept is used in the imputation model

Suggested:

#----- specify random slopes
random_slopes <- list()
#** random slopes for variable x1
random_slopes[["x1"]] <- list( "id2:id3"=c("x2"), "id3"=c("y1") )
# if no random slopes should be specified, the corresponding entry can be left empty
# and only a random intercept is used in the imputation model

Cannot execute "subset_datlist(expr_subset = ())"

Hello, I really enjoy data analysis with you fantastic packages.
I've been trying to subset mids object by an expression.

imp: a mids object produced by parlMICE, SBP_ER: a variable of original data.frame indicating systolic blood pressure in the ER department

imp2 <- subset_datlist(imp, subset = (SBP_ER > 100))

But, an error

Error in subset_datlist(imp, subset = (SBP_ER > 100)) : object 'SBP_ER' not found

was shown.

I tried sample codes in your document,

data(data.ma02)
datlist1a <- data.ma02
datlist5a <- miceadds::subset_datlist( datlist1a , expr_subset = migrant == 1 )

however, it also failed.

Error in miceadds::subset_datlist(datlist1a, expr_subset = migrant == : object 'migrant' not found

How should I do next?

Check on 2.10-11 fails

I can install version 2.10-11 (2018-01-22) by devtools::install_github("alexanderrobitzsch/miceadds"), but if I try to check in Rstudio or Clean and Rebuild, I get a cpp related error. Any idea whether I am doing something wrong?

* installing *source* package ‘miceadds’ ...
** libs
clang++  -I/Library/Frameworks/R.framework/Resources/include -DNDEBUG  -    RcppExports.cpp:99:54: error: address of overloaded function 'ma_pmm6_C' does not match required type 'void *()'
{"ma_pmm6_C",                         (DL_FUNC) &ma_pmm6_C,                         6},
                                                 ^~~~~~~~~
RcppExports.cpp:90:17: note: candidate function has different number of parameters (expected 0 but has 6)
RcppExport SEXP ma_pmm6_C(SEXP, SEXP, SEXP, SEXP, SEXP, SEXP);
            ^
RcppExports.cpp:52:21: note: candidate function has different number of parameters (expected 0 but has 6)
Rcpp::NumericVector ma_pmm6_C(Rcpp::NumericVector y, Rcpp::NumericVector ry01,  Rcpp::NumericMatrix x, double ridge, Rcpp::NumericVector coefu1, Rcpp::NumericVector donorsample);
                ^
1 error generated.
make: *** [RcppExports.o] Error 1
ERROR: compilation failed for package ‘miceadds’

Is it possible to convert mids objects to an Amelia object?

As per the issue title... I'm wondering if this is possible in the current version of this package, or if there would be potential utility to a feature like this?

Memory limit error with weighted imputation

Hi, when applying weighted imputation functions to larger data sets I run into the following error: Error: vector memory exhausted (limit reached?). Digging around a little, the issue seems to be with the line WW <- diag(weights.obs) in the function miceadds:::mice_imputation_weighted_norm_draw. The object WW doesn't appear to be referenced elsewhere and perhaps can be safely removed?

Reproducible example

library(mice)
library(miceadds)
data(data.ma01)
set.seed(977)

# expand data set
dat <- as.matrix(data.ma01)
dat <- dat[sample(1:nrow(dat), 50000, replace = TRUE),]

# empty imputation
imp0 <- mice::mice(dat, maxit=0)

# redefine imputation methods
meth <- imp0$method
meth[meth=="pmm"] <- "weighted.pmm"
meth[c("paredu","books","migrant")] <- "weighted.norm"
# redefine predictor matrix
pm <- imp0$predictorMatrix
pm[,1:3] <- 0
# do imputation
imp <- mice(dat, predictorMatrix=pm, method=meth,
              imputationWeights=unname(dat[,"studwgt"]), m=2, maxit=1)

iter imp variable
  1   1  mathError: vector memory exhausted (limit reached?)


# suspected culprit
> miceadds:::mice_imputation_weighted_norm_draw

function (yobs, xobs, ry, y, x, weights.obs, ridge = 1e-05, sample_pars = TRUE, 
    ...) 
{
    n <- length(yobs)
    WW <- diag(weights.obs) # WW is not referenced elsewhere
    weights_obs_sqrt <- sqrt(weights.obs)

`mice.impute.ml.lmer` on large three-level dataset: `"binary"` logistic model returns error, 'hangs' when adding random slopes or interactions

I am currently trying to impute a three-level dataset with 87 columns and 71,756 rows. The variables comprise of which 4 identifier columns, 15 continuous outcome variables without missing entries, and 68 predictors and covariates with missing entries:

On level 1 (lowest, represents on individual) there are 16 ordinal and 20 dichotomous variables,
on level 2 there are 28 continuous variables, and
on level 3 (top) there are 4 ordinal variables.

I've been following Simon Grund's example for modeling three-level data using mice with the mice.impute.ml.lmer-function. Naturally, I had to make some adaptations to the example model to fit my data:

I tried setting model to "binary" to run a logistic mixed effects model for the dichotomous variables ("pmm" for the ordinal, "continuous" for the continuous).
I tried added random slopes and interaction effects.
mice.impute.2lonly.pmm was used instead of mice.impute.2lonly.norm for the top level imputation.
I added a post processing to a level 2 variable where I set upper and lower boundaries.

However when running mice (with some variables modeled as "binary" (without random slopes or interactions), I get the following warning:

Warning message in commonArgs(par, fn, control, environment()):
“maxfun < 10 * length(par)^2 is not recommended.”

Execution of mice hangs at this point.

I ran a test with mice (1 iteration), this time with all dichotomous variables as "pmm", and this time the function completed the run. However, adding variables to random_slopes it seemingly gets stuck (running infinitely) on the imputation of the first three variables. Now, my assumption is that this is due to the relatively large dataset, making the the process computationally very demanding.

I am wondering what exactly causes this error message, and if there are ways to avoid it. Also, I would like to know if there are ways to improve computational efficiency of such a large model.

I am not very familiar with mice, but I have some thoughts regarding how the data is imputed:
I am planning to use the imputed data for a structural equation model I've built, where all the variables are grouped into indicators of latent constructs. It therefore seems natural that the indicator variables that belongs to the same construct are imputed together.

In mice there is an argument called blocks which allows for multivariate imputation of the variables grouped together as list elements. However, creating blocks containing variables from different levels created the issue that I got the error message that no top level was defined in the predictorMatrix (i.e. no block set to -2). As an alternative method, it seems the formulas argument can be used in place of a predictor matrix. This options seems ideal, as it allows user defined formulas for each block. Also, if I understand the whole process correctly, the predictorMatrix is only passed on to mice.impute.2lonly.pmm and not mice.impute.ml.lmer. The question then is if the formulas argument can be used to define three-level models using lme4-syntax? ..and can these user defined models in formulas be passed on to mice.impute.ml.lmer? As a more general question, why can't mice.impute.ml.lmer be used for imputation at top level? (At least, it didn't work when I tried.)
Then there's also an argument group_index in mice.impute.ml.lmer used to pass group identifiers to mice.impute.bygroup. From reading the documentation I am still unsure what this function actually does, as I can find little information on it. However, it seems it is designed for grouping variables together by level, but not across grouping of variables from different levels, correct? However, what would distinguish mice.impute.bygroup from creating blocks? ..and what would the difference of doing this, rather than calling models in mice.impute.ml.lmer?
As for computational efficiency, I have no idea if grouping variables together would increase computational efficiency. I could really use some advice on this part.

pooling error with nested imputations

Hello! I was trying out using nested multiple imputations in miceadds. I was able to get the imputation to run but when I run the pooling function, I get an error. This example is taken directly from the example in the mice.nmi function:

library(BIFIEsurvey)
data(data.timss2, package="BIFIEsurvey" )
datlist <- data.timss2
# remove first four variables
M <- length(datlist)
for (ll in 1:M){
  datlist[[ll]] <- datlist[[ll]][, -c(1:4) ]
}

#***************
# (1) nested multiple imputation using mice
imp1 <- miceadds::mice.nmi(datlist,  m=3, maxit=2 )
summary(imp1)

#***************
# (2) first linear regression: ASMMAT ~ migrant + female
res1 <- with( imp1, stats::lm(ASMMAT ~ migrant + female ) ) # fit
pres1 <- miceadds::pool.mids.nmi(res1)  # pooling

The error that appears after running the pooling function is:

> Error in dimnames(qhat) <- `*vtmp*` : 
> length of 'dimnames' [3] must match that of 'dims' [2]

I've tried using the github version as well as the CRAN version with the same results.

Thanks for any help you can provide.

Best,
Francis

different variable imputed values using same predictor variable - 2l.pmm

I would expect the imputed values of x to be the same if the same preditor variables were used, despite other variables being imputed or not, but it's not the case, as reproduced here:

library(data.table)
library(robustlmm)
library(mice)
library(miceadds)
library(magrittr)
library(dplyr)
library(tidyr)

set.seed(1)

# Data ------------------------------------

dt1 <- data.table(id = rep(1:10, each=3), 
                  group = rep(1:2, each=15),
                  time = rep(1:3, 10),
                  sex = rep(sample(c("F","M"),10,replace=T), each=3),
                  x = rnorm(30),
                  y = rnorm(30),
                  z = rnorm(30))

setDT(dt1)[id %in% sample(1:10,4) & time == 2, `:=` (x = NA, y = NA)][
           id %in% sample(1:10,4) & time == 3, `:=` (x = NA, y = NA)]


dt2 <- dt1 %>% group_by(id) %>% fill(y) %>% ungroup %>% as.data.table


# MI 1 ------------------------------------
pm1 <- make.predictorMatrix(dt1)
pm1['x',c('y','z')] <- 0
pm1[c('x','y'), 'id'] <- -2
imp1 <- mice(dt1, pred = pm1, meth = "2l.pmm", seed = 1, m = 2, print = F, maxit = 20)
# boundary (singular) fit: see ?isSingular - don't know how to interpret this (don't occur with my real data)
View(complete(imp1, 'long'))
                        
# MI 2 ------------------------------------
pm2 <- make.predictorMatrix(dt2)
pm2['x',c('y','z')] <- 0
pm2['x', 'id'] <- -2
imp2 <- mice(dt2, pred = pm2, meth = "2l.pmm", seed = 1, m = 2, print = F, maxit = 20, remove.constant = F)
# imp2$loggedEvents report sex as constant (don't know why) so I include remove.constant=F to keep that variable (don't occur with my real data)                      
View(complete(imp2, 'long'))

In imp1:

group, time and sex are used to predict x
group, time, sex, x and z are used to predict y

In ìmp2:

group, time and sex are used to predict x
y is complete so no imputation is performed for this variable

Given so, why are the results different for the imputed data on x?
Is it the expected behavior?

Thank you!

PS: I've posted this same question in StackOverflow (before I remember posting it here). Should I delete that post to avoid crossed posts or simply add there the link to here?

Error in weighted predictive mean matching method

Hi, I believe there is an error in the miceadds:::mice.impute.weighted.pmm function. Specifically, when locating matches, the Rcpp function miceadds_rcpp_weighted_pmm_match calculates the distance between the predicted value for each imputation, z, and the set of observed y values, yobs, rather than the jittered observed y values, yhatobs.

i.e., on line 90

ds[oo] = std::abs(z-yobs[oo]);

should be

ds[oo] = std::abs(z-yhatobs[oo]);

In the current implementation, it looks like yhatobs was intended to be used, but isn't. As a result, the variability in the distribution of imputed values is underestimated -- particularly for variables with relatively few unique values. The attached zip file has a working example in R illustrating the issue and slightly modified Rcpp functions that correct it.

Hope this helps!

miceadds_working_example.zip

micombine.cor with Fisher transform?

Hi there,

Just a quick question: since Rubin's rules assume normality, pooling correlations first requires a transformation towards normality using Fisher Z transformation (Schafer, 1997)), and a subsequent backtransformation after pooling. Is this already incorporated in the micombine.cor function, or should I do this manually?

Thanks in advance!

Best regards,
Eefje Poppelaars

warning message when using datalist2mids

When using datalist2mids I get this warning:

library(data.table)
library(miceadds)
library(Amelia)
library(readr)
file = 'https://raw.githubusercontent.com/sdaza/lambda/master/notebooks/chile.csv'

data = data.table(read_csv(file))

vars = c('water', 'sewage', 'elec')
i = which(names(data) %in% vars)
bounds = cbind(i, rep(0, length(i)), rep(100, length(i)))

imp1 = amelia(as.data.frame(data), 
              bounds = bounds, 
              m=5, # only five!
              ts = 'year', cs = 'ctry', 
              splinetime = 4, intercs = TRUE,  
              p2s = 0, 
              logs=c('igdp_pc', 'iurban', 'ilit', 'itfr','Ex', 'water', 'sewage', 'elec'), 
              lags = c('igdp_pc', 'Ex', 'iurban', 'itfr', 'ilit'), 
              leads = c('Ex', 'igdp_pc', 'iurban', 'itfr', 'ilit'),
              empri = .01*nrow(data))

imputations = datalist2mids(imp1$imputations)

Warning message in data.matrix(x):
“NAs introduced by coercion”Warning message in data.matrix(x):
“NAs introduced by coercion”

glm.cluster yielding wrong p-values

Hi,

there seems to be an issue with the p-values glm.cluster is yielding. Maybe some dependencies have changed?

Reproducible example:
Calculating following model with glm.cluster yields strange p-values.

library(miceadds)
summary(FIT <- glm.cluster(mpg ~ wt + disp + wt*disp, cluster="carb", data=mtcars))
#                Estimate  Std. Error    t value      Pr(>|t|)
# (Intercept) 44.08199770 1.960767641  22.482010 6.225590e-112
# wt          -6.49567966 0.635946018 -10.214200  1.712866e-24
# disp        -0.05635816 0.009436760  -5.972194  2.340843e-09
# wt:disp      0.01170542 0.001965829   5.954445  2.609566e-09

The right p-values, however, calculated per hand should be:

# p-values by hand
2 * pt(-abs(summary(FIT)[, 3]), df=FIT$glm_res$df.residual)
#  (Intercept)           wt         disp      wt:disp 
# 1.846712e-19 6.015931e-11 1.972266e-06 2.068825e-06

These would also resemble lfe::felm:

library(lfe)
summary(felm(mpg ~ wt + disp + wt*disp | 0 | 0 | carb, data=mtcars))$coe
#                Estimate Cluster s.e.    t value     Pr(>|t|)
# (Intercept) 44.08199770  1.960767641  22.482010 1.846712e-19
# wt          -6.49567966  0.635946018 -10.214200 6.015931e-11
# disp        -0.05635816  0.009436760  -5.972194 1.972266e-06
# wt:disp      0.01170542  0.001965829   5.954445 2.068825e-06

It would be great if you could take a look at this issue,

cheers

PS: see also this discussion on Stack Overflow

groupwise imputation with ml.lmer?

Hello,

I was wondering whether the bygroup-function can be combined with the ml.lmer-function defined in miceadds. I want to perform groupwise imputation by means of three-level models.

Thank you in advance!

Kind regards,
Sophie Stallasch

mi.anova intercept?

Hello again,

My apologies for a second question today. I was wondering whether there is any way to get the intercept output from the mi.anova function, in a similar way as the car::Anova function (for type 3) also gives?

Thank you in advance!

Best regards,
Eefje Poppelaars

alexanderrobitzsch / miceadds Goto Github PK

miceadds's Introduction

miceadds

Some Additional Multiple Imputation Functions, Especially for 'mice'

Manual

CRAN version miceadds 3.17-44 (2024-01-08)

GitHub version miceadds 3.18-2 (2024-01-10)

miceadds's People

Contributors

Stargazers

Watchers

Forkers

miceadds's Issues

Minimal working example

Recommend Projects

Recommend Topics

Recommend Org

CRAN version `miceadds` 3.17-44 (2024-01-08)

GitHub version `miceadds` 3.18-2 (2024-01-10)