alexanderrobitzsch / miceadds Goto Github PK
View Code? Open in Web Editor NEWSome Additional Multiple Imputation Functions, Especially for 'mice'.
Home Page: https://alexanderrobitzsch.github.io/miceadds/
Some Additional Multiple Imputation Functions, Especially for 'mice'.
Home Page: https://alexanderrobitzsch.github.io/miceadds/
modify function visitSequence.determine
to output character vector instead of numeric vector due to changes in mice (>= 3.0)
I am currently trying to impute a three-level dataset with 87 columns and 71,756 rows. The variables comprise of which 4 identifier columns, 15 continuous outcome variables without missing entries, and 68 predictors and covariates with missing entries:
I've been following Simon Grund's example for modeling three-level data using mice
with the mice.impute.ml.lmer
-function. Naturally, I had to make some adaptations to the example model to fit my data:
model
to "binary"
to run a logistic mixed effects model for the dichotomous variables ("pmm"
for the ordinal, "continuous"
for the continuous).mice.impute.2lonly.pmm
was used instead of mice.impute.2lonly.norm
for the top level imputation.However when running mice
(with some variables modeled as "binary" (without random slopes or interactions), I get the following warning:
Warning message in commonArgs(par, fn, control, environment()):
“maxfun < 10 * length(par)^2 is not recommended.”
Execution of mice
hangs at this point.
I ran a test with mice
(1 iteration), this time with all dichotomous variables as "pmm"
, and this time the function completed the run. However, adding variables to random_slopes
it seemingly gets stuck (running infinitely) on the imputation of the first three variables. Now, my assumption is that this is due to the relatively large dataset, making the the process computationally very demanding.
I am wondering what exactly causes this error message, and if there are ways to avoid it. Also, I would like to know if there are ways to improve computational efficiency of such a large model.
I am not very familiar with mice
, but I have some thoughts regarding how the data is imputed:
I am planning to use the imputed data for a structural equation model I've built, where all the variables are grouped into indicators of latent constructs. It therefore seems natural that the indicator variables that belongs to the same construct are imputed together.
mice
there is an argument called blocks
which allows for multivariate imputation of the variables grouped together as list elements. However, creating blocks containing variables from different levels created the issue that I got the error message that no top level was defined in the predictorMatrix
(i.e. no block set to -2
). As an alternative method, it seems the formulas
argument can be used in place of a predictor matrix. This options seems ideal, as it allows user defined formulas for each block. Also, if I understand the whole process correctly, the predictorMatrix
is only passed on to mice.impute.2lonly.pmm
and not mice.impute.ml.lmer
. The question then is if the formulas
argument can be used to define three-level models using lme4
-syntax? ..and can these user defined models in formulas
be passed on to mice.impute.ml.lmer
? As a more general question, why can't mice.impute.ml.lmer
be used for imputation at top level? (At least, it didn't work when I tried.)group_index
in mice.impute.ml.lmer
used to pass group identifiers to mice.impute.bygroup
. From reading the documentation I am still unsure what this function actually does, as I can find little information on it. However, it seems it is designed for grouping variables together by level, but not across grouping of variables from different levels, correct? However, what would distinguish mice.impute.bygroup
from creating blocks? ..and what would the difference of doing this, rather than calling models in mice.impute.ml.lmer
?Hello, I really enjoy data analysis with you fantastic packages.
I've been trying to subset mids object by an expression.
imp: a mids object produced by parlMICE, SBP_ER: a variable of original data.frame indicating systolic blood pressure in the ER department
imp2 <- subset_datlist(imp, subset = (SBP_ER > 100))
But, an error
Error in subset_datlist(imp, subset = (SBP_ER > 100)) : object 'SBP_ER' not found
was shown.
I tried sample codes in your document,
data(data.ma02)
datlist1a <- data.ma02
datlist5a <- miceadds::subset_datlist( datlist1a , expr_subset = migrant == 1 )
however, it also failed.
Error in miceadds::subset_datlist(datlist1a, expr_subset = migrant == : object 'migrant' not found
How should I do next?
I have run a glm.cluster from the quassipoisson family. When I try to get a summary I get this message:
Error in [<-
(*tmp*
, , "z value", value = csmod[, "Estimate"]/csmod[, :
subscript out of bounds
I can extract the information I need from the data structure, but it would be good to have formatted output from R.
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] multiwayvcov_1.2.3 miceadds_2.14-26 mice_3.3.0 lattice_0.20-35
loaded via a namespace (and not attached):
[1] lavaan_0.6-3 zoo_1.8-3 tidyselect_0.2.4 mitools_2.3
[5] purrr_0.2.5 splines_3.5.0 stats4_3.5.0 yaml_2.2.0
[9] mgcv_1.8-24 pan_1.6 survival_2.42-6 rlang_0.2.2
[13] jomo_2.6-4 nloptr_1.0.4 pillar_1.3.0 glue_1.3.0
[17] lavaan.survey_1.1.3.1 bindrcpp_0.2.2 bindr_0.1.1 mirt_1.29
[21] GPArotation_2014.11-1 CDM_6.6-5 mvtnorm_1.0-8 coda_0.19-1
[25] permute_0.9-4 dcurver_0.9.1 sirt_2.7-50 parallel_3.5.0
[29] broom_0.5.0 Rcpp_0.12.18 backports_1.1.2 vegan_2.5-2
[33] lme4_1.1-18-1 Deriv_3.8.5 polycor_0.7-9 mnormt_1.5-5
[37] dplyr_0.7.6 survey_3.33-2 grid_3.5.0 tools_3.5.0
[41] sandwich_2.5-0 magrittr_1.5 tibble_1.4.2 cluster_2.0.7-1
[45] crayon_1.3.4 tidyr_0.8.1 TAM_2.13-15 pbivnorm_0.6.0
[49] pkgconfig_2.0.2 MASS_7.3-50 Matrix_1.2-14 assertthat_0.2.0
[53] minqa_1.2.4 rstudioapi_0.7 boot_1.3-20 mitml_0.3-6
[57] R6_2.2.2 rpart_4.1-13 sfsmisc_1.1-2 nnet_7.3-12
[61] nlme_3.1-137 compiler_3.5.0
Hello again,
My apologies for a second question today. I was wondering whether there is any way to get the intercept output from the mi.anova
function, in a similar way as the car::Anova
function (for type 3) also gives?
Thank you in advance!
Best regards,
Eefje Poppelaars
When specifying the random slope for x1
, it will cross-classify by id2
and id3
with the current code. To make it nested (as in a three-level model), I believe it needs to be id2:id3
.
Current:
#----- specify random slopes
random_slopes <- list()
#** random slopes for variable x1
random_slopes[["x1"]] <- list( "id2"=c("x2"), "id3"=c("y1") )
# if no random slopes should be specified, the corresponding entry can be left empty
# and only a random intercept is used in the imputation model
Suggested:
#----- specify random slopes
random_slopes <- list()
#** random slopes for variable x1
random_slopes[["x1"]] <- list( "id2:id3"=c("x2"), "id3"=c("y1") )
# if no random slopes should be specified, the corresponding entry can be left empty
# and only a random intercept is used in the imputation model
There seems to be be an error when running the code on pages 101 to 102 of the miceadds manual. Here is the code:
library(mice)
data(nhanes, package="mice")
set.seed(9090)
# nhanes data in one chain
imp.mi <- miceadds::mice.1chain( nhanes, burnin=5, iter=20, Nimp=4,
method=rep("norm", 4 ) )
# correlation coefficients of variables 4, 2 and 3 (indexed in nhanes data)
res <- miceadds::micombine.cor(mi.res=imp.mi, variables=c(4,2,3) )
## variable1 variable2 r rse fisher_r fisher_rse fmi t p
## 1 chl bmi 0.2458 0.2236 0.2510 0.2540 0.3246 0.9879 0.3232
## 2 chl hyp 0.2286 0.2152 0.2327 0.2413 0.2377 0.9643 0.3349
## 3 bmi hyp -0.0084 0.2198 -0.0084 0.2351 0.1904 -0.0358 0.9714
## lower95 upper95
## 1 -0.2421 0.6345
## 2 -0.2358 0.6080
## 3 -0.4376 0.4239
# extract matrix with correlations and its standard errors
attr(res, "r_matrix")
attr(res, "rse_matrix")
# inference for covariance
res2 <- miceadds::micombine.cov(mi.res=imp.mi, variables=c(4,2,3) )
# inference can also be conducted for non-imputed data
res3 <- miceadds::micombine.cov(mi.res=nhanes, variables=c(4,2,3) )
#############################################################################
# EXAMPLE 2: nhanes data | comparing different correlation coefficients
#############################################################################
library(psych)
library(mitools)
# imputing data
imp1 <- mice::mice( nhanes, method=rep("norm", 4 ) )
summary(imp1)
#*** Pearson correlation
res1 <- miceadds::micombine.cor(mi.res=imp1, variables=c(4,2) )
#*** Spearman rank correlation
res2 <- miceadds::micombine.cor(mi.res=imp1, variables=c(4,2), method="spearman")
#*** Kendalls tau
# test of computation of tau for first imputed dataset
dat1 <- mice::complete(imp1, action=1)
tau1 <- psych::corr.test(x=dat1[,c(4,2)], method="kendall")
tau1$r[1,2] # estimate
tau1$se # standard error
# results of Kendalls tau for all imputed datasets
res3 <- with( data=imp1,
expr=psych::corr.test( x=cbind( chl, bmi ), method="kendall") )
# extract estimates
betas <- lapply( res3$analyses, FUN=function(ll){ ll$r[1,2] } )
# extract variances
vars <- lapply( res3$analyses, FUN=function(ll){ ll$se^2 } )
# Rubin inference
tau_comb <- mitools::MIcombine( betas, vars )
summary(tau_comb)
The problem seems to be with the code tau_comb <- mitools::MIcombine( betas, vars )
, which generates the error Error in (1 + 1/m) * evar/vbar : non-conformable arrays
While this might be a mitools (rather than a miceadds) issue, it seems noteworthy that the code from the package manual doesn't run to completion.
Thanks for miceadds package. I'm using miceadds 3.5-14 with mice 3.6.0 in R 3.6.1.
When I try to use 2lonly.function
with polyreg
, I get the error missing values in pred not allowed
. This doesn't happen if only using 2lonly.function
with logreg
.
Example is given below. Please let me know if I'm doing something incorrectly.
library(mice)
library(miceadds)
set.seed(1)
n.subjects <- 20
n.obs.per.subject <- 4
# x1 varies within subject (e.g. age)
# x2, x3, x4 and x5 are constant within subject
d <- data.frame(subject = rep(1:n.subjects, each = n.obs.per.subject),
x1 = round(rep(c(10, 15, 20, 25), n.subjects) +
rnorm(n.subjects*n.obs.per.subject, 0, 0.5), 2),
x2 = rep(sample(0:1, n.subjects, replace=TRUE), each = n.obs.per.subject),
x3 = factor(rep(sample(0:2, n.subjects, replace=TRUE),
each = n.obs.per.subject)),
x4 = round(rep(rnorm(n.subjects), each = n.obs.per.subject), 2),
x5 = round(rep(rnorm(n.subjects), each = n.obs.per.subject), 2),
y = round(rnorm(n.subjects*n.obs.per.subject), 2))
d[d$subject %in% 1:3, "x2"] <- NA
d[d$subject %in% 1:5, "x3"] <- NA
d[d$subject %in% 4:7, "x4"] <- NA
d[d$subject %in% 6:9, "x5"] <- NA
d[d$subject %in% 9:12, "y"] <- NA
d[c(1:2, 17:18), "y"] <- NA
methA <- c(subject = "",
x1 = "",
x2 = "2lonly.function",
x3 = "2lonly.function",
x4 = "2lonly.pmm",
x5 = "2lonly.pmm",
y = "2l.pan")
predMat <- matrix(1, nrow = ncol(d), ncol = ncol(d))
colnames(predMat) <- rownames(predMat) <- names(d)
predMat[, "subject"] <- -2
predMat["y", "x1"] <- 4
diag(predMat) <- 0
predMat
## subject x1 x2 x3 x4 x5 y
## subject 0 1 1 1 1 1 1
## x1 -2 0 1 1 1 1 1
## x2 -2 1 0 1 1 1 1
## x3 -2 1 1 0 1 1 1
## x4 -2 1 1 1 0 1 1
## x5 -2 1 1 1 1 0 1
## y -2 4 1 1 1 1 0
miceA <- mice(data = d,
method = methA,
predictorMatrix = predMat,
imputationFunction = list(x2 = "logreg", x3 = "polyreg"),
cluster_var = list(x2 = "subject", x3 = "subject"),
seed = 1)
## iter imp variable
## 1 1 x2 x3 x4 x5 y
## 1 2 x2 x3 x4 x5 yError in pan::pan(y1, subj, pred, xcol, zcol, prior, seed = s1, iter = paniter) :
## missing values in pred not allowed
# We also get the error if predMat["y", "x1"] is 1, 2 or 3.
# Different seed values in the call to mice() may get through different numbers of imputed data sets
# before producing the error.
# For comparison, use "polyreg" for x3 instead of "2lonly.function"
methB <- c(subject = "",
x1 = "",
x2 = "2lonly.function",
x3 = "polyreg",
x4 = "2lonly.pmm",
x5 = "2lonly.pmm",
y = "2l.pan")
miceB <- mice(data = d,
method = methB,
predictorMatrix = predMat,
imputationFunction = list(x2 = "logreg"),
cluster_var = list(x2 = "subject"))
# No error
Hi there,
Just a quick question: since Rubin's rules assume normality, pooling correlations first requires a transformation towards normality using Fisher Z transformation (Schafer, 1997)), and a subsequent backtransformation after pooling. Is this already incorporated in the micombine.cor function, or should I do this manually?
Thanks in advance!
Best regards,
Eefje Poppelaars
df<-structure(list(Zone = structure(c(1L, 2L, 1L, 1L, 1L, 2L, 2L,
3L, 3L), .Label = c("A", "B", "C"), class = "factor"), Y = c(2L,
3L, 5L, 4L, 3L, 5L, 7L, 8L, 10L), Cat1 = structure(c(1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("Fact1", "Fact2"), class = "factor"),
Cat2 = structure(c(1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L), .Label = c("Level1",
"Level2", "Level3"), class = "factor")), class = "data.frame", row.names = c(NA,-9L))
Fitglm<-glm(Y~Cat1*Cat2,df,family="gaussian")
summary(Fitglm)
Fitglm.cluster<-glm.cluster(df,Y~Cat1*Cat2,cluster="Zone",family="gaussian")
summary(Fitglm.cluster)
length(Fitglm.cluster$glm_res$coefficients)
Fitglm.cluster$glm_res$coefficients
length(Fitglm.cluster$vcov)
Fitglm.cluster$vcov
This re-produces the error. I think that it is down to the coefficient parameter vector contains the NA's (length 6) but the variance-covaraince doesn't (length 4x4). The summary from Fitglm shows this whilst Fitglm.cluster "crashes" out.
Dear Alexander,
I am currently working on a new argument to the mice
function, called where
, which allows the user to specify which cells to impute. There is a develop branch at my account (https://github.com/stefvanbuuren/mice/tree/where/R).
One of the consequences is that I need to extend all univariate imputation functions with a wy
argument. I've been careful not to break any existing code, but I've found that functions mice.impute.2lonly.pmm
and mice.impute.2lonly.norm
became dysfunctional. The problem is that wy
cannot be passed if there is a multilevel structure, and needs to be broken down into classes, just like you did with ry
. I have fixed both functions, but it might be that the behaviour may also show up in similar functions you incorporated into miceadds
. Would you mind taking a look at whether this indeed the case, and see whether my solution could also be used in miceadds
?
I have not been able to include the wy
argument into mice.impute.2l.pan
since I would need to overwrite (part of) the data in y1
before entering pan
. Any ideas how to fix that?
Best wishes, Stef.
Hello,
thank you for the mi.anova function, it makes it very easy to calculate an ANOVA with pooled imputed data sets. However, I was wonderung if there is a possibility to perform a post hoc test with all imputed data sets. So far, I have just figured out how to do a post hoc test when I select one of the imputed data sets.
Thanks a lot!
I encountered the following error when trying to subset a mids
object that contain factor levels. If some of the factors levels are missing after subsetting then the resulting datlist
object cannot be converted back to a mids
object. This is particularly a problem if the subsetting variable is itself a factor. Is there a way around this?
library(miceadds)
#> Loading required package: mice
#> Loading required package: lattice
#>
#> Attaching package: 'mice'
#> The following objects are masked from 'package:base':
#>
#> cbind, rbind
#> * miceadds 3.7-6 (2019-12-15 13:38:43)
data(nhanes2)
nhanes2_imp <- mice(nhanes2, seed = 12345, printFlag = FALSE)
nhanes2_list <- mids2datlist(nhanes2_imp)
nhanes2_sub <- subset_datlist(nhanes2_list, subset = nhanes2$age == "20-39")
datlist2mids(nhanes2_sub)
#> Error in Ops.factor(left, right): level sets of factors are different
Created on 2020-01-15 by the reprex package (v0.3.0)
The visitSequence
argument in mice 3.0
will change from numerical to character, but miceadds 2.11-80
seems to use the numerical nature:
devtools::install_github(repo = "stefvanbuuren/mice", ref = "dev")
library(miceadds)
library(Amelia)
data(nhanes,package="mice")
set.seed(566)
a.out <- Amelia::amelia(x = nhanes , m=10)
a.mids <- miceadds::datlist2mids( a.out$imputations )
Error in 0 * imp1$visitSequence : non-numeric argument to binary operator
Could you fix this?
I have glm.cluster
embedded in a larger function. When I run the function (see below) in a new R session, I get the following error:
Error in eval(extras, data, env) : object 'wgt_' not found
When I run glm.cluster(...) with the parameters saved as variables but without assigning the output, it automatically saves wgt__
as a value in the session environment. This does not return an error and I can then run the overarching function without errors.
It seems to hinge on whether or not wgt__
is saved in the environment - as soon as I remove this from the environment, the error message returns. I can also run the function in a new session by assigning
wgt__ <- NULL
before the function call.
I've included example code and some fake data which generates the error.
R Version: 3.6.0
Mac OSX 10.14.5
`library(tidyverse)
library(miceadds)
example_data <- read_csv("example_data.csv")
cluster_glm <- function(data, formula, cluster, type) {
mod <- glm.cluster(formula = formula,
data = data,
cluster = cluster,
weights = NULL,
family = if(type == "logit") {
binomial(link="logit")
} else {
gaussian
}
) %>%
summary(.) %>%
as.data.frame(.)
return(mod)
}
mod_basic_vote <- cluster_glm(formula = "Y ~ X1 + X2 + X3",
data = example_data,
type = "logit",
cluster = "pid")
wgt__ <- NULL
mod_basic_vote <- cluster_glm(formula = "Y ~ X1 + X2 + X3",
data = example_data,
type = "logit",
cluster = "pid")`
mwe.zip
I'm trying to impute missing values in Level 1 and Level 2 variables using the ml.lmer command which was working in miceadds v(3.2), but keeps throwing an error since the update yesterday.
Error message :
Error in check.method(method = method, data = data, where = where, blocks = blocks, : The following functions were not found: mice.impute.ml.lmer, mice.impute.ml.lmer
How can I resolve this?
Hi, when applying weighted imputation functions to larger data sets I run into the following error: Error: vector memory exhausted (limit reached?)
. Digging around a little, the issue seems to be with the line WW <- diag(weights.obs)
in the function miceadds:::mice_imputation_weighted_norm_draw
. The object WW doesn't appear to be referenced elsewhere and perhaps can be safely removed?
Reproducible example
library(mice)
library(miceadds)
data(data.ma01)
set.seed(977)
# expand data set
dat <- as.matrix(data.ma01)
dat <- dat[sample(1:nrow(dat), 50000, replace = TRUE),]
# empty imputation
imp0 <- mice::mice(dat, maxit=0)
# redefine imputation methods
meth <- imp0$method
meth[meth=="pmm"] <- "weighted.pmm"
meth[c("paredu","books","migrant")] <- "weighted.norm"
# redefine predictor matrix
pm <- imp0$predictorMatrix
pm[,1:3] <- 0
# do imputation
imp <- mice(dat, predictorMatrix=pm, method=meth,
imputationWeights=unname(dat[,"studwgt"]), m=2, maxit=1)
iter imp variable
1 1 mathError: vector memory exhausted (limit reached?)
# suspected culprit
> miceadds:::mice_imputation_weighted_norm_draw
function (yobs, xobs, ry, y, x, weights.obs, ridge = 1e-05, sample_pars = TRUE,
...)
{
n <- length(yobs)
WW <- diag(weights.obs) # WW is not referenced elsewhere
weights_obs_sqrt <- sqrt(weights.obs)
Hello,
I would like to try out forward step-wise regression as a model-building strategy for the imputation models. From the mice.impute.imputeR.lmFun
documentation, I understand that I can use the imputeR::stepForR()
method as a univariate imputation in the mice
algorithm.
If I follow the example reported in the documentation on the mice.impute.imputeR.lmFun
, I would set up the code like this:
# Load packages
library(mice)
library(miceadds)
library(ridge)
library(imputeR)
# Load data
data(nhanes, package = "mice")
dat <- nhanes
# Make hyp a factor
dat$hyp <- as.factor(dat$hyp)
# Define general imputation method
method <- c(
age = "",
bmi = "norm",
hyp = "imputeR.cFun",
chl = "imputeR.lmFun" # use one of the imputeR methods for cont. vars
)
# Define specifics of imputeR methods
Fun <- list(
hyp = imputeR::ridgeC,
chl = imputeR::stepForR # use forward step-wise method from imputeR
)
# Usual run of mice with and extra "Fun" argument
imp <- mice::mice(dat, method = method, maxit = 10, m = 4, Fun = Fun)
My understanding is that:
Do I understand this correctly?
I can install version 2.10-11 (2018-01-22) by devtools::install_github("alexanderrobitzsch/miceadds")
, but if I try to check in Rstudio or Clean and Rebuild
, I get a cpp
related error. Any idea whether I am doing something wrong?
* installing *source* package ‘miceadds’ ...
** libs
clang++ -I/Library/Frameworks/R.framework/Resources/include -DNDEBUG - RcppExports.cpp:99:54: error: address of overloaded function 'ma_pmm6_C' does not match required type 'void *()'
{"ma_pmm6_C", (DL_FUNC) &ma_pmm6_C, 6},
^~~~~~~~~
RcppExports.cpp:90:17: note: candidate function has different number of parameters (expected 0 but has 6)
RcppExport SEXP ma_pmm6_C(SEXP, SEXP, SEXP, SEXP, SEXP, SEXP);
^
RcppExports.cpp:52:21: note: candidate function has different number of parameters (expected 0 but has 6)
Rcpp::NumericVector ma_pmm6_C(Rcpp::NumericVector y, Rcpp::NumericVector ry01, Rcpp::NumericMatrix x, double ridge, Rcpp::NumericVector coefu1, Rcpp::NumericVector donorsample);
^
1 error generated.
make: *** [RcppExports.o] Error 1
ERROR: compilation failed for package ‘miceadds’
Hi, I believe there is an error in the miceadds:::mice.impute.weighted.pmm
function. Specifically, when locating matches, the Rcpp function miceadds_rcpp_weighted_pmm_match
calculates the distance between the predicted value for each imputation, z
, and the set of observed y values, yobs
, rather than the jittered observed y values, yhatobs
.
i.e., on line 90
ds[oo] = std::abs(z-yobs[oo]);
should be
ds[oo] = std::abs(z-yhatobs[oo]);
In the current implementation, it looks like yhatobs
was intended to be used, but isn't. As a result, the variability in the distribution of imputed values is underestimated -- particularly for variables with relatively few unique values. The attached zip file has a working example in R illustrating the issue and slightly modified Rcpp functions that correct it.
Hope this helps!
Hi
I ran the examples. It works well. When I tried mine... it got this message:
utilError in x[, group_vname] : subscript out of bounds
Thank you very much for your advice.
Sandro
I would expect the imputed values of x
to be the same if the same preditor variables were used, despite other variables being imputed or not, but it's not the case, as reproduced here:
library(data.table)
library(robustlmm)
library(mice)
library(miceadds)
library(magrittr)
library(dplyr)
library(tidyr)
set.seed(1)
# Data ------------------------------------
dt1 <- data.table(id = rep(1:10, each=3),
group = rep(1:2, each=15),
time = rep(1:3, 10),
sex = rep(sample(c("F","M"),10,replace=T), each=3),
x = rnorm(30),
y = rnorm(30),
z = rnorm(30))
setDT(dt1)[id %in% sample(1:10,4) & time == 2, `:=` (x = NA, y = NA)][
id %in% sample(1:10,4) & time == 3, `:=` (x = NA, y = NA)]
dt2 <- dt1 %>% group_by(id) %>% fill(y) %>% ungroup %>% as.data.table
# MI 1 ------------------------------------
pm1 <- make.predictorMatrix(dt1)
pm1['x',c('y','z')] <- 0
pm1[c('x','y'), 'id'] <- -2
imp1 <- mice(dt1, pred = pm1, meth = "2l.pmm", seed = 1, m = 2, print = F, maxit = 20)
# boundary (singular) fit: see ?isSingular - don't know how to interpret this (don't occur with my real data)
View(complete(imp1, 'long'))
# MI 2 ------------------------------------
pm2 <- make.predictorMatrix(dt2)
pm2['x',c('y','z')] <- 0
pm2['x', 'id'] <- -2
imp2 <- mice(dt2, pred = pm2, meth = "2l.pmm", seed = 1, m = 2, print = F, maxit = 20, remove.constant = F)
# imp2$loggedEvents report sex as constant (don't know why) so I include remove.constant=F to keep that variable (don't occur with my real data)
View(complete(imp2, 'long'))
In imp1
:
group
, time
and sex
are used to predict x
group
, time
, sex
, x
and z
are used to predict y
In ìmp2
:
group
, time
and sex
are used to predict x
y
is complete so no imputation is performed for this variableGiven so, why are the results different for the imputed data on x
?
Is it the expected behavior?
Thank you!
PS: I've posted this same question in StackOverflow (before I remember posting it here). Should I delete that post to avoid crossed posts or simply add there the link to here?
When using datalist2mids
I get this warning:
library(data.table)
library(miceadds)
library(Amelia)
library(readr)
file = 'https://raw.githubusercontent.com/sdaza/lambda/master/notebooks/chile.csv'
data = data.table(read_csv(file))
vars = c('water', 'sewage', 'elec')
i = which(names(data) %in% vars)
bounds = cbind(i, rep(0, length(i)), rep(100, length(i)))
imp1 = amelia(as.data.frame(data),
bounds = bounds,
m=5, # only five!
ts = 'year', cs = 'ctry',
splinetime = 4, intercs = TRUE,
p2s = 0,
logs=c('igdp_pc', 'iurban', 'ilit', 'itfr','Ex', 'water', 'sewage', 'elec'),
lags = c('igdp_pc', 'Ex', 'iurban', 'itfr', 'ilit'),
leads = c('Ex', 'igdp_pc', 'iurban', 'itfr', 'ilit'),
empri = .01*nrow(data))
imputations = datalist2mids(imp1$imputations)
Warning message in data.matrix(x):
“NAs introduced by coercion”Warning message in data.matrix(x):
“NAs introduced by coercion”
Hello! I was trying out using nested multiple imputations in miceadds
. I was able to get the imputation to run but when I run the pooling function, I get an error. This example is taken directly from the example in the mice.nmi
function:
library(BIFIEsurvey)
data(data.timss2, package="BIFIEsurvey" )
datlist <- data.timss2
# remove first four variables
M <- length(datlist)
for (ll in 1:M){
datlist[[ll]] <- datlist[[ll]][, -c(1:4) ]
}
#***************
# (1) nested multiple imputation using mice
imp1 <- miceadds::mice.nmi(datlist, m=3, maxit=2 )
summary(imp1)
#***************
# (2) first linear regression: ASMMAT ~ migrant + female
res1 <- with( imp1, stats::lm(ASMMAT ~ migrant + female ) ) # fit
pres1 <- miceadds::pool.mids.nmi(res1) # pooling
The error that appears after running the pooling function is:
> Error in dimnames(qhat) <- `*vtmp*` :
> length of 'dimnames' [3] must match that of 'dims' [2]
I've tried using the github version as well as the CRAN version with the same results.
Thanks for any help you can provide.
Best,
Francis
When using datalist2mids
I get this error I don't know what is about.
Warning message in data.matrix(x):
“NAs introduced by coercion”Warning message in data.matrix(x):
“NAs introduced by coercion”Warning message in data.matrix(x):
“NAs introduced by coercion”Warning message in data.matrix(x):
“NAs introduced by coercion”
Any hints?
As per the issue title... I'm wondering if this is possible in the current version of this package, or if there would be potential utility to a feature like this?
Hello,
I was wondering whether the bygroup-function can be combined with the ml.lmer-function defined in miceadds. I want to perform groupwise imputation by means of three-level models.
Thank you in advance!
Kind regards,
Sophie Stallasch
Hi,
there seems to be an issue with the p-values glm.cluster
is yielding. Maybe some dependencies have changed?
Reproducible example:
Calculating following model with glm.cluster
yields strange p-values.
library(miceadds)
summary(FIT <- glm.cluster(mpg ~ wt + disp + wt*disp, cluster="carb", data=mtcars))
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 44.08199770 1.960767641 22.482010 6.225590e-112
# wt -6.49567966 0.635946018 -10.214200 1.712866e-24
# disp -0.05635816 0.009436760 -5.972194 2.340843e-09
# wt:disp 0.01170542 0.001965829 5.954445 2.609566e-09
The right p-values, however, calculated per hand should be:
# p-values by hand
2 * pt(-abs(summary(FIT)[, 3]), df=FIT$glm_res$df.residual)
# (Intercept) wt disp wt:disp
# 1.846712e-19 6.015931e-11 1.972266e-06 2.068825e-06
These would also resemble lfe::felm
:
library(lfe)
summary(felm(mpg ~ wt + disp + wt*disp | 0 | 0 | carb, data=mtcars))$coe
# Estimate Cluster s.e. t value Pr(>|t|)
# (Intercept) 44.08199770 1.960767641 22.482010 1.846712e-19
# wt -6.49567966 0.635946018 -10.214200 6.015931e-11
# disp -0.05635816 0.009436760 -5.972194 1.972266e-06
# wt:disp 0.01170542 0.001965829 5.954445 2.068825e-06
It would be great if you could take a look at this issue,
cheers
PS: see also this discussion on Stack Overflow
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.