florianhartig / dharma Goto Github PK
View Code? Open in Web Editor NEWDiagnostics for HierArchical Regession Models
Home Page: http://florianhartig.github.io/DHARMa/
Diagnostics for HierArchical Regession Models
Home Page: http://florianhartig.github.io/DHARMa/
A problem in the Vignette exmaple
To create a res object with a correct model (replace obs with sim), so that users can see what patterns in the plots are reasonable / normal
fittedModel$family$simulate() will work for standard families, but
This means DHARMa can currently not simulate from any of the extended families, and only from the top random level.
Perspective for this request: In general, I am reluctant to implement simulate functions in DHARMa, for reasons explained in https://github.com/florianhartig/DHARMa/wiki/Adding-new-R-packages-to-DHARMA However, once a simulate function would be available in mgcv, it would be very easy to integrate all it's capabilities in DHARMa. Feel free to request a simulate function from the package developers.
Interim solution for users that want to use DHARMa with glmmTMB: take your fitted model, create a simulate function for this model structure yourself, and then use createDHARMa (see help), this will allow most options of the package to be run. The vignette has some further comments / examples on creating custom simulation functions and reading them into DHARMa
Currently, the parametric overdispersion test is restricted to glmer -> check if this can be generalized to glm
to solve
"DHARMa::testOverdispersionParametric currently only works for GLMMs in lme4. For Poisson GLMs, you can use AER::dispersiontest, otherwise use the non-parametric tests of DHARMa to test dispersion."
A long-standing problem that I want to solve with the next release is the handling of the weights. The problem is the following:
Depending on the family, weights are either included in the simulations, or not. In some cases, they can't be included, because weights on the likelihood in a Poisson don't have a corresponding data-generating model. In some cases (like lmer), it seems to me that they were accidentally forgotten?
As DHARMa relies on the fact that the simulate() function simulates from the assumed likelihood, it is a quite essential information if weights are included in simulate or not (and ideally, they would be, as far as possible). Another problem is that models don't consistently warn if weights are not included in the simulation.
I think the current situation is the following
Example
library(DHARMa)
library(lme4)
library(glmmTMB)
library(spamMM)
testData = createData(sampleSize = 200, overdispersion = 0, family = gaussian())
# lm weights are considered in simulate()
fit <- lm(observedResponse ~ Environment1 , weights = Environment1 +1,
data = testData)
simulateResiduals(fittedModel = fit, plot = T)
# glm gaussian still the same
fit <- glm(observedResponse ~ Environment1 , weights = Environment1 +1,
data = testData)
simulateResiduals(fittedModel = fit, plot = T)
# glm behaves nice, throws a warning that simulate ignores weights for poisson
fit <- glm(ceiling(exp(observedResponse)) ~ Environment1 , weights = Environment1 +1,
data = testData, family = "poisson")
simulateResiduals(fittedModel = fit, plot = T)
# lmer behaves bad, doesn't include weights even for Gaussian, but doesn't warn either
fit <- lmer(observedResponse ~ Environment1 + (1|group), weights = Environment1 +1,
data = testData)
simulateResiduals(fittedModel = fit, plot = T)
# glmer warns
fit <- glmer(ceiling(exp(observedResponse)) ~ Environment1 + (1|group), weights = Environment1 +1,
data = testData, family = "poisson")
simulateResiduals(fittedModel = fit, plot = T)
# glmmTMB does not warn
fit <- glmmTMB(ceiling(exp(observedResponse)) ~ Environment1 + (1|group), weights = Environment1 +1,
data = testData, family = "poisson")
simulateResiduals(fittedModel = fit, plot = T)
# spaMM does not warn, but seems to be simulating with correct (heteroskedastic) variance. How exactly
fit <- HLfit(ceiling(exp(observedResponse)) ~ Environment1 + (1|group), prior.weights = Environment1 +1,
data = testData, family=poisson())
summary(fit)
simulateResiduals(fittedModel = fit, plot = T)
From a user: I’m using the DHARMa package, and I obtained this residuals plot
I did the tests explained in the vignette :
· testUniformity : S
· testOverdispersionParametric : NS
· testZeroInflation : NS
· adding overdispersion correction : no change
· adding quadratic term for both regressor : no change
Can you help me giving a lead about what’s going wrong ?
to solve
"DHARMa::testOverdispersionParametric currently only works for GLMMs in lme4. For Poisson GLMs, you can use AER::dispersiontest, otherwise use the non-parametric tests of DHARMa to test dispersion."
Users may find some plotting function of the package useful, even if their model does not require simulations to calculate residuals (e.g. lm, or glm with gamma).
Currently, the only way to use DHARMa with these models is to simulate. Should provide a direct numerical calculation of the quantles, e.g. convertResiduals for speed-up.
for parametric bootstrapt / simulated LRTs
confusing for the user to have so many functions - better to have one function, that has sensible default values
A user requested support for http://glmmadmb.r-forge.r-project.org/
Perspective for this request (updated): I have decided to not make any further efforts to achieve this because of bbolker/glmmadmb#12 . glmmADMB are advised to switch to glmmTMB, which is now supported by DHARMA
Interim solution for users that, for some reason, have to urgently use DHARMa with this package: take your fitted model, create a simulate function for this model structure yourself, and then use createDHARMa (see help), this will allow most options of the package to be run. See further comments on support of new packages in here. Note that the vignette has some further comments / examples on creating custom simulation functions and reading them into DHARMa
When plotting residuals vs. predicted in mixed models, a common question is if predictions should be calculated including the RE terms (conditional) .
The answer is no, at least if the REs are also included (re-simulated) when calculating the residuals!
The reason is that predictions with REs can produce artifacts (see, also #42). To understand this, note first that the question about conditional / unconditional appears both on the x and y axis, i.e. we can calculate residuals and predictions with / without REs.
So, let's see what happens if we do that - consider this example, where we compare residuals plotted against the unconditional predictions (no REs, DHARMa standard), and the conditional predictions (predictions include REs, lme4 predict standard):
# creating test data according to standard poisson GLMM assumptions
testData = createData(sampleSize = 200, randomEffectVariance = 1.5, family = poisson(), numGroups = 20)
# fitting GLMM
fittedModel <- glmer(observedResponse ~ Environment1 + (1|group), family = "poisson", data = testData)
# res ~ pred, with conditional and unconditional predictions
simulationOutput <- simulateResiduals(fittedModel = fittedModel)
par(mfrow = c(2,2))
plotResiduals(simulationOutput$fittedPredictedResponse, simulationOutput$scaledResiduals , main = "Unconditional Residuals\n Unconditional Predictions\n DHARMa standard", rank = T)
plotResiduals(predict(fittedModel), simulationOutput$scaledResiduals, main = "Unconditional Residuals\n Conditional Predictions\n", rank = T)
simulationOutput <- simulateResiduals(fittedModel = fittedModel, re.form = NULL)
plotResiduals(simulationOutput$fittedPredictedResponse, simulationOutput$scaledResiduals , main = "Conditional Residuals\n Unconditional Predictions\n", rank = T)
plotResiduals(predict(fittedModel), simulationOutput$scaledResiduals, main = "Conditional Residuals\n Conditional Predictions\n", rank = T)
The effect goes away when specifying the random effect as fixed
# alternative with grouping factor as fixed effect, this removes the pattern with group
fittedModel <- glm(observedResponse ~ Environment1 + as.factor(group), family = "poisson", data = testData)
simulationOutput <- simulateResiduals(fittedModel = fittedModel)
plot(simulationOutput, rank = T)
I should add more explanation on this in the vignette, especially as the predict() function in R includes the REs by default, so users will be tempted to plot residuals against predictions with REs.
There seem to be a few related questions on CV that are still unanswered.
When the variability in predicted values is very low, smooth.spline can crash, which crashes the plot
Error in smooth.spline(pred, res, df = 10) :
'tol' must be strictly positive and finite
best solution is probably to catch this error
The output of the fitted function in mgcv::gam is not named, in contrast to lm / glm - this minor difference creates a problem with the simulate functions, see demonstration here
https://github.com/florianhartig/DHARMa/blob/master/Code/DHARMaPackageSupport/mgcv/GAMProblems.md
Since DHARMa 0.1.3, DHARMa overwrites the fitted function by fitted.gam in https://github.com/florianhartig/DHARMa/blob/master/DHARMa/R/compatibility.R
I'm creating this ticket to keep in mind that this is a hack that is not optimal - ideally, this would be fixed in mgcv, and once it is this should be removed again.
Would be nice to transform x in a way such that the distribution is more or less flat
Add option to bin / group DHARMa residuals
in particular plotSimulatedResiduals
A question from a user: [...] I have seen that the package can be adopted to test for spatial autocorrelation, both by testing the distribution of coordinates against random and against real coordinates.
In the case of real coordinates, which reference system should be used? I am adopting point data based on a projected coordinates ED50/32N (EPSG: 23032).
check if glmer.nb works with DHARMa
Should be possible in principle, see lme4/lme4#284
Goes back to #47 - looks as if the simulated overdispersion test with glmer.nb and refit = T doesn't perform as expected
library(lme4)
testData = createData(sampleSize = 300, overdispersion = 5, randomEffectVariance = 0, family = poisson())
fittedModel <- glmer.nb(observedResponse ~ Environment1 + (1|group) , data = testData)
simulationOutput <- simulateResiduals(fittedModel = fittedModel)
plot(simulationOutput, rank = T)
# works
testOverdispersion(simulationOutput)
# works, but produces overdispersion! -> new ticket
simulationOutput2 <- simulateResiduals(fittedModel = fittedModel, refit = T)
testOverdispersion(simulationOutput2)
# doesn't work
testOverdispersionParametric(fittedModel)
For complex models, it can happen that applying simulateResiduals(..., refit = T) results in a range of refitted models that have all the same parameters.
Symptoms of this phenomenon are
The reason in each cases is that there is no variance created by the simulation.
I suspect that all these cases originate from lme4 failing to converge, or potentially already being converged according to the optimization criteria (as the model is updated, the optimizer likely starts with the old values, so it could be that the current model is already good enough to trigger the convergence tolerance).
Proposed interim solution: throw warning for this problem
Proposed long-term solution: find the convergence problem
Thanks to Luis Cayuela Delgado for bringing this problem to my attention.
I just noted that currently, fitted binomial models with a factor as response (as opposed to 0/1) are not properly handled.
To avoid that a failure of convergence for single models leads to a failure of the whole refit procedure, see #17
The vignette talks about how to see if a model might be zero-inflated, but unless I missed it, there doesn't seem to be a recommended course of action to fix the model in the case of fixed-effect Poisson regression.
Searching around the web I found the zeroinfl()
function in the pscl
package, but it doesn't work with DHARMa
so I can fit a zeroinfl()
model, but I cannot plot its residuals to see how they changed. What do you guys use when you have a zero-inflated non mixed-effect Poisson model?
I understand the point you made in the vigniette about zero inflation being hard to distinguish from overdispersion. Because of the nature of the data, it is mechanistically plausible that it will be zero inflated. The data are counts of complications in a manually curated registry of surgery patients.
Is it wrong to remedy zero-inflation by the same means as remedying overdispersion in general i.e. using a negative-binomial model instead? In that case I will try glm.nb()
it sounds from the close issued like you already added support for it.
Depending on the answer to the above question, this issue could be classified either as a documentation request or a feature request (to add support for zeroinfl()
or some other zero-inflated Poisson model object).
Thank you very much for your time and for creating this very helpful package.
I'm afraid it might not be easy to give reproducible code for this, but I'll give as much information as I can. I receive no warnings in the model itself, but simulateResiduals(binary.glmer, n=500, refit=T)
gives the following error:
Error in pwrssUpdate(pp, resp, tol = tolPwrss, GQmat = GQmat, compDev = compDev, :
Downdated VtV is not positive definite
In addition: There were 12 warnings (use warnings() to see them)
Any suggestions would be appreciated. I always get this warning with this model, but I've run similar models on similar data structures that worked alright.
Model:
binary.glmer <- glmer(Status ~ Line + (1|Day), data=ParBin.data, family="binomial", nAGQ=25)
Data:
> print(head(ParBin.data))
Line TreeID Day Status
1 Zoar 3.1 1 1
2 Zoar 3.3 1 0
3 Cropper 4.1 1 1
4 Cropper 4.1 1 1
5 Cropper 4.2 1 1
6 Cropper 4.2 1 1
> str(ParBin.data)
'data.frame': 306 obs. of 4 variables:
$ Line : Factor w/ 7 levels "BC3F3","Cropper",..: 7 7 2 2 2 2 2 2 2 2 ...
$ TreeID: Factor w/ 37 levels "1.1","3.1","3.3",..: 2 3 5 5 6 6 6 6 8 9 ...
$ Day : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
$ Status: int 1 0 1 1 1 1 1 0 0 1 ...
sessionInfo():
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.1
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] grid stats graphics grDevices utils datasets methods base
other attached packages:
[1] DHARMa_0.1.3 lsmeans_2.25 estimability_1.2 lmtest_0.9-34 zoo_1.7-14
[6] lme4_1.1-12 Matrix_1.2-8 EnvStats_2.2.1 boot_1.3-18 aod_1.3
[11] multcompView_0.1-7 multcomp_1.4-6 TH.data_1.0-8 MASS_7.3-45 survival_2.40-1
[16] mvtnorm_1.0-5 ggedit_0.0.2 miniUI_0.1.1 dplyr_0.5.0 colourpicker_0.3
[21] plyr_1.8.4 plotly_4.5.6 ggplot2_2.2.1 scales_0.4.1 gridExtra_2.2.1
[26] reshape2_1.4.2 shinyBS_0.61 shinyAce_0.2.1 shiny_1.0.0
loaded via a namespace (and not attached):
[1] purrr_0.2.2 splines_3.3.2 lattice_0.20-34 colorspace_1.3-2 htmltools_0.3.5 viridisLite_0.1.3
[7] base64enc_0.1-3 nloptr_1.0.4 DBI_0.5-1 foreach_1.4.3 stringr_1.1.0 munsell_0.4.3
[13] gtable_0.2.0 htmlwidgets_0.8 coda_0.19-1 codetools_0.2-15 labeling_0.3 httpuv_1.3.3
[19] Rcpp_0.12.9 xtable_1.8-2 jsonlite_1.2 mime_0.5 digest_0.6.12 stringi_1.1.2
[25] gap_1.1-16 tools_3.3.2 sandwich_2.3-4 magrittr_1.5 lazyeval_0.2.0 tibble_1.2
[31] tidyr_0.6.1 qrnn_1.1.3 iterators_1.0.8 assertthat_0.1 minqa_1.2.4 httr_1.2.1
[37] rstudioapi_0.6 R6_2.2.0 nlme_3.1-130
Based on #35, the request to add support for pscl::zeroinfl into DHARMa
Status of this (updated):
In general, pscl::zeroinfl could be added to DHARMa, provided that the package developers implement a simulate function, see https://github.com/florianhartig/DHARMa/wiki/Adding-new-R-packages-to-DHARMA
However, and most important from the user perspective, with the addition of glmmTMB #16 , all possible models that could be fit with pscl::zeroinfl could also be fit with glmmTMB, which is now supported by DHARMa
I have therefore decided to close this issue for now. Feel free to reopen if you think that pscl::zeroinfl is important for whatever reason, or if pscl::zeroinfl implements a simulate function
add checks for k/n provision via weights
A question from a user
I have a question regarding testTemporalAutocorrelation(). Here’s my code:
simulationOutput <- simulateResiduals(fittedModel = fittedmodel, n = 250, refit = T)
simulationOutput2 <- simulateResiduals(fittedModel = fittedmodel, n = 250)
testTemporalAutocorrelation(simulationOutput = simulationOutput, time = data2$date) # DW = 1.7327, p-value = 0.04495
testTemporalAutocorrelation(simulationOutput = simulationOutput2, time = data2$date) # DW = 2.1114, p-value = 0.7601
What does it mean if when testing for temporal autocorrelation, p value is significant when I used refit simulation and non-significant when I don’t use refit?
Test example similar to http://stats.stackexchange.com/questions/264508/choosing-between-glmer-models-supported-by-data-and-good-gof-vs-better-binned in DHARMa
A question from a user, also related to an earlier question on my blog. My answer is largely copied from there.
Some of the papers on simulated quantile residuals transform the simulated residuals (possibly originating from Poisson, NB, binomial, ...) to a normal distribution. I presume this is because of the larger familiarity of most users with normal residuals.
My statistical home is built more in Bayesian territory, and here it's customary to transform to uniformity (Bayesian p-values), so in fact the idea of transforming to normality for better interpretation never occurred to me until people started asking me about it.
I mainly considered the transformation to normality because statistical tests that a user might want to apply on the residuals may require normality. As an example, I was concerned about the testTemporalAutocorrelation() function in DHARMa, which employs the Durbin-Watson test. I therefore implemented did implement also a normal transformation, you can access it via ?residuals.DHARMa. However, I have subsequently found that for testing, the uniform residuals work perfectly fine. In any case, it's a fact that we could display the residuals uniform, normal, or also with any other distribution (beta, gamma, ...).
My reasons to stay with the uniform residuals are the following:
I find it more logical and parsimonious. The idea of the uniform residuals is not really that we have a uniform residual distributions, but rather to express the residual of any distribution in terms of its position on its (e)CDF. That makes more sense to me that to re-convert the CDF values to an arbitrary third distribution that is not the actual expected residual distribution. In fact, the whole point of DHARMa is to transform to a common standard.
My strong feeling is that our eye is better at detecting inhomogeneities from uniformity that deviations from normality, so I would conjecture that uniform residuals are better diagnostics for a human.
Moreover, uniform residuals are computationally more stable. The main issue with normal residual is that, with a limited number of simulations, it is quite likely to have all simulations lower or higher than the observed value, which leads to 0 or 1 in the current uniform residuals, but would map to -Inf / Inf when transforming to normal. This could probably be fudged away, e.g. by adding some kernel density to the ecdf estimation, but who knows what that would do to the test properties, so in the end I decided that it’s safer to perform all tests on the uniform residuals (and the DB test seemed to perform fine on the uniform residuals).
Request see discussion here https://twitter.com/marcnicklas/status/910423954464092160
Perspective for this request: afaiks, this package does not yet include a simulate function, which is a minimum requirement for inclusion in DHARMa. I am reluctant to implement such a function myself, with the exception of essential packages, because it is difficult for me to guarantee compatibility with future updates. The respective model package is a better place for such a function. Feel free to request a simulate function from the package developers. Once such a function is available, I am more than willing to include the package to DHARMa.
Interim solution for users that want to use DHARMa with glmmTMB: take your fitted model, create a simulate function for this model structure yourself, and then use createDHARMa (see help), this will allow most options of the package to be run.
To speed up the plots
From a user:
I'm having issues installing DHARMa.
I'm using Rstudio.
> install.packages("DHARMa")
Installing package into ‘C:/Users/LocalAdmin/Documents/R/win-library/3.1’
(as ‘lib’ is unspecified)
package ‘DHARMa’ is available as a source package but not as a binary
Warning in install.packages :
package ‘DHARMa’ is not available (as a binary package for R version 3.1.3)
package doesn't show up on the packages list, and cannot be loaded by
library(DHARMa)
If I try to install via the .gz file (using the GUI) I get this error:
ERROR: dependencies 'gap', 'qrnn', 'lmtest', 'sfsmisc', 'doParallel', 'foreach' are not available for package 'DHARMa'
* removing 'C:/Users/LocalAdmin/Documents/R/win-library/3.1/DHARMa'
Warning in install.packages :
running command '"C:/PROGRA~1/R/R-31~1.3/bin/x64/R" CMD INSTALL -l "C:\Users\LocalAdmin\Documents\R\win-library\3.1" "C:/Users/LocalAdmin/Documents/Academic/Statistisics/DHARMa_0.1.3.tar.gz"' had status 1
Warning in install.packages :
installation of package ‘C:/Users/LocalAdmin/Documents/Academic/Statistisics/DHARMa_0.1.3.tar.gz’ had non-zero exit status
if I try to install via the zip file, I get this error:
Warning in install.packages :
cannot open compressed file 'florianhartig-DHARMa-v0.1.3-0-g1a82759/DESCRIPTION', probable reason 'No such file or directory'
Error in install.packages : cannot open the connection
I then tried installing devtools and installing the development release, which also did not work:
> devtools::install_github(repo = "DHARMa", username = "florianhartig", subdir = "DHARMa", dependencies = T)
Downloading GitHub repo florianhartig/DHARMa@master
from URL https://api.github.com/repos/florianhartig/DHARMa/zipball/master
Installing DHARMa
Error in `_digest`(c(list(repos, type), lapply(`_additional`, function(x) eval(x[[2L]], :
object 'digest_impl' not found
In addition: Warning message:
Username parameter is deprecated. Please use florianhartig/DHARMa
When I try to use glm.nb, I get the message:
Warning message:
In simulateResiduals(fittedModel = fittedModel, n = 250): DHARMa: fittedModel not in class of supported models. Absolutely no guarantee that this will work!
Do you think the results are still ok?
From a user: It's working fine most of the time but sometimes I get this error using:
plotSimulatedResiduals
Error in qrnn::qrnn.fit(x = as.matrix(pred), y = as.matrix(res), n.hidden = 4, :
zero variance column(s) in "x"
as mentioned in http://glmm.wikidot.com/faq
remove
doParallel,
foreach,
to reduce dependencies
Check the overdispersion tests in
https://github.com/florianhartig/DHARMa/blob/master/Code/SimulationExperiments/PowerOverdispersion.md
Try if you can find a better test statistic for the Dharma nonparametric test
A user requested support for https://github.com/glmmTMB/glmmTMB
glmmTMB is supported by DHARMA since https://github.com/florianhartig/DHARMa/releases/tag/v0.1.6.2.
glmmTMB is fully supported (fixing the re.form problem and some other things, see below) since DHARMa 0.2.7 and glmmTMB 1.0
A remaining limitation is the lack of support in glmmTMB to condition simulations on fitted REs, see glmmTMB/glmmTMB#888
A simple example is
m <- glmmTMB(count~ mined, family=poisson, data=Salamanders)
summary(m)
res = simulateResiduals(m)
plot(res, asFactor = T)
More examples in the vignette or in https://github.com/florianhartig/DHARMa/tree/master/Code/DHARMaPackageSupport/glmmTMB
Discuss alternatives and related packages / functions in the vignette
Candidates
glm.diag.plots
http://ugrad.stat.ubc.ca/R/library/statmod/html/qresiduals.html
Hi Florian,
Here is a reproducible example to show you the errors I've been getting when I try to use simulationOutput with a gamma glmm. Perhaps it is something specific to my models, but I'm getting the same error message from several different models.
“Warning message:
In rgamma(nsim * length(ftd), shape = shape, rate = shape/ftd) :
NAs produced”
pop <- as.character(c("BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "MA", "MA",
"MA", "MA", "MA", "MA", "MA", "MA", "MA", "MA", "MA", "MA", "MA", "MA", "MA", "NU", "NU", "NU", "NU", "NU", "NU", "NU", "NU", "NU", "SA",
"SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA",
"SA", "SA", "SA", "SA", "SA"))
season <- as.character(c( "fall", "spring", "fall", "spring", "fall", "spring", "fall", "spring", "fall", "spring", "fall", "fall", "spring", "fall", "fall", "spring", "fall",
"spring", "spring", "fall", "spring", "fall", "spring", "fall", "spring", "fall", "spring", "fall", "spring", "spring", "fall", "spring", "spring", "fall",
"spring", "spring", "fall", "fall", "fall", "fall", "fall", "fall", "fall", "spring", "fall", "fall", "fall", "spring", "fall", "spring", "fall",
"spring", "spring", "fall", "fall", "spring", "fall", "spring", "spring", "fall", "fall", "fall", "fall", "spring", "fall", "fall", "spring", "spring",
"fall", "fall", "spring", "fall", "fall", "spring"))
id <- "2 2 4 4 7 7 9 9 10 10 84367 84367 84367 84368 84368 84368 84368 84368 84368 84369 84369 33073 33073 33073 33073 33073 33073 33073 33073 33073 80149 80149 80149 80150 80150 80150 57140 57141 126674 126677 126678 126680 137152 137152 137157 115925 115925 115925 115925 115925 115925 115925 115925 115926 115926 115926 115926 115926 115926 115927 115928 115929 115929 115929 115930 115930 115930 115930 115931 115931 115931 115932 115932 115932"
id <- strsplit(id, " ")
id <- as.numeric(unlist(id))
distance <- "0.2970136 0.2813103 0.2409127 0.2461686 0.3392629 0.3246902 0.2938654 0.4403401 0.3935010 0.8161045 0.4622339 0.5448272 0.4347536 0.3623991 0.5014513
0.3961407 0.4285523 0.5033465 0.3668231 0.4008644 0.3642039 0.5428035 0.6348236 0.5461090 0.5763835 0.4907923 0.4349144 0.4891743 0.6423068 0.4663140
0.5226629 0.4855906 0.5868346 0.6429156 0.6363822 0.7002516 2.8778679 1.9055360 3.5048864 2.0234082 1.9940036 2.4991125 2.0742525 2.4859194 2.2326559
0.5232152 0.4835573 0.4421921 0.6048358 1.0315084 0.4935351 0.5886613 0.4821023 0.9571798 0.5721407 0.5219413 0.4243556 0.6960064 0.4713459 0.6254402
0.4887114 1.0324105 0.5536996 0.5539310 0.9808605 0.4164348 0.4658780 0.4707927 0.3722258 0.3805717 0.4608752 0.5829011 0.5095774 0.4262177"
distance <- strsplit(distance, " ")
distance <- as.numeric(unlist(distance))
distdata <- data.frame(pop = pop, season = season, id = id, distance = distance)
dist.glmm <- glmer(distance ~ pop * season + (1|id),
data=distdata, family = Gamma, control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))
simulationOutput <- simulateResiduals(fittedModel = dist.glmm)
plotSimulatedResiduals(simulationOutput = simulationOutput)
Are you interested in the below problem with DHARMa testOverdispersionParametric() ?
The plot generated by DHARMa is
Regards
Desmond
> testOverdispersionParametric( fitLmer ) # doesn't work
Chisq test for overdispersion in GLMMs
data: Error in cat("data: ", x$data.name, "\n", sep = "") :
argument 2 (type 'language') cannot be handled by 'cat'
>
> class(fitLmer)
[1] "glmerMod"
attr(,"package")
[1] "lme4"
>
> fitLmer
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: Negative Binomial(7.6604) ( log )
Formula: value ~ 1 + tissue + PLATFORM + (1 | metabId) + (1 | metabId:tissue) + (1 | specimenId) + (1 | specimenPlatform)
Data: dfData
AIC BIC logLik deviance df.resid
868136.3 868258.9 -434053.1 868106.3 26286
Random effects:
Groups Name Std.Dev.
metabId:tissue (Intercept) 1.0322
metabId (Intercept) 2.1001
specimenPlatform (Intercept) 0.1092
specimenId (Intercept) 0.1192
Number of obs: 26301, groups: metabId:tissue, 2201; metabId, 660; specimenPlatform, 240; specimenId, 60
Fixed Effects:
(Intercept) tissueBrain tissueHeart tissueKidney tissueOuter Medulla PLATFORMLC/MS Neg PLATFORMLC/MS Polar PLATFORMLC/MS Pos PLATFORMLC/MS Pos Early
12.96533 -0.01186 0.43224 1.16883 1.09679 1.34259 0.62587 2.95287 2.80882
PLATFORMLC/MS Pos Late
3.13469
convergence code 0; 1 optimizer warnings; 0 lme4 warnings
>
> oSimRes <- NA; system.time( oSimRes <- simulateResiduals( fitLmer, n=1000) )
user system elapsed
14.06 1.33 15.40
>
> try(plotSimulatedResiduals( oSimRes ))
>
>
> testOverdispersion( oSimRes )
DHARMa nonparametric overdispersion test via IQR of scaled residuals against IQR expected under uniform
data: oSimRes
dispersion = 1.0662, p-value < 2.2e-16
alternative hypothesis: overdispersion
Warning message:
In testOverdispersion(oSimRes) :
You have called the non-parametric test for overdispersion based on the scaled residuals. Simulations show that this test is less powerful for detecting overdispersion than the default uniform test on the sclaed residuals, and a lot less powerful than a parametric overdispersion test, or the non-parametric test on re-simulated residuals. The test you called is only implemented for testing / development purposes, there is no scenario where it would be preferred. See vignette for details.
>
When re-simulating residuals for fixed data and integer responses, p-values may spread within a certain range, due to the randomization procedure to smoothen out the integer values.
This phenomenon was discussed in #37
Particularly in low-data situation, this can result in p-values being quite variable. In this case, it might be desirable to have an option to obtain an aggregate p-value from multiple simulations.
Question is
a) how to best do this
b) what the properties of the resulting p-values are in terms of their distribution / type I error / power
Add option to calculate aggregate residuals, e.g. for binning binomial responses
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.