Comments (4)
Hello Hugo,
just as a general point: we don't expect the GLM to have the same dispersion as the GLMM (because the RE absorbs dispersion), so unless you have simulated this data, we wouldn't know for sure by fitting the GLM that the GLMM should also show overdispersion. In this case, however, it is clear that default dispersion tests fails to flag the overdispersion that exists.
The reason is that the dispersion tests based on unconditional simulations can be a far less powerful than the test on conditional simulations (basically the issue increases the stronger the random effect), and you should therefore prefer to test with conditional residuals in case of strong REs. In the help, I therefore have the warning
Important: for either refit = T or F, the results of type = "DHARMa" dispersion test will differ depending on whether simulations are done conditional (= conditional on fitted random effects) or unconditional (= REs are re-simulated). How to change between conditional or unconditional simulations is discussed in simulateResiduals. The general default in DHARMa is to use unconditional simulations, because this has advantages in other situations, but dispersion tests for models with strong REs specifically may increase substantially in power / sensitivity when switching to conditional simulations. I therefore recommend checking dispersion with conditional simulations if supported by the used regression package.
The way to do this in lme4 would be to run
res2 <- simulateResiduals(MpoissonRE, re.form = NULL)
DHARMa::testDispersion(res2)
which would give you the result you expect. Alternatively, you can also run the analytical test
DHARMa::testDispersion(MpoissonRE, type = "PearsonChisq")
but this can have certain other problems in case of a large number of RE groups.
None of this requires refit and I also wouldn't recommend to use refit in this case for reasons of speed.
I appreciate that this appraoch is not 100% intuitive and the user shouldn't have to read the help in detail before using a function to get reasonable results. I have thought about switching the way residuals are simulated in the background for the dispersion tests, but then the dispersion tests would have another default than the rest of the package and the conditional residuals are not available for all R packages (in particular, glmmTMB doesn't support them), so if I switch automatically the function would perform differently for different R packages.
from dharma.
Hi,
Thank you for your answer,
The main problem I think is that when you run
DHARMa::testDispersion(MpoissonRE)
The function testDispersion
internally calls simulateResiduals(MpoissonRE)
while it should call testDispersion(simulateResiduals(MpoissonRE, re.form = NULL , refit = T))
for GLMMs.
If this is not possible as for glmmTMB
, then testDispersion
should return the classical PearsonChisq
dispersion, with warning about the change of the default in comparison to other packages.
However, I still think that re.form = NULL
is not the solution :
DHARMa::testDispersion(DHARMa::simulateResiduals(MpoissonRE, re.form = NULL ), type = 'DHARMa')
# dispersion = 1.8607, p-value < 2.2e-16
DHARMa::testDispersion(DHARMa::simulateResiduals(MpoissonRE, re.form = NULL ), type = 'PearsonChisq')
# dispersion = 2650.7, df = 337, p-value < 2.2e-16
DHARMa::testDispersion(DHARMa::simulateResiduals(MpoissonRE, re.form = NULL, refit = T), type = 'DHARMa')
# dispersion = 2731.3, p-value < 2.2e-16
For the description of the argument refit
, I don't fully understand the sentence
"if FALSE, new data will be simulated and scaled residuals will be created by comparing observed data with new data."
Overdispersion has to be detected by comparing the observed residual variance to the assumed variance. The assumed variance cannot be estimated from the fitted model, it has to use the parametric bootstrap, even though it will be somewhat slower. To faster computation, we can lower the number to simulations, for example
DHARMa::testDispersion(DHARMa::simulateResiduals(MpoissonRE, re.form = NULL, refit = T, n = 10), type = 'DHARMa')
or we can get some rest and be patients 🙂
Cheers,
Thanks anyway for developing this package.
hugo
from dharma.
Hi Hugo,
yes, testDispersion simulates the residuals via the default, but if you want to modify this, you can do this as in
res2 <- simulateResiduals(MpoissonRE, re.form = NULL)
DHARMa::testDispersion(res2)
which I would recommend anyway. Do you think that this is inconvenient or confusing?
If I understand you correctly, your second problem is that the two tests give different dispersion parameters? Differences are not necessarily unexpected, as they are based on different test statistics. Maybe the help is not entire clear in DHARMa, but there are basically 2 fundamentally different options in DHARMa:
-
Default: comparison of SD of simulated / observed residuals
-
Comparison of Pearson Chi-2. which can be either done via df analytically (option PearsonChisq) or via refit = T. The advantage of refit is that the Pearson residuals will not be exactly Chi2 under a Mixed model, which biases the test to underdispersion.
So, in that sense it is no surprise that your option 2,3 are very similar, and 1 is different. Of 2,3, you would think 3 is more exact.
Whether 1 is a problem is debatable to me. In our tests (I had a MSc thesis about this recently), the default tests with re.form = NULL have excellent type I error rates and power. In fact, there doesn't seem to be an advantage of the PearsonChisq. The only issue may be (we haven't looked at this in detail) that the test statistic is not really proportional to what you expect as a dispersion parameter. If you plot the residuals, it looks indeed wider dispersed than 1.8. So, is the 1.8 the problem you see, you would like the value to be higher?
from dharma.
Hi
Yes, I think that recommending to not use the default value of a package you develop is problematic and confusing. Most people will not read the recommendations, and many will draw wrong conclusions.
In addition, as mentioned in 'Overdispersion and Diagnostics (Stat 544, Lecture 11)' https://app.box.com/s/ejcig9z4shtt3wodq6nfb2ojignzps11
"A test for φ = 1?
There is no reliable test for φ = 1.
A test is not important anyway, because what really matters is how far away φ is from one, not how much evidence you can accumulate against the null hypothesis H0 : φ = 1."
(For a similar discussion about normality in linear models, see https://stats.stackexchange.com/questions/2492/is-normality-testing-essentially-useless)
So the effect size of the deviation from the assumption of a model is more useful than a p-value.
You say that the first approach, is accurate. Maybe that is for the p-value, I don't know, but that is clearly not for the effect size, and some people use only effect sizes in this context.
The first approach yield an estimate that is thousands of times lower than other approaches.
Some consider that in models with φ<2, overdispersion is negligible. This threshold is way too conservative. But people using it would interpret the p-values of this heavily overdispersed model.
Cheers and good-luck with future
from dharma.
Related Issues (20)
- Dealing with non-uniform residuals in y-direction when plotted against predictor(s) HOT 2
- Question about using DHARMa for Bernoulli Response Data
- Add support for nls
- Diagnostic plots for model using glmmTMB and Beta distibution HOT 6
- could not find function "ensureDHARMa"
- plotResiduals() falls back to predict and doesn't create a warning if variable doesn't exist
- Implement option to use DHARMa plots on general residual definition (bypassing simulation)
- Power of the KS test HOT 1
- How to resolve the residual versus predicted quantile devation (Dharma plot)?
- Could not find documentation on red-shaded area around smooth spline HOT 2
- Pattern in RE-grouped residual for binomial GLMM HOT 2
- Adjust DHARMa plots for color blindness
- DHARMa: fittedModel not in class of supported models for a glmmTMB model HOT 3
- Dispersion calculations for mdcv tweedie distribution HOT 1
- Clarify dispersion tests
- Extracted Pearson residuals wrong for mccv (type should be "scaled.pearson")
- Adding support for multinomial family in mcvc::gam HOT 4
- Simulations with infitite values for weighted binomial GAM, interpreting plots HOT 3
- Move all function links to me md syntax [simulateResiduals]
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dharma.