Code Monkey home page Code Monkey logo

Comments (4)

florianhartig avatar florianhartig commented on August 16, 2024

Hello Hugo,

just as a general point: we don't expect the GLM to have the same dispersion as the GLMM (because the RE absorbs dispersion), so unless you have simulated this data, we wouldn't know for sure by fitting the GLM that the GLMM should also show overdispersion. In this case, however, it is clear that default dispersion tests fails to flag the overdispersion that exists.

The reason is that the dispersion tests based on unconditional simulations can be a far less powerful than the test on conditional simulations (basically the issue increases the stronger the random effect), and you should therefore prefer to test with conditional residuals in case of strong REs. In the help, I therefore have the warning

Important: for either refit = T or F, the results of type = "DHARMa" dispersion test will differ depending on whether simulations are done conditional (= conditional on fitted random effects) or unconditional (= REs are re-simulated). How to change between conditional or unconditional simulations is discussed in simulateResiduals. The general default in DHARMa is to use unconditional simulations, because this has advantages in other situations, but dispersion tests for models with strong REs specifically may increase substantially in power / sensitivity when switching to conditional simulations. I therefore recommend checking dispersion with conditional simulations if supported by the used regression package.

The way to do this in lme4 would be to run

res2 <- simulateResiduals(MpoissonRE, re.form = NULL)
DHARMa::testDispersion(res2)

which would give you the result you expect. Alternatively, you can also run the analytical test

DHARMa::testDispersion(MpoissonRE, type = "PearsonChisq")

but this can have certain other problems in case of a large number of RE groups.

None of this requires refit and I also wouldn't recommend to use refit in this case for reasons of speed.

I appreciate that this appraoch is not 100% intuitive and the user shouldn't have to read the help in detail before using a function to get reasonable results. I have thought about switching the way residuals are simulated in the background for the dispersion tests, but then the dispersion tests would have another default than the rest of the package and the conditional residuals are not available for all R packages (in particular, glmmTMB doesn't support them), so if I switch automatically the function would perform differently for different R packages.

from dharma.

HugoMH avatar HugoMH commented on August 16, 2024

Hi,

Thank you for your answer,

The main problem I think is that when you run

DHARMa::testDispersion(MpoissonRE)

The function testDispersion internally calls simulateResiduals(MpoissonRE) while it should call testDispersion(simulateResiduals(MpoissonRE, re.form = NULL , refit = T)) for GLMMs.
$~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~$ ^^^^^^^^ (see below)

If this is not possible as for glmmTMB, then testDispersion should return the classical PearsonChisq dispersion, with warning about the change of the default in comparison to other packages.

However, I still think that re.form = NULL is not the solution :

DHARMa::testDispersion(DHARMa::simulateResiduals(MpoissonRE, re.form = NULL           ), type = 'DHARMa')
# dispersion = 1.8607, p-value < 2.2e-16

DHARMa::testDispersion(DHARMa::simulateResiduals(MpoissonRE, re.form = NULL           ), type = 'PearsonChisq')
# dispersion = 2650.7, df = 337, p-value < 2.2e-16

DHARMa::testDispersion(DHARMa::simulateResiduals(MpoissonRE, re.form = NULL, refit = T), type = 'DHARMa')
# dispersion = 2731.3, p-value < 2.2e-16

For the description of the argument refit, I don't fully understand the sentence
"if FALSE, new data will be simulated and scaled residuals will be created by comparing observed data with new data."

Overdispersion has to be detected by comparing the observed residual variance to the assumed variance. The assumed variance cannot be estimated from the fitted model, it has to use the parametric bootstrap, even though it will be somewhat slower. To faster computation, we can lower the number to simulations, for example

DHARMa::testDispersion(DHARMa::simulateResiduals(MpoissonRE, re.form = NULL, refit = T, n = 10), type = 'DHARMa')

or we can get some rest and be patients 🙂

Cheers,
Thanks anyway for developing this package.

hugo

from dharma.

florianhartig avatar florianhartig commented on August 16, 2024

Hi Hugo,

yes, testDispersion simulates the residuals via the default, but if you want to modify this, you can do this as in

res2 <- simulateResiduals(MpoissonRE, re.form = NULL)
DHARMa::testDispersion(res2)

which I would recommend anyway. Do you think that this is inconvenient or confusing?

If I understand you correctly, your second problem is that the two tests give different dispersion parameters? Differences are not necessarily unexpected, as they are based on different test statistics. Maybe the help is not entire clear in DHARMa, but there are basically 2 fundamentally different options in DHARMa:

  1. Default: comparison of SD of simulated / observed residuals

  2. Comparison of Pearson Chi-2. which can be either done via df analytically (option PearsonChisq) or via refit = T. The advantage of refit is that the Pearson residuals will not be exactly Chi2 under a Mixed model, which biases the test to underdispersion.

So, in that sense it is no surprise that your option 2,3 are very similar, and 1 is different. Of 2,3, you would think 3 is more exact.

Whether 1 is a problem is debatable to me. In our tests (I had a MSc thesis about this recently), the default tests with re.form = NULL have excellent type I error rates and power. In fact, there doesn't seem to be an advantage of the PearsonChisq. The only issue may be (we haven't looked at this in detail) that the test statistic is not really proportional to what you expect as a dispersion parameter. If you plot the residuals, it looks indeed wider dispersed than 1.8. So, is the 1.8 the problem you see, you would like the value to be higher?

from dharma.

HugoMH avatar HugoMH commented on August 16, 2024

Hi

Yes, I think that recommending to not use the default value of a package you develop is problematic and confusing. Most people will not read the recommendations, and many will draw wrong conclusions.

In addition, as mentioned in 'Overdispersion and Diagnostics (Stat 544, Lecture 11)' https://app.box.com/s/ejcig9z4shtt3wodq6nfb2ojignzps11

"A test for φ = 1?
There is no reliable test for φ = 1.
A test is not important anyway, because what really matters is how far away φ is from one, not how much evidence you can accumulate against the null hypothesis H0 : φ = 1."

(For a similar discussion about normality in linear models, see https://stats.stackexchange.com/questions/2492/is-normality-testing-essentially-useless)

So the effect size of the deviation from the assumption of a model is more useful than a p-value.

You say that the first approach, is accurate. Maybe that is for the p-value, I don't know, but that is clearly not for the effect size, and some people use only effect sizes in this context.

The first approach yield an estimate that is thousands of times lower than other approaches.

Some consider that in models with φ<2, overdispersion is negligible. This threshold is way too conservative. But people using it would interpret the p-values of this heavily overdispersed model.

Cheers and good-luck with future

from dharma.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.