florianhartig / dharma Goto Github PK

Diagnostics for HierArchical Regession Models

Home Page: http://florianhartig.github.io/DHARMa/

R 71.73% Rez 28.27%

cran glmm residual regression regression-diagnostics

dharma's Introduction

DHARMa - Residual Diagnostics for HierARchical Models

The 'DHARMa' package uses a simulation-based approach to create readily interpretable scaled (quantile) residuals for fitted (generalized) linear mixed models. Currently supported are linear and generalized linear (mixed) models from 'lme4' (classes 'lmerMod', 'glmerMod'), 'glmmTMB' 'GLMMadaptive' and 'spaMM', generalized additive models ('gam' from 'mgcv'), 'glm' (including 'negbin' from 'MASS', but excluding quasi-distributions) and 'lm' model classes. Moreover, externally created simulations, e.g. posterior predictive simulations from Bayesian software such as 'JAGS', 'STAN', or 'BUGS' can be processed as well. The resulting residuals are standardized to values between 0 and 1 and can be interpreted as intuitively as residuals from a linear regression. The package also provides a number of plot and test functions for typical model misspecification problems, such as over/underdispersion, zero-inflation, and residual spatial and temporal autocorrelation.

Installing DHARMa

From CRAN

DHARMa is on CRAN, and for most users, installing from CRAN will be the best option. To install the latest CRAN release, just run

install.packages("DHARMa")

To get an overview about its functionality once the package is installed, run

library(DHARMa)
?DHARMa
vignette("DHARMa", package="DHARMa")

The vignette, which can also be read online here, provides many exampless about how to use the package function for the supported regression models. To cite the package, run

citation("DHARMa")

To fit a model (from any package supported by DHARMa), run

testData = createData(sampleSize = 200, family = poisson())
m1 <- glm(observedResponse ~ Environment1, 
                     family = "poisson", data = testData)

res <- simulateResiduals(m1, plot = T)

and read to help of ?simulateResiduals and the vignette to understand what you can do with the object res. If you want to ask questions about DHARMa, or report a bug, please use the DHARMa GH issue page.

Development release

New features in DHARMa will typically on GitHub 1-2 months before they are on CRAN. If you want to install the current (development) version from this repository, run

devtools::install_github(repo = "florianhartig/DHARMa", subdir = "DHARMa", 
dependencies = T, build_vignettes = T)

Below the status of the automatic tests via GitHub Actions

Development branches / older releases

To install a specific (older) release, or a particular branch, decide for the version number that you want to install in https://github.com/florianhartig/DHARMa/releases (version numbering corresponds to CRAN, but there may be smaller releases that were not pushed to CRAN), or branch and run

devtools::install_github(repo = "florianhartig/DHARMa", subdir = "DHARMa", 
ref = "v0.0.2.1", dependencies = T, build_vignettes = T)

with the appropriate version number / branch as argument to ref.

Contributing to DHARMa

Contributions to DHARMa are very welcome! There are several ways in which you can contribute:

A simple but nevertheless important way to contribute is to suggest problems / new features in DHARMa, and post them in our issue tracker. A good issue should at least have a clear reproducible example. If possible, it could also already contain an analysis of the problem, and / or ideas for a fix. Likewise, feel free to comment on issues existing issues, e.g. by adding examples or suggesting sollutions.
If you want to propose a solution an existing problem, for simple things (typos, etc.), the easiest would be to just create a PR that I can directly merge. For more complicated changes, however, I would suggest that it is more effective to first discuss the approach at the thread of the issue.

When working on these issues, note that there is extensive code for tests / development purposes outsite the core package in the folde ./code/ on GH. You may find useful information there, and in case you have code intended for development to contribute, you may also create a PR intended for this section.

Also, there are a few technical hints about DHARMA development on the DHARMa GH wiki.

Code of conduct

The development of DHARMA and all its surrounding activities is based on the values of scientific integrity, free software and knowledge, and mutual respect, indepdent of background or world view.

Acknowledgements

A question by Catalina Gutiérrez Chacón provided me with the motivation write the first version of DHARMa. Thanks for useful suggestions to improve DHARMa by Jochen Fründ, Tomer J. Czaczkes, Luis Cayuela Delgado, Alexandre Courtiol, Jim Thorson, Lukas Lohse, jmniehaus, justintimm and many other people that made comments on GitHub, Crossvalidated or via email.

dharma's People

Contributors

Stargazers

Watchers

dharma's Issues

Add message for low number of samples in plot

Add a warning message that quantile lines are expected to vary when there are only very few points.

Should tell the user to either use #38 or #41 once this is implemented

Install problems

From a user:

I'm having issues installing DHARMa.

I'm using Rstudio.

> install.packages("DHARMa")
Installing package into ‘C:/Users/LocalAdmin/Documents/R/win-library/3.1’
(as ‘lib’ is unspecified)

   package ‘DHARMa’ is available as a source package but not as a binary

Warning in install.packages :
  package ‘DHARMa’ is not available (as a binary package for R version 3.1.3)



package doesn't show up on the packages list, and cannot be loaded by
library(DHARMa)

If I try to install via the .gz file (using the GUI) I get this error:

ERROR: dependencies 'gap', 'qrnn', 'lmtest', 'sfsmisc', 'doParallel', 'foreach' are not available for package 'DHARMa'
* removing 'C:/Users/LocalAdmin/Documents/R/win-library/3.1/DHARMa'
Warning in install.packages :
  running command '"C:/PROGRA~1/R/R-31~1.3/bin/x64/R" CMD INSTALL -l "C:\Users\LocalAdmin\Documents\R\win-library\3.1" "C:/Users/LocalAdmin/Documents/Academic/Statistisics/DHARMa_0.1.3.tar.gz"' had status 1
Warning in install.packages :
  installation of package ‘C:/Users/LocalAdmin/Documents/Academic/Statistisics/DHARMa_0.1.3.tar.gz’ had non-zero exit status

if I try to install via the zip file, I get this error:


Warning in install.packages :
  cannot open compressed file 'florianhartig-DHARMa-v0.1.3-0-g1a82759/DESCRIPTION', probable reason 'No such file or directory'
Error in install.packages : cannot open the connection

I then tried installing devtools and installing the development release, which also did not work:

> devtools::install_github(repo = "DHARMa", username = "florianhartig", subdir = "DHARMa", dependencies = T)
Downloading GitHub repo florianhartig/DHARMa@master
from URL https://api.github.com/repos/florianhartig/DHARMa/zipball/master
Installing DHARMa
Error in `_digest`(c(list(repos, type), lapply(`_additional`, function(x) eval(x[[2L]],  : 
  object 'digest_impl' not found

In addition: Warning message:
Username parameter is deprecated. Please use florianhartig/DHARMa

Warning from simulateResiduals when using DHARMa with lme4 glmer gamma family

Hi Florian,

Here is a reproducible example to show you the errors I've been getting when I try to use simulationOutput with a gamma glmm. Perhaps it is something specific to my models, but I'm getting the same error message from several different models.

“Warning message:
In rgamma(nsim * length(ftd), shape = shape, rate = shape/ftd) :
NAs produced”

pop <- as.character(c("BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "BF", "MA", "MA",
"MA", "MA", "MA", "MA", "MA", "MA", "MA", "MA", "MA", "MA", "MA", "MA", "MA", "NU", "NU", "NU", "NU", "NU", "NU", "NU", "NU", "NU", "SA",
"SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA", "SA",
"SA", "SA", "SA", "SA", "SA"))

season <- as.character(c( "fall", "spring", "fall", "spring", "fall", "spring", "fall", "spring", "fall", "spring", "fall", "fall", "spring", "fall", "fall", "spring", "fall",
"spring", "spring", "fall", "spring", "fall", "spring", "fall", "spring", "fall", "spring", "fall", "spring", "spring", "fall", "spring", "spring", "fall",
"spring", "spring", "fall", "fall", "fall", "fall", "fall", "fall", "fall", "spring", "fall", "fall", "fall", "spring", "fall", "spring", "fall",
"spring", "spring", "fall", "fall", "spring", "fall", "spring", "spring", "fall", "fall", "fall", "fall", "spring", "fall", "fall", "spring", "spring",
"fall", "fall", "spring", "fall", "fall", "spring"))

id <- "2 2 4 4 7 7 9 9 10 10 84367 84367 84367 84368 84368 84368 84368 84368 84368 84369 84369 33073 33073 33073 33073 33073 33073 33073 33073 33073 80149 80149 80149 80150 80150 80150 57140 57141 126674 126677 126678 126680 137152 137152 137157 115925 115925 115925 115925 115925 115925 115925 115925 115926 115926 115926 115926 115926 115926 115927 115928 115929 115929 115929 115930 115930 115930 115930 115931 115931 115931 115932 115932 115932"
id <- strsplit(id, " ")
id <- as.numeric(unlist(id))

distance <- "0.2970136 0.2813103 0.2409127 0.2461686 0.3392629 0.3246902 0.2938654 0.4403401 0.3935010 0.8161045 0.4622339 0.5448272 0.4347536 0.3623991 0.5014513
0.3961407 0.4285523 0.5033465 0.3668231 0.4008644 0.3642039 0.5428035 0.6348236 0.5461090 0.5763835 0.4907923 0.4349144 0.4891743 0.6423068 0.4663140
0.5226629 0.4855906 0.5868346 0.6429156 0.6363822 0.7002516 2.8778679 1.9055360 3.5048864 2.0234082 1.9940036 2.4991125 2.0742525 2.4859194 2.2326559
0.5232152 0.4835573 0.4421921 0.6048358 1.0315084 0.4935351 0.5886613 0.4821023 0.9571798 0.5721407 0.5219413 0.4243556 0.6960064 0.4713459 0.6254402
0.4887114 1.0324105 0.5536996 0.5539310 0.9808605 0.4164348 0.4658780 0.4707927 0.3722258 0.3805717 0.4608752 0.5829011 0.5095774 0.4262177"
distance <- strsplit(distance, " ")
distance <- as.numeric(unlist(distance))

distdata <- data.frame(pop = pop, season = season, id = id, distance = distance)

dist.glmm <- glmer(distance ~ pop * season + (1|id),
data=distdata, family = Gamma, control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))

simulationOutput <- simulateResiduals(fittedModel = dist.glmm)

plotSimulatedResiduals(simulationOutput = simulationOutput)

Aggregation of multiple simulations to stabilize p-values

When re-simulating residuals for fixed data and integer responses, p-values may spread within a certain range, due to the randomization procedure to smoothen out the integer values.

This phenomenon was discussed in #37

Particularly in low-data situation, this can result in p-values being quite variable. In this case, it might be desirable to have an option to obtain an aggregate p-value from multiple simulations.

Question is

a) how to best do this

b) what the properties of the resulting p-values are in terms of their distribution / type I error / power

Check overdispersion benchmarks

Check the overdispersion tests in
https://github.com/florianhartig/DHARMa/blob/master/Code/SimulationExperiments/PowerOverdispersion.md
Try if you can find a better test statistic for the Dharma nonparametric test

Add aggregation option

Add option to calculate aggregate residuals, e.g. for binning binomial responses

Add null simulation function

To create a res object with a correct model (replace obs with sim), so that users can see what patterns in the plots are reasonable / normal

Add support for pscl::zeroinfl

Based on #35, the request to add support for pscl::zeroinfl into DHARMa

Status of this (updated):

In general, pscl::zeroinfl could be added to DHARMa, provided that the package developers implement a simulate function, see https://github.com/florianhartig/DHARMa/wiki/Adding-new-R-packages-to-DHARMA
However, and most important from the user perspective, with the addition of glmmTMB #16 , all possible models that could be fit with pscl::zeroinfl could also be fit with glmmTMB, which is now supported by DHARMa

I have therefore decided to close this issue for now. Feel free to reopen if you think that pscl::zeroinfl is important for whatever reason, or if pscl::zeroinfl implements a simulate function

Implement catch statements for refit

To avoid that a failure of convergence for single models leads to a failure of the whole refit procedure, see #17

Formally testing for GLMM heteroskedasticity in DHARMa

Generalize parametric overdispersion tests

Currently, the parametric overdispersion test is restricted to glmer -> check if this can be generalized to glm

to solve
"DHARMa::testOverdispersionParametric currently only works for GLMMs in lme4. For Poisson GLMs, you can use AER::dispersiontest, otherwise use the non-parametric tests of DHARMa to test dispersion."

Zero-inflated fixed-effect Poisson models... how to plot/test residuals?

The vignette talks about how to see if a model might be zero-inflated, but unless I missed it, there doesn't seem to be a recommended course of action to fix the model in the case of fixed-effect Poisson regression.

Searching around the web I found the zeroinfl() function in the pscl package, but it doesn't work with DHARMa so I can fit a zeroinfl() model, but I cannot plot its residuals to see how they changed. What do you guys use when you have a zero-inflated non mixed-effect Poisson model?

I understand the point you made in the vigniette about zero inflation being hard to distinguish from overdispersion. Because of the nature of the data, it is mechanistically plausible that it will be zero inflated. The data are counts of complications in a manually curated registry of surgery patients.

Is it wrong to remedy zero-inflation by the same means as remedying overdispersion in general i.e. using a negative-binomial model instead? In that case I will try glm.nb() it sounds from the close issued like you already added support for it.

Depending on the answer to the above question, this issue could be classified either as a documentation request or a feature request (to add support for zeroinfl() or some other zero-inflated Poisson model object).

Thank you very much for your time and for creating this very helpful package.

move to native R parallel functions

remove

doParallel,
foreach,

to reduce dependencies

Why are DHARMa residuals transformed to uniform / or: consider other (in particular normal) transformations?

A question from a user, also related to an earlier question on my blog. My answer is largely copied from there.

Some of the papers on simulated quantile residuals transform the simulated residuals (possibly originating from Poisson, NB, binomial, ...) to a normal distribution. I presume this is because of the larger familiarity of most users with normal residuals.

My statistical home is built more in Bayesian territory, and here it's customary to transform to uniformity (Bayesian p-values), so in fact the idea of transforming to normality for better interpretation never occurred to me until people started asking me about it.

I mainly considered the transformation to normality because statistical tests that a user might want to apply on the residuals may require normality. As an example, I was concerned about the testTemporalAutocorrelation() function in DHARMa, which employs the Durbin-Watson test. I therefore implemented did implement also a normal transformation, you can access it via ?residuals.DHARMa. However, I have subsequently found that for testing, the uniform residuals work perfectly fine. In any case, it's a fact that we could display the residuals uniform, normal, or also with any other distribution (beta, gamma, ...).

My reasons to stay with the uniform residuals are the following:

I find it more logical and parsimonious. The idea of the uniform residuals is not really that we have a uniform residual distributions, but rather to express the residual of any distribution in terms of its position on its (e)CDF. That makes more sense to me that to re-convert the CDF values to an arbitrary third distribution that is not the actual expected residual distribution. In fact, the whole point of DHARMa is to transform to a common standard.
My strong feeling is that our eye is better at detecting inhomogeneities from uniformity that deviations from normality, so I would conjecture that uniform residuals are better diagnostics for a human.
Moreover, uniform residuals are computationally more stable. The main issue with normal residual is that, with a limited number of simulations, it is quite likely to have all simulations lower or higher than the observed value, which leads to 0 or 1 in the current uniform residuals, but would map to -Inf / Inf when transforming to normal. This could probably be fudged away, e.g. by adding some kernel density to the ecdf estimation, but who knows what that would do to the test properties, so in the end I decided that it’s safer to perform all tests on the uniform residuals (and the DB test seemed to perform fine on the uniform residuals).

Error for simulateResiduals() when refit=T

I'm afraid it might not be easy to give reproducible code for this, but I'll give as much information as I can. I receive no warnings in the model itself, but simulateResiduals(binary.glmer, n=500, refit=T) gives the following error:

Error in pwrssUpdate(pp, resp, tol = tolPwrss, GQmat = GQmat, compDev = compDev,  : 
  Downdated VtV is not positive definite
In addition: There were 12 warnings (use warnings() to see them)

Any suggestions would be appreciated. I always get this warning with this model, but I've run similar models on similar data structures that worked alright.

Model:
binary.glmer <- glmer(Status ~ Line + (1|Day), data=ParBin.data, family="binomial", nAGQ=25)

Data:

> print(head(ParBin.data))
     Line TreeID Day Status
1    Zoar    3.1   1      1
2    Zoar    3.3   1      0
3 Cropper    4.1   1      1
4 Cropper    4.1   1      1
5 Cropper    4.2   1      1
6 Cropper    4.2   1      1

> str(ParBin.data)
'data.frame':	306 obs. of  4 variables:
 $ Line  : Factor w/ 7 levels "BC3F3","Cropper",..: 7 7 2 2 2 2 2 2 2 2 ...
 $ TreeID: Factor w/ 37 levels "1.1","3.1","3.3",..: 2 3 5 5 6 6 6 6 8 9 ...
 $ Day   : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
 $ Status: int  1 0 1 1 1 1 1 0 0 1 ...

sessionInfo():

R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.1

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DHARMa_0.1.3       lsmeans_2.25       estimability_1.2   lmtest_0.9-34      zoo_1.7-14        
 [6] lme4_1.1-12        Matrix_1.2-8       EnvStats_2.2.1     boot_1.3-18        aod_1.3           
[11] multcompView_0.1-7 multcomp_1.4-6     TH.data_1.0-8      MASS_7.3-45        survival_2.40-1   
[16] mvtnorm_1.0-5      ggedit_0.0.2       miniUI_0.1.1       dplyr_0.5.0        colourpicker_0.3  
[21] plyr_1.8.4         plotly_4.5.6       ggplot2_2.2.1      scales_0.4.1       gridExtra_2.2.1   
[26] reshape2_1.4.2     shinyBS_0.61       shinyAce_0.2.1     shiny_1.0.0       

loaded via a namespace (and not attached):
 [1] purrr_0.2.2       splines_3.3.2     lattice_0.20-34   colorspace_1.3-2  htmltools_0.3.5   viridisLite_0.1.3
 [7] base64enc_0.1-3   nloptr_1.0.4      DBI_0.5-1         foreach_1.4.3     stringr_1.1.0     munsell_0.4.3    
[13] gtable_0.2.0      htmlwidgets_0.8   coda_0.19-1       codetools_0.2-15  labeling_0.3      httpuv_1.3.3     
[19] Rcpp_0.12.9       xtable_1.8-2      jsonlite_1.2      mime_0.5          digest_0.6.12     stringi_1.1.2    
[25] gap_1.1-16        tools_3.3.2       sandwich_2.3-4    magrittr_1.5      lazyeval_0.2.0    tibble_1.2       
[31] tidyr_0.6.1       qrnn_1.1.3        iterators_1.0.8   assertthat_0.1    minqa_1.2.4       httr_1.2.1       
[37] rstudioapi_0.6    R6_2.2.0          nlme_3.1-130

testOverdispersionParametric() not accepting glmer.nb

Are you interested in the below problem with DHARMa testOverdispersionParametric() ?

The plot generated by DHARMa is

Regards
Desmond

> testOverdispersionParametric( fitLmer ) # doesn't work

        Chisq test for overdispersion in GLMMs

data:  Error in cat("data:  ", x$data.name, "\n", sep = "") : 
  argument 2 (type 'language') cannot be handled by 'cat'
> 
> class(fitLmer)
[1] "glmerMod"
attr(,"package")
[1] "lme4"
> 
> fitLmer
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
 Family: Negative Binomial(7.6604)  ( log )
Formula: value ~ 1 + tissue + PLATFORM + (1 | metabId) + (1 | metabId:tissue) +      (1 | specimenId) + (1 | specimenPlatform)
   Data: dfData
      AIC       BIC    logLik  deviance  df.resid 
 868136.3  868258.9 -434053.1  868106.3     26286 
Random effects:
 Groups           Name        Std.Dev.
 metabId:tissue   (Intercept) 1.0322  
 metabId          (Intercept) 2.1001  
 specimenPlatform (Intercept) 0.1092  
 specimenId       (Intercept) 0.1192  
Number of obs: 26301, groups:  metabId:tissue, 2201; metabId, 660; specimenPlatform, 240; specimenId, 60
Fixed Effects:
            (Intercept)              tissueBrain              tissueHeart             tissueKidney      tissueOuter Medulla        PLATFORMLC/MS Neg      PLATFORMLC/MS Polar        PLATFORMLC/MS Pos  PLATFORMLC/MS Pos Early  
               12.96533                 -0.01186                  0.43224                  1.16883                  1.09679                  1.34259                  0.62587                  2.95287                  2.80882  
 PLATFORMLC/MS Pos Late  
                3.13469  
convergence code 0; 1 optimizer warnings; 0 lme4 warnings 
>




> oSimRes <- NA; system.time( oSimRes <- simulateResiduals( fitLmer, n=1000) )
   user  system elapsed 
  14.06    1.33   15.40 
> 
> try(plotSimulatedResiduals( oSimRes ))
> 
> 
> testOverdispersion( oSimRes )

        DHARMa nonparametric overdispersion test via IQR of scaled residuals against IQR expected under uniform

data:  oSimRes
dispersion = 1.0662, p-value < 2.2e-16
alternative hypothesis: overdispersion

Warning message:
In testOverdispersion(oSimRes) :
  You have called the non-parametric test for overdispersion based on the scaled residuals. Simulations show that this test is less powerful for detecting overdispersion than the default uniform test on the sclaed residuals, and a lot less powerful than a parametric overdispersion test, or the non-parametric test on re-simulated residuals. The test you called is only implemented for testing / development purposes, there is no scenario where it would be preferred. See vignette for details.
>

Check glmer.nb

check if glmer.nb works with DHARMa

Should be possible in principle, see lme4/lme4#284

Add support for ordinal / clmm2

Request see discussion here https://twitter.com/marcnicklas/status/910423954464092160

Perspective for this request: afaiks, this package does not yet include a simulate function, which is a minimum requirement for inclusion in DHARMa. I am reluctant to implement such a function myself, with the exception of essential packages, because it is difficult for me to guarantee compatibility with future updates. The respective model package is a better place for such a function. Feel free to request a simulate function from the package developers. Once such a function is available, I am more than willing to include the package to DHARMa.

See further comments on support of new packages in https://github.com/florianhartig/DHARMa/wiki/Adding-new-R-packages-to-DHARMA
The vignette has some further comments / examples on creating custom simulation functions and reading them into DHARMa

Switch from quantile regression to spline for large data

To speed up the plots

Binning / grouping

Add option to bin / group DHARMa residuals

simulateResiduals(..., refit = T) can fail (all re-fitted models the same)

For complex models, it can happen that applying simulateResiduals(..., refit = T) results in a range of refitted models that have all the same parameters.

Symptoms of this phenomenon are

Residuals calculated from refit = T look fine, although simulateResiduals(..., refit = F) shows residual problems
testOverdispersion is significant, althought other tests don't show overdispersion.

The reason in each cases is that there is no variance created by the simulation.

I suspect that all these cases originate from lme4 failing to converge, or potentially already being converged according to the optimization criteria (as the model is updated, the optimizer likely starts with the old values, so it could be that the current model is already good enough to trigger the convergence tolerance).

Proposed interim solution: throw warning for this problem

Proposed long-term solution: find the convergence problem

Thanks to Luis Cayuela Delgado for bringing this problem to my attention.

Selection on RE structure

Simulated overdispersion correct for glmer.nb?

Goes back to #47 - looks as if the simulated overdispersion test with glmer.nb and refit = T doesn't perform as expected

library(lme4)

testData = createData(sampleSize = 300, overdispersion = 5, randomEffectVariance = 0, family = poisson())
fittedModel <- glmer.nb(observedResponse ~ Environment1 + (1|group) , data = testData)
simulationOutput <- simulateResiduals(fittedModel = fittedModel)
plot(simulationOutput, rank = T)

# works
testOverdispersion(simulationOutput)

# works, but produces overdispersion! -> new ticket
simulationOutput2 <- simulateResiduals(fittedModel = fittedModel, refit = T)
testOverdispersion(simulationOutput2)

# doesn't work
testOverdispersionParametric(fittedModel)

Interpretation of residual plots for Poisson GLMM - overprediction with possible underdispersion

From a user: I’m using the DHARMa package, and I obtained this residuals plot

I did the tests explained in the vignette :

· testUniformity : S
· testOverdispersionParametric : NS
· testZeroInflation : NS
· adding overdispersion correction : no change
· adding quadratic term for both regressor : no change

Can you help me giving a lead about what’s going wrong ?

Add travis-ci tests

https://travis-ci.org/

glm.nb from package MASS

When I try to use glm.nb, I get the message:

Warning message:
In simulateResiduals(fittedModel = fittedModel, n = 250): DHARMa: fittedModel not in class of supported models. Absolutely no guarantee that this will work!

Do you think the results are still ok?

Handling of models fitted with weights

A long-standing problem that I want to solve with the next release is the handling of the weights. The problem is the following:

Depending on the family, weights are either included in the simulations, or not. In some cases, they can't be included, because weights on the likelihood in a Poisson don't have a corresponding data-generating model. In some cases (like lmer), it seems to me that they were accidentally forgotten?

As DHARMa relies on the fact that the simulate() function simulates from the assumed likelihood, it is a quite essential information if weights are included in simulate or not (and ideally, they would be, as far as possible). Another problem is that models don't consistently warn if weights are not included in the simulation.

I think the current situation is the following

lm includes weights in simulations
glm includes weights for gaussian and binomial, but not for poisson
lme4 does not include weights in lmer, does not warn. Does not include in glmer, except binomial, but does warn.
glmmTMB - weights are multiplied with the log likelihood, except for the binomial. Simulations are only reliable for binomial? glmmTMB does not give a warning in the simulation #146
spaMM - weights are / can only be included for distribution with variable dispersion, and control the dispersion -> simulations should be reliable?

Example

library(DHARMa)
library(lme4)
library(glmmTMB)
library(spamMM)

testData = createData(sampleSize = 200, overdispersion = 0, family = gaussian())

# lm weights are considered in simulate()
fit <- lm(observedResponse ~ Environment1 , weights = Environment1 +1, 
            data = testData)
simulateResiduals(fittedModel = fit, plot = T)

# glm gaussian still the same
fit <- glm(observedResponse ~ Environment1 , weights = Environment1 +1, 
          data = testData)
simulateResiduals(fittedModel = fit, plot = T)

# glm behaves nice, throws a warning that simulate ignores weights for poisson
fit <- glm(ceiling(exp(observedResponse)) ~ Environment1 , weights = Environment1 +1, 
           data = testData, family = "poisson")
simulateResiduals(fittedModel = fit, plot = T)

# lmer behaves bad, doesn't include weights even for Gaussian, but doesn't warn either
fit <- lmer(observedResponse ~ Environment1 + (1|group), weights = Environment1 +1, 
            data = testData)
simulateResiduals(fittedModel = fit, plot = T)

# glmer warns
fit <- glmer(ceiling(exp(observedResponse)) ~ Environment1 + (1|group), weights = Environment1 +1, 
            data = testData, family = "poisson")
simulateResiduals(fittedModel = fit, plot = T)

# glmmTMB does not warn

fit <- glmmTMB(ceiling(exp(observedResponse)) ~ Environment1 + (1|group), weights = Environment1 +1, 
             data = testData, family = "poisson")
simulateResiduals(fittedModel = fit, plot = T)

# spaMM does not warn, but seems to be simulating with correct (heteroskedastic) variance. How exactly 
fit <- HLfit(ceiling(exp(observedResponse)) ~ Environment1 + (1|group), prior.weights = Environment1 +1, 
               data = testData, family=poisson())
summary(fit)
simulateResiduals(fittedModel = fit, plot = T)

Implement option to obtain likelihood distribution and parameters from simulations

for parametric bootstrapt / simulated LRTs

Add support for glmmADMB in DHARMa

A user requested support for http://glmmadmb.r-forge.r-project.org/

Perspective for this request (updated): I have decided to not make any further efforts to achieve this because of bbolker/glmmadmb#12 . glmmADMB are advised to switch to glmmTMB, which is now supported by DHARMA

Interim solution for users that, for some reason, have to urgently use DHARMa with this package: take your fitted model, create a simulate function for this model structure yourself, and then use createDHARMa (see help), this will allow most options of the package to be run. See further comments on support of new packages in here. Note that the vignette has some further comments / examples on creating custom simulation functions and reading them into DHARMa

Discuss alternatives R functions / packages

Discuss alternatives and related packages / functions in the vignette

Candidates

glm.diag.plots

http://ugrad.stat.ubc.ca/R/library/statmod/html/qresiduals.html

behavior of binomial k/n with weights

add checks for k/n provision via weights

merge testOverdispersion / testOverdispersionParametric / testZeroInflation

confusing for the user to have so many functions - better to have one function, that has sensible default values

Issue in simulate / fitted for mgcv::gam with binomial n/k

The output of the fitted function in mgcv::gam is not named, in contrast to lm / glm - this minor difference creates a problem with the simulate functions, see demonstration here

https://github.com/florianhartig/DHARMa/blob/master/Code/DHARMaPackageSupport/mgcv/GAMProblems.md

Since DHARMa 0.1.3, DHARMa overwrites the fitted function by fitted.gam in https://github.com/florianhartig/DHARMa/blob/master/DHARMa/R/compatibility.R

I'm creating this ticket to keep in mind that this is a hack that is not optimal - ideally, this would be fixed in mgcv, and once it is this should be removed again.

rename plots

in particular plotSimulatedResiduals

Add analytical quantile function for continous models without REs

Users may find some plotting function of the package useful, even if their model does not require simulations to calculate residuals (e.g. lm, or glm with gamma).

Currently, the only way to use DHARMa with these models is to simulate. Should provide a direct numerical calculation of the quantles, e.g. convertResiduals for speed-up.

Automatic scaling of x axis for residual plots

Would be nice to transform x in a way such that the distribution is more or less flat

Implement Vuong's test

as mentioned in http://glmm.wikidot.com/faq

Implement support for extended families and REs in mgcv

fittedModel$family$simulate() will work for standard families, but

not for the extended families of mgcv as described in https://stat.ethz.ch/R-manual/R-devel/library/mgcv/html/family.mgcv.html
not (I think) for any of the RE specifications

This means DHARMa can currently not simulate from any of the extended families, and only from the top random level.

Perspective for this request: In general, I am reluctant to implement simulate functions in DHARMa, for reasons explained in https://github.com/florianhartig/DHARMa/wiki/Adding-new-R-packages-to-DHARMA However, once a simulate function would be available in mgcv, it would be very easy to integrate all it's capabilities in DHARMa. Feel free to request a simulate function from the package developers.

Interim solution for users that want to use DHARMa with glmmTMB: take your fitted model, create a simulate function for this model structure yourself, and then use createDHARMa (see help), this will allow most options of the package to be run. The vignette has some further comments / examples on creating custom simulation functions and reading them into DHARMa

Vignette ## [1] "Parametric overdispersion test not implemented for this model type"

A problem in the Vignette exmaple

Different results with refit = T / refit = F

A question from a user

I have a question regarding testTemporalAutocorrelation(). Here’s my code:

simulationOutput <- simulateResiduals(fittedModel = fittedmodel, n = 250, refit = T)
simulationOutput2 <- simulateResiduals(fittedModel = fittedmodel, n = 250)
testTemporalAutocorrelation(simulationOutput = simulationOutput, time = data2$date) # DW = 1.7327, p-value = 0.04495
testTemporalAutocorrelation(simulationOutput = simulationOutput2, time = data2$date) # DW = 2.1114, p-value = 0.7601

What does it mean if when testing for temporal autocorrelation, p value is significant when I used refit simulation and non-significant when I don’t use refit?

Check compatibility with brms

https://cran.r-project.org/web/packages/brms/index.html

plotSimulatedResiduals spline can crash with low variability in x

When the variability in predicted values is very low, smooth.spline can crash, which crashes the plot

Error in smooth.spline(pred, res, df = 10) :
'tol' must be strictly positive and finite

best solution is probably to catch this error

Error in plotSimulatedResiduals for very few unique predictions

From a user: It's working fine most of the time but sometimes I get this error using:
plotSimulatedResiduals

Error in qrnn::qrnn.fit(x = as.matrix(pred), y = as.matrix(res), n.hidden = 4, :
zero variance column(s) in "x"

Parallel option in simulateResiduals

Projection for spatial autocorrelation

A question from a user: [...] I have seen that the package can be adopted to test for spatial autocorrelation, both by testing the distribution of coordinates against random and against real coordinates.
In the case of real coordinates, which reference system should be used? I am adopting point data based on a projected coordinates ED50/32N (EPSG: 23032).

add option to calculate power analysis from the parametric bootstrap

Include random effects when plotting residuals vs. predicted?

When plotting residuals vs. predicted in mixed models, a common question is if predictions should be calculated including the RE terms (conditional) .

The answer is no, at least if the REs are also included (re-simulated) when calculating the residuals!

The reason is that predictions with REs can produce artifacts (see, also #42). To understand this, note first that the question about conditional / unconditional appears both on the x and y axis, i.e. we can calculate residuals and predictions with / without REs.

So, let's see what happens if we do that - consider this example, where we compare residuals plotted against the unconditional predictions (no REs, DHARMa standard), and the conditional predictions (predictions include REs, lme4 predict standard):

    # creating test data according to standard poisson GLMM assumptions
testData = createData(sampleSize = 200, randomEffectVariance = 1.5, family = poisson(), numGroups = 20)

# fitting GLMM
fittedModel <- glmer(observedResponse ~ Environment1 + (1|group), family = "poisson", data = testData)

# res ~ pred, with conditional and unconditional predictions
simulationOutput <- simulateResiduals(fittedModel = fittedModel)
par(mfrow = c(2,2))
plotResiduals(simulationOutput$fittedPredictedResponse, simulationOutput$scaledResiduals , main = "Unconditional Residuals\n Unconditional Predictions\n DHARMa standard", rank = T)
plotResiduals(predict(fittedModel), simulationOutput$scaledResiduals, main = "Unconditional Residuals\n Conditional Predictions\n", rank = T)

simulationOutput <- simulateResiduals(fittedModel = fittedModel, re.form = NULL)
plotResiduals(simulationOutput$fittedPredictedResponse, simulationOutput$scaledResiduals , main = "Conditional Residuals\n Unconditional Predictions\n", rank = T)
plotResiduals(predict(fittedModel), simulationOutput$scaledResiduals, main = "Conditional Residuals\n Conditional Predictions\n", rank = T)

The effect goes away when specifying the random effect as fixed

# alternative with grouping factor as fixed effect, this removes the pattern with group
fittedModel <- glm(observedResponse ~ Environment1 + as.factor(group), family = "poisson", data = testData)
simulationOutput <- simulateResiduals(fittedModel = fittedModel)
plot(simulationOutput, rank = T)

I should add more explanation on this in the vignette, especially as the predict() function in R includes the REs by default, so users will be tempted to plot residuals against predictions with REs.

There seem to be a few related questions on CV that are still unanswered.

https://stats.stackexchange.com/questions/89991/use-predicted-values-with-or-without-random-part-to-plot-residuals-with-binnedpl

https://stats.stackexchange.com/questions/325898/strong-linear-pattern-in-residuals-of-random-intercept-model-simulated-unbalanc

Support for glmmTMB in DHARMa

A user requested support for https://github.com/glmmTMB/glmmTMB

Status of this request (updated)

glmmTMB is supported by DHARMA since https://github.com/florianhartig/DHARMa/releases/tag/v0.1.6.2.
glmmTMB is fully supported (fixing the re.form problem and some other things, see below) since DHARMa 0.2.7 and glmmTMB 1.0
A remaining limitation is the lack of support in glmmTMB to condition simulations on fitted REs, see glmmTMB/glmmTMB#888

A simple example is

m <- glmmTMB(count~ mined, family=poisson, data=Salamanders)
summary(m)

res = simulateResiduals(m)
plot(res, asFactor = T)

More examples in the vignette or in https://github.com/florianhartig/DHARMa/tree/master/Code/DHARMaPackageSupport/glmmTMB

Binomial with factor as response doesn't work

I just noted that currently, fitted binomial models with a factor as response (as opposed to 0/1) are not properly handled.