Code Monkey home page Code Monkey logo

ewp's Introduction

ewp

R-CMD-check Codecov test coverage

The goal of ewp is to provide a modelling interface for underdispersed count data based on the exponentially weighted Poisson (EWP) distribution described by Ridout & Besbeas (2004), allowing for nest-level covariates on the location parameter \lambda of the EWP. Currently only the three-parameter version of the distribution (EWP_3) is implemented.

Installation

You can install the development version of ewp directly from github like so (requires a working C++ compiler toolchain):

remotes::install_github('pboesu/ewp')

Example

The package contains a reconstructed version of the linnet dataset from Ridout & Besbeas (2004), which consists of 5414 clutch size records and is augmented with two synthetic covariates, one that is random noise, and one that is correlated with clutch size. The parameter estimates for the intercept-only model presented in Ridout & Besbeas (2004) can be reproduced like so:

library(ewp)
fit_null <- ewp_reg(eggs ~ 1, data = linnet)# this may take a few seconds
#> start values are: 
#> (Intercept)       beta1       beta2 
#>    1.546823    0.000000    0.000000 
#> initial  value 9530.456809 
#> iter   4 value 5324.797069
#> iter   8 value 5299.393228
#> final  value 5299.362098 
#> converged
#> 
#> Calculating Hessian. This may take a while.
summary(fit_null)
#> Deviance residuals:
#> 
#> lambda coefficients (ewp3 with log link):
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept) 1.584650   0.003511   451.3   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> dispersion coefficients:
#>       Estimate Std. Error z value Pr(>|z|)    
#> beta1  1.46441    0.05588   26.21   <2e-16 ***
#> beta2  2.35681    0.05607   42.04   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Number of iterations in  optimization: 11 
#> Log-likelihood: -5299 on 3 Df

Note that the linear predictor on \lambda uses a log-link.

A model with nest-level covariates can be fitted by specifying a more complex model formula - as in the base R glm()

fit <- ewp_reg(eggs ~ cov1 + cov2, data = linnet)# this may take 5-10 seconds
#> start values are: 
#>  (Intercept)         cov1         cov2        beta1        beta2 
#>  1.204279467 -0.001259307  0.071872488  0.000000000  0.000000000 
#> initial  value 9434.616462 
#> iter   4 value 5054.429651
#> iter   8 value 4596.004461
#> iter  12 value 4440.611768
#> iter  16 value 4423.783976
#> iter  20 value 4420.081202
#> iter  20 value 4420.081199
#> iter  20 value 4420.081199
#> final  value 4420.081199 
#> converged
#> 
#> Calculating Hessian. This may take a while.
summary(fit)
#> Deviance residuals:
#> 
#> lambda coefficients (ewp3 with log link):
#>               Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  1.3335734  0.0048514 274.883   <2e-16 ***
#> cov1        -0.0007823  0.0014025  -0.558    0.577    
#> cov2         0.0532956  0.0009553  55.791   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> dispersion coefficients:
#>       Estimate Std. Error z value Pr(>|z|)    
#> beta1  1.89858    0.04404   43.11   <2e-16 ***
#> beta2  3.05108    0.08761   34.83   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Number of iterations in  optimization: 20 
#> Log-likelihood: -4420 on 5 Df

Simulation based residual diagnostics are indirectly available through the package DHARMa, by using the simulate.ewp method:

library(DHARMa)
#> Warning: package 'DHARMa' was built under R version 4.2.2
#> This is DHARMa 0.4.6. For overview type '?DHARMa'. For recent changes, type news(package = 'DHARMa')
#simulate from fitted model
sims <- simulate(fit, nsim = 20)

#create a DHARMa abject
DH <- createDHARMa(simulatedResponse = as.matrix(sims),#simulated responses
                   observedResponse = linnet$eggs,#original response
                   fittedPredictedResponse = fit$fitted.values,#fitted values from ewp model
                   integerResponse = T)#tell DHARMa this is a discrete probability distribution

#plot diagnostics
plot(DH)
#> DHARMa:testOutliers with type = binomial may have inflated Type I error rates for integer-valued distributions. To get a more exact result, it is recommended to re-run testOutliers with type = 'bootstrap'. See ?testOutliers for details

⚠️ Note that the maximum likelihood optimisation procedure is still experimental ⚠️

In particular:

  • At the moment the likelihood evaluation is optimised for small counts (\lambda << 20) and uses a hard upper limit of 30 for potential counts, this means the model is currently only suitable for datasets with expected counts up to 20-25, depending on the degree of underdispersion. A warning is issued if this criterion in not met when using ewp_reg(), but other functions may fail silently. - Fitting is very slow (think minutes, not seconds), especially when estimating the Hessian matrix for more than a couple of parameters! - Estimates may not be stable for models with many covariates and/or very large sample sizes (1000s). Centering and scaling continuous covariates seems to help on that front.

⚠️⚠️⚠️

ewp's People

Contributors

clarkejames944 avatar pboesu avatar

Watchers

 avatar  avatar

ewp's Issues

Use adaptive summation limits

The PMF calculation requires evaluation of an infinite sum of weights, which is currently approximate using a hard coded upper limit for the sum. ewp_reg currently issues a warning when that limit is approached, but lower level functions do not currently handle this defensively.
Ideally all relevant functions should adapt the summation limit conditional on lambda and beta2, with warnings issued for counts above some lambda to point out the increasing computational effort as the summation limit increases.

Scale error

When trying to scale a variable e.g. scale(min_first_egg) within a model, I get "Error in scale(min_first_egg): object 'min_first_egg' not found.

This didn't happen previously and doesn't happen with the same code in a GLM.

fit_min <- ewp_reg(max_num_eggs ~ scale(min_first_egg, scale = F), data = wilwa_trial)
summary(fit_min)

Model3_min<-glm(max_num_eggs~scale(min_first_egg, scale = F), family = quasipoisson, data = wilwa_trial)
summary(Model3_min)

The data are the same that I sent previously.

EWP.rdp is corrupt

Full error message:
Error in fetch(key) :
lazy-load database 'C:/Users/jerem/Dropbox/PC/Documents/R/win-library/4.1/ewp/help/ewp.rdb' is corrupt

This occurred whilst trying to run my histogram plotting function using the model outputs from ewp - particularly the null model. Data were wilwa_trial data.

May be related to other error:

Error: Expecting a single value: [extent=10].
3. stop(structure(list(message = "Expecting a single value: [extent=10].",
call = NULL, cppstack = NULL), class = c("Rcpp::not_compatible",
"C++Error", "error", "condition")))
2. dewp3_cpp(seq(1, 10), lambda = model$coefficients[[1]], beta1 = fit_null$coefficients[[2]],
beta2 = fit_null$coefficients[[3]])

  1. hist.ewp.null(bullf$max_num_eggs[bullf$year == 1960], fit_null)

Second error has been solved by providing a singular value to x

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.