Code Monkey home page Code Monkey logo

ssdtools's Introduction

ssdtools

Lifecycle:Maturing R-CMD-check codecov CRAN status CRAN downloads

ssdtools is an R package to fit and plot Species Sensitivity Distributions (SSD).

SSDs are cumulative probability distributions which are fitted to toxicity concentrations for different species as described by Posthuma et al. (2001). The ssdtools package uses Maximum Likelihood to fit distributions such as the gamma, log-Gumbel (identical to inverse Weibull), log-logistic, log-normal and Weibull to censored and/or weighted data. Multiple distributions can be averaged using Akaike Information Criteria. Confidence intervals on hazard concentrations and proportions are produced by parametric bootstrapping.

Installation

To install the latest version from CRAN

install.packages("ssdtools")

To install the latest development version from GitHub

# install.packages("pak", repos = sprintf("https://r-lib.github.io/p/pak/stable/%s/%s/%s", .Platform$pkgType, R.Version()$os, R.Version()$arch))
pak::pak("bcgov/ssdtools")

Introduction

ssdtools provides a data set for several chemicals including Boron.

library(ssdtools)
ssddata::ccme_boron
#> # A tibble: 28 × 5
#>    Chemical Species                  Conc Group        Units
#>    <chr>    <chr>                   <dbl> <fct>        <chr>
#>  1 Boron    Oncorhynchus mykiss       2.1 Fish         mg/L 
#>  2 Boron    Ictalurus punctatus       2.4 Fish         mg/L 
#>  3 Boron    Micropterus salmoides     4.1 Fish         mg/L 
#>  4 Boron    Brachydanio rerio        10   Fish         mg/L 
#>  5 Boron    Carassius auratus        15.6 Fish         mg/L 
#>  6 Boron    Pimephales promelas      18.3 Fish         mg/L 
#>  7 Boron    Daphnia magna             6   Invertebrate mg/L 
#>  8 Boron    Opercularia bimarginata  10   Invertebrate mg/L 
#>  9 Boron    Ceriodaphnia dubia       13.4 Invertebrate mg/L 
#> 10 Boron    Entosiphon sulcatum      15   Invertebrate mg/L 
#> # ℹ 18 more rows

Distributions are fit using ssd_fit_dists()

fits <- ssd_fit_dists(ssddata::ccme_boron, dists = c("lnorm", "llogis"))

and can be quickly plotted using autoplot

library(ggplot2)

theme_set(theme_bw())

autoplot(fits) +
  scale_colour_ssd()

The goodness of fit can be assessed using ssd_gof

ssd_gof(fits)
#> # A tibble: 2 × 9
#>   dist      ad     ks    cvm   aic  aicc   bic delta weight
#>   <chr>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
#> 1 lnorm  0.507 0.107  0.0703  239.  240.  242.  0      0.73
#> 2 llogis 0.487 0.0994 0.0595  241.  241.  244.  1.99   0.27

and the model-averaged 5% hazard concentration estimated by parametric bootstrapping using ssd_hc.

set.seed(99)
hc5 <- ssd_hc(fits, ci = TRUE, nboot = 100) # 100 bootstrap samples for speed
print(hc5)
#> # A tibble: 1 × 10
#>   dist    percent   est    se   lcl   ucl    wt method     nboot pboot
#>   <chr>     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>      <dbl> <dbl>
#> 1 average       5  1.65 0.599 0.923  3.34     1 parametric   100     1

To bootstrap in parallel set future::plan(). For example:

future::multisession(workers = 2)
hc5 <- ssd_hc(fits, ci = TRUE, nboot = 100)

Model-averaged predictions complete with confidence intervals can also be estimated by parametric bootstrapping using the stats generic predict. To perform bootstrapping for each distribution in parallel register the future backend and then select the evaluation strategy.

doFuture::registerDoFuture()
future::plan(future::multisession)

set.seed(99)
boron_pred <- predict(fits, ci = TRUE)

and plotted together with the original data using ssd_plot.

ssd_plot(ssddata::ccme_boron, boron_pred,
  shape = "Group", color = "Group", label = "Species",
  xlab = "Concentration (mg/L)", ribbon = TRUE
) +
  expand_limits(x = 3000) +
  scale_colour_ssd()

References

Posthuma, L., Suter II, G.W., and Traas, T.P. 2001. Species Sensitivity Distributions in Ecotoxicology. CRC Press.

Information

Get started with ssdtools at https://bcgov.github.io/ssdtools/articles/ssdtools.html.

A shiny app to allow non-R users to interface with ssdtools is available at https://github.com/bcgov/shinyssdtools.

For the latest changes visit NEWS.

The citation for the shiny app:

Dalgarno, S. 2021. shinyssdtools: A web application for fitting Species Sensitivity Distributions (SSDs). JOSS 6(57): 2848. https://joss.theoj.org/papers/10.21105/joss.02848.

The ssdtools package was developed as a result of earlier drafts of:

Schwarz, C., and Tillmanns, A. 2019. Improving Statistical Methods for Modeling Species Sensitivity Distributions. Province of British Columbia, Victoria, BC.

For recent developments in SSD modeling including a review of existing software see:

Fox, D.R., et al. 2021. Recent Developments in Species Sensitivity Distribution Modeling. Environ Toxicol Chem 40(2): 293–308. https://doi.org/10.1002/etc.4925.

The CCME data.csv data file is derived from a factsheet prepared by the Canadian Council of Ministers of the Environment. See the data-raw folder for more information.

Getting Help or Reporting an Issue

To report bugs/issues/feature requests, please file an issue.

How to Contribute

If you would like to contribute to the package, please see our CONTRIBUTING guidelines.

Code of Conduct

Please note that the ssdtools project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Licensing

Copyright 2023 Province of British Columbia, Environment and Climate Change Canada, and Australian Government Department of Climate Change, Energy, the Environment and Water

The documentation is released under the CC BY 4.0 License

The code is released under the Apache License 2.0

ssdtools's People

Contributors

atillmanns avatar cschwarz-stat-sfu-ca avatar hadley avatar ibarraespinosa avatar joethorley avatar nadinehussein avatar repo-mountie[bot] avatar sarahly9 avatar sebdalgarno avatar stephhazlitt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ssdtools's Issues

Unable to plot

Keep getting this error when trying to plot:

"Error in is.finite(x) :
default method not implemented for type 'language'"

This happens when using the script straight from the Shiny App using Boron as the example as well as when I use my own data.
Any suggestions?

fit standard errors?

I tried to do again all the command to obtain HC5 and I realized I can’t have same number that showed by Joe and team.

> ssd_hc5(boron_dists)
    prop   est    se   lcl   ucl
   <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.0500  1.25 0.726 0.623  3.19

gompertz not fitting

hi there, when i try to fit my data set using the R code, it works but i get errors when i attempt to plot, or make predictions. I've attached the data set. Cu_lowdata.xlsx

This is the code and errors:

Cu_low_dists <- ssd_fit_dists(LowerCuMarData)
autoplot(Cu_low_dists)
Error in pgompertz(q = c(0.7, 1.3, 1.94), shape = 0.391386784692936, scale = 0.313425202466778) :
unused argument (q = c(0.7, 1.3, 1.94))
ssd_gof(Cu_low_dists)
Error in pgompertz(q = c(0.7, 1.3, 1.94), shape = 0.391386784692936, scale = 0.313425202466778) :
unused argument (q = c(0.7, 1.3, 1.94))

It works when i do it in the R shiny app.

Is there a way to exclude some fits when using the R code, as you can in the Shiny app?
Thanks,
Jenni

Correction to initial values for gamma distribution

Gamma distribution with shape (a) and scale (s) has
"mean and variance are E(X) = as and Var(X) = as^2" according to the help file for dgamma.

Solving for $a$ and $s$ gives
$ a = V(x) / E[x]$
so using method of moments, use a starting value for $a = var(x) / mean(x)$.

Solving for $a$ then gives
$ a = E[X] / s$
and a moment estimator is then $a = mean(x) / s = mean(x) / (V(x)/Mean(x) = mean^2/V(x)$

We have
if (dist$dist == "gamma") {
dist$start <- list(
scale = stats::var(x) / mean(x),
shape = mean(x)^2 / stats::var(x)^2
)
so it appear we have and extra square in the last line. This should read

if (dist$dist == "gamma") {
dist$start <- list(
scale = stats::var(x) / mean(x),
shape = mean(x)^2 / stats::var(x)
)

i.e. remove the ^2 from the var(x) in the initial value for shape.

Fortunately, the fitdist() function appears to be robust to making the shape parameter too small so no obvious user impact.

discrepancies in Table 1 of carl and angeline's report

Namely the HC5s appear to be incorrectly assigned to distributions in Table 1

For example

> ssd_hc5(boron_lnorm)
# A tibble: 1 x 5
   prob      est        se       lcl      ucl
  <dbl>    <dbl>     <dbl>     <dbl>    <dbl>
1  0.05 1.681175 0.6848542 0.8618846 3.574394

the ssdca package with lnorm produces an hc5 estimate of 1.68

and below

> ssd_hc5(boron_dists, average = F)
# A tibble: 6 x 7
      dist  prop      est        se       lcl      ucl weight
     <chr> <dbl>    <dbl>     <dbl>     <dbl>    <dbl>  <dbl>
1    lnorm  0.05 1.681175 0.6874052 0.8851202 3.611764  0.133
2     llog  0.05 1.562052 0.7760264 0.6636144 3.652784  0.049
3 gompertz  0.05 1.299302 0.4419386 0.9581149 2.788982  0.271
4  lgumbel  0.05 1.768918 0.5369794 1.1299498 3.231247  0.010
5    gamma  0.05 1.075854 0.8355977 0.2652210 3.476163  0.268
6  weibull  0.05 1.087170 0.7784698 0.3654033 3.292916  0.269

but in Table 1 1.68 is the hc5 value for log-logistic

note aicc values in Table 1 appear to be correctly assigned

> ssd_gof(boron_dists)
# A tibble: 6 x 7
      dist        ad         ks        cvm      aic     aicc      bic
     <chr>     <dbl>      <dbl>      <dbl>    <dbl>    <dbl>    <dbl>
1    lnorm 0.5070335 0.10651430 0.07033164 239.0284 239.5084 241.6928
2     llog 0.4870694 0.09934088 0.05950233 241.0149 241.4949 243.6793
3 gompertz 0.6019563 0.12018745 0.08221977 237.6112 238.0912 240.2756
4  lgumbel 0.8286390 0.15826649 0.13401800 244.1860 244.6660 246.8504
5    gamma 0.4409319 0.11691481 0.05550569 237.6303 238.1103 240.2947
6  weibull 0.4346273 0.11697580 0.05426727 237.6253 238.1053 240.2897

Gamma distribution reports NA for standard error estimate; warn user; solution is rescale data

For some extreme sets of data, the gamma distribution reports NA for the standard error. Eg.]

library(ssdtools)
test.csv <- textConnection(
"dummy , Conc , Units
all , 0.1 , ug/L
all , 0.12 , ug/L
all , 0.24 , ug/L
all , 0.42 , ug/L
all , 0.67 , ug/L
all , 0.78 , ug/L
all , 120 , ug/L
all , 2030 , ug/L
all , 9033 , ug/L
all , 15000 , ug/L
all , 15779 , ug/L
all , 20000 , ug/L
all , 31000 , ug/L
all , 40000 , ug/L
all , 105650 , ug/L")
test <- read.csv(test.csv, header=TRUE, strip.white=TRUE, as.is=TRUE)

ssd_fit_dists(test, dists="gamma")

reports

Fitting of the distribution ' gamma ' by maximum likelihood
Parameters:
estimate Std. Error
scale 9.695693e+04 NaN
shape 1.642224e-01 0.04142257

Estimates of scale and shape are ok, but it was unable to compute a standard error for the scale parameter. This appears to be a scaling problem in inverting the hessian. The standard error for the scale parameter is about 67000 and the se for the shape is .04. So the vcov will have terms on the order of 67000^2 and .04^2 which vary by 2.8 x 10^12 (!) so I suspect that it is a numerical issue

If you rescale the data

factor=100
test2 <- test
test2$Conc <- test2$Conc/ factor
ssd_fit_dists(test2, dists="gamma")

all is well
Fitting of the distribution ' gamma ' by maximum likelihood
Parameters:
estimate Std. Error
scale 969.5359604 674.2705097
shape 0.1641717 0.0454135

The pdf for the gamma has a term x/s so if you multiple x by a factor k and multiply s by a factor of k, the pdf is unchanged except for the change in the scale factor.

I would advise user if any estimates have NA for the standard errors and advise them that a rescaling may be appropriate. We could make a specific message for each distribution, i.e. typically advise gamma users to divide by a factor k, or a generic message and the user needs to try both multiplying or dividing the data by a factor until message "disappears".

Estimates are ok, so estimates of HCx are ok. Bootstrapping used to compute SE for HCx so this should also work fine.

Carl

information criterion for lgumbel is too low....

# A tibble: 6 x 7
      dist        ad         ks        cvm      aic     aicc      bic
     <chr>     <dbl>      <dbl>      <dbl>    <dbl>    <dbl>    <dbl>
1    lnorm 0.5070335 0.10651430 0.07033164 239.0284 239.5084 241.6928
2     llog 0.4870694 0.09934088 0.05950233 241.0149 241.4949 243.6793
3 gompertz 0.6024097 0.12019409 0.08229579 237.6112 238.0912 240.2756
4  lgumbel 0.8287721 0.15825915 0.13402809 100.7338 101.2138 103.3982
5    gamma 0.4409319 0.11691481 0.05550569 237.6303 238.1103 240.2947
6  weibull 0.4346273 0.11697580 0.05426727 237.6253 238.1053 240.2897

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.