Code Monkey home page Code Monkey logo

standartox's Introduction

Standartox

CRAN Downloads

Standartox is a database and tool facilitating the retrieval of ecotoxicological test data. It is based on the EPA ECOTOX database as well as on data from several other chemical databases and allows users to filter and aggregate ecotoxicological test data in an easy way. It can either be accessed via http://standartox.uni-landau.de or this R-package standartox. Ecotoxicological test data is used in environmental risk assessment to calculate effect measures such as TU - Toxic Units or SSD - Species Sensitivity Distributions to asses environmental toxicity of chemicals.

The project lives in two repositories:

Installation

install.packages('standartox')
# remotes::install_github('andschar/standartox') # development version

Functions

Standartox consists of the two functions stx_catalog() and stx_query(). The former allows you to retrieve a catalog of possible parameters that can be used as an input for stx_query(). The latter fetches toxicity values from the database.

stx_catalog()

The function returns a list of all possible arguments that can bes use in stx_query().

require(standartox)
catal = stx_catalog()
names(catal)
##  [1] "casnr"              "cname"              "concentration_unit"
##  [4] "concentration_type" "chemical_role"      "chemical_class"    
##  [7] "taxa"               "trophic_lvl"        "habitat"           
## [10] "region"             "ecotox_grp"         "duration"          
## [13] "effect"             "endpoint"           "exposure"          
## [16] "vers"
catal$endpoint # access the parameter endpoint
variable n n_total perc
NOEX 237616 609435 39
LOEX 192718 609435 32
XX50 179101 609435 30

stx_query()

The function allows you to retrieve filtered and aggregated toxicity data according to the parameters below.

parameter example
casnr 50000, 94520, 94531
cname 2718, 4, 3
concentration_unit ug/l, mg/kg, ppb
concentration_type active ingredient, formulation, total
chemical_role pesticide, herbicide, drug
chemical_class amide, aromatic, organochlorine
taxa species, Fusarium oxysporum, Apis mellifera
trophic_lvl heterotroph, autotroph
habitat freshwater, terrestrial, marine
region europe, america_north, america_south
ecotox_grp invertebrate, plant, fish
duration 24, 96
effect mortality, population, biochemistry
endpoint NOEX, LOEX, XX50
exposure aquatic, environmental, diet
vers 20191212

You can type in parameters manually or subset the object returned by stx_catalog():

require(standartox)
cas = c(Copper2Sulfate = '7758-98-7',
        Permethrin = '52645-53-1',
        Imidacloprid = '138261-41-3')
# query
l = stx_query(cas = cas,
              endpoint = 'XX50',
              taxa = grep('Oncorhynchus', catal$taxa$variable, value = TRUE), # fish genus
              exposure = 'aquatic',
              duration = c(24, 120))
## Standartox query running..
## Parameters:
## casnr: 7758-98-7, 52645-53-1, 138261-41-3
## duration: 24, 120
## endpoint: XX50
## exposure: aquatic
## taxa: Oncorhynchus clarkii, Oncorhynchus mykiss, Oncorhynchus nerka, Oncorhy...[truncated]

Important parameter settings

  • CAS (cas =) Can be input in the form of 7758-98-7 or 7758987
  • Endpoints (endpoint =) Only one endpoint per query is allowed:
  • If you leave a parameter empty Standartox will not filter for it

Query result

Standartox returns a list object with five entries.

  • l$filtred and l$filtered_all contain the filtered Standartox data set (the former only is a shorter and more concise version of the latter):
cas cname concentration concentration_unit effect endpoint
7758-98-7 cupric sulfate 1100.0 ug/l mortality XX50
7758-98-7 cupric sulfate 18.9 ug/l mortality XX50
7758-98-7 cupric sulfate 46.4 ug/l mortality XX50
  • l$aggregated contains the several aggregates of the Standartox data:

    • cname, cas - chemical identifiers
    • min - Minimum
    • tax_min - Most sensitive taxon
    • gmn - Geometric mean
    • amn - Arithmetic mean
    • sd - Standard Deviation of the arithmetic mean
    • max - Maximum
    • tax_max - Most insensitive taxon
    • n - Number of distinct taxa used for the aggregation
    • tax_all - Concatenated string of all taxa used for the aggregation
cname cas min tax_min gmn max
cupric sulfate 7758-98-7 6.813740e+01 Oncorhynchus clarkii 1.330055e+02 263.6153
imidacloprid 138261-41-3 2.291000e+05 Oncorhynchus mykiss 2.291000e+05 229100.0000
permethrin 52645-53-1 1.896481e+00 Oncorhynchus gilae 4.505877e+00 17.0000
  • l$meta contains meta information on the request:
variable value
accessed 2021-05-10 10:17:29
standartox_version 20210315

Example: Oncorhynchus

Let’s say, we want to retrieve the 20 most tested chemicals on the genus Oncorhynchus. We allow for test durations between 48 and 120 hours and want the tests restricted to active ingredients only. Since we are only interested in the half maximal effective concentration, we choose XX50 as our endpoint. As an aggregation method we choose the geometric mean.

require(standartox)
l2 = stx_query(concentration_type = 'active ingredient',
               endpoint = 'XX50',
               taxa = grep('Oncorhynchus', catal$taxa$variable, value = TRUE), # fish genus
               duration = c(48, 120))
## Standartox query running..
## Parameters:
## concentration_type: active ingredient
## duration: 48, 120
## endpoint: XX50
## taxa: Oncorhynchus clarkii, Oncorhynchus mykiss, Oncorhynchus nerka, Oncorhy...[truncated]

We subset the retrieved data to the 20 most tested chemicals and plot the result.

require(data.table)
dat = merge(l2$filtered, l2$aggregated, by = c('cas', 'cname'))
cas20 = l2$aggregated[ order(-n), cas ][1:20]
dat = dat[ cas %in% cas20 ]
require(ggplot2)
ggplot(dat, aes(y = reorder(cname, -gmn))) +
  geom_point(aes(x = concentration, col = 'All values'),
             pch = 1, alpha = 0.3) +
  geom_point(aes(x = gmn, col = 'Standartox value\n(Geometric mean)'),
             size = 3) +
  scale_x_log10(breaks = c(0.01, 0.1, 1, 10, 100, 1000, 10000),
                labels = c(0.01, 0.1, 1, 10, 100, 1000, 10000)) +
  scale_color_viridis_d(name = '') +
  labs(title = 'Oncorhynchus EC50 values',
       subtitle = '20 most tested chemicals',
       x = 'Concentration (ppb)') +
  theme_minimal() +
  theme(axis.title.y = element_blank())

Usage

We ask you to use the API service thoughtfully, which means to run the stx_query() only once and to re-run it only when parameters change or you want to query new versions. Here is an example of how to easily store the queried data locally from within R.

run = FALSE # set to TRUE for the first run
if (run) {
  l2 = stx_query(concentration_type = 'active ingredient',
                 endpoint = 'XX50',
                 taxa = grep('Oncorhynchus', catal$taxa$variable, value = TRUE), # fish genus
                 duration = c(48, 120))
  saveRDS(l2, file.path('path/to/directory', 'data.rds'))
  
} else {
  l2 = readRDS(file.path('path/to/directory', 'data.rds'))
}

# put rest of the script here
# ...

Article

The article on Standartox is published here.

Information

Contributors

Want to contribute?

Check out our contribution guide here.

Meta

  • Please report any issues, bugs or feature requests
  • License: MIT
  • Get citation information for the standartox package in R doing citation(package = 'standartox')

standartox's People

Contributors

andschar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

standartox's Issues

Sketchy studies

An initial list of "sketchy" studies. Might be considered in the future. Though a separate study quality analysis might be necessary.

stx_query() with CAS no. that returns no data on Ecotox web search returns large amounts of data.

It is possible that the stx_query() function is returning information when there should be a null result from the database.

Using a CAS from the help files

# cas from help files
cas_1 <- c("35554-44-0")

# query
l1 <- stx_query(cas = cas_1,
                exposure = 'aquatic',
                taxa = 'Daphnia magna')

# one result
l1$aggregated$cname

Using a CAS that returns nothing on the web database (1073-69-4) [4-Chlorophenylhydrazine]

Screenshot 2022-11-21 at 23 01 59

# DOES NOT WORK
# this cas returns nothing on the database
# but stx_query does

cas_noDB <- c("1073-69-4")
badQuery <- stx_query(casnr = cas_noDB)

# returns a huge amount of stuff.
# 5664 names
badQuery$aggregated$cname
# none of them contain the compound
sum(grepl("Chlorophenylhydrazine", badQuery$aggregated$cname))

error in flag_outliers?

Hi Andreas,

I think it should be
fifelse(x < qnt[1] - H | x > qnt[2] + H, TRUE, FALSE) instead of
fifelse(x < qnt[1] - H | x > qnt[2], TRUE, FALSE)
in flag_outliers. Or is there a reason to use only the upper 75% quantile as upper threshold?

Access?

I would need to access chemical information via the R-package but I received an error message about its accessibility. Is it possible to repair this? Thanks

Feature Request - endpoint returns value from database, not search value.

Thanks for this awesome tool. I've noticed that the endpoint query argument allows only 3 categories but the ECOTOX database has many more. I see the value of limiting it to 3 options for input, but can the output be the actual results from ECOTOX field? e.g. specifying XX50 seems to include data with ECOTOX endpoints of LC50, LL50, EC50 etc.

Have I understood this correctly?

This means that post-query filtering on the details under the XX50 reality is not possible because the query is returning the search term, not the search result. - thanks for considering this!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.