mrc-ide / epireview Goto Github PK

Home Page: https://mrc-ide.github.io/epireview/

License: GNU General Public License v3.0

R 100.00%

epireview's Introduction

epireview

Please note that epireview is currently under active development. This means that data format, software interface, and features are evolving and are likely to change. The version used in our publication on Marburg virus disease (preprint) is tagged as V0.1.0 and is available here. You can use this version and accompanying data (with appropriate citation) but if you plan to make extensive use of epireview, please first get in touch with by email.

epireview is a tool to obtain the latest data, figures and tables from the Pathogen Epidemiology Review Group (PERG). This package also contains functions to update pathogen-specific databases with new data from peer-reviewed papers as they become available. This can be submitted via a pull-request and will be checked by our team.

To install the latest version of epireview, use:

remotes::install_github('mrc-ide/epireview')

Quick start

To load pathogen-specific data, do

ebola <- epireview::load_epidata("ebola")

At the moment, the package hosts data for Ebola, Marburg and Lassa.

This will load a list consisting of four elements (articles, params, outbreaks, models).

To visualise parameter values,

params <- ebola[["params"]]
forest_plot_rt(params, col_by = "population_country", shape_by = "parameter_value_type")

Some other functions of interest are

forest_plot_r0(params)
forest_plot_serial_interval(params)
forest_plot_incubation_period(params)
forest_plot_infectious_period(params)

Project overview

The COVID-19 pandemic has highlighted the critical role that mathematical modelling can play in supporting evidence-based decision-making during outbreaks (e.g. to project the expected epidemic size, the required hospital capacity and assess the potential population-level impact of interventions). However, early in an epidemic, modelling efforts can be hampered and delayed by the lack of a centralised resource summarising existing model structures and input parameters for the disease of interest. Literature reviews are therefore often conducted during epidemics to identify plausible parameter ranges and/or existing mathematical model structures (e.g. Van Kerkhove et al. Scientific data 2015) and are mostly limited to individual parameters.

A group of ~20 volunteer researchers currently or formerly at Imperial College London with an interest in outbreaks are working together to systematically review the mathematical models and parameters for the nine World Health Organization (WHO) 2019 blue-print priority pathogens: Nairo virus (Crimean-Congo haemorrhagic fever), Ebola virus, Henipa virus, Lassa mammarenavirus, Marburg virus, Middle East respiratory syndrome coronavirus (MERS-CoV), Rift Valley fever virus, Severe Acute Respiratory Syndrome coronavirus (SARS-CoV-1), and Zika virus. These are pathogens, or strains thereof, for which there are no approved vaccines or treatments and hence where we anticipate mathematical modelling is likely to play a major role in supporting the epidemic response. We do not include SARS-CoV-2 because vaccines exist for this pathogen and the body of literature far exceeds the capacity of our team. For each pathogen, we will review published mathematical models, information on transmission, evolution, natural history and severity, as well as seroprevalence studies and reported sizes of previous outbreaks. The quality of each paper will also be assessed as part of the review. This series of systematic reviews is registered with PROSPERO: CRD42023393345

Pathogen database

One output of this project will be a database initially populated with all the information extracted that can be easily updated with new parameter estimates or information on additional pathogens as these become available.

The code in this repository provides functions to access the data for each pathogen. We will update the repository as we progress through this work. An expected timeline is provided in the below table.
We provide functionality to update existing databases with new data as new research (in line with the inclusion and exclusion criteria) becomes available.
Tables and figures for each pathogen can be updated once new data is added to the database.
Vignettes and the github wiki contain all required information on the data.
We will add functionality to create/add new pathogens which are not currently included in the review.

Pathogen overview and timeline

Pathogen	Titles & Abstracts screened	Contact	Living review	last lit review update	doi
Marburg virus	4,460	[email protected], [email protected]	link	Mar 2023	https://doi.org/10.1016/S1473-3099(23)00515-7
Ebola virus	14,690	[email protected], [email protected]		Jul 2023	https://doi.org/10.1016/S1473-3099(24)00374-8
Lassa Mammarenavirus	1,760	[email protected], [email protected]	link	Aug 2023	https://doi.org/10.1101/2024.03.23.24304596
Henipa virus	959	[email protected]		2019
SARS-CoV-1	11,918	[email protected], [email protected]		Nov 2023	https://doi.org/10.1101/2024.08.13.24311934
Nairo virus (CCHF)	1,967	[email protected], [email protected]		2019
Zika virus	4,518	[email protected]		Feb 2024
Rift Valley Fever Virus	3,341	[email protected]		2019
MERS-CoV	10,382	[email protected]		2019
Comprehensive paper comparing pathogens	47,115	[email protected]

If you are interested in adding any other pathogen to the database please feel free to contact us.

epireview's People

Contributors

Stargazers

Watchers

Forkers

jamesmbaazam bisaloo

epireview's Issues

consider showing more contextual information on summary plots

user can colour by one or more of contextual fields, but could we show some context on the summary plots in a pleasing way?

Fix study labels for Lassa

The code produces NAs for some of the study labels..

lassa <- load_epidata('lassa') params <- lassa[['params']]

params$article_label [1] "Webb 1986" "Webb 1986" NA [4] NA "Van Der Waals 1986" "Troup 1970" [7] "Troup 1970" NA NA

This seems to happen because the join is by article id but we have two article ids for the same paper (from double extraction)

Not sure if this is a Lassa specific issue (due to the fact that the double extraction was done prior to the current code base in priority pathogens).

Is Chan and Nishiura serial interval duplicated in Ebola parameters?

Looking through the delay distribution entries in the Ebola data and there are entries with $article_label "Chan 2020 (1)" and "Chan 2020 (2)". However when comparing these rows/entries they are identical in every aspect other than the $parameter_data_id and $article_label. If so, is this an accidental duplication?

Reproducible example to show what I mean:

ebola_data <- epireview::load_epidata("ebola")
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning in load_epidata_raw(pathogen, "outbreak"): No data found for ebola
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning in epireview::load_epidata("ebola"): No outbreaks information found for
#> ebola
#> Data loaded for ebola

ebola_params <- ebola_data$params
chan_si_idx <- which(
  grepl(pattern = "Chan 2020 \\(", x = ebola_params$article_label) &
    grepl(pattern = "serial", x = ebola_params$parameter_type)
)
# what is the difference between Chan 2020 (1) and Chan 2020 (2) entries
ebola_params[chan_si_idx, ]
#> # A tibble: 2 × 77
#>   id      parameter_data_id covidence_id pathogen parameter_type parameter_value
#>   <chr>   <chr>                    <int> <chr>    <chr>                    <dbl>
#> 1 86e39e… 5c8d68c39d1c3b98…        15896 Ebola v… Human delay -…            15.3
#> 2 86e39e… e824649c690f81ba…        15896 Ebola v… Human delay -…            15.3
#> # ℹ 71 more variables: exponent <dbl>, parameter_unit <chr>,
#> #   parameter_lower_bound <dbl>, parameter_upper_bound <dbl>,
#> #   parameter_value_type <chr>, parameter_uncertainty_single_value <dbl>,
#> #   parameter_uncertainty_singe_type <chr>,
#> #   parameter_uncertainty_lower_value <dbl>,
#> #   parameter_uncertainty_upper_value <dbl>, parameter_uncertainty_type <chr>,
#> #   cfr_ifr_numerator <int>, cfr_ifr_denominator <int>, …

waldo::compare(ebola_params[chan_si_idx[1], ], ebola_params[chan_si_idx[2], ])
#> old vs new
#>                           parameter_data_id article_label
#> - old[1, ] 5c8d68c39d1c3b9870ecaaff0280d02e Chan 2020 (1)
#> + new[1, ] e824649c690f81ba50fae3d81254a9f2 Chan 2020 (2)
#> 
#> `old$parameter_data_id`: "5c8d68c39d1c3b9870ecaaff0280d02e"
#> `new$parameter_data_id`: "e824649c690f81ba50fae3d81254a9f2"
#> 
#> `old$article_label`: "Chan 2020 (1)"
#> `new$article_label`: "Chan 2020 (2)"

^{Created on 2024-06-14 with reprex v2.1.0}

Issue in Marburg database

Need to adjust the data entry for R effective for Marburg data to be consistent / work with wider episodes review code

What is "Mean sd" in Ebola parameters?

Looking through the Ebola data set there are several parameter entries that state $distribution_par2_type as "Mean sd". When I first read this I assumed it was the standard deviation of the mean. However, I've read two of the papers where this is reported Rosello et al. (2015) and Chan and Nishiura (2020) and in both cases, my interpretation of the results is they report the mean and standard deviation of the distribution.

Another reason for the confusion is that in the Lassa parameters some are reported as "Mean" and "Standard deviation" for $distribution_par1_type and $distribution_par2_type, respectively. If the Ebola data is the mean and standard deviation of the distribution it would be good to standardise this across pathogens.

Below I've pasted some reproducible examples showing in code which entries I'm mentioning.

Ebola data

ebola_data <- epireview::load_epidata("ebola")
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning in load_epidata_raw(pathogen, "outbreak"): No data found for ebola
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning in epireview::load_epidata("ebola"): No outbreaks information found for
#> ebola
#> Data loaded for ebola

ebola_params <- ebola_data$params
which(ebola_params$distribution_par1_type == "Mean sd")
#> integer(0)

which(ebola_params$distribution_par2_type == "Mean sd")
#>  [1] 364 365 366 367 368 371 374 375 376 377 378 381 382 383 384 385 388 389 390
#> [20] 391 461 468 952 953 956

ebola_params[which(ebola_params$distribution_par2_type == "Mean sd"), ]
#> # A tibble: 25 × 77
#>    id     parameter_data_id covidence_id pathogen parameter_type parameter_value
#>    <chr>  <chr>                    <int> <chr>    <chr>                    <dbl>
#>  1 a9b0b… e0d452d5aea72b33…         2594 Ebola v… Human delay -…           12.9 
#>  2 a9b0b… 9baf758dc2bc1fe3…         2594 Ebola v… Human delay -…            5.02
#>  3 a9b0b… c70a61d876605ae6…         2594 Ebola v… Human delay -…            9.47
#>  4 a9b0b… f0d612742167f390…         2594 Ebola v… Human delay -…            5.72
#>  5 a9b0b… 79b99cad7ce7f90f…         2594 Ebola v… Human delay -…            4.5 
#>  6 a9b0b… 4704e8616bdf9d2c…         2594 Ebola v… Human delay -…            0   
#>  7 a9b0b… d79a599fd2882bfa…         2594 Ebola v… Human delay -…           10.0 
#>  8 a9b0b… 4b9e9266adfda812…         2594 Ebola v… Human delay -…            0   
#>  9 a9b0b… 365cd27a1c10c648…         2594 Ebola v… Human delay -…            7.62
#> 10 a9b0b… 7cffbd19447c4390…         2594 Ebola v… Human delay -…            1.5 
#> # ℹ 15 more rows
#> # ℹ 71 more variables: exponent <dbl>, parameter_unit <chr>,
#> #   parameter_lower_bound <dbl>, parameter_upper_bound <dbl>,
#> #   parameter_value_type <chr>, parameter_uncertainty_single_value <dbl>,
#> #   parameter_uncertainty_singe_type <chr>,
#> #   parameter_uncertainty_lower_value <dbl>,
#> #   parameter_uncertainty_upper_value <dbl>, …

ebola_params[which(ebola_params$distribution_par2_type == "Mean sd"), ]$article_label
#>  [1] "Rosello 2015 (1)" "Rosello 2015 (1)" "Rosello 2015 (1)" "Rosello 2015 (1)"
#>  [5] "Rosello 2015 (1)" "Rosello 2015 (2)" "Rosello 2015 (3)" "Rosello 2015 (2)"
#>  [9] "Rosello 2015 (2)" "Rosello 2015 (2)" "Rosello 2015 (2)" "Rosello 2015 (4)"
#> [13] "Rosello 2015 (3)" "Rosello 2015 (3)" "Rosello 2015 (3)" "Rosello 2015 (3)"
#> [17] "Rosello 2015 (5)" "Rosello 2015 (4)" "Rosello 2015 (4)" "Rosello 2015 (4)"
#> [21] "Lau 2017 (a)"     "Lau 2017 (b)"     "Chan 2020"        "Chan 2020 (1)"   
#> [25] "Chan 2020 (2)"

^{Created on 2024-06-14 with reprex v2.1.0}

Lassa data

lassa_data <- epireview::load_epidata("lassa")
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Data loaded for lassa

lassa_params <- lassa_data$params
which(!is.na(lassa_params$distribution_par1_type))
#> [1] 261 262

which(!is.na(lassa_params$distribution_par2_type))
#> [1] 261 262

lassa_params[which(!is.na(lassa_params$distribution_par1_type)), ]$distribution_par1_type
#> [1] "Mean" "Mean"

lassa_params[which(!is.na(lassa_params$distribution_par1_type)), ]$distribution_par2_type
#> [1] "Standard deviation" "Standard deviation"

^{Created on 2024-06-14 with reprex v2.1.0}

Improve colour palette

No one who has looked at it likes the country palette. This needs to be fixed.
Sabine shared this useful resource: https://www.simplifiedsciencepublishing.com/resources/best-color-palettes-for-scientific-figures-and-data-visualizations

Add create_new_model_row function

Similar to new create_new_article_row function

template for what information should be supplied
validate against expected values / ranges

Fix vignettes

Vignettes are broken, they need to be fixed. They are currently sourcing R scripts, this is not necessary.

how to make mapping functions more generic

Lassa has started with a good set of mapping functions. However, it's still non-trivial to reuse for other pathogens

Are mapping functions always pathogen specific?
Can we split this up further / better into pre-processing and using the pre-processed information with ggplot2 + sf

reorder_studies function is dropping rows

When we feed parameters into a forest_plot, it automatically runs them through "reorder_studies" to plot them from highest to lowest. However, it also does this by country. If a parameter has NA for country, it currently cuts that parameter. (drops that row). We don't want this happen.

Rather, we want it to include all by default, and have users say if they want to order by country, and then have a way to make sure NAs aren't removed still.

ebola_df <- load_epidata('ebola')
ebola_params <- ebola_df[["params"]]

forest_plot_infectious_period(ebola_params)

ebola_params <- ebola_params[ebola_params$parameter_type %in% c("Human delay - infectious period",
                                                   "Human delay - infectious period  (inverse parameter)"), ]
#So, seven params, but only six got plotted above, we're missing a 21.6 estimate

ebola_params <- reparam_gamma(ebola_params) |>
  invert_inverse_params() |>
  delays_to_days() |>
  param_pm_uncertainty()

#Now, enforce the bug, set Yang (2015) (the first one)
ebola_params$population_country[1] <- NA
#We have 25 rows...
forest_plot_infectious_period(ebola_params)
#Yang has gone missing!

#THis is because of reorder_studies:
ebola_params <- reorder_studies(ebola_params)

Vignette on how to add data for article, outbreak, parameter and model data

Description on how to update article, outbreak, parameter & model data

git clone repo locally
examples of how to run functions
checks

This is only for adding papers to the pathogen databases. For adding entirely new pathogens this is not scalable...

epireview direct load creating extra rows?

parameters <- read_csv("../epireview/inst/extdata/lassa_parameters.csv")
tmp <- epireview::load_epidata("lassa")[["params"]]

tmp has 374 rows whereas parameters has 371. tmp also has more columns but that's expected.

Add all required data to articles data frame from load_epidata

If we load data for a pathogen and look at the articles data frame we only get a small subset of the article information

marburg_data <- epireview::load_epidata('marburg')

A call to marburg_articles <- epireview::load_epidata_raw( pathogen = "marburg", table = "article") is required to get full articles information. This is an issue for 'epiparameter::as_epidist()functionality and means that aload_epidata_raw` call is required + matching of articles.

We should expose at least the rows required by epiparameter in the articles dataframe in `load_epidata'. At a minimum "article_title", "doi" and "journal" are required.

fyi - @joshwlambert

Improve Warning messages for load_epidata(pathogen)

From running the code the following warning for loading Lassa data is generated:

Data loaded for lassa Warning messages: 1: One or more parsing issues, call problems()on your data frame for details, e.g.: dat <- vroom(...) problems(dat)

Add a create_new_outbreak_row function

Similar to new create_new_article_row function

template for what information should be supplied
validate against expected values / ranges
check against list of country names

Ebola SI incorrectly extracted

I think Jombart entry for SI for Ebola has been extracted with SD in the uncertainty field instead of as a separate entry.

mark_multiple_estimates should be made available to plotting functions

Currently multiple estimates from the same study are being distinguished via the mark_multiple_estimates option in load_epidata.
If data are loaded through this function and passed to forest_plot, these estimates will then appear on separate lines in the plot.
However, there is no option of distinguishing them in the forest_plot functions by themselves, or separately outside of load_epidata.
So if someone uses the forest plotting function directly, they have to reinvent the wheel by creating unique labels.

This needs to be changed.

Print hex-stickers for epireview

Provide DOI as column in parameter dataframe

From Berlin workshop:
When merging the article data df with the parameter data df we should include the doi column. This is really helpful for the user to be able to quickly find the paper the estimate is from in case they need more information and should be a very quick fix.

Instructions to add new private fork and how to merge back for new pathogens

Move load_epidata from prep_data_forest_plots.R to load_epi_data.R

load_epidata is currently in R/prep_data_forest_plots.R . Would it make more sense to move it to R/load_epidata.R?

accommodating changes in extraction of uncertainty/variability

We will need to change the way the pairs of mean/sds are stored in Marburg, Lassa, Ebola and SARS to match the way this is done in Zika - so post processing the current dbs for Marburg etc so the users sees something which looks like ZIka
this will make interfacing with external tools such as epiparameters much easier

Make parameter type easier to filter

The parameter name column parameter_type is currently not very easy to filter because some of the names are fairly long which requires users to know the exact name, and introduces the potential for errors in specifying the names.

This issue is to request that the parameter types in the column parameter_type be made easier to filter by:

Shortening the name,
Replacing spaces with underscores,
Using lowercase names.

E.g. replace Human delay - Symptom Onset/Fever to Death with onset_to_death.

Data dictionary

Package should provide a clear data dictionary where all columns are explained clearly. I think this would be best provided through a vignette.

Get one "summary stat" from parameters that can be used downstream

The group has highlighted that users may want one "summary" number for a parameter type which they can then use in their models. For instance, Ebola serial interval.
The most obvious candidate for this summary is obviously the result of the meta-analysis, where we have done this. However, that raises some questions:

should we provide a function to do meta-analysis? I am slightly hesitant to do this because (a) we use the meta package, so that we will be providing a very thin wrapper on another package, and (b) we made lots of very careful choices while doing meta-analysis, and doing this right needs a lot of care.
how do we summarise parameters for which we did not do meta-analysis?

IMO, we should make available the results of meta-analysis where available and have a think about my second question.

user-friendly functions to get specific parameters

we want user-friendly functions to get to any parameter of interest.
e.g. serial_intervals('ebola') etc.
We could either have

load_epidata('ebola')$params |> serial_intervals()

Or,

serial_intervals('ebola')

The second option will use load_epidata.
Both of these are quite easy to implement technically, needs more thought about how to make it user-friendly.
Once we have these in place, forest_plot functions should be modified to use these instead of doing their own filtering.

Summary function to help users prioritise a distribution to use

It could be helpful to have a simple function that helps users select a parameter from a returned set. From discussions, it seems like a function that wrapped the following column selection would be particularly helpful to inform this decision, or at least guide users:

# Get Lassa parameters
lassa_data <- epireview::load_epidata("lassa")
lassa_params <- lassa_data$params

# Extract delay
param_admission_outcome <- lassa_params |> dplyr::filter(parameter_type=="Human delay - admission to care>discharge/recovery")

# Summary columns
summary_columns <- c("article_label", "population_sample_size", "population_location", "population_group",
                     "population_sex", "parameter_value", "distribution_type")

summary_info <- param_admission_outcome[,summary_columns]
summary_info <- summary_info |> dplyr::arrange(desc(population_sample_size))
print(summary_info, n=Inf,width=Inf)

Persist notes from extraction process into epireview for each paper

convert to and from epidist object

Identify the minimal set of columns we want ported over when we convert to epidist.
have a think about the columns mean the same to us as to epidist and epidist ingesting functions
translate;
as an example we could filter epidist object and try plotting it.

Case fatality ratio not rate

We currently label the CFR as "case fatality rate" but this should be updated everywhere to "case fatality ratio".

Colours for different columns when using forest_plot

At the moment the column "population_country" has a colour palette assigned so it's possible to plot parameters and colour by country
However other columns don't have assigned palettes so it's not currently possible to plot by other variables
It'd be great to be able to do this, and colour by other variables such as year or population size
Thanks!

Allow easily updating data

We need to implement functionality so that user can easily add their own data. functionality rolled out with marburg is currently broken as a lot of development has happened since then. Before we try to fix it, we need to think about how can we make it user-friendly. Our tables have lots of columns so that it is not feasible for someone to even specify all of them. We should identify a minimal set of mandatory columns that should be filled by the user.

Add R-CMD-check

user-friendly data filters

We have implemented filtering (see filter_cols), but it is not very user-friendly because we expect the user to know the exact column name to filter on as well as the precise value.
Our column names are not easy to remember or guess, and where a precise value is required e.g. if filtering on parameter, the user must know how the value has been specified by us (e.g, Basic Reproduction Number, when the user wants R0). How can we make this feature easy to use?
My first thought is to provide user-friendly names for parameter values e.g. have a function that maps R0, r0, rnaught etc to "Basic Reproduction Number", so that user can say filter_param_type("r0") and they get what they want. But this is just an initial thought, I haven't really thought this through.

General review of package and code by Paul and Mantra

As discussed in initial meeting with us, @sangeetabhatia03 and Rich

Data compatibility - Marburg

Due to the updates to the database, we need to ensure that the Marburg df (e.g. columns) are updated to reflect the data structure that we have just now.
We had a discussion about doing this at the end of the project, but then an excellent point was raised that we should do this sooner rather than later before this paper entirely drops out of our mind and while the people involved are all still at Imperial.

A `parameter_value_type` in Marburg is a `numeric`

The parameter_value_type for row 15 in the Marburg parameters table is a number. Is this a mistake or can the parameter_value_type column also contain numeric values?

create an article about data limitations

issues around "other"/na/missing, QA scores etc.

Add a quick start section

This issue is to request a "Quick start" section in the Readme to show users how to get started with accessing and extracting parameters from the database for use in downstream analyses. A small example of conversion to the {epiparameter} class <epidist> is shown here.

library(epireview)
#> Loading required package: epitrix
#> Loading required package: ggplot2
#> Loading required package: ggforce
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

data = load_epidata("ebola")
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning in load_epidata_raw(pathogen, "outbreak"): No data found for ebola
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning in load_epidata("ebola"): No outbreaks information found for ebola
#> Data loaded for ebola

names(data)
#> [1] "articles"  "params"    "models"    "outbreaks"
param_data = data$params

# get all onset to death distribution data
# select high quality with Gamma dist
param_data = filter(
  param_data,
  parameter_type == "Human delay - Symptom Onset/Fever to Death",
  distribution_type == "Gamma",
  article_qa_score > 50
) %>%
  head(1)

# transform to epidist
# NOTE which parameters are used for uncertainty
# this differs between studies/entries
onset_to_death_mean = pull(param_data, parameter_value)
onset_to_death_sd = pull(param_data, distribution_par2_value)

# assume a Gamma distribution?
dist_params = epiparameter::convert_summary_stats_to_params(
  x = "gamma",
  mean = onset_to_death_mean,
  sd = onset_to_death_sd
)

# select one with shape and scale
epidist_ebola_otd = epiparameter::epidist(
  "Ebola virus disease",
  "Ebolavirus",
  epi_dist = "onset_to_death",
  prob_distribution = "gamma",
  prob_distribution_params = unlist(dist_params)
)
#> Citation cannot be created as author, year, journal or title is missing

epidist_ebola_otd
#> Disease: Ebola virus disease
#> Pathogen: Ebolavirus
#> Epi Distribution: onset to death
#> Study: (????). "No citation."
#> Distribution: gamma
#> Parameters:
#>   shape: 4.549
#>   scale: 2.082

^{Created on 2024-03-27 with reprex v2.0.2}

Outcome type column for Marburg data

Currently the parameters for Marburg include the type of parameter as "Human delay - time symptom to outcome", but as far as I can tell there's no other columns that indicate what type of outcome. It'd be very useful to have this information, for instance the onset-death delay to estimate CFR.

On a related note, to make the database easier to filter and subset, I think it'd also be helpful to split the "parameter type" column into 2, one with "human delay" and the other one with the specific type of delay- at the moment I had to use grepl() to be able to subset by parameter type.

Thanks! @ruthmccabe

Add create_new_parameter_row function

Similar to new create_new_article_row function

template for what information should be supplied
validate against expected values / ranges

functions should print more informative messages

e.g. load_epidata should print information about number of studies, number of unique parameters etc.

rename parameter types to more intuitive names

e.g. change "Human delay - serial interval" to "serial interval" (I think the default can be human)

Dual license

We need to make sure we have a dual license that covers both the code and the data. To be discussed

how to deal with different schemas across pathogens

Currently the schema (in particular for parameters) differs across pathogens as the access database has evolved over time + been adapted to pathogens.

How do we take account of this in

parameter load function (linked to Issue 23)
analysis
compilation of parameter tables across multiple pathogens

Improve text of warning messages

Kelly and Ettie (and later Gina) have pointed out that the following warning text is confusing for the user:
Warning messages:
1: In delays_to_days(invert_inverse_params(reparam_gamma(df))) :
Not all delays are in days. Other units are:weeks
2: In delays_to_days(invert_inverse_params(reparam_gamma(df))) :
We will attempt to convert hours and weeks to days.

This needs to be fixed. Maybe we should tell the users what we did or didn't convert.

mark / comment functions which are legacy and should move to priority_pathogens (eg meta-analysis)

Investigate different uncertainty intervals in forest_plots.

Some parameters have different types of uncertainty around their (different types of) mean. Currently these different types of uncertainty are broadly captured by different interval line-types. It is not clear what they mean, and I am also dubious of if it is definitely being interpretted correctly.

Use a few parameter entries from {epireview} in {epiparameter} for testing?

I am enhancing the as_epidist() functionality in {epiparameter}, which takes a row (or multiple rows for multi-row entries) from one of the {epireview} parameter tables and converts it into an <epidist> object. We are not taking {epireview} on as a package dependency in {epiparameter} as this would be quite heavy given the number of dependencies in {epireview}. Would it be okay to save a few entries (rows) of the {epireview} parameters for the purpose of testing as_epidist()? If possible I will update these periodically to ensure they are up-to-date with any changes made in {epireview}.

Complete pathogen_marburg vignette

Need to add:

mrc-ide / epireview Goto Github PK

epireview's Introduction

epireview

Quick start

Project overview

Pathogen database

Pathogen overview and timeline

epireview's People

Contributors

Stargazers

Watchers

Forkers

epireview's Issues

Ebola data

Lassa data

Recommend Projects

Recommend Topics

Recommend Org