seabbs / gettbinr Goto Github PK

View Code? Open in Web Editor NEW

16.0 4.0 6.0 245.68 MB

An R package for accessing and summarising the World Health Organisation Tuberculosis data.

Home Page: https://www.samabbott.co.uk/getTBinR

License: GNU General Public License v3.0

R 3.37% Makefile 0.04% HTML 96.54% Dockerfile 0.02% TeX 0.03%

r package tuberculosis who data world-health-organization shiny eda binder-ready tb-incidence-rates

gettbinr's Introduction

getTBinR: Access and Summarise World Health Organization Tuberculosis Data

Quickly and easily import analysis ready Tuberculosis (TB) burden data, from the World Health Organization (WHO), into R. The aim of getTBinR is to allow researchers, and other interested individuals, to quickly and easily gain access to a detailed TB data set and to start using it to derive key insights. It provides a consistent set of tools that can be used to rapidly evaluate hypotheses on a widely used data set before they are explored further using more complex methods or more detailed data. These tools include: generic plotting and mapping functions; a data dictionary search tool; an interactive shiny dashboard; and an automated, country level, TB report. For newer R users, this package reduces the barrier to entry by handling data import, munging, and visualisation. All plotting and mapping functions are built with ggplot2 so can be readily extended. See here for the WHO data permissions. For help getting started see the Getting Started vignette and for a case study using the package see the Exploring Global Trends in Tuberculosis Incidence Rates vignette.

Installation

Install the CRAN version:

install.packages("getTBinR")

Alternatively install the development version from GitHub:

# install.packages("devtools")
devtools::install_github("seabbs/getTBinR")

Documentation

Testing

Quick start

Lets get started quickly by mapping and then plotting TB incidence rates in the United Kingdom. First map the most recently available global TB incidence rates (this will also download and save both the TB burden data and its data dictionary, if they are not found locally, to R’s temporary directory),

getTBinR::map_tb_burden(metric = "e_inc_100k")

Then compare TB incidence rates in the UK to TB incidence rates in other countries in the region,

getTBinR::plot_tb_burden_overview(metric = "e_inc_100k",
                                  countries = "United Kingdom",
                                  compare_to_region = TRUE)

In order to compare the changes in incidence rates over time, in the region, plot the annual percentage change,

getTBinR::plot_tb_burden_overview(metric = "e_inc_100k",
                                  countries = "United Kingdom",
                                  compare_to_region = TRUE,
                                  annual_change = TRUE)

Now plot TB incidence rates over time in the United Kingdom, compared to TB incidence rates in Europe and globally.

getTBinR::plot_tb_burden_summary(metric = "e_inc_num",
                                 metric_label = "e_inc_100k",
                                 countries = "United Kingdom",
                                 compare_all_regions = FALSE,
                                 compare_to_region = TRUE,
                                 compare_to_world = TRUE)

We can repeat the above plot but this time only for the UK - this allows us to get a clear picture of trends in TB incidence rates in the UK.

getTBinR::plot_tb_burden(metric = "e_inc_100k",
                         countries = "United Kingdom")

We might be interested in having some of this information in tablular form. We can either generate a short summary for the most recent year of available data with the following,

getTBinR::summarise_metric(metric = "e_inc_100k",
                           countries = "United Kingdom")
#> # A tibble: 1 x 6
#>   country         year metric        world_rank region_rank avg_change
#>   <chr>          <int> <chr>              <int>       <int> <chr>     
#> 1 United Kingdom  2018 8 (7.2 - 8.8)        165          33 -5.9%

Or a more detailed dataset as follows,

getTBinR::summarise_tb_burden(metric = "e_inc_num",
                              stat = "rate",
                              countries = "United Kingdom", 
                              compare_to_world = FALSE, 
                              compare_to_region = FALSE) 
#> # A tibble: 133 x 5
#>    area            year e_inc_num e_inc_num_lo e_inc_num_hi
#>    <fct>          <int>     <dbl>        <dbl>        <dbl>
#>  1 United Kingdom  2000      11.9         10.7         13.1
#>  2 United Kingdom  2001      11.5         10.3         12.7
#>  3 United Kingdom  2002      13.1         11.8         14.3
#>  4 United Kingdom  2003      13.4         12.1         14.8
#>  5 United Kingdom  2004      13.2         11.9         14.5
#>  6 United Kingdom  2005      15.3         13.8         16.6
#>  7 United Kingdom  2006      15.3         13.8         16.4
#>  8 United Kingdom  2007      14.6         13.2         16.1
#>  9 United Kingdom  2008      15.0         13.5         16.1
#> 10 United Kingdom  2009      14.5         13.1         15.9
#> # … with 123 more rows

Here e_inc_num is used rather than e_inc_100k as incidence rates are being estimated based on notified cases. This allows country level rates to be compared to regional (using compare_to_region = TRUE) and global (using compare_to_world = TRUE) rates.

See Functions for more details of the functions used (note the fuzzy country matching, all functions will try to exactly match your country request and if that fails will search for partial matches) and for more package functionality. We could make the plots above interactive by specifying interactive = TRUE

Additional datasets

On top of the core datasets provided by default, getTBinR also supports importing multiple other datasets. These include data on latent TB, HIV surveillance, intervention budgets, and outcomes. The currently supported datasets are listed below,

knitr::kable(getTBinR::available_datasets[, 1:4])

dataset	description	timespan	default
Estimates	Generated estimates of TB mortality, incidence, case fatality ratio, and treatment coverage (previously called case detection rate). Data available split by HIV status.	2000-2018	yes
Estimates	Generated estimates for the proportion of TB cases that have rifampicin-resistant TB (RR-TB, which includes cases with multidrug-resistant TB, MDR-TB), RR/MDR-TB among notified pulmonary TB cases.	2018	yes
Incidence by age and sex	Generated estimates of TB incidence stratified by age and sex. This dataset is currently experimental.	2018	no
Latent TB infection	Generated estimates incidence of latent TB stratified by age.	2018	no
Notification	TB notification dataset linking to TB notifications as raw numbers. Age-stratified, with good data dictionary coverage but has large amounts of missing data.	1980-2018	no
Drug resistance surveillance	Country level drug resistance surveillance. Lists drug resistance data from country level reporting. Good data dictionary coverage but has large amounts of missing data.	2018	no
Non-routine HIV surveillance	Country level, non-routine HIV surveillance data. Good data dictionary coverage but with a large amount of missing data.	2007-2018	no
Outcomes	Country level TB outcomes data. Lists numeric outcome data, very messy but with good data dictionary coverage.	1994-2018	no
Budget	Current year TB intervention budgets per country. Many of the data fields are cryptic but has good data dictionary coverage.	2018	no
Expenditure and utilisation	Previous year expenditure on TB interventions. Highly detailed, with good data dictionary coverage but lots of missing data.	2018	no
Policies and services	Lists TB policies that have been implemented per country. Highly detailed, with good data dictionary coverage but lots of missing data.	2018	no
Community engagement	Lists community engagement programmes. Highly detailed, with good data dictionary coverage but lots of missing data.	2013-2018	no
Laboratories	Country specific laboratory data. Highly detailed, with good data dictionary coverage but lots of missing data.	2009-2018	no

These datasets can be imported into R by supplying the name of the required dataset to the additional_datasets argument of get_tb_burden (or any of the various plotting/summary functions). Alternatively, they can all be imported in one go using additional_datasets = "all", as below,

getTBinR::get_tb_burden(additional_datasets = "all")
#> # A tibble: 8,694 x 485
#>    country iso2  iso3  iso_numeric g_whoregion  year e_pop_num e_inc_100k
#>    <chr>   <chr> <chr>       <int> <chr>       <int>     <int>      <dbl>
#>  1 Afghan… AF    AFG             4 Eastern Me…  2000  20779953        190
#>  2 Afghan… AF    AFG             4 Eastern Me…  2001  21606988        189
#>  3 Afghan… AF    AFG             4 Eastern Me…  2002  22600770        189
#>  4 Afghan… AF    AFG             4 Eastern Me…  2003  23680871        189
#>  5 Afghan… AF    AFG             4 Eastern Me…  2004  24726684        189
#>  6 Afghan… AF    AFG             4 Eastern Me…  2005  25654277        189
#>  7 Afghan… AF    AFG             4 Eastern Me…  2006  26433049        189
#>  8 Afghan… AF    AFG             4 Eastern Me…  2007  27100536        189
#>  9 Afghan… AF    AFG             4 Eastern Me…  2008  27722276        189
#> 10 Afghan… AF    AFG             4 Eastern Me…  2009  28394813        189
#> # … with 8,684 more rows, and 477 more variables: e_inc_100k_lo <dbl>,
#> #   e_inc_100k_hi <dbl>, e_inc_num <int>, e_inc_num_lo <int>,
#> #   e_inc_num_hi <int>, e_tbhiv_prct <dbl>, e_tbhiv_prct_lo <dbl>,
#> #   e_tbhiv_prct_hi <dbl>, e_inc_tbhiv_100k <dbl>, e_inc_tbhiv_100k_lo <dbl>,
#> #   e_inc_tbhiv_100k_hi <dbl>, e_inc_tbhiv_num <int>, e_inc_tbhiv_num_lo <int>,
#> #   e_inc_tbhiv_num_hi <int>, e_mort_exc_tbhiv_100k <dbl>,
#> #   e_mort_exc_tbhiv_100k_lo <dbl>, e_mort_exc_tbhiv_100k_hi <dbl>,
#> #   e_mort_exc_tbhiv_num <int>, e_mort_exc_tbhiv_num_lo <int>,
#> #   e_mort_exc_tbhiv_num_hi <int>, e_mort_tbhiv_100k <dbl>,
#> #   e_mort_tbhiv_100k_lo <dbl>, e_mort_tbhiv_100k_hi <dbl>,
#> #   e_mort_tbhiv_num <int>, e_mort_tbhiv_num_lo <int>,
#> #   e_mort_tbhiv_num_hi <int>, e_mort_100k <dbl>, e_mort_100k_lo <dbl>,
#> #   e_mort_100k_hi <dbl>, e_mort_num <int>, e_mort_num_lo <int>,
#> #   e_mort_num_hi <int>, cfr <dbl>, cfr_lo <dbl>, cfr_hi <dbl>, cfr_pct <int>,
#> #   cfr_pct_lo <int>, cfr_pct_hi <int>, c_newinc_100k <dbl>, c_cdr <dbl>,
#> #   c_cdr_lo <dbl>, c_cdr_hi <dbl>, source_rr_new <chr>,
#> #   source_drs_coverage_new <chr>, source_drs_year_new <int>,
#> #   e_rr_pct_new <dbl>, e_rr_pct_new_lo <dbl>, e_rr_pct_new_hi <dbl>,
#> #   e_mdr_pct_rr_new <int>, source_rr_ret <chr>, source_drs_coverage_ret <chr>,
#> #   source_drs_year_ret <int>, e_rr_pct_ret <dbl>, e_rr_pct_ret_lo <dbl>,
#> #   e_rr_pct_ret_hi <dbl>, e_mdr_pct_rr_ret <int>, e_inc_rr_num <int>,
#> #   e_inc_rr_num_lo <int>, e_inc_rr_num_hi <int>, e_mdr_pct_rr <dbl>,
#> #   e_rr_in_notified_labconf_pulm <int>,
#> #   e_rr_in_notified_labconf_pulm_lo <int>,
#> #   e_rr_in_notified_labconf_pulm_hi <int>, source_hh <chr>, e_hh_size <dbl>,
#> #   prevtx_data_available <int>, newinc_con04_prevtx <int>,
#> #   ptsurvey_newinc <int>, ptsurvey_newinc_con04_prevtx <int>,
#> #   e_prevtx_eligible <dbl>, e_prevtx_eligible_lo <dbl>,
#> #   e_prevtx_eligible_hi <dbl>, e_prevtx_kids_pct <dbl>,
#> #   e_prevtx_kids_pct_lo <dbl>, e_prevtx_kids_pct_hi <dbl>, new_sp <int>,
#> #   new_sn <int>, new_su <int>, new_ep <int>, new_oth <int>, ret_rel <int>,
#> #   ret_taf <int>, ret_tad <int>, ret_oth <int>, newret_oth <int>,
#> #   new_labconf <int>, new_clindx <int>, ret_rel_labconf <int>,
#> #   ret_rel_clindx <int>, ret_rel_ep <int>, ret_nrel <int>,
#> #   notif_foreign <int>, c_newinc <int>, new_sp_m04 <int>, new_sp_m514 <int>,
#> #   new_sp_m014 <int>, new_sp_m1524 <int>, new_sp_m2534 <int>,
#> #   new_sp_m3544 <int>, new_sp_m4554 <int>, …

Once imported, these datasets can be used in the plotting and summary functions provided by getTBinR (by passing them to their df argument or using the additional_datasets argument in each function). See the contributing section if their are any other datasets that you think getTBinR should support or if you have suggestions for better descriptions for each dataset.

WHO-inspired themes and palettes.

The WHO makes use of several standardised plot themes and colour palettes. getTBinR implements these so that the package can be easily used internally at the WHO or by those collaborating with the WHO.

getTBinR::plot_tb_burden_summary(countries = "United Kingdom", 
                                 compare_all_regions = FALSE, 
                                 compare_to_region = TRUE) +
  getTBinR::theme_who() +
  getTBinR::scale_colour_who(reverse = TRUE) +
  getTBinR::scale_fill_who(reverse = TRUE)

Shiny dashboard

To explore the package functionality in an interactive session, or to investigate TB without having to code extensively in R, a shiny dashboard has been built into the package. This can either be used locally using,

getTBinR::run_tb_dashboard()

Or accessed online. Any metric in the WHO data can be explored, with country selection using the built in map, and animation possible by year.

Country report

To get a detailed overview of TB in a country of your choice run the following, alternatively available from the built in dashboard above.

## Code saves report into your current working directory
render_country_report(country = "United Kingdom", save_dir = ".")

Contributing

File an issue here if there is a feature, or a dataset, that you think is missing from the package, or better yet submit a pull request!

Please note that the getTBinR project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Citing

If using getTBinR please consider citing the package in the relevant work. Citation information can be generated in R using the following (after installing the package),

citation("getTBinR")
#> 
#> To cite getTBinR in publications use:
#> 
#>   Sam Abbott (2019). getTBinR: an R package for accessing and
#>   summarising the World Health Organisation Tuberculosis data Journal
#>   of Open Source Software, 4(34), 1260. doi: 10.21105/joss.01260
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Article{,
#>     title = {getTBinR: an R package for accessing and summarising the World Health Organisation Tuberculosis data},
#>     author = {Sam Abbott},
#>     journal = {Journal of Open Source Software},
#>     year = {2019},
#>     volume = {4},
#>     number = {34},
#>     pages = {1260},
#>     doi = {10.21105/joss.01260},
#>   }

Docker

This package has been developed in docker based on the rocker/tidyverse image, to access the development environment enter the following at the command line (with an active docker daemon running),

docker pull seabbs/gettbinr
docker run -d -p 8787:8787 -e USER=getTBinR -e PASSWORD=getTBinR --name getTBinR seabbs/gettbinr

The rstudio client can be accessed on port 8787 at localhost (or your machines ip). The default username is getTBinR and the default password is getTBinR. Alternatively, access the development environment via binder.

gettbinr's People

Contributors

Stargazers

Watchers

Forkers

wojciechniemczyk wenlong-liu themellion frycast mariabnd benkcwong

gettbinr's Issues

Rmarkdown not installed by run_tb_dashboard but needed to render the TB report.

Describe the bug
When rendering a TB country report via the shiny app an error occurs in a clean R install.

To Reproduce

getTBinR::run_tb_dashboard()

Select a country and click country report.

Expected behavior

Report should be generated and downloaded.

Additional context
render_country_report should also check for Rmarkdown.

Data import error

Dear getTBinR gurus,

this is a great application for anyone interested in tuberculosis data. Sadly enough, however, I came across a problem. After installation according to the vignette, trying to use

tb_burden <- get_tb_burden()

leads to apparant downloading of the data, they are not read in correctly, however. The screen log is:

tb_burden <- get_tb_burden()
Downloading data from: https://extranet.who.int/tme/generateCSV.asp?ds=estimates
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 272k 0 272k 0 0 192k 0 --:--:-- 0:00:01 --:--:-- 194k
Downloading the data using fread::data.table has failed. Trying
again using utils::read.csv
Downloading data has failed after 1 tries.
Attempting data download in 3.4 seconds.
Downloading data from: https://extranet.who.int/tme/generateCSV.asp?ds=estimates
Downloading the data using fread::data.table has failed. Trying
again using utils::read.csv
Downloading data has failed after 2 tries.
Attempting data download in 2.9 seconds.
Downloading data from: https://extranet.who.int/tme/generateCSV.asp?ds=estimates
Downloading the data using fread::data.table has failed. Trying
again using utils::read.csv
Downloading data has failed after 3 tries.
Attempting data download in 4.4 seconds.
Downloading data from: https://extranet.who.int/tme/generateCSV.asp?ds=estimates
Downloading the data using fread::data.table has failed. Trying
again using utils::read.csv
Downloading data has failed after 4 tries.
Attempting data download in 2.5 seconds.
Downloading data from: https://extranet.who.int/tme/generateCSV.asp?ds=estimates
Downloading the data using fread::data.table has failed. Trying
again using utils::read.csv
Downloading data has failed after 5 tries.
Attempting data download in 1.2 seconds.
Fehler in get_data(url = url, download_data = download_data, data_trans_fn = trans_burden_data, :
Data downloading has failed, check your internet connection.
If this issue is not resolved, contact the package author.
Zusätzlich: Warnmeldungen:
1: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
URL 'https://extranet.who.int/tme/generateCSV.asp?ds=estimates': status was 'Failure when receiving data from the peer'
2: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
URL 'https://extranet.who.int/tme/generateCSV.asp?ds=estimates': status was 'Failure when receiving data from the peer'
3: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
URL 'https://extranet.who.int/tme/generateCSV.asp?ds=estimates': status was 'Failure when receiving data from the peer'
4: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
URL 'https://extranet.who.int/tme/generateCSV.asp?ds=estimates': status was 'Failure when receiving data from the peer'
5: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
URL 'https://extranet.who.int/tme/generateCSV.asp?ds=estimates': status was 'Failure when receiving data from the peer'

Using wget https://extranet.who.int/tme/generateCSV.asp?ds=estimates , the data are read into a file on my local machine (GNU/Kubuntu Linux 16.04.3 LTS running Linux 4.15.5, R 3.4.3) like a charm.

I'd love a clue on how I should proceed to have the data read in properly.

Best wishes and regards,

Ernst

Review available data

The WHO TB report appears to draw on additional cost information. It makes sense to integrate this into the package.

Steps are:

Review WHO TB Report and note data being used
Find data sources
Incorporate into get_tb_burden as for MDR data
Add to examples
Add tests for new data importing
Add details to documentation
Add details to news
Review if any new feautures

Make plots/maps work with categorical shading

Make summary function compute summarised incidence rates and proportions

Return correct confidence intervals when supplying countries to `summarise_tb_burden`

Describe the bug
When computing summary statistics using summarise_tb_burden and supplying a list of countries NA are returned rather than the known data values. This issue is only valid for the "mean". For "median" confidence intervals are not possible so NA values make sense.

To Reproduce

library(getTBinR)

summarise_tb_burden(countries = "United Kingdom", 
                    metric = "e_mdr_pct_rr_new", verbose = FALSE,
                    year = 2017, stat = "mean")

Expected behaviour
The expected behaviour is that the NA values are returned for all single countries. Where it makes sense actual country level confidence intervals should be returned.

Desktop (please complete the following information):
Using development docker container (seabbs/gettbinr)

Expand the feature set of the shiny app.

The built-in shiny app does not currently support any of the features released since 0.5.0 or the new datasets supported from 0.6.0. These should be included prior to a new release.

Get ready for CRAN resubmission

Host package on shiny server
Run tests
Run check
Windows check
CRAN comments
resubmit

Submit to CRAN

Solve all other issues
Add basic functionality and test extensively

Vignette: Case study

Case study using the full functionality of the package. Possible ideas are:

High incidence countries. Similarities/difference s.
United kingdom. Epidemiology in comparison to other countries.
TB in Europe.

CRAN check failed

Issues downloading the data still cause issues for CRAN check

Options

Skip tests and examples that rely on data on CRAN. This is all tests and examples.
Provide a static copy of current data, checking the WHO permissions this appears to be allowed.

Get Passing good practise checks and apply badge.

Get passing good practise for FLOSS projects: https://bestpractices.coreinfrastructure.org/en/projects/2673

Option to convert a metric to annual percentage change

Add for all functions.
Functionality in prepare_df_plot.
Change label to indicate change.
Change axis scale to percentage where appropriate.

curl::curl_download Error

Data fails to download on Windows, as of the 22/01/18

Error is:

>tb_burden <- get_data(url = "https://extranet.who.int/tme/generateCSV.asp?ds=estimates",
+ save_name = "TB_burden",
+ save = TRUE, 
+ download_data = TRUE)
Downloading data from: https://extranet.who.int/tme/generateCSV.asp?ds=estimates
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 19471    0 19471    0     0  19471      0 --:--:--  0:00:01 --:--:-- 10046
100  272k    0  272k    0     0   136k      0 --:--:--  0:00:02 --:--:--  106k
Error in curl::curl_download(input, tt, mode = "wb", quiet = !showProgress) : 
  Failure when receiving data from the peer
Calls: get_data -> <Anonymous> -> <Anonymous> -> .Call
Execution halted

summarise_tb_burden errors with no confidence intervals when estimating rates

Describe the bug
summarise_tb_burden errors when no confidence intervals are used and the rate needs to be calculated.

To Reproduce

summarise_tb_burden(conf = NULL)

Expected behavior
The metric should be returned with no error.

Implement a new `metric_summary` function

The current TB report uses an internal function to summarise a given metric for the country of interest. This functionality should be expanded and added to the general package functionality.

Steps are:

Plot regional comparision

Make a regional comparison plot for any given metric. Leverage #34 to summarise data for each region. The final plot should look like these.

Plot options:

Facet, colour palette, axis scales etc....

Summarise statistic across region/world/user defined group of countries

For a given statistic provide a summary measure across multiple countries (this needs to be specified across regions, or be user-defined). The summary measure should include the appropriate uncertainty.

It also needs to be able to handle measures that have no uncertainty, and measure that should be calculated based on a weighting (i.e incidence rates)

Forthcoming release of ggplot2 and getTBinR

We are contacting you because you are the maintainer of getTBinR, which imports ggplot2 and uses vdiffr to manage visual test cases. The upcoming release of ggplot2 includes several improvements to plot rendering, including the ability to specify lineend and linejoin in geom_rect() and geom_tile(), and improved rendering of text. These improvements will result in subtle changes to your vdiffr dopplegangers when the new version is released.

Because vdiffr test cases do not run on CRAN by default, your CRAN checks will still pass. However, we suggest updating your visual test cases with the new version of ggplot2 as soon as possible to avoid confusion. You can install the development version of ggplot2 using remotes::install_github("tidyverse/ggplot2").

If you have any questions, let me know!

Performance improvements for shiny app

Make custom name override data set country name

Is your feature request related to a problem? Please describe.
When supplying a custom country name (i.e United Kingdom) the current default is to override this with the fuzzy matched country name in the data. This has two negative consequences.

User name is not respected. The default should be to do what the user wants.
Country names in the data can be very long and/or archaic. Supplying a custom name is a good way to override this.

Describe the solution you'd like
Switch naming priority with the user name respected.

Remove notes caused by using dplyr

Improve test coverage

Is your feature request related to a problem? Please describe.
Test coverage has degraded with recent package versions.

Describe the solution you'd like
Get coverage over 95% again.

Additional context
Main issues are the changes to summarise_tb_burden and any additional tests required by new functionality

Add "sum" statistic to `summarise_tb_burden`

Is your feature request related to a problem? Please describe.
When looking at the data it would be useful to be able to summarise some metrics (i.e notifications) with a sum.

Describe the solution you'd like
Add a stat = "sum" option to summarise_tb_burden and feed through to plot_tb_burden_summary

Explore using plotlyProxy

May offer speed ups.

Write unit tests to test functions

Bind in MDR TB data

Add search by partial definitions

Country level report

Parameterised markdown to produce a country level report

Fix failing tests

Tests are failing on both travis and appveyor - evaluate why this is happening and fix

Expand TB report

The current TB report is fairly basic and could be expanded.

Look at the WHO global report for inspiration: https://www.who.int/tb/publications/global_report/en/

Resolve dashboard plot clipping

In the current implementation of run_tb_dashboard box sizes are static and plotly plots clip these box limits. This is related to #54 and both issues need to be resolved at the same time.

Options are:

Fix current plotly plots to be correctly bounded in the current static implementation
Move to static plots
Move to another plotting package (i.e highcharter)

Note: Adding another package means code duplication.

Shiny dashboard for package

Map based showing current incidence rate. When a country is selected show summary plots for the country and offer a downloadable report option. For country reports include incidence rates in comparison to the region, incidence rates over time. Mortality overtime etc.

New case study using package.

Based on this blog post, make a case study as a vignette and add to the site.

Fix budget dataset

Describe the bug
The budget dataset only contains data for the following year but is joined to data that contains notifications up to the current year. This means that all budget data is excluded by default

To Reproduce

library(getTBinR)

tb <- get_tb_burden(additional_datasets = "Budget")

summary(tb)

Expected behavior
Need to either backdate the budget data or allow for future dates when joining

Add button to generate country level report to TB dashboard

For the selected country add an option to generate and download a country-specific report.

Additional data sources

Need list of possible additional data sources. These could include more detailed individual country level datasets, data on vaccination, data on treatment etc.

Move to using ggplot2 in package best practises

Based on this post the following best practises need to be implemented.

Don't fully import ggplot2, instead import only used functions
Switch to using vars(.data[[col]]) rather than aes_string.
Make an S3 plot and summary method for TB data (using autoplot).

Update to 2017.

Update all defaults and docs to 2017.
Make robust to future data updates
Update dev version with changes
Push to CRAN after tests
Blog post advertising updates and highlighting key changes year on year.

Incomplete LICENSE file

GPL-3 is an appropriate OSI-approved license for a JOSS submission. However, the bottom of the LICENSE file in this repo is unfilled (note the use of angled brackets near the bottom, e.g. , ).

Part of this JOSS review.

CRAN checks failed due to data downloads

Data downloads are failing on CRAN (https://cran.r-project.org/web/checks/check_results_getTBinR.html) but not locally on or on the test server.

This appears to be an issue with the WHO API. It maybe due to rate limits from multiple calls when CRAN is checking.

To attempt to resolve this add a while loop with a variable wait time. This may avoid rate limit issues and solve the problem. If not there may be another underlying issue.

Stop `search_data_dict` from returning an Error when nothing found

Is your feature request related to a problem? Please describe.
When search_data_dict finds nothing currently an error is returned. This is not ideal as it can cause downstream functions to fail if a variable has no entry in the data dictionary. The default should be to fail but return the base metric as the label in these cases.

Describe the solution you'd like
Change search_data_dict to return something other than an error when nothing is found.

Describe alternatives you've considered
Change all use cases to cope better when search_data_dict fails

Additional context
Reproduce with the following:

library(getTBinR)
search_data_dict('g4gt23q2')

Fix issue with assigning custom label to legend.

When adding custom label to legend there appears to be a failure when using non standard variable names. This may be resolved by changing the legend title rather than adding the label as a new variable name

Add documentation badge

Add documentation badge as seen here: https://github.com/ropensci/rdhs

Plot function to summarise multiple metrics for a single country

Facet over metrics
Default metrics in list, or optionally supply additional metrics
use plot_tb_burden as framework.
adapt prepare_df_plot to handle multiple metrics.
supply multiple countries for comparision

plot country to region/world/user selected countries

Plot a comparison of a metric in a given country to that in the region/world/user selected countries.

The function needs to accept a single country, logical for the region, list of countries, logical for the world.
Should then estimate the summary for each of these using #34
Bind together into a single tibble giving appropriate names (list of countries needs to have a user-settable label)
Plot metric stratifying by colour for country, region, world, list of countries.
Provide options for facetting, axis scaling, showing legend etc.

Improve dashboard performance

The current implementation of the run_tb_dashboard function is feature complete but does not run smoothly on low compute servers. This negatively impacts the user experience.

The majority of the bottleneck appears to be in the generation of plotly interactive plots. There are 3 options for resolving this.

Switch to non-interactive plots
Switch to base plotly interactive plots
Switch to another htmlwidgets package - like highcharter

The downside of all options bar 1 is that this would require using code not included in getTBinR or duplicating functionality that is already present.

Pediatric MDR-TB data

Hello Sam,

is there anyway to access data on MDR-TB stratified by age? I would be interested in visualizing MDR-TB cases only in children aged 0-14 years (as WHO routinely reports).

Thank you!

Best,
Matthias

Add generic map plot of data

[JOSS] Statement of Need

Minor issue:
This issue is related to openjournals/joss-reviews#1260

Please add a sentence to the readme that emphasizes the need of this package (e.g., there's no other package, or there's no straightforward way to get the data w/o getTBinR, ...).

seabbs / gettbinr Goto Github PK

gettbinr's Introduction

getTBinR: Access and Summarise World Health Organization Tuberculosis Data

Installation

Documentation

Testing

Quick start

Additional datasets

WHO-inspired themes and palettes.

Shiny dashboard

Country report

Contributing

Citing

Docker

gettbinr's People

Contributors

Stargazers

Watchers

Forkers

gettbinr's Issues

Recommend Projects

Recommend Topics

Recommend Org