Code Monkey home page Code Monkey logo

episoon's Introduction

EpiSoon

Lifecycle: experimental R build status Codecov test coverage MIT license GitHub contributors universe DOI

This package provides tooling to forecast the time-varying reproduction number and use this to forecast reported case counts via a branching process. It supports a range of time series modelling packages including bsts, forecast, and fable. It also supports ensembles via stackr and forecastHyrbid. Forecasts can be assessed by iteractively fitting and then using proper scoring rules (via scoringutils and scoringRules) to compare to both observed case counts and estimated reproduction numbers.

Installation

Install the stable development version of the package with:

install.packages("EpiSoon", repos = "https://epiforecasts.r-universe.dev")

Install the unstable development version of the package with (few users should need to do this):

remotes::install_github("epiforecasts/EpiSoon")

Quick start

  • Load packages (bsts and fable for models, ggplot2 for plotting, and cowplot for theming)
library(EpiSoon)
library(bsts)
library(fable)
library(future)
library(cowplot)
library(dplyr)
  • Set up example data (using EpiSoon::example_obs_rts and EpiSoon::example_obs_cases as starting data sets). When generating timeseries with EpiNow use get_timeseries to extract the required data.
obs_rts <- EpiSoon::example_obs_rts %>%
  dplyr::mutate(timeseries = "Region 1") %>%
  dplyr::bind_rows(EpiSoon::example_obs_rts %>%
    dplyr::mutate(timeseries = "Region 2"))

obs_cases <- EpiSoon::example_obs_cases %>%
  dplyr::mutate(timeseries = "Region 1") %>%
  dplyr::bind_rows(EpiSoon::example_obs_cases %>%
    dplyr::mutate(timeseries = "Region 2"))
  • Define the list of models to be compared.
models <- list(
  "AR 3" =
    function(...) {
      EpiSoon::bsts_model(
        model =
          function(ss, y) {
            bsts::AddAr(ss, y = y, lags = 3)
          }, ...
      )
    },
  "Semi-local linear trend" =
    function(...) {
      EpiSoon::bsts_model(
        model =
          function(ss, y) {
            bsts::AddSemilocalLinearTrend(ss, y = y)
          }, ...
      )
    },
  "ARIMA" =
    function(...) {
      EpiSoon::fable_model(model = fable::ARIMA(y ~ time), ...)
    }
)
  • Compare models across timeseries (change the future::plan to do this in parallel).
future::plan("sequential")

## Compare models
forecasts <- EpiSoon::compare_timeseries(obs_rts, obs_cases, models,
  horizon = 7, samples = 10,
  serial_interval = EpiSoon::example_serial_interval
)
#> Warning: There were 40 warnings in `dplyr::mutate()`.
#> The first warning was:
#> ℹ In argument: `eval = furrr::future_pmap(...)`.
#> Caused by warning:
#> ! Unknown or uninitialised column: `.distribution`.
#> ℹ Run `dplyr::last_dplyr_warnings()` to see the 39 remaining warnings.

forecasts
#> $forecast_rts
#> # A tibble: 511 × 12
#>    timeseries model foreca…¹ date       horizon median  mean     sd bottom lower
#>    <chr>      <chr> <chr>    <date>       <int>  <dbl> <dbl>  <dbl>  <dbl> <dbl>
#>  1 Region 1   AR 3  2020-03… 2020-03-05       1   2.26  2.25 0.0504   2.17  2.25
#>  2 Region 1   AR 3  2020-03… 2020-03-06       2   2.21  2.19 0.0729   2.09  2.19
#>  3 Region 1   AR 3  2020-03… 2020-03-07       3   2.15  2.12 0.0957   1.95  2.13
#>  4 Region 1   AR 3  2020-03… 2020-03-08       4   2.07  2.05 0.128    1.86  2.06
#>  5 Region 1   AR 3  2020-03… 2020-03-09       5   2.00  1.98 0.165    1.69  2.00
#>  6 Region 1   AR 3  2020-03… 2020-03-10       6   1.95  1.93 0.173    1.61  1.93
#>  7 Region 1   AR 3  2020-03… 2020-03-11       7   1.90  1.86 0.199    1.52  1.86
#>  8 Region 1   AR 3  2020-03… 2020-03-07       1   2.11  2.11 0.0389   2.02  2.10
#>  9 Region 1   AR 3  2020-03… 2020-03-08       2   2.04  2.03 0.0449   1.94  2.02
#> 10 Region 1   AR 3  2020-03… 2020-03-09       3   1.97  1.94 0.0713   1.80  1.96
#> # … with 501 more rows, 2 more variables: upper <dbl>, top <dbl>, and
#> #   abbreviated variable name ¹​forecast_date
#> 
#> $rt_scores
#> # A tibble: 399 × 12
#>    timeseries model forec…¹ date       horizon    mad  bias   dss   crps log_s…²
#>    <chr>      <chr> <chr>   <date>       <int>  <dbl> <dbl> <dbl>  <dbl>   <dbl>
#>  1 Region 1   AR 3  2020-0… 2020-03-05       1 0.0514  -0.2 -5.98 0.0110  -1.96 
#>  2 Region 1   AR 3  2020-0… 2020-03-06       2 0.0630   0.4 -5.34 0.0176  -1.55 
#>  3 Region 1   AR 3  2020-0… 2020-03-07       3 0.0879   0   -4.72 0.0232  -1.31 
#>  4 Region 1   AR 3  2020-0… 2020-03-08       4 0.129   -0.2 -4.01 0.0345  -0.961
#>  5 Region 1   AR 3  2020-0… 2020-03-09       5 0.165   -0.4 -3.44 0.0514  -0.764
#>  6 Region 1   AR 3  2020-0… 2020-03-10       6 0.170   -0.4 -3.25 0.0578  -0.713
#>  7 Region 1   AR 3  2020-0… 2020-03-11       7 0.186   -0.6 -2.67 0.0902  -0.483
#>  8 Region 1   AR 3  2020-0… 2020-03-07       1 0.0283  -0.8 -5.53 0.0207  -2.05 
#>  9 Region 1   AR 3  2020-0… 2020-03-08       2 0.0273  -1   -3.32 0.0506  -0.712
#> 10 Region 1   AR 3  2020-0… 2020-03-09       3 0.0398  -1   -2.13 0.0864  -0.162
#> # … with 389 more rows, 2 more variables: ae_median <dbl>, se_mean <dbl>, and
#> #   abbreviated variable names ¹​forecast_date, ²​log_score
#> 
#> $forecast_cases
#> # A tibble: 399 × 12
#>    timeseries model forecas…¹ date       horizon median  mean    sd bottom lower
#>    <chr>      <chr> <chr>     <date>       <int>  <dbl> <dbl> <dbl>  <dbl> <dbl>
#>  1 Region 1   AR 3  2020-03-… 2020-03-05       1   68.5  67.5  5.91     59    61
#>  2 Region 1   AR 3  2020-03-… 2020-03-06       2   81.5  80.5 11.8      61    72
#>  3 Region 1   AR 3  2020-03-… 2020-03-07       3   97    94.4 15.7      68    94
#>  4 Region 1   AR 3  2020-03-… 2020-03-08       4  114   116.  13.9     102   102
#>  5 Region 1   AR 3  2020-03-… 2020-03-09       5  139   131.  24.0      98   132
#>  6 Region 1   AR 3  2020-03-… 2020-03-10       6  150.  156.  23.9     117   141
#>  7 Region 1   AR 3  2020-03-… 2020-03-11       7  186.  182.  41.1     122   155
#>  8 Region 1   AR 3  2020-03-… 2020-03-07       1   94.5  92.7  8.30     80    94
#>  9 Region 1   AR 3  2020-03-… 2020-03-08       2  108   107.   6.34     95   104
#> 10 Region 1   AR 3  2020-03-… 2020-03-09       3  122.  121.  13.8      94   115
#> # … with 389 more rows, 2 more variables: upper <dbl>, top <dbl>, and
#> #   abbreviated variable name ¹​forecast_date
#> 
#> $case_scores
#> # A tibble: 399 × 12
#>    timeseries model sample forecast…¹ date       horizon   mad  bias   dss  crps
#>    <chr>      <chr> <chr>  <chr>      <date>       <int> <dbl> <dbl> <dbl> <dbl>
#>  1 Region 1   AR 3  1      2020-03-04 2020-03-05       1  6.67   0.4  4.09  2.75
#>  2 Region 1   AR 3  1      2020-03-04 2020-03-06       2 11.1    0.4  5.28  4.77
#>  3 Region 1   AR 3  1      2020-03-04 2020-03-07       3 13.3    0.4  5.58  6.3 
#>  4 Region 1   AR 3  1      2020-03-04 2020-03-08       4 14.1    0.6  6.30  6.85
#>  5 Region 1   AR 3  1      2020-03-04 2020-03-09       5 29.7    0.2  6.70 11.1 
#>  6 Region 1   AR 3  1      2020-03-04 2020-03-10       6 14.1    0.6  6.68  7.7 
#>  7 Region 1   AR 3  1      2020-03-04 2020-03-11       7 42.3    0.2  7.47 13.1 
#>  8 Region 1   AR 3  1      2020-03-06 2020-03-07       1  8.90   0.4  4.48  3.65
#>  9 Region 1   AR 3  1      2020-03-06 2020-03-08       2  5.19   0.6  4.23  3.68
#> 10 Region 1   AR 3  1      2020-03-06 2020-03-09       3 13.3    0.2  5.26  4.1 
#> # … with 389 more rows, 2 more variables: ae_median <dbl>, se_mean <dbl>, and
#> #   abbreviated variable name ¹​forecast_date
  • Plot an evaluation of Rt forecasts using iterative fitting.
EpiSoon::plot_forecast_evaluation(forecasts$forecast_rts, obs_rts, c(7)) +
  ggplot2::facet_grid(model ~ timeseries) +
  cowplot::panel_border()

  • Plot an evaluation of case forecasts using iterative fitting
EpiSoon::plot_forecast_evaluation(forecasts$forecast_cases, obs_cases, c(7)) +
  ggplot2::facet_grid(model ~ timeseries, scales = "free") +
  cowplot::panel_border()

  • Summarise the forecasts by model scored against observed cases
EpiSoon::summarise_scores(forecasts$case_scores)
#> # A tibble: 12 × 9
#>    score     model             bottom lower median    mean  upper    top      sd
#>    <chr>     <chr>              <dbl> <dbl>  <dbl>   <dbl>  <dbl>  <dbl>   <dbl>
#>  1 ae_median AR 3               0.85   9     24.5  5.69e+1   91   2.27e2 6.28e+1
#>  2 ae_median Semi-local linea…  0.613  8.5   24.2  5.11e+1   92.4 1.88e2 5.86e+1
#>  3 bias      AR 3              -1      0      0.6  4.47e-1    1   1   e0 6.13e-1
#>  4 bias      Semi-local linea… -1      0.2    0.6  4.81e-1    1   1   e0 5.90e-1
#>  5 crps      AR 3               2.94   6.57  17.8  4.69e+1   79.3 2.00e2 5.59e+1
#>  6 crps      Semi-local linea…  2.80   6.82  15.6  4.18e+1   69.5 1.60e2 5.12e+1
#>  7 dss       AR 3               4.53   6.09   8.33 1.29e+1   15.9 4.67e1 1.09e+1
#>  8 dss       Semi-local linea…  4.56   6.20   7.74 1.28e+1   12.7 5.07e1 1.35e+1
#>  9 mad       AR 3               7.93  13.3   21.5  2.61e+1   34.1 8.51e1 1.76e+1
#> 10 mad       Semi-local linea…  6.10  12.8   19.3  2.46e+1   31.1 6.58e1 1.70e+1
#> 11 se_mean   AR 3               1.15  81    784    7.29e+3 8855.  4.72e4 1.56e+4
#> 12 se_mean   Semi-local linea…  0.524 79.2  595.   6.26e+3 8663.  4.16e4 1.31e+4

Contributing

File an issue here if you have identified an issue with the package. Please note that due to operational constraints priority will be given to users informing government policy or offering methodological insights. We welcome all contributions, in particular those that improve the approach or the robustness of the code base.

episoon's People

Contributors

achateigner avatar andreamrau avatar jhellewell14 avatar maressyl avatar medewitt avatar nikosbosse avatar sbfnk avatar seabbs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

episoon's Issues

Code styling

The code style is currently not consistent. I suggest we run styler over it and then make sure to lint etc as needed to keep consistent.

Make `bsts` and `fable` suggests

The newly added brms support is a good template for how we should be looking to add modelling packages. It would be great to lighten the dependencies by doing the same for fable_model and bsts_model.

Wrapper for forecastHybrid

fable has been shown to have a memory leak and so is not suitable for large scale applications. forecastHybrid looks like a good drop-in replacement and appears to be a robust package with a good feature set. A potential issue is that it does not support sampling of the forecast out of the box and so an approximation is needed (potentially

A small example of a simple implementation:

library(forecastHybrid)


rts <- EpiSoon::example_obs_rts

y <- rts$rt

model <- forecastHybrid::hybridModel(y, weights = "cv.errors", 
                                     rolling = TRUE, cvHorizon = 7,
                                     horizonAverage = TRUE, 
                                     windowSize = 7)


preds <- forecast(model)


plot(preds)

S3 enhancement

Many of the functions may be better provided as S3 methods. It would be great to get a discussion going on whether we want to do this.

'horizon' in forecast_cases()

Hi,

As we are working on developing unit tests for the package functions (and considering I'm quite new to epidemiology), I could use some clarification regarding the behavior expected from forecast_cases().

I start here from the examples in forecast_cases() manual :

forecast <- forecast_rt(
	EpiSoon::example_obs_rts[1:10,],
	model = function(...) {
		EpiSoon::bsts_model(
			model = function(ss, y){
				bsts::AddAutoAr(ss, y=y, lags=10)
			},
			...
		)
	},
	horizon = 7,
	samples = 10
)

set.seed(42)
forecast_cases(
	EpiSoon::example_obs_cases,
	fit_samples = forecast,
	serial_interval = EpiSoon::example_serial_interval,
	horizon = 1
)

set.seed(42)
forecast_cases(
	EpiSoon::example_obs_cases,
	fit_samples = forecast,
	serial_interval = EpiSoon::example_serial_interval,
	horizon = 5
)

set.seed(42)
forecast_cases(
	EpiSoon::example_obs_cases,
	fit_samples = forecast,
	serial_interval = EpiSoon::example_serial_interval,
	horizon = 10
)

Isn't the output expected to be limited to rows with relevant "horizon" values ? How should forecast_cases() behave when horizon exceed the value provided to fit forecast ?

I am working with a version pulled from Github last week (8eddc07).

Best regards,
Sylvain

Validate against incidences

To do this we need to map back to reported cases and the design a measure of fit quality (could be current scores).

Visualising model scoring

The output of compare_timeseries is complex and hard to use. Sensible visualizations can make understanding the scores easier.

I have started (note this is a public-facing version of some work based on confidential data so may be slightly out of step) to think about how this can be visualised here: https://github.com/nikosbosse/covidevaluation/blob/master/R/evaluate_models.R

Wrapping some of these plots into package functions would be a great start to making our approach more easily adoptable by others.

example in `compare_models` returns an error

## Example data
obs_rts <- EpiSoon::example_obs_rts %>%
    dplyr::mutate(timeseries = "Region 1") %>%
    dplyr::bind_rows(EpiSoon::example_obs_rts %>%
    dplyr::mutate(timeseries = "Region 2"))

obs_cases <- EpiSoon::example_obs_cases %>%
    dplyr::mutate(timeseries = "Region 1") %>%
    dplyr::bind_rows(EpiSoon::example_obs_cases %>%
    dplyr::mutate(timeseries = "Region 2"))

## List of forecasting bsts models wrapped in functions.
models <- list("AR 3" =
                    function(...){EpiSoon::bsts_model(model =
                    function(ss, y){bsts::AddAr(ss, y = y, lags = 3)}, ...)},
               "Semi-local linear trend" =
               function(...) {EpiSoon::bsts_model(model =
                    function(ss, y){bsts::AddSemilocalLinearTrend(ss, y = y)}, ...)},
               "ARIMA" =
                    function(...){fable_model(model = fable::ARIMA(y ~ time), ...)})

returns

Error: Element 3 must be a vector, not NULL
Run `rlang::last_error()` to see where the error occurred

Improve the handling of uncertainty when using `forecastHybrid`

The forecastHybrid package can produce some robust cross-validation weighted ensembles. Unfortunately, it handles confidence intervals by taking an unweighted mean of all candidate models or using the max/min for the candidate models.

As we only use samples from models (except when samples = 1 and a point forecast is used) this behaviour is a problem. In the current wrapper forecastHybrid_model we take samples by assuming the forecast is normally distributed and reconstructing a mean and standard deviation from the upper and lower bounds. This means that we see no benefit of any kind from a weighted ensemble. In an ideal scenario, the underlying package would produce better samples for its forecasts but failing that a more elegant workaround would be ideal.

Speed up renewal

The renewal code can be sped up using a dot product vs its current implementation. This a core part of the simulation code and so would give a good speed up. However, if planning on using thee EpiNow2 simulation model then this change is not needed as this part of the code base will be depreciated I would imagine.

~ sum(cases$cases * draw_from_si_prob(

Wrapper for forecast

Whilst fable is the replacement for the forecast package it looks like much of the functionality has yet to be ported across. It would make sense to provide a wrapper for forecast.

horizon > 7

I was experimenting with higher horizon number like 30

In this cases I think the Rt factor when going close to 0 for the forecast is overestimating the decline

Maybe one can fine a more conservative forecasting method or do you think it can be that for any european country Rt goes much below 1?

JOSS review

As the codebase matures (thanks to all contributors so far!) it might make sense to start thinking about having it peer-reviewed (this is linked to #56 ). I have had great experiences with JOSS so this would be my first thought.

master -> main

To keep up with current practice. This should be done using the tooling in usethis.

score_model function dimension problem

The example:

observations <- data.frame(rt = 1:20,
                           date = as.Date("2020-01-01")
                                   + lubridate::days(1:20))

 ## Fit a model
 samples <- fit_model(observations[1:10],
                      model = function(ss, y){bsts::AddSemilocalLinearTrend(ss, y = y)},
                      horizon = 7, samples = 10)

 ## Score the model fit
 score_model(samples, observations)

doesn't work anymore if observations is used instead of observations[1:10]. In this case the function score_model throws an error

bsts AR 3 model produces different numbers of samples than other models


library(EpiSoon)
library(bsts)
library(fable)
library(future)
library(cowplot)
library(dplyr)

## define modles
models <- list("AR 3" =
                 function(...) {EpiSoon::bsts_model(model =
                                                      function(ss, y){bsts::AddAr(ss, y = y, lags = 3)}, ...)},
               "semilocal" =
                 function(...) {EpiSoon::bsts_model(model =
                                                      function(ss, y){bsts::AddSemilocalLinearTrend(ss, y = y)}, ...)},
               "ARIMA" = 
                 function(...){EpiSoon::fable_model(model = fable::ARIMA(y ~ time), ...)}, 
               "ETS" = 
                 function(...){EpiSoon::fable_model(model = fable::ETS(), ...)})

## get example observations
obs_rts <- EpiSoon::example_obs_rts %>%
  dplyr::mutate(timeseries = 1)
obs_cases <- EpiSoon::example_obs_cases %>%
  dplyr::mutate(timeseries = 1)

## make forecasts
forecasts <- EpiSoon::compare_timeseries(obs_rts, 
                                         obs_cases, 
                                         models,
                                         horizon = 7, samples = 10,
                                         serial_interval = EpiSoon::example_serial_interval, 
                                         return_raw = T)


## check number of obs
forecasts$raw_rt_forecast %>%
  dplyr::group_split(model) %>%
  purrr::map(nrow)

forecasts$raw_rt_forecast %>%
  dplyr::filter(model == 'ARIMA') %>%
  dplyr::select(date) %>%
  table()

forecasts$raw_rt_forecast %>%
  dplyr::filter(model == 'AR 3') %>%
  dplyr::select(date) %>%
  table()

Plot incidences

Can we use current plot functionality in a wrapper for plotting

Warning of potential issue in predict cases

When running compare_timeseries on a large grid I see this warning. Reprex needed for further debug.

Warning message:
In rdist(1, predictions[i, ]$infectiousness * predictions[i, ]$rt) :
NAs produced

Towards a CRAN realise

We have been developing this package very much for our own use case and tightly integrated with EpiNow (our Rt estimation tooling). It would be great to hear if others are interested in EpiSoon as a standalone tool and if so would a CRAN release be helpful.

GitHub actions and travis

We currently are using both - it would be great to standardise around one. I have not personally used GitHub's actions but hear it may be the new kid on the block.

Ideally, we would also automate documentation generation (i.e the pkgdown site) on new pull requests.

README update

As we move to finalise the package it makes sense to remove the warning and improve the documentation. A sensible starting point would be to add a first paragraph that contains the information required by the JOSS guidelines.

Infrastructure for comparing case only forecasts to case forecasts via Rt

This package focusses on providing case forecasts via first forecasting Rt values. An outstanding question is the value this adds over a simple case only forecast. It would be interesting to add tooling to enable case only forecasts to be easily compared to Rt based forecasts but this may require a complex rewrite. Opening this as a starting point for thinking about this.

Reduce dependencies + speed up code

In general, the package has quite a few dependencies it would be great to lighten this where possible. It would also be great to try and improve performance. For example compare_timeseries can be very slow to run when considering many samples + models.

forecastHybrid runs in parallel despite settings

When usingforecastHybrid more than one core is used despite parallelization being turned off explicitly. This means that large runs are slow as cores conflict. Identifying someway to internally turn this off or externally prevent more than one core being accessed would both be good solutions.

This could very easily just be user error rather than a bug!

Issue opened on the package repo (ellisp/forecastHybrid#92)

Error for compare_timeseries: Element 1 must be a vector, not NULL

obs_rts_aus <- readRDS("obs_rts_aus.rds")
obs_cases_aus <- readRDS("obs_cases_aus.rds")
models <- list("AR 1" = function(ss, y){bsts::AddAr(ss, y = y, lags = 1)})

forecast_eval <- compare_timeseries(obs_rts = obs_rts_aus,
obs_cases = obs_cases_aus,
serial_interval = EpiNow::covid_serial_intervals,
models = models,
horizon = 7, samples = 20)

Make compare_timeseries be more efficient

For few regions, it makes sense to run in parallel over both regions and models. To do this need to strip out the reliance on compare_models and roll a new version that makes a data frame/list of all possible combinations and then maps over them.

It also makes sense to make compare_models potentially be parallel when used in standalone mode as well as evaluate_model potentially with a pass-through.

Improve scoring

  • Add integer and continuous scores
  • Add model calibration check
  • Score within samples

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.