Reproduction number Estimation Accounting for Lead time
A minimally parameterized algorithm for estimating on average how many cases stem from an existing case in a viral pandemic, for each date during the pandemic. Rather than fitting an exponential curve to the entire dataset, a weighted average of later-dated cases is used to derive an instantaneous reproduction number. This weighting is empiricially estimated from the virus biology assuming that most cases are unobserved and asymptomatic (probably true for COVID-19 in many jurisdictions).
This script generates very similar Rt numbers to the R EpiEstim package when the Serial Interval function is parameterized to be similar to the viral_shedding_proportions
in this script. The main difference is that this script produces confidence intervals for Rt in a different manner, checking for Gaussian or exponential distribution of the Rt estimate in a rolling window. It also normalizes the data for frequent changes in the number of tests performed per date. This aligns more closely with observed SARS-CoV-2 testing results in a jurisdiction with high testing rates. It also mitigates potential MCMC misestimation of the R value when there are multiple staggered drops in testing, such as around Christmas and New Year's days.
This script also greatly simplifies the process of plotting multiple Rt estimates (e.g. different "zones") together for comparison, provides sanity checks (e.g. no days with more cases than tests) and tries to gracefully handle missing dates. Breakdown of cases by age group is also displayed, normalized for the age group demographics provided (Alberta age population structure stats are from the 2016 census, which is a bit out of date).
Test datasets for graphing are provided based on data scrapped from the Alberta Government COVID-19 data explorer.