reichlab / covid19-forecast-hub Goto Github PK

Projections of COVID-19, in standardized format

Home Page: https://covid19forecasthub.org

License: Other

R 2.40% Shell 0.21% JavaScript 3.08% Python 1.97% HTML 9.29% Vue 0.66% CSS 0.74% TypeScript 0.41% Dockerfile 0.01% Jupyter Notebook 81.10% SCSS 0.12% Makefile 0.02%

covid19 forecasts covid-19 forecast-data covid-data github-pages visualization analytics

covid19-forecast-hub's People

Contributors

Stargazers

Watchers

Forkers

youyanggu tomcm39 abrennen yuorme titanache adogan ut-covid jbracher mattk7 tkcy sangeetabhatia03 favoriteprojects annaliesewieler spencerwoody confunguido majohansson mobs-lab statsccpr jinghuichen e3bo greenpdx lacastro auquan xinyuexiong cobeylab eclee25 jlessler superjohn scottglennscott varshaskrish wanchuangzhu adrianxdev zhang-xinyu ianowilliamson matttriano lsturtew predsci erdc-cv19 dthboyd hbiegel payoto zanzan666 yannael rnatara fengxu-pku ruihanwei hanneehan onylab hannanabdul55 alecgiusti ssemyonov-quantori yueyingwang ardenebaxter yumouqiu tsnyder701 kfatyas odalgic sthorstman confidant575 sccoot pedromcruz solarsys hankw507 ryanruff han-tun jocelinelega ffy99 scavany nrsander jpratt1011 josh-wilde gavin-k-lee frostxtj qjhong robertwalraven rbarraud bcua alexrobwong scorsett gentles36 eycramer nikosbosse kathsherratt russwolfinger vethno cnjelita taosunvoyage nssac jsharpna yupengyanghuhu michaellli pypm mingyuanzhou rodrigogonzalez nisargvp valeman kingsleyred zyt9lsb elray1 deankarlen

covid19-forecast-hub's Issues

2020-04-27-MOBS_NEU-GLEAM-COVID-19_v1.csv has no forecast_date or target_end_date

This newly added file does not have required fields: forecast_date and target_end_date.

build first draft local d3 visualization

update target list

what are the next-phase targets that we want to include? likely we should phase these in slowly, to reduce strain on creating checks, visualizations, ensembles, for new targets. candidates are:

incident hospitalization demand by week/day?
ICU bed demand by week/day?
...

specify scoring rules and ground-truth data

specify preferred quantiles
specify what scoring will be done
specify when death data will be retrieved for scoring

write script to convert LANL data

Move 2020-04-27 MOBS file to data-raw/ from data-processed

This is related to #66 and #67. I believe

https://github.com/reichlab/covid19-forecast-hub/blob/master/data-processed/MOBS_NEU-GLEAM_COVID/2020-04-27-MOBS_NEU-GLEAM-COVID-19_v1.csv

should be in

https://github.com/reichlab/covid19-forecast-hub/tree/master/data-raw

make LANL forecasts compatible with new timezero structure

Write plausibility checks

Write a script that does some plausibility checks for cleaned data, eg:

no quantile crossing
quantiles for cumulative deaths greater or equal than those for incident
quantiles for cumulative deaths non-decreasing over time
cumulative week-ahead and corresponding day-ahead forecasts coincide
Maybe related to #13 ?

Separate forecasts from truth

I suggest we reorganize the data so that forecasts are separate from truth, e.g.

data-raw/forecasts
data-raw/truth
data-processed/forecasts
data-processed/truth

The subdirectory structure within the forecasts/ subdirectories would be the same as it is now.

Also, perhaps we should include nytimes "gold-standard" data in addition to the JHU data.

Move processing scripts to data-raw/ folders

Currently most of the code is in the code/ directory and recently organized into subdirectories. As a general principle, I suggest we move code closer to the data it is used on. For example, I suggest we move raw data processing scripts to the data-raw/ folder.

The code/ directory could still be used for functions (rather than scripts) that are used in multiple scripts.

not all day-ahead targets are showing up in shiny app

@jarad I see up to 41 day ahead targets in the recent 2020-04-26 CU files, but only max_n=9 for the day_ahead targets in the app.

add instructions for running the visualization locally to the wiki

put in place data format checks from Travis

add a 1-week ahead forecast for Imperial.

update lanl processed file column names

Geneva processing file uses "days" rather than "day"

For consistency,

covid19-forecast-hub/code/process_geneva_file.R

Line 49 in d3e77ad

times = c("day ahead inc death", "days ahead cum death"))

should use "day" rather than "days"

Pull request incoming.

write code for ensemble

add dates, targets, locations

mismatch in shiny app n_states variable

showing JHU IDD has 50 states but it actually has 0?

adapt process_lanl_file() to incorporate incident death data in processed files.

migrate to a clearer structure for what forecasts are made when

There are two competing priorities here:
(1) record all (or nearly all - do we really want to store every update, even if daily?) forecasts made by teams, as they make them [useful for "tracker"-like sites that want all versions and real-time updates]
(2) record forecasts made by teams that are available at a specific time, and use them to build an ensemble. realistically, for the foreseeable future we might just want to update the ensemble once a week. [useful for our standardizing our ensemble]

Here is one proposal for how to do this:

we have the data-processed directory contain all (or nearly all) forecasts from each team. no restrictions on when these forecasts are submitted.
each file is marked with the date the forecast was made. This would change a bit our restriction right now that these YYYY-MM-DD's only refer to Mondays. I'm going to refer to this date in the filename as fcast_date below.
we set really clear guidelines for when "1 wk ahead" means epiweek(fcast_date) and when it means epiweek(fcast_date)+1. for example, we say that if weekday(fcast_date) is Thursday, Friday or Saturday, then "1 wk ahead" means epiweek(fcast_date)+1 and otherwise epiweek(fcast_date). (I don't feel that strongly about where the threshold is for switching over. Could be Tuesday, could be Thursday.)
to reinforce this and avoid inadvertent errors in assignment of targets to days/weeks, we could also accept a new column name in the files that would be end_date, so files submitted with fcast_date of 2020-04-23 (thursday of EW 17) or 2020-04-27 (Monday of EW 18) would both have a "1 wk ahead" forecast with end_date of 2020-05-02 (Saturday of EW 18).
on Mondays at a fixed time (6pm ET?) we run an ensemble script that finds all available forecasts from a team made since the preceding Thursday (i.e. 4 days prior) and takes the most recent forecast to include in the ensemble.

Correct 2020-04-26-UMass-MechBayes.csv

This file includes rownames in the first column which is non-standard.

Also, the file does not include target_end_date.

Remove Imperial ensemble forecast files from data-raw/ folder

All team forecasts should be in a subdirectory of data-raw/, but these files

https://github.com/reichlab/covid19-forecast-hub/blob/master/data-raw/2020-04-19-Imperial-ensemble1.csv
https://github.com/reichlab/covid19-forecast-hub/blob/master/data-raw/2020-04-19-Imperial-ensemble2.csv

are directly in the data-raw/ folder.

I would create a pull request, but these files have differences to the files in the data-raw/Imperial subdirectory, so I'm not sure which versions should be preserved.

Add Metadata to the YYG-ParamSearch Model

@youyanggu could you add a metadata file to the YYG-ParamSearch Model?

See https://github.com/reichlab/covid19-forecast-hub/blob/master/data-processed/UMass-ExpertCrowd/metadata-UMass-ExpertCrowd.txt for an example. We need the metadata in order to visualize the model. Thanks!

write script to convert CU data

add global files to LANL automatic download

As of 2020-04-26 there are global files, with country-level forecasts from LANL, e.g.
https://covid-19.bsvgateway.org/forecast/global/files/2020-04-26/confirmed/2020-04-26_confirmed_quantiles_global_website.csv

These files should be included in the raw data download script for LANL.

Cumulative death predictions below current levels

Georgia, Indiana, Alabama, Arkansas, and Iowa to name a few examples

Add point estimate to UMass ExpertCroud forecast

@tomcm39

add metadata for models

update IHME target labels

make Imperial cumulative projections

update Imperial/MOBS/LANL target labels

location_name isn't consistent when referring to US in processed data

In processed data, the location is "US", but the location_name can be "US", "United States", or . Specifically, it is "United States" in UTexas data and in Imperial data.

Run CU processing files as soon as undamaged April 12 orecasts are available

I already pushed my code, but it should only be run & results committed once the April 12 forecasts are available. Currently I use the April 9 forecasts to test the code, but these are not actually going to be used.

write script to convert IHME data

make IHME forecasts compatible with new timezero structure

determine feasibility of including additional targets in visualization

Could we use the drop-down menu that we have previously used for "season" to toggle between different forecast targets, e.g. incident deaths, cumulative deaths, hospitalizations, etc...? how much customization would this take?

Summarize Zoltpy validation checks in Wiki

Instructions on how to check file locally
Summary of what checks are currently in place

add additional validations

Some additional validations

ensure that we are checking for all required column names as required by the repo (right now we are requiring forecast_date and target_end_date which are not part of Zoltar) can we require these?
are we validating the FIPS locations based on the specific set of valid numbers, or just any string of a number between 01 and 95? I would prefer the former, so we are doing it specifically for accepted FIPS.
can we institute a more complex check to ensure that people are aligning forecast_date and target_end_date correctly? I will explain more below.
Require point estimates (exactly one point estimate per location/target tuple) - we know from Katie's code that the forecast_date column is the same for the entire file (based on filename)
update https://github.com/reichlab/covid19-forecast-hub/wiki/Validation-Checks

Standardize processed data filenames

In addition to the missing fields in #66, the newest MOBS processed data file has a filename that is non-standard (for me) and is causing me issues with reading processed data for the shiny data-processing app.

Although I could update the data reading script, I think the real issue is that we don't seem to have a standardized filename processed data. I was assuming "-" was a reserved character such that the files are named

YYYY-MM-DD-team-model.csv

Can we set this as a standard?

Edit point estimate format in UMass-ExpertCrowd

@tomcm39

https://github.com/reichlab/covid19-forecast-hub/blob/master/data-processed/UMass-ExpertCrowd/2020-04-13-UMass-ExpertCrowd.csv

It needs point estimates to be in this format:

2020-04-12,1 day ahead cum death,2020-04-13,31,Nebraska,point,NA,6