Code Monkey home page Code Monkey logo

covid-19-data's Introduction

china-dxy-download gdph-daily world-cases-wiki-scrape google-mobility intervention-data-update us-cases-wiki-scrape china-dxy-update github-update

Repository that stores datasets used in different COVID CEID projects.

The datasets in the repository were compiled by members of the CEID COVID-19 working group. The data at the top level of the repository have been formated to be used 'as-is' and are updated often. Data sets were either created from html scraping or manually entered. In the case of the automated web scraping, the raw data and scripts are organized into sub-directories. The description of each data file along with the corresponding sub-directories are listed below. The metadata is in the readme of the same directory as the data.

China data

China_casedata
China_TA: Travel advisories or restrictions within China.
Hubei_Evacuation_Repatriation: Reports of evacuations from Hubei province.

US data

US_wikipedia_cases_fatalities/UScases_by_state_wikipedia.csv: Number of new cases in a state by day.
US_wikipedia_cases_fatalities/USfatalities_by_state_wikipedia.csv: Number of new case fatalities in a state by day.

us-state-intervention-data/stateInterventionTimeSeries.csv: Reshaped version of longFormStateInterventions.csv that includes the intervention status of all US states on each date since the beginning of the outbreak. Updated daily.
us-state-intervention-data/longFormStateInterventions.csv: Running summary of interventions at the state level taken from reports and wikipedia

COVID-19-ILI-forecasting/us_cases_data_weekly.csv: weekly counts of newly reported cases in each MMWR week by state and territory (and NYC) in the United States

us_nursing_homes_HIFLD.csv: locations and some metadata (number of residents, beds, etc.) for nursing homes in all 50 states

us-airports.csv: list and location of airports in the US

us-early-linelist/US_early_linelist.xlsx: Individual infection histories of US COVID19 cases from early in the outbreak that were manually collected from media reports.

Georgia Data

ga-county-intervention-data/countyInterventionTimeSeries.csv: Reshaped version of longFormStateInterventions.csv that includes the intervention status of all US states on each date since the beginning of the outbreak. Updated daily.
ga-county-intervention-data/longFormCountyInterventions.csv: Running summary of interventions at the state level taken from reports and wikipedia

ga_DGPH_daily_status_report: daily cases, fatatliies, and tests in the state of Gerogia. Also includes demography about cases. GA-DPH-CanvasJS-data-cases-deaths.csv is an alternate scrape of this data.

georgia_icu_beds.csv: number of icu beds by county for Georgia

Global

global_cases_by_country/worldCases.csv: Number of new cases in a country by day.
global_cases_by_country/worldFatalities.csv: Number of new fatalities in a country by day.

International_TA: Travel advisories announced by county.
Global Health Security Index (GHSI): Index of epidemic preparednessa and underlying data
Epidemiological characteristics of COVID-19 and other zoonotics)
Canada COVID-19 Case Data: Includes cases, fatalities, recovered, and tested for Canada
global_exposure_locations.csv: information on where cases were exposed to the virus
global_first_case.csv: First case for every ADM1 globally
Global Google Mobility Report: Daily mobility data by country for the globe beginning in Feb 2020. Disaggregated by types of places visitied.


How to add new data?

Please follow the data protocols outlines here. Use the template below to add metadata about your dataset to its subdirectory, and add the name of the dataset to this master README.

Data_name

Here is some discription of how the data is collected, when it is normally updated, etc.

Metadata:

Source: Link to data source if pulled from single website (ie. Wikipedia, etc.)

Related subdirectory and/or files

Projects List/Link related projects

License

See License.txt

Contact John Drake ([email protected]) for questions.

covid-19-data's People

Contributors

mvevans89 avatar rlrichards avatar arw36 avatar tierney6 avatar renikaul avatar e3bo avatar allopole avatar grighi avatar lsalvador avatar

Stargazers

Guppy avatar Heaven avatar Joy Vaz avatar  avatar

Watchers

James Cloos avatar Andrew Tredennick avatar  avatar John M. Drake avatar  avatar  avatar Ana Isabel Bento avatar Andrew M. Kramer avatar  avatar  avatar Paige Miller avatar  avatar  avatar  avatar Chao Song avatar

covid-19-data's Issues

lookup table of geographic IDs

Create a table that has ID codes and names for different geographic regions so that names are standardized across datasets.

I propose using ISO3 codes for country level (which corresponds with GADM ADM0 GID) and then the GADM GID for lower levels.

To start, we'll get ADM0 for all countries, and up to ADM2 for US, Canada, and China. I hope to update this to eventually be global.

ga-county-intervention-data directory

The filenames of these do not match conventions. These will be fixed in tonight's pull. But could mess folks up downstream if they are using these data. I don't know who is though so. They can probably figure it out!

GA_daily_status_report_GDPH.csv - Test data

We started collecting test data today. If we're collecting test data we might as well also collect the number of positive tests. Tests data will be much more informative if we also have positive tests

create github action for china-dxy script

Get a github action that works for the china-dxy scraping script. I had something running but then a merge issue popped up that may have broken it, so I'm just creating this issue so we can keep better track of it.

Georgia case data

Can you add data from 3/21 at 7pm? Currently cases recorded between noon and 7pm are assigned to the next day. Thanks

GA_daily_status_report_GDPH.csv

Both deaths and case data for 3/24/20 at 7pm do not agree with the website as of 11:55pm. Particularly, deaths is obviously erroneous. Please correct.

State Interventions Metadata

Please add some meta data to the read me. There is an empty template at the bottom of the readme that you can copy and fill in.

US wiki autoupdate failing.

us-wiki autoupdate : Mon Apr 20 10:04:41 UTC 2020 failed to return well formed data.

Check if wiki table structure has changed, update script, and run manually.

Repo setup

  • set up repo according to R datascience guidelines
  • Add john's uga email as contact in README (this is for public contacts)
  • make repo public

read_GDPH_daily_status_report.R - problem with data scraping

At the moment the code to scrape data from GA public health department is not working. The format of the webpage (https://dph.georgia.gov/covid-19-daily-status-report) has changed, and I can't seem to figure out how to extract the html. I tried to update the xpaths and Robbie has also tried to play with the css, but no success so far. The html page is really simple, so it is puzzling that we are not able to do it. I will continue to update the data manually, but if you are experienced on web scrapping any help would be greatly appreciated @arw36 @mvevans89 @allopole @rlrichards

Missing Decatur County?

This is so great! I think y'all are missing Decatur County, GA (at least in the time series), though.

GA DPH pulls are broken as of 2020-06-02

It looks like the previous links that we were using to download data are no longer working. There is now a way to download CSV's of the data, but it doesn't not include the number of tests. I've got something working so far for everything but the test data

Data License

  • Add the least restrictive license compatible with source datasets and R

Might have overwritten with most recent commit

I pressed the wrong hotkey when I was merging the master updates into my local version. I think I might have overwritten changes to the repo. I suggest reverting to commit 70e5c08 by Liliana and then I can re-push the data I have locally. I would do this myself but 1) I've never actually reverted to a previous commit before (weird) and 2) it seems like that's an administrator function that random contributors shouldn't be doing.

Sorry!!!!!!

Tracking age breakdowns per state

I saw an issue filed about this on the CovidTracking project -- this is worth doing, but would take settling on a set of buckets to map all data into, and building a per-state model that interpolates values for those buckets from the values given [or, for states w/ line lists, perhaps recalculating precisely for the shared buckets]

Then one could ask the outlier states to standardize the buckets they use in their own reporting.

Ping me if there is still interest in this / if you've found a source for this rebucketed data.

read_UScases_wikipedia.R pulling incorrect table structure

As of this morning, script is now retuning a table with some data misaligned. I'm having the same issue with different code accessing the same wikipedia table. something might have changed with the wiki table structure that's causing the rvest/xml2 code to return wrong results.

create auto github action for us cases wiki

Create a github action that will run the script US/US_wikipedia_cases_fatalities/read_UScases_wikipedia.R. This needs to run twice a day, morning and night (specific times not important)

State fatalities

Do we have a dataset for state fatalities? If not, can we scrape this from wikipedia too?

GA data

Is it possible to also include daily new cases and deaths in the GA data spreadsheet as columns?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.