Code Monkey home page Code Monkey logo

covid-19's Introduction

COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University

On March 10, 2023, the Johns Hopkins Coronavirus Resource Center ceased its collecting and reporting of global COVID-19 data. For updated cases, deaths, and vaccine data please visit the following sources:

For more information, visit the Johns Hopkins Coronavirus Resource Center.


This is the data repository for the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). Also, Supported by ESRI Living Atlas Team and the Johns Hopkins University Applied Physics Lab (JHU APL).

Visual Dashboard (desktop): https://www.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6

Visual Dashboard (mobile): http://www.arcgis.com/apps/opsdashboard/index.html#/85320e2ea5424dfaaa75ae62e5c06e61

Please cite our Lancet Article for any use of this data in a publication: An interactive web-based dashboard to track COVID-19 in real time

The Johns Hopkins University Center for Systems Science and Engineering COVID-19 Dashboard: data collection process, challenges faced, and lessons learned

Provided by Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE): https://systems.jhu.edu/

DONATE to the CSSE dashboard team: https://engineering.jhu.edu/covid-19/support-the-csse-covid-19-dashboard-team/

DATA SOURCES: This list includes a complete list of all sources ever used in the data set, since January 21, 2020. Some sources listed here (e.g. ECDC, US CDC, BNO News) are not currently relied upon as a source of data.

Embed our dashboard into your webpage:

<style>.embed-container {position: relative; padding-bottom: 80%; height: 0; max-width: 100%;} .embed-container iframe, .embed-container object, .embed-container iframe{position: absolute; top: 0; left: 0; width: 100%; height: 100%;} small{position: absolute; z-index: 40; bottom: 0; margin-bottom: -15px;}</style><div class="embed-container"><iframe width="500" height="400" frameborder="0" scrolling="no" marginheight="0" marginwidth="0" title="COVID-19" src="https://www.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6"></iframe></div>

Acknowledgements: We are grateful to the following organizations for supporting our Center’s COVID-19 mapping and modeling efforts: Financial Support: Johns Hopkins University, National Science Foundation (NSF), Bloomberg Philanthropies, Stavros Niarchos Foundation; Resource support: AWS, Slack, Github; Technical support: Johns Hopkins Applied Physics Lab (APL), Esri Living Atlas team

Additional Information about the Visual Dashboard: https://systems.jhu.edu/research/public-health/ncov/

Contact Us:

Terms of Use:

  1. This data set is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) by the Johns Hopkins University on behalf of its Center for Systems Science in Engineering. Copyright Johns Hopkins University 2020.

  2. Attribute the data as the "COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University" or "JHU CSSE COVID-19 Data" for short, and the url: https://github.com/CSSEGISandData/COVID-19.

  3. For publications that use the data, please cite the following publication: "Dong E, Du H, Gardner L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Inf Dis. 20(5):533-534. doi: 10.1016/S1473-3099(20)30120-1"

covid-19's People

Contributors

akatz2 avatar arthurzhang434 avatar cssegisanddata avatar enshengdong avatar hongru94 avatar tamara-goyea avatar yuhang065 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

covid-19's Issues

Data error in 02-13-2020_2115.csv

See below screenshot, for Hubei, the number for Deaths and Recovered are wrong.

Screenshot_2020-02-14 CSSEGISandData COVID-19

See below for screenshot from qq.com at 6:51AM EDT 02-14-2020.

Screenshot_2020-02-14 实时更新:新冠肺炎疫情最新动态

See below for screenshot from DXY at 6:56AM EDT 02-14-2020.

Screenshot_2020-02-14 全国新冠肺炎疫情实时动态 - 丁香园·丁香医生

Date format issue

when pull to google sheet the date column display as 37276.4166666667 for the first entry of Feb2020?

Time series data is gone?

The below path is empty

https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series

The below file is clearly broken with rows where province='Confirmed'

https://github.com/CSSEGISandData/COVID-19/blob/master/who_covid_19_situation_reports/who_covid_19_sit_rep_time_series/who_covid_19_sit_rep_time_series.csv

What's going on? What's supposed to be the definitive data source for this now?

Also it would be super nice if you folk didn't randomly make breaking changes with 0 warning. There is a lot of analysis and information tooling downstream of your dataset now. If you need assistance in data management/stewardship open an issue and I or any of the other dozens of SMEs active on this repo can assist with that.

Small problem: calculating death rates

It's tempting to try and calculate how deadly a virus is by just dividing the number of people who have died by the number of people who have died plus the number who have recovered, but there's a problem with the number you get: it's too high.

The calculation needs to compare people who were infected at the same time, because it takes longer to recover than it does to die.

Eventually the simple calculation and the complicated calculation will converge, but we're not there yet. :-/

Table Column has changed

By Today Feb-23 (and also yesterday) I realised that tables has change columns headers. I am not sure but I realised also that the class name for table header is not correspond with coulum values
Screenshot from 2020-02-12 23-12-49
Screenshot from 2020-02-13 00-52-56

Also I have an open repository here - https://github.com/igoralves1/scrap2019-ncov

I am using puppeteer for async scrapping. I just updated today with the latest changes.
Screenshot from 2020-02-13 00-55-27

add data dictionary to README.md

What does “confirmed” mean? Recently there were discussions on defining confirmed as “tested positive/exhibit symptoms” as opposed to “tested positive/asymptomatic”. Are we looking at daily or cumulative readings? The latter maybe obvious but it would be nice to add a full data description to the repo.

Which time series are correct now? Total figures from earlier days are different

Which daily dates are still correct at all?

Referred to "confirmed" in the "time series".

At Google Spreadsheet and the last days here at Github the data was usually entered twice a day. Mostly in the morning and late in the evening. To be seen in the first line with the date and the given time.

Since you've changed it now (1 time a sum for one day) here at GitHub; how can it be that the daily sum is less compared to the old data (Spreadsheet, GitHub) for the day instead of more?

For example, February 9th. I was comparing saved records. These include the last (February 11th) from Google Spreadsheet and also February 11th from GitHub, as well as February 12th GitHub, yesterday and today GitHub.

If I compare all of them, except for today, the result is the same total sum for the day. 40,536 confirmed. I took the last entry for this on February 9th at 23:20.

If I look at today's data that is available in GitHub and take February 9th, I get only 40,151 confirmed for the calculation.

As you could see, the other data sets had the time at 23.20 on the second measurement on February 9th. So 20 minutes before midnight.

How can it be that the new data sets, with the once existing daily statistics, are now negative? The 20 minute difference for a new day is not long, but the few confirmed at the end for February 9th 385 are more than before is quite strange.

Differences (even big ones) also occur on all other days when I compare them with the data available in GitHub today to previous data sets.

So what is true?

Unfortunately, visualizations from other users who relied on the last measurement data of that day are no longer correct. If a last measurement was many hours before the end of the day, a new one-day statistic can't show any minus differences for that day?

github_11te
github_new_now
google_11te

China vs WHO confirmed cases

Will WHO statistics be used for your dataset versus those published from Chinese sources?

China changed its confirmed case statistic to include clinically diagnosed cases (cat scan of patients lungs to see if they're infected) in addition to laboratory confirmed cases. The WHO doesn't agree with this change and are publishing statistics based on laboratory confirmed cases.

Thanks

Github CSV to Google Sheets

I have a pretty intensive set of calculations I have already setup on my old google sheets file. Is there a way to continue to update my google sheets with the new file released twice a day from this repository?

add deaths in graphycs

Hi,

could you add deaths by day in graphycs.

for better preview " deaths / recovers / infected"

thanks

Alternative languages

Is there any plans on supporting multiple languages? Especially chinese and japanese?

How are transferred individuals counted?

There are special flights out of Wuhan and Japan will let some individuals off of the cruise ship. So how are those transferred individuals counted in the data?

Let's say a cruise ship passenger has the virus and is allowed off the ship, but they die after. Is that a cruise ship death or a Japan death?

Thailand data issue

From files "01-28-2020_1300.csv" to "01-31-2020_1400.csv", Thailand is listed as having 5 "Recovered" cases.

Then in file "02-01-2020_1000.csv", Thailand is listed as having 7 "Recovered" cases, with an update time of "2/1/20 10:00".

Then from files "02-01-2020_1800.csv" to "02-01-07-2020_20204.csv", Thailand is again listed as having 5 "Recovered" cases.

Finally, in file "02-08-2020_1024.csv", Thailand is listed as having 10 "Recovered" cases, with an update time of "2/8/20 12:53".

It appears that either a) file "02-01-2020_1000.csv" should be edited to 5 "Recovered" cases, or b) files "02-01-2020_1800.csv" to "02-01-07-2020_20204.csv" should be edited to 7 "Recovered" cases.

Thank you.

Thank you so much for providing us with this data.

I don't see what the issue was with google sheets though. I'm sure anyone who used this data could convert import export it in any way they wanted.

Every time you choose to stop one method of delivery and select to chose a new one brings a new challenge, that is very good for learning new skills. I've never imported and integrity checked csv files from github into google sheets before.

I would appreciate though, for the future, to either keep supporting the previously offered delivery formats or sticking with the current one. Maybe start a reddit or other "user support" page where people who are unable to get what they need can ask others how they can achieve what they want?

If they wanted to be able to download a CSV from your published sheet it was as easy as creating an empty google docs sheet file, importing your 3 tabs to it then saving it as a csv file from there.
Someone would have taken the time to explain them how to do it, I'm sure.

Again, thank you so much for taking the time to centralize all this data and keeping it up to date for us.

Friendly regards.
A

Contents of csv files are overwritten. Why?

How can it be that in, for example, the last data (daily_case_updates) of 12 February 2020, 22.00 (02-12-2020_2200.csv), there are update data with the date 13 February 2020?

Then the data is no longer correct? It's like when I name a file January 10th, but keep updating the contents. Then nothing at all matches anymore.

Bildschirmfoto vom 2020-02-14 01-54-28

currently sick

is displayed 1) "Total Confirmed" 2) "total Deaths" 3) "total Recovered"

is it possible to show under "Total confirmed" (1) the number how many people are under treatment are ?

(for example "Actual cases" would be Total Confirmed -(minus) Total Deaths - Total Recovered )

?

please

Problem with Singapore data

Singapore data of confirm cases the first few days from 23 Jan to 30 Jan were changed to 0. From 31 Jan it was correct at 13 cases. Can you change this? Thanks! Previous days the file was ok. Only problem from yesterday or today.

Please release data twice a day

I read through the new README and csv file for 02-14 and I have some concerns: under the new plan, you make release before 8AM Beijing time and most provinces in China have not updated their data yet; the next release will not come until 24 hours later. (China normally release data between 8-10am.) Please make another daily release at around 4am UTC if possible. Thanks.

Yokohama Lng

Is the Diamond Cruise Lng right? I think it should be 139.638 (Yokohama) instead of 129.638.

Missing data in `daily updates`

Comparing the Google sheets and the data in the daily update directory of this repository, quite a few datasets appear to be missing. In particular, all of these:

Screen Shot 2020-02-08 at 11 31 28

Was this data faulty or is this simply a synchronization error?

Please add README for new files

Looks like
https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_daily_reports/02-13-2020.csv
is the same as
https://github.com/CSSEGISandData/COVID-19/blob/master/archived_data/daily_case_updates/02-13-2020_1000.csv
with some minor update, e.g., add 'T' between date and time in 'Last Update' column, so T means UTC?

See below screenshots:

Screenshot_2020-02-14 CSSEGISandData COVID-19(1)
Screenshot_2020-02-14 CSSEGISandData COVID-19(2)

More importantly, why did you choose 02-13-2020_1000.csv as the source to create 02-13-2020.csv???

If you need community's help to clean up the data to make file name, last update, time zone, etc, consistent, please let us know! I have proposed something in this reply, but you choose to ignore.

If you can do it by yourself, that's great and we as the community appreciates it VERY much. I'm afraid you did not do it right, more importantly, you did that in a rush w/o giving any warning. I know this is free source and you are helping the community, I understand and appreciate it very much, but choices like this really hurt. @CSSEGISandData

Created a JSON-based API.

Hello and thank you so much for maintaining this! I have created an API that reads your data and returns it in a way that's more friendly to use in programs. It also supports history. It's still a W.I.P and I would love contributions! It is open-sourced here: https://github.com/ExpDev07/coronavirus-tracker-api. Feel free to use it in your projects!

It is very fast due to caching.

The current endpoints are (more will be added):
https://coronavirus-tracker-api.herokuapp.com/confirmed
https://coronavirus-tracker-api.herokuapp.com/deaths
https://coronavirus-tracker-api.herokuapp.com/recovered

For all of them combined:
https://coronavirus-tracker-api.herokuapp.com/all

Time Series - Confirmed

The data point for confirmed cases in Hubei does not change for 6 Feb (and maybe the other provinces). My history from dxy.cn shows Hubei had 22112 confirmed cases in their 6 Feb (CST) update.

Great work putting this together. I've just linked to your tables rather than manually pulling from dxy.

JHU Data Backend Updates (NOTICE)

It makes for a lot of down time on the developers end if the file structure is constantly changing without a warning. Maybe a notice could allow developers to make changes before the edits go live. :D

An alternate visulization

Thank you very much for your effort and making curated data available to the world! Just wanted to bring to your attention that we have used some of the data that you have generously made available (along with data curated by us) to build a dashboard. This dashboard provides an alternate way of examining surveillance data. In particular:

  • County-level statistics for the United States (click on a State to view), and state/province-level statistics for Canada, Chile, India and Germany;
  • A time slider to view all the historical data;
  • An interactive chart for cumulative and daily number;
  • A visualization of all reported Coronavirus incidence data, filtered by date;
  • A heatmap of selected attributes on an interactive map;
  • A Query tool that allows users to focus on regions of interest;
  • The ability to select regions by clicking on the map; to select multiple regions at once, hold the “command” key on the Mac or the “ctrl” key on Windows while clicking;
  • Users can export subsets of the data for analysis on external tools.

Please see: https://nssac.bii.virginia.edu/covid-19/dashboard/

Documentation:
https://nssac.github.io/covid-19/dashboard/

Time zone

A suggestion: indicate the time zone (Eastern Standard Time if I remember well) or, even better, use UTC?

Keep up the good work!

keep the shape of the data consistent

A column in the time_series_2019-ncov-Confirmed.csv used to be named 'First confirmed date in country (est.)' but now is 'First confirmed date in country' - this small change brakes all the downstream analytics.

Besides, the column name is misleading since it contains dates of first confirmed cases in either state/province or in country - depending on which is the smaller administrative unit.

There could be 2 columns:

  1. First confirmed date in province/country <-- with data from current 'First confirmed date in country' column
  2. First confirmed date in country <-- optional, preserves compatibility

The 1st one showing data from former/current column 'First confirmed date in country (est.)'/'First confirmed date in country'.

The 2nd one showing actual first date for the country as a whole. The column is not strictly necessary since people who need it, will add it on their side, but it would preserve backward compatibility with existing analytical solutions.

Either way kindly please keep the names and data consistent because it causes errors and confusion in the analytic pipeline down the line.

Feature Request- Change red-only color of map bubbles to green -> red spectrum based on confirmed case rate by region.

I think the red bubbles on the map- currently representing the sheer number of confirmed cases via bubble size, would be even more useful if the color of these bubbles represented rate of change of confirmed cases by region. So, a large green bubble in an area would signify high total quantity of confirmed cases, but with zero growth, and a tiny red bubble would mean few total cases, but high rate of growth. A small legend showing the full color range and the min and / max growth rates would be useful as well.

Using Semicolon as Delimiter

Dear Contributors,

Because of some province/state have comma, I think it will be much better to use semicolon as delimiter (like the format you use before current format).

Thank you.

Data Problems

The data field "Province/States" has rows showing statuses (i.e. confirmed, deaths, severe etc.) along with actual provinces. This is clearly a mistake. Can you fix this?

Also, why did you do away with longitude/latitude? Can you bring it back?

Please commit to a format as to avoid introducing human error into the process.

Infected amount data type (Double/Int)

The .csv files for Confirmed, Dead and Recovered state the numbers as double.
Files in daily_case_updates/ have integer values.
What is the reason behind not having integers everywhere?

Data inconsistencies between dashboard and time_series files

It appears there are some inconsistencies in terms of data between what dashboard shows and what timeseries csv files record. If you look at:

2/13/2020 the graph says 59,8k cases in Mainland China while who_covid_19_sit_rep_time_series.csv indicates 46550 and time_series_2019-ncov-Confirmed.csv indicates 63841?

img1

2/11/2020 the graph says 44.3k cases in Mainland China while who_covid_19_sit_rep_time_series.csv indicates 42708 and time_series_2019-ncov-Confirmed.csv indicates 44641?

img2

2/10/2020 the graph says 42.3k cases in Mainland China while who_covid_19_sit_rep_time_series.csv indicates 40235 and time_series_2019-ncov-Confirmed.csv indicates 42310?

img3

2/7/2020 the graph says 34.1k cases in Mainland China while who_covid_19_sit_rep_time_series.csv indicates 31211 and time_series_2019-ncov-Confirmed.csv indicates 34569?

img4

SQL Version of this dataset

We are publishing a SQL Version of this dataset in Dolt, a SQL database with Git-style versioning if anyone is interested.

The Dolt repository can be found here:

https://www.dolthub.com/repositories/Liquidata/corona-virus

We wrote a blog post about SQL Views which also describes how the dataset can be used:

https://www.dolthub.com/blog/2020-02-10-introducing-sql-view-support-in-dolt/

The import job is open source and can be found here:

https://github.com/liquidata-inc/liquidata-etl-jobs/blob/master/airflow_dags/corona-virus/import-data.pl

The import job runs on the hour.

Created a python package to extract data and generate reports 📈

Had this working for the google sheets, but then decided to update for the github version

https://github.com/AaronWard/coronavirus-analysis

What does it do?

  • Extracts latest entry for each date from 2019-nCoV
  • creates an aggregated time series dataframe
  • creates summary report csv with information such as currently_infected for a given data
  • report diagrams to visualize the growth in confirmed cases, deaths and recoveries

If you just want to see visualizations, i update this repo daily so star the repo and check the readme👍

wrong data for Japan

Wrong data for Japan in time_series_2019-ncov-Confirmed.csv on 2/5/20 23:00 - 2/6/20 9:00 - 2/6/20 14:20 (45 confirmed) because next value 25

Keep the date time format consistent

In all files in time_series, all times follow a specific format (2/5/2020 9:00 AM): %m/%d/%Y %I:%M %p. The last datetime however, has a different format. The time is now in the 24-hour format, and the year has been shortened from 2020 to 20 (2/8/20 23:04). I believe this format is %m/%d/%y %I:%M.

Missing column in `time_series_2019-ncov - Recovered.csv`

Thank you so much for putting up the raw data!

I've noticed that in the file time_series_2019-ncov - Recovered.csv, the column with time stamp 1/31/2020 7:00 PM is missing while it's present in the corresponding Death- and Confirmed-files. Would you be able to comment on why that is?

I've noticed that the time stamp 1/31/2020 7:00 PM is missing in the daily update data, as well.

Thanks again!

Daily updates with no changes in Hubei data

Dear @CSSEGISandData, thank you for you work once again.

If I understand correct that the problem with same Hubei data in different daily updates (see the image) is probably related to when you publish your updates and when China publish Hubei updates (they delay the updates sometimes for a few hours). Will you fix this problem? Probably by publishing your daily update only after Hubei update already available or just a few hours later?

Or did I miss something and there is an easy way to extract for missing Hubei data from your files? There were a lot of changes in file structure, so maybe I have missed something...

But I see same issues in daily reports and time series - different dates, same Hubei data.

Thousands of people are already using my reports based on your data http://avatorl.org/covid-19/ and I hope

image

Data Connection

I had connected to this data originally when it was stored in the google doc via Microsoft PowerBI, just using the built in "Get Data - Web Connection."

When the data feed moved to git hub I just started pointing PowerBI to the most recently updated csv file.

It seems that method is no longer working, either, and instead of a table of data PowerBI is pulling in just some snippets of html code, and not the data table. If anyone has some suggestions where I've went wrong I would appreciate it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.