covid19datahub / covid19 Goto Github PK

View Code? Open in Web Editor NEW

252.0 252.0 94.0 165.16 MB

A worldwide epidemiological database for COVID-19 at fine-grained spatial resolution

Home Page: https://covid19datahub.io

License: GNU General Public License v3.0

R 99.34% TeX 0.35% CSS 0.03% JavaScript 0.28%

2019-ncov coronavirus covid-19 covid-data covid19-data r

covid19's People

Contributors

Stargazers

Watchers

Forkers

dylansppy dc-l rpromptt filippomenegatti etc5513demo federicolg dankelley mannymistry bigheiniu nuelma991 vishakhagaikwad gmaritati angelinakhatiwada hari-vg vedang13sawant krishtrivedi mrunal93 harshithpandit ybkamaleri ashish-kulkarni-ds priyanka1102 shubhamjain29 pranjalg3 jacksonmi86 harikachowdary32 advait-t 35naman hemantbhadoria satyanpadhy suruchi-agarwal martinbenes1996 chaihel-m sjaddya drroad sim55649 shawkatdidar je55 denisgenesthec ajayrawat gwenpetro montemurropaolo ghatode midnight93 zedauna linc-student muhammedseehan-commits anabeatriz1903 petr-pavlik mishrasushruti99 m3it lxhui suryahcu yuanzhouir will-rowe georgehagstrom hanchenresearch jpiburn 7ain7soph7 fumihirosaitoh elsaburren zheng7ai310 auk1808 jonekeat ngarzerigit anant-procogia esuess gachet nemochina2008 carlosdavila91 wook2014 ftrotter g4brielvs stevenya97 luoluogogogo jailukanna teodorojmartinez vzemlys fahadkhan2024 minghao2016 estellad llscody123 irmccallum enformatik shainaraza felixwodaczek galexandros neuvn ravinder8591 lgreski 3fraginverni 8sufustanto

covid19's Issues

Error in UK data on 01/06/2020

The deaths data is incorrect for the UK between 24-May and 01-Jun. On 01-Jun, a historical correction of 445 was introduced to add deaths tested by commercial labs (called "pillar 2"). In the official government figures this correction was retrospectively applied to previous reporting dates from 24-May to 31-May. Your data has applied the entire 445 correction to 01-Jun, increasing the correct number that day of 111 to an artificially high 556. I assume this will have happened because you just take the latest announced cumulative total and so the detail of the correction was missed.

I suggest refreshing your UK data based on the official DHSC numbers available here:

https://coronavirus.data.gov.uk/downloads/csv/coronavirus-deaths_latest.csv

That will result in the following updates to your UK cumulative deaths:

23-May: 36675 (Correct)
24-May: 36793 > 37116
25-May: 36914 > 37237
26-May: 37048 > 37373
27-May: 37460 > 37807
28-May: 37837 > 38220
29-May: 38161 > 38593
30-May: 38376 > 38819
31-May: 38489 > 38934
01-Jun: 39045 (Correct)
02-Jun: 39369 (Correct)
03-Jun: 39728 (Correct)
04-Jun: 39904 (Correct)

A similar correction was applied of ~4000 deaths on 29-Apr that your data does correctly incorporate in the same way as the UK government retrospectively applied it. Both corrections should have the same treatment.

Many thanks - happy to offer any additional clarification required.

possibly hours-old problem with COVID19 'death's column

First, I apologize that this issue is quite long. You can basically see my problem by looking at the code and output blocks at the bottom. I think there may be a problem with COVID19 that did not exist yesterday.

I'm wondering whether some has changed very recently with COVID19, in the deaths column. Below is some code that shows unexpected results. I am not sure whether this is a difficulty in how subset is working, how [ is working, or perhaps in the deaths column. I am not familiar with working with tibbles, having started using R long before they were invented, so maybe both my trial methods for extracting data are faulty?

NOTE: I am not querying by ISO codes for country names, because I simply don't know all the names, whereas I do know the actual names. Also, I'm doing this for nearly 200 countries, and I fear that calling covid19() that many times will be slow.

My confusion points are

why do [ and subset give different results?
why does subset give incorrect results (i.e. max per country is identical to max per world)
how can the [ work so differently for different countries

As a clue, I am pretty sure the results I am getting this morning are different from those I got yesterday; the previous results were not giving 0 deaths in countries where I know for sure there have been deaths.

The R code

library(COVID19)
d <- covid19(end=Sys.Date()-1)
cat("World:\n    ", max(d$deaths), "deaths\n")
for (country in c("Australia", "Canada", "United Kingdom", "United States")) {
    cat(country, ":\n", sep="")
    sub1 <- subset(d, d$country == country)
    cat("    method 1 reveals ", max(sub1$deaths), "deaths\n")
    sub2 <- d[d$country == country, ]
    cat("    method 2 reveals ", max(sub2$deaths), "deaths\n")
}

gives output

World:
     56259 deaths
Australia:
    method 1 reveals  56259 deaths
    method 2 reveals  0 deaths
Canada:
    method 1 reveals  56259 deaths
    method 2 reveals  0 deaths
United Kingdom:
    method 1 reveals  56259 deaths
    method 2 reveals  21092 deaths
United States:
    method 1 reveals  56259 deaths
    method 2 reveals  56259 deaths

I cannot find data for Kansas City, MO

Hello,

I cannot find the rows for Kansas City, MO in the data of administrative level 3. Could you please help me point out where it is? How did you deal with Kansas City and the counties that the city overlaps?

FYI. In the github repo of NYTimes, it says "Four counties (Cass, Clay, Jackson and Platte) overlap the municipality of Kansas City, Mo. The cases and deaths that we show for these four counties are only for the portions exclusive of Kansas City. Cases and deaths for Kansas City are reported as their own line."

Thanks.

Zheng

coding of OxCGRT policy measures

Is this intentional or perhaps I'm missing something? The geographic flags from OxCGRT aren't included (reasonable simplification, in my opinion), but that means that, e.g. Italy's schools are listed as closed on 23 February (true for Lombardy, presumably), rather than 4 March. Perhaps worth including the acaps dataset as an alternative? (https://www.acaps.org/projects/covid19/data)

France series

How it comes that the number of confirmed cases in France in 2020-04-21 is 156921 and on the next days on 2020-04-22 decreases to 154715?

Thanks

County level death data

County level death data for the United States seemed to have been changed to zero for the majority of counties.

U.S. data levels 2 and 3 have not been updated since 10/27.

United Kingdom and Norway latitude and longitude

Please fix latitude and Longitude for United Kingdom and Netherlands.
Lat Long
UK 55.3781 3.4360
Netherlands 52.1326 5.2913

HTI source

Haiti source has split the data before and after 5th May in two files. Fix needed.
https://proxy.hxlstandard.org/data/738954

Missing population source. See #46

No data for China

No Chinese data was available.

Level 3 Data Not Updated

The level 3 data I referenced earlier are now there, nut have not been updated since 8/22.

Data for Austria is outdated

Hi!
I just wanted to report that the data for positive tests in Austria in the admin level 1 file has not been updated for three days. Where do you source the data for Austria? JHU has different numbers for the positives, at least in the time series.

JHU:
10-27: 86102
10-28: 89496
10-29: 93949

https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv

Covid-19:
2020-10-27: 91386
2020-10-28,: 94891
2020-10-29: 94891
2020-10-30: 94891

https://storage.covid19datahub.io/data-1.csv

The official data provider for Austria would be the federal health agency AGES: https://covid19-dashboard.ages.at/dashboard.html and the URL of the relevant data CSV is https://covid19-dashboard.ages.at/data/CovidFaelle_Timeline.csv (Filter for "Österreich" in the column "Bundesland" and take the latest data from the column "AnzahlFaelle".

I know that there seems to be a problem with data transfer from Austria to ECDC. Do you source it from there?

All the best from Vienna!
gh

Mirai Solutions using covid19datahub

Dear Emanuele

Following the CovidR contest and some problems we have had with the data we are currently using we have decided to switch to the Covid19datahub project for our dashboard.
https://mirai-solutions.ch/gallery/covid19/

On the top left we will replace the "Data Source" text with your info. A screenshot from the feature branch attached.

On the ReadMe file we have quoted yourself, David Ardia and again referenced your website.
miraisolutions/Covid19#112

Let me know if this seems sufficient as a quote.
I believe we can go live on the master today.

We are also happy to be mentioned in your "usage" page!
https://covid19datahub.io/articles/usage.html

I would like to thank you for the great project you have put in place and for all the extraordinary efforts you and your team have made (!!!) .

Best regards

Guido Maggio

Missing data

Dear, data from Brazil and several countries in South America are not available.

Kind regards

> covid19(country = "BRA")
# A tibble: 0 x 35
# Groups:   id [0]
# … with 35 variables: id <chr>, date <date>, tests <int>, confirmed <int>, recovered <int>, deaths <int>,
#   hosp <int>, vent <int>, icu <int>, population <int>, school_closing <int>, workplace_closing <int>,
#   cancel_events <int>, gatherings_restrictions <int>, transport_closing <int>,
#   stay_home_restrictions <int>, internal_movement_restrictions <int>,
#   international_movement_restrictions <int>, information_campaigns <int>, testing_policy <int>,
#   contact_tracing <int>, stringency_index <dbl>, iso_alpha_3 <chr>, iso_alpha_2 <chr>,
#   iso_numeric <int>, currency <chr>, administrative_area_level <int>, administrative_area_level_1 <chr>,
#   administrative_area_level_2 <lgl>, administrative_area_level_3 <lgl>, latitude <dbl>, longitude <dbl>,
#   key <lgl>, key_apple_mobility <chr>, key_google_mobility <chr>

U.S. levels 2 and 3 data have not been updated since 10/27.

Why there is no data for China and Australia

The JHU US data is by state and county - there is no city level data

The readme file says that the JHU data is at the state and city level, but it's at the state and county level.

Policy Measures in US States dont seem correct

Thanks for putting this package together!.

Looking at Georgia I notice that all policy variables have increased over time without any decrease. Georgia has relaxed a number of restrictions and this relaxation does not seem to be accounted for.

Could you please shine some light on this?

reshape2 is also too heavy

reshape2 has lots of deps, too. And It's not recommanded by @hadley , ggplot2 3.3.0 has got rid of it. tidyverse/ggplot2#3639

No Updates?

The level-3 data set has not been updated since July 2. This is the second substantial delay in the past 10 days. Will updates be more regular in the future? My needs require the freshest data possible. Thank you, Ken

wrongly formated key_numeric in US data

Hi,
The "key_numeric" data is incorrect in the admin level 3 US data: it is formatted as an integer instead of a character and is loosing digits. These are FIPS codes, and should be 5 digits at the county level. For example, Autauga County in Alabama is FIPS code "01001", but it is entered in the covid19 data hub as 1001. I'm not familiar with other countries, but similar problems may exist in other geographies as well.
Fixing this would make life easier...thanks for the great product!

articles/api/r

R • COVID-19 Data Hub

https://covid19datahub.io/articles/api/r.html

CovsirPhy (Python package for COVID-19 analysis) will use COVID-19 Data Hub

Dear @emanuele-guidotti and COVID-19 Data Hub team,
Thank you for creating this dataset and Python interface.

We are developing python package to analyse COVID-19 data with SIR-like models (and open dataset for Japan).
CovsirPhy: https://github.com/lisphilar/covid19-sir

Currently, we are using different datasets for analysis (JHU dataset, OxCGRT dataset, population values). However, I'd like to switch to COVID-19 Data Hub and use Python interface covid19dh as a dependency of CovsirPhy in the next version.

I found comments on this package in your paper.
When we use covid19dh as a dependency to download the dataset, it is enough to cite the paper as follows?

Guidotti, E., Ardia, D., (2020). COVID-19 Data Hub, Working paper, https://www.researchgate.net/publication/340771808_COVID-19_Data_Hub

Should we show the citation lists (stdout of covid19dh.covid19(country=None, verbose=True)) when the users download this dataset?

I'm looking forward to collaborating with you!
Best Regards,
Lisphilar

Update Issue

Update at level 2 not there even though source data are current. This is turning out to be a recurrent and frustrating issue. Sorry, but I just needed to say it because we are very reliant on your data - we have built our platform of surveillance on it.

Data by region from Peru

Since Mar 15, 2020 I've been collecting the data that has been published in press releases, etc. from MINSA (the Ministry of Health of Peru).

You can find that in: https://github.com/jmcastagnetto/covid-19-peru-data/

Also, once they started releasing some data as open data, I've also put a repository with some data cleaning and concordance scripts at: https://github.com/jmcastagnetto/covid-19-peru-limpiar-datos-minsa

The second repo is not as complete as the first one.

US City Data, recent data missing, and NC County Missing

City Level Data in the US is not current- latest date is June 12, 2020.
For US, North Carolina, City data, "Alamance" (the first county alphabetically) is missing. In reality, there are 100 counties in the state... even though there are 100 entities for Counties, one of them is "Out of NC"

Reproducible code below:
data3 <- subset(covid19("US", level=3),state=="North Carolina")
table(data3$city) #first county alphabetically is missing, should be "Alamance"
max(data3$date) #returns "2020-06-12"

City and country names not showing up

Hi - Thank you for all your work. This is a remarkable contribution!!!

I noticed that in early version of the R package, covid19, the city and state names were visible from the ID variable. Now it's outputting the underlying codes without the place names. Is that bug? If not, is there a crosswalk to link the codes with the place names?

require("COVID19")

us.city <- covid19("USA", level = 3)
us.city.list <- sort(unique(us.city$id))

us.city.list[1:20]
[1] "0007cb93" "00261c81" "004a8ee7" "0051e968" "006b65bd"
[6] "00738b9f" "0083c472" "008b8a54" "00a1a685" "00b3d68a"
[11] "00b948a7" "00cc6d45" "00cebd4e" "00fc7fbd" "010cd779"
[16] "010e0772" "013e158a" "0141ae45" "0163ccb2" "0171bcbd"

Proposal to rename the column "id" as "iso3c"

Hi,

thanks for the excellent package! I have a minor suggestion - the id column is the ISO3C code for each country:

https://github.com/vincentarelbundock/countrycode

As a matter of fact, in other COVID datasets it's called iso3c:

https://joachim-gassen.github.io/2020/03/tidying-the-new-johns-hopkins-covid-19-datasests/

what about renaming it as iso3c in this package too? I think it would be a descriptive column name.

articles/api/python

Python • COVID-19 Data Hub

https://covid19datahub.io/articles/api/python.html

Having error while loading repository

Install COVID19

remotes::install_github("covid19datahub/COVID19")
Downloading GitHub repo covid19datahub/COVID19@master
Error in utils::download.file(url, path, method = method, quiet = quiet, :
cannot open URL 'https://api.github.com/repos/covid19datahub/COVID19/tarball/master'

Install COVID19

remotes::install_github("covid19datahub/COVID19")
Downloading GitHub repo covid19datahub/COVID19@master
covid19datahub-COVID19-31935e9/man/figures/apple-touch-icon.png: truncated gzip input
tar.exe: Error exit delayed from previous errors.
Error: Failed to install 'COVID19' from GitHub:
Does not appear to be an R package (no DESCRIPTION)
In addition: Warning messages:
1: In utils::untar(tarfile, ...) :
‘tar.exe -xf "C:\Users\choti\AppData\Local\Temp\Rtmpg1s3Jr\file2bbc2489976.tar.gz" -C "C:/Users/choti/AppData/Local/Temp/Rtmpg1s3Jr/remotes2bbc5fc55a4b"’ returned error code 1
2: In system(cmd, intern = TRUE) :
running command 'tar.exe -tf "C:\Users\choti\AppData\Local\Temp\Rtmpg1s3Jr\file2bbc2489976.tar.gz"' had status 1

Add Information on Number of Tests

Hello,
it would be great if the number of tests realised by country and date could be added. Such as available here https://ourworldindata.org/coronavirus-testing

Best

Coordinates for Denmark and France are wrong

Hi! Just wanted to notify that the lat/long coordinates for Denmark and France as delivered with the country timelines are off. Maybe this is due to their overseas territories. If you run a geo algorithm searching for the center point of a country polygon, you end up with a point in the middle of nowhere if you forget to delete the overseas polygons first. So the point for Denmark is somewhere in the north Atlantic and the point for France is in western Africa.

France could be 46 lat 3 long
Denmark could be 56 lat 9 long
(EPSG:3857 Web Mercator)

No more updates?

First of all: Thank you very much for this fantastic project. I just wanted to know whether there are issues with the data updates or whether the update cycle has been extended to once a week instead of once a day. The last update on admin level 1 was on 2020-06-24, the last update on US admin level 3 on 2020-06-23. https://covid19datahub.io/articles/data.html

articles/data

Data • COVID-19 Data Hub

https://covid19datahub.io/articles/data.html

Partially missing level 2 data for Germany

Hi, first of all, thanks for putting together this awesome tool.

I am trying to run an analysis using time-series data for the confirmed cases in German landers. I am using the Python API but I also double-checked with the R API and I am getting the same:

covid_germany, _ = covid19(['Germany'], level=2, verbose=False)
print(covid_germany.administrative_area_level_2.unique())

['Bayern' 'Schleswig-Holstein' 'Nordrhein-Westfalen' 'Baden-Württemberg'
 'Bremen' 'Hamburg' 'Hessen' 'Rheinland-Pfalz' 'Niedersachsen']

So basically, I can only fetch data for 9 out of the 16 landers. Missing the regions in red: ['Saarland', 'Berlin', 'Sachsen-Anhalt', 'Thüringen', 'Brandenburg', 'Sachsen', 'Mecklenburg-Vorpommern']

The source, RKI, seems to report for all landers, so where is this data getting lost?

missing populations

Hi, I'm new and have much to learn. First, thank you for making this very cool package!

Second, I am wondering why there is no population data for countries with id ERI, GPC and MSZ.

Thank you again!

Unable to establish a connection to the server

Warning message:
In file(con, "r") : InternetOpenUrl失败:’无法与服务器建立连接'

Level 3 Clean csv Data Not Loaded

Hi, the level-3 cleaned csv data are not loaded and never backed up yesterday ( https://storage.covid19datahub.io/data-3.csv )

Thank you. Ken

why no pop for United States?

I think there as a non-NA pop for the United States before, but now it seems to be gone. I wonder if that's a problem with the upstream data, or a result of the name change (which I think was "US" until a few days ago, but I might be remembering back to the days when I used my own code to download the Johns Hopkins data).

library(COVID19)
d <- covid19()
for (country in c("France","United States", "Canada")) {
    ds <- d[d$country==country,]
    cat(ds$country[1], ds$pop[1],"\n")
}

yields

France 66987244
United States NA
Canada 37058856

No Recent Updates

No updates to level 2 or 3 data sets since August 16. Thank you.

JOSS review

Hi there.

Thank you for the submission - this is a great resource! This is my review as part of openjournals/joss-reviews#2376. Please can you address the following comments:

tests / continuous integration

I can see the tests directory but I'm not entirely clear on how to run these. I don't code using R so this might be why. Regardless, could you please add some documentation for running these tests (or make it clearer where this documentation is if it already exists). Can you also add some continuous integration for your test suite please - travis or similar would be great.

references

Please could you include DOIs for the references where you can.

minor paper comments

Although the paper is very well written and the summary is nice, the actual software description/purpose isn't included until the final paragraph which is on the second page of the paper. In terms of readability/impact, maybe you could introduce the datahub earlier on? Feel free to ignore this comment if you wish. Secondly, and this might be intentional, there are a few extra full stops after first mention of Excel!

problem with today's data

Many thanks for this package.

I'm wondering whether I'm missing something, as illustrated with the R script and output given below, run using updated COVID19 as updated a few minutes ago.

Note the most recent value of confirmed, for example.

I can work around this issue, by ignoring today's data if they disagree badly with the data on the day before, but I am pointing this out in case it reveals a problem that you might want to look at. (Or, perhaps, is there a way provided by COVID19 to skip not-yet-complete data?)

R script

library(COVID19)
old <- world("country")
new <- covid19()
for (country in c("Canada", "United States")) {
    cat("#", country, "\n")
    print(tail(old[old$country == country, ], 3))
    print(tail(new[new$country == country, ], 3))
}

Output


R version 4.0.0 alpha (2020-04-01 r78130)
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin17.0 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(COVID19)
> old <- world("country")
> new <- covid19()
> for (country in c("Canada", "United States")) {
+     cat("#", country, "\n")
+     print(tail(old[old$country == country, ], 3))
+     print(tail(new[new$country == country, ], 3))
+ }
# Canada 
# A tibble: 3 x 21
# Groups:   id [1]
  id    date       deaths confirmed tests recovered  hosp   icu  vent country
  <chr> <date>      <dbl>     <dbl> <dbl>     <dbl> <dbl> <dbl> <dbl> <chr>  
1 CAN   2020-04-13    779     25674     0    107480     0     0     0 Canada 
2 CAN   2020-04-14    899     27029     0    116822     0     0     0 Canada 
3 CAN   2020-04-15      0         8     0      8210     0     0     0 Canada 
# … with 11 more variables: state <lgl>, city <lgl>, lat <dbl>, lng <dbl>,
#   pop <int>, pop_14 <dbl>, pop_15_64 <dbl>, pop_65 <dbl>, pop_age <dbl>,
#   pop_density <dbl>, pop_death_rate <dbl>
# A tibble: 3 x 21
# Groups:   id [1]
  id    date       deaths confirmed tests recovered  hosp   icu  vent country
  <chr> <date>      <dbl>     <dbl> <dbl>     <dbl> <dbl> <dbl> <dbl> <chr>  
1 CAN   2020-04-13    779     25674     0    107480     0     0     0 Canada 
2 CAN   2020-04-14    899     27029     0    116822     0     0     0 Canada 
3 CAN   2020-04-15      0         8     0      8210     0     0     0 Canada 
# … with 11 more variables: state <lgl>, city <lgl>, lat <dbl>, lng <dbl>,
#   pop <int>, pop_14 <dbl>, pop_15_64 <dbl>, pop_65 <dbl>, pop_age <dbl>,
#   pop_density <dbl>, pop_death_rate <dbl>
# United States 
# A tibble: 3 x 21
# Groups:   id [1]
  id    date       deaths confirmed tests recovered  hosp   icu  vent country
  <chr> <date>      <dbl>     <dbl> <dbl>     <dbl> <dbl> <dbl> <dbl> <chr>  
1 USA   2020-04-13  23468    578978     0         0     0     0     0 United…
2 USA   2020-04-14  25770    605948     0         0     0     0     0 United…
3 USA   2020-04-15      0         0     0         0     0     0     0 United…
# … with 11 more variables: state <lgl>, city <lgl>, lat <dbl>, lng <dbl>,
#   pop <int>, pop_14 <dbl>, pop_15_64 <dbl>, pop_65 <dbl>, pop_age <dbl>,
#   pop_density <dbl>, pop_death_rate <dbl>
# A tibble: 3 x 21
# Groups:   id [1]
  id    date       deaths confirmed tests recovered  hosp   icu  vent country
  <chr> <date>      <dbl>     <dbl> <dbl>     <dbl> <dbl> <dbl> <dbl> <chr>  
1 USA   2020-04-13  23468    578978     0         0     0     0     0 United…
2 USA   2020-04-14  25770    605948     0         0     0     0     0 United…
3 USA   2020-04-15      0         0     0         0     0     0     0 United…
# … with 11 more variables: state <lgl>, city <lgl>, lat <dbl>, lng <dbl>,
#   pop <int>, pop_14 <dbl>, pop_15_64 <dbl>, pop_65 <dbl>, pop_age <dbl>,
#   pop_density <dbl>, pop_death_rate <dbl>

GBR administrative area level 2 lost at level 3

The level 2 region data is lost at level 3, I'm pretty this wasn't the case before.

Before you could download the level 3 GBR data, then filter by level 2 by England, and then get all the level 3 regions in England. Now if you pull the level 3 data in the level administrative_area_level_2 column is empty, so there's no way to select a level 2 area and then filter level 3 by that selection.

articles/iso/GRC

Greece • COVID-19 Data Hub

https://covid19datahub.io/articles/iso/GRC.html

Data Missing

https://storage.covid19datahub.io/data-3.csv is missing - nothing is found there at the moment.

articles/iso/USA

United States • COVID-19 Data Hub

https://covid19datahub.io/articles/iso/USA.html

articles/doc/data

Dataset Documentation • COVID-19 Data Hub

https://covid19datahub.io/articles/doc/data.html

ds_opencovid_fr <- function(level=1, cache=cache){
  
  # Montemurro Paolo 11 05 2020
  
  # Libraries
  library(dplyr) #You can import different libraries!
  
  # Download data
  url <- "https://raw.githubusercontent.com/opencovid19-fr/data/master/dist/chiffres-cles.csv"
  x   <- read.csv(url, cache=cache) #To test it, remove cache from here.
  
  # Formatting columns
  x$date       <- as.Date(x$date)
  x$tests      <- x$depistes
  x$confirmed  <- x$cas_confirmes
  x$deaths     <- x$deces
  x$recovered  <- x$gueris
  x$hosp       <- x$hospitalises 
  x$icu        <- x$reanimation  
  
  x <- x[c("date","tests","confirmed","deaths","recovered","hosp","icu","granularite","maille_code","maille_nom")] #Not needed, but cleaner
  
  # Keeping only relevant level
  if(level==1){x<- x[x$granularite=="pays",]}
  if(level==2){x<- x[(x$granularite=="region" | x$granularite=="collectivite-outremer") ,]}
  if(level==3){x<- x[x$granularite=="departement",]}
  
  # Cleaning
  x <- x %>% distinct(date,maille_code, .keep_all = TRUE) #Keep the first observation, more reliable
  
  # Done!
  return(x)
  
  # Don't forget to check your data!!!
  
}

covid19datahub / covid19 Goto Github PK

covid19's People

Contributors

Stargazers

Watchers

Forkers

covid19's Issues

R • COVID-19 Data Hub

Python • COVID-19 Data Hub

Install COVID19

Install COVID19

Data • COVID-19 Data Hub

Greece • COVID-19 Data Hub

United States • COVID-19 Data Hub

Dataset Documentation • COVID-19 Data Hub

Recommend Projects

Recommend Topics

Recommend Org