jgehrcke / covid-19-germany-gae Goto Github PK
View Code? Open in Web Editor NEWCOVID-19 statistics for Germany. For states and counties. With time series data. Daily updates. Official RKI numbers.
License: MIT License
COVID-19 statistics for Germany. For states and counties. With time series data. Daily updates. Official RKI numbers.
License: MIT License
Hi! Great repo! I was just wondering if you are updating the data any time soon. It is now a week old. Thanks!
Hi JGehrcke
Fyinfo, I've put some files under https://gist.github.com/denis-bz
0-covid19-per100000-perweek-allgermany.md
covid19_weeks.py
ags_place_pop.py
destatis-ags-place-pop.csv -- from a destatis .csv, ags -> Pop Place Land
121may-covid19_weeks.log
No plots --
what do you think of the plot Munich + 6 Landkreise from last week ?
What plots on the web give any insight at all on causes ?
cheers
-- denis-bz-py t-online.de
Self-documenting is the best. :)
Relates to #6
human-readable is better, bandwidth is not so much a concern
This might be a little too sensational, of limited use
I'm so glad you use English in your repo, even if it's nation-specific. See
opencovid19-fr/data#141
pcm-dpc/COVID-19#284
I'm staring your repo BIG times!!!
Now, can you imagine exchanging information, experience, etc. with these repos? Would be really great.
Without this header, the API is not usable by web applications because of the same origin policy
error message in Chrome, for example:
Access to XMLHttpRequest at 'https://covid19-germany.appspot.com/timeseries/DE-BW/cases' from origin 'http://localhost:4200' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource.
See also: https://www.w3.org/wiki/CORS_Enabled#Why_is_CORS_important.3F
https://www.saarland.de/254259.htm/ seems to be dead. I didn't find any replacement link though.
Makefile-driven?
I love the progress this repo is making, and I don't have an issue, but rather a question, which I hope someone here can answer: Is there a source in Germany for the data that the Dipartimento della Protezione Civile makes available for Italy? There are two parts, both of which are useful:
I have not seen this anywhere, but it is so obviously important to gauge the progress of the outbreak and the effectiveness of countermeasures, that I have to believe that German authorities also capture this information.
Has anyone seen this reported?
Thanks,
Matt
It looks somewhat incomplete - very obvious around April 8th to April 20th, but other dates as well. Given that RKI did report data for the missing days, is there a reason to exclude it, or is this a simple oversight?
Hey there, this is really really great work, thank you very much. I've build a little dashboard my self and want to include the numbers for all of Germany. So far it looks like I can get the timeseries for Germany only via CSV download? Would it be possible for you to add this? The sum seems to be already contained in the data file. Cheers, Daniel
Hi,
Thanks for the project! Like you I've been very dismayed by the state of the data being published. I don't understand why we can't just have a CSV with an event stream, with each case ID and then changes to the case.
I'm particularly troubled by the prominence of the "CFR" stat in many dashboards. This stat is near useless, due to the extremely fast growth rate. If the median time from onset to death is 22 days, and cases double every 2.75 days, then while the disease is growing 255/256 cases will be simply too recent to be "eligible" for mortality.
I want to make some more detailed calculations about this, but the problem is that we need to know the time of onset of symptoms for the cases, not simply when they were reported. The RKI figures suggest this information is often recorded, e.g. in Figure 3 here it is available for some cases: https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Situationsberichte/2020-03-20-en.pdf?__blob=publicationFile
I suppose I could try to laboriously reconstruct the source figures by zooming in on the PDF, but...wtf. Is the underlying data available somewhere instead?
The public is severely misled about what's going on, because they're looking at this useless CFR figure. I think many decision makers are actually looking at the same picture of things and being misled as well.
What's wrong with rki data, Both death and confirm? It's not updating, thank you a lot, it would be very helpful.
There is a new great official data source that gets reports by each hospital for intensive care Covid cases they have (and free ventilators).
https://www.intensivregister.de/#/intensivregister
Unfortunately, they only provide a daily snapshot here in form of a picture of a table and no historic data. My request hasn't been answered now for days. Any interest in starting to include those numbers?
Hi JGehrcke
would you know of .csv files with the number of people recovered each day / each city ?
What I'm really looking for is the number of people who could infect others,
active cases:
Total nr cases, from your cases-rki-by-ags.csv
- nr recovered ?
- nr died
- nr in quarantine -- estimate ?
Then one could say e.g. "10 people per 100000 in my area could infect me"
which seems to me a simple way to put the risk in perspective.
What do you think ?
Thanks, cheers
-- denis-bz-py t-online.de
Hello, I’m looking at the data by state and the latest date is 25 May. How frequent does the data get refreshed?
Thank you for making this!
Could you please share the Zeit JSON URL you use to get the data? I couldn't find it. I guess it's in a .env file not pushed here.
Does the original data have the numbers per Bundesland?
People are lazy, we all are :-). Maybe make a table, linking to the individual https://covid19-germany.appspot.com/timeseries/<state>/<metric>
combinations.
corona-zahlen-landkreis/corona_landkreis_fallzahlen_scraping#39
Highly insightful, from March 23:
https://fragdenstaat.de/anfrage/meldekette-von-coronavirus-zahlen/
Unter anderem um diesen Prozess weiter zu vereinheitlichen und zu beschleunigen, entwickelt das RKI ein elektronisches Melde- und Informationssystem mit dem Namen "DEMIS". Damit sollen die eingehenden Meldungen mit ihrem jeweiligen Bearbeitungsstand allen am Melde- und Übermittlungsweg beteiligten Einrichtungen entsprechend ihren jeweiligen gesetzlichen Zugriffberechtigungen medienbruchfrei in Echtzeit zur Verfügung stehen.
Also:
Die von den Ärzten und Laboren auf unterschiedlichen Wegen und in unterschiedlichen Formaten eingehenden Meldungen müssen zunächst vom Gesundheitsamt erfasst, zusammengeführt und anhand der vom RKI getroffenen Falldefinitionen bewertet werden. Die Daten werden spätestens am nächsten Arbeitstag vom Gesundheitsamt elektronisch an die zuständige Landesbehörde und von dort an das RKI übermittelt.
NPGEO Corona
https://npgeo-corona-npgeo-de.hub.arcgis.com/datasets/917fc37a709542548cc3be077a786c17_0
RKI Corona Landkreise
Last updated 14 hours ago | 412 Records
--
This is a great structured resource, describing the individual Landkreise with their metadata properties. The case count is behind, because this is the RKI view on case count.
But this LK metadata can be correlated with LK case count obtained by ZEIT ONLINE or https://github.com/corona-zahlen-landkreis/corona_landkreis_fallzahlen_scraping.
Screenshot:
This table as CSV file: here
The Berliner Morgenpost also seems to parse data from the German federal state health ministries. However their numbers are slightly different to the numbers of zeit.de:
https://interaktiv.morgenpost.de/corona-virus-karte-infektionen-deutschland-weltweit/
Is there an explanation to why this is the case?
an aggregated endpoint per state is convenient
With the current state of tooling in this repo we're now approaching a state where it's easy to compare time series obtained from different data sources, and where it will be easy to do so continuously do so (with automation). A simple plot showing the four time series named in the title will reveal a lot about the relationship, differences, and commonalities between the data sources.
Hi J Gehrcke,
just fyinfo, not an issue, here's a plot of Covid-19 cases hospitalized per week, from RKI data:
Seems to me that concentrating on the < 10 % of cases who enter hospital
would be more effective than looking at all cases, 90 % of them mild --
what do you think ?
If you know of a data source for nr. hospitalized per Kreis (the RKI Berichte have only the totals for all Germany), please let me know.
cheers
-- denis
Could you please provide deaths num in cases-rki-by-ags.csv like you do in data.csv ? That would be very helpful.
Hello,
Your dataset was added to CoronaWhy (https://www.coronawhy.org/) Data Lake on Dataverse as a piece of common COVID-19 data https://datasets.coronawhy.org/dataset.xhtml?persistentId=doi:10.5072/FK2/IJWHDT
Would you be willing to help with the maintenance of your dataset in Dataverse, e.g. adding the relevant metadata and keeping the dataset up-to-date? That will help to make the dataset findable and accessible for the medical science community.
Very cool repo!
It would be nice to also plot the daily new cases for the comparison plot in the landing page. It would help separate true divergence from reporting delay between sources.
Related: #58
it seems that RKI data is not updated since 2020-04-08T17:00:00+0000
. Any reason for that? I havent found an issue on the topic, sorry if I missed it.
And thanks for the effort, by the way!
Right now this endpoint still consumes a private spreadsheet behind the scenes. I manually curate that spreadsheet and derive both, and also derive the CSV file from it. Make the API implementation consume the CSV file in the repo: cleaner, more transparent, more robust information flow.
Just starting a small list with articles discussing the data quality of data sets. Hope it's fine for you as "issue".
19.03.2020, Berliner Zeitung, "Corona-Statistik - RKI und Johns Hopkins University: Darum weichen die Fallzahlen voneinander ab"
19.03.2020, tagesschau.de, "Zahlen über infizierte Menschen Unterschiedlich, aber nicht falsch"
22.03.2020, Spiegel, "Verwirrung um Fallzahlen vom Robert Koch-Institut - Doch keine Entwarnung bei Corona-Infektionen "
22.03.2020, tagesschau.de, "Infektionszahlen - Zu früh für einen Trend"
24.03.2020, Spiegel, "Statistikprobleme beim Coronavirus - Die große Meldelücke"
27.04.2020, ndr, "Corona: Neue Daten stellen Epidemie-Verlauf infrage"
ZEIT ONLINE have updated their data flow. They now pull data from individual Landkreise. Great!
Edit: @coezbek made us aware of a nice crowdsourcing effort coordinated in a gsheet linked here: CSSEGISandData/COVID-19#1008 (comment)
Edit: risklayer uses a crowd-sourcing approach and curates LK-based data in a gsheet
Those data sources are relatively early in the Meldekette, and probably quite credible. Will elaborate in the comments.
Best I could find (thx Berliner Morgenpost):
Thüringen:
Schleswig-Holstein:
Sachsen-Anhalt:
Followed your link from the JHU conversation.
I like your use of GAE for this. I once did a tiny little side project on GAE several years ago, but then didn't really keep up with it (as I remember GAE was stuck on Python 2.6 or 2.7 for what seemed like an eternity). I will have to take a look at the Python part of your repo, to get a sense of what's possible now. Performance seems to be excellent: I am in California, and your API is VERY quick, even from here.
Happy to exchange thoughts on this pandemic, here. Or on Twitter, if you prefer. I followed you there, as well.
germany-sum over time derived from this data set should be in line with the same done for JHU data, and it will deviate from RKI data over time, visualizing what the data freshness chitchat is about.
As of this writing, March 27th, 2020, there is actually no scientific test - as in: a provable and repeatable test - to confirm a case of COVID-19. Here's what the Wikipedia community says:
https://en.wikipedia.org/wiki/Coronavirus_disease_2019#Diagnosis
First the ambiguous part which is probably misleading a large part of the public into thinking that COVID-19 can be tested for:
The WHO has published several testing protocols for the disease. The standard method of testing is real-time reverse transcription polymerase chain reaction (rRT-PCR). The test can be done on respiratory samples obtained by various methods, including a nasopharyngeal swab or sputum sample.[62] Results are generally available within a few hours to two days. Blood tests can be used, but these require two blood samples taken two weeks apart and the results have little immediate value.
This is apparently making many readers believe that the paragraph is describing the diagnosis of COVID-19. It's not, because the clarification is right in the next sentence, emphasis mine:
Chinese scientists were able to isolate a strain of the coronavirus and publish the genetic sequence so that laboratories across the world could independently develop polymerase chain reaction (PCR) tests to detect infection by the virus. As of 19 March 2020, there were no antibody tests though efforts to develop them are ongoing.
There is no test for COVID-19 (the disease) but a test for SARS-CoV-2 (the virus).
So what are symptoms of COVID-19?
Diagnostic guidelines released by Zhongnan Hospital of Wuhan University suggested methods for detecting infections based upon clinical features and epidemiological risk. These involved identifying people who had at least two of the following symptoms in addition to a history of travel to Wuhan or contact with other infected people: fever, imaging features of pneumonia, normal or reduced white blood cell count, or reduced lymphocyte count.
In other words, a wide range of symptoms caused by anything from bacteria to viruses.
Moreover:
One study in China found that CT scans showed ground-glass opacities in 56%, but 18% had no radiological findings. Bilateral and peripheral ground glass opacities are the most typical CT findings, though they are non-specific.
So these are non-specific, but even aside of that: no country has such a large capacity of CT equipment to test each suspected case of COVID-19.
Add to this that the vast majority of published and claimed COVID-19 cases had one or more pre-existing conditions which either weaken the immune-system or attack the lungs, i.e. conditions which would also cause the above symptoms: fever, imaging features of pneumonia, normal or reduced white blood cell count, or reduced lymphocyte count.
I repeat: right now there is no way to prove in any scientific sense that a person infected with SARS-CoV-2 and who also developed e.g. pneumonia is actually infected by COVID-19, or if they were "only" infected by SARS-CoV-2, fought off the infection by SARS-CoV-2 but developed the above symptoms due to an unrelated infection with other bacteria or viruses. Which would actually explain why the vast majority of people infected by SARS-CoV-2 survive without developing any of the above symptoms, and why the vast majority of confirmed or suspected COVID-19 cases had pre-existing conditions, namely conditions causing the exact same symptoms.
Here's how the Robert-Koch-Institute confuses the situation even more:
Clinical aspects
Clinical information is available for 26,250 of the notified cases, of which 870 cases were reported as not having any symptoms considered significant for COVID-19. The most common manifestations are cough (14,202; 54%), fever (10,784; 41%), rhinorrhoea (6,158; 23%) and pneumonia (429; 2%). Hospitalisation was reported in 2,664 (10%) of the 26,563 COVID-19 cases with data available. An estimated 5,900 persons have recovered from their COVID-19 infection. Cases were considered to have recovered if they had a known onset of disease on or before 12/03/2020, were not reported to have pneumonia or dyspnea, did not require hospitalisation or had already been discharged and did not die. Cases were included in the algorithm only if information on date of illness onset, symptoms, hospitalisation status and vital status were available.
So the RKI is counting people infected by SARS-CoV-2 who had no symptoms of pneumonia or dyspnea and who were not hospitalised, as cases of cured COVID-19, instead of counting them as cases of SARS-CoV-2 infection without developing COVID-19. I guess that's called proof-through-absence-of-evidence? But don't take my word for it, here's what the RKI itself admits in the German-only case definition file, emphasis mine:
Epidemiologische Bestätigung
Epidemiologische Bestätigung, definiert als mindestens einer der beiden folgenden Nachweise unter Berücksichtigung der Inkubationszeit:
- epidemiologischer Zusammenhang mit einer labordiagnostisch nachgewiesenen Infektion beim Menschen durch - Mensch-zu-Mensch-Übertragung
- Auftreten von zwei oder mehr Lungenentzündungen (Pneumonien) (spezifisches klinisches Bild) in einer medizinischen Einrichtung, einem Pflege- oder Altenheim, bei denen ein epidemischer Zusammenhang wahrscheinlich ist oder vermutet wird, auch ohne Vorliegen eines Erregernachweises.
Please read the last part carefully:
bei denen ein epidemischer Zusammenhang wahrscheinlich ist oder vermutet wird, auch ohne Vorliegen eines Erregernachweises.
in English: a COVID-19 infection is to be treated as confirmed if there were two or more cases of pneumonia in a medical facility, nursing home or retirement home, where an epidemiological connection is probable or suspected, even if there is no positive test result for the virus.
And here is what the RKI says which cases should be reported as confirmed cases of COVID-19, emphasis mine:
Über die zuständige Landesbehörde an das RKI zu übermittelnder Fall
B. Klinisch-epidemiologisch bestätigte Erkrankung
Spezifisches klinisches Bild von COVID-19, ohne labordiagnostischen Nachweis, aber mit epidemiologischer Bestätigung (Auftreten von zwei oder mehr Lungenentzündungen (Pneumonien) in einer medizinischen Einrichtung, einem Pflegeoder Altenheim).
Spezifisches oder unspezifisches klinisches Bild von COVID-19, ohne labordiagnostischen Nachweis, aber mit epidemiologischer Bestätigung (Kontakt zu einem bestätigten Fall).
In English: all cases of pneumonia where two or more cases of pneumonia occured in a medical facility, nursing home or retirement home where an epidemiological connection is established should be transmitted by state authorities to the RKI as a case of COVID-19 even in the absence of a positive test for SARS-CoV-2 in the patient who developed the pneumonia.
So given the fact that:
the RKI is basically saying that starting in March 2020 all cases of pneumonia in Germany should be reported as COVID-19. Imagine that.
Therefore I think this project, useful as it may be, should correctly label the findings: if they are really just giving positive test numbers for SARS-CoV-2 then it should read that this is what the findings show, nothing more. The numbers should also point out the above: cases of unrelated pneumonia with no testing done for SARS-CoV-2 where the sick person was somehow connected to a confirmed case of SARS-CoV-2 are (mis-)represented as a confirmed case of COVID-19.
Hamburg: case numbers
Does not link to anything Hamburg related - instead it shows numbers from NRW.
for convenience! e.g. sum_cases
, sum_deaths
.
Do you have a link/dataset to convert the columns names in the ags files? I figured out that they are Landkreis Keys and tried to make my own converter, but every dataset I find is in most cases not sufficient.
For example Landkreiskey 1000 => Flensburg
There seems to be an issue with the updates. Is that on purpose?
Thank you for providing this API!
I'm using your API to simulate COVID-19 transmission for Germany, together with some other data sources: https://github.com/cjolowicz/covid19
This is probably a rather naive model. I am grateful for any feedback, to learn more about modelling virus spread. The core algorithm is here: https://github.com/cjolowicz/covid19/blob/master/src/covid19/simulation.py
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.