Comments (17)
I want to have one source for both cases and deaths. Having multiple source with different dates is confusing. I also need daily updates for the current date, since I want to show other decision makers current numbers. Also the risklayer file seems to have duplicate days per line (as shown in your example above). So the easiest thing for me would be to copy the code which generates the csv from your repo to my dashboard, I guess.
from covid-19-germany-gae.
Well, I don't think there is an issue (or at least we didn't properly describe it yet), the slight delay is "on purpose" as in "it's fine" as in "it encourages to ask the right questions", as you did here. Thanks for asking.
I'd love you to have a look at #93. It makes sense to look at the RKI timeseries data only up to ~3 days ago -- the RKI data for "the last few days" are a little skewed.
To make this a little easier to understand: the RKI data for today should be looked at in about a week, only then it's not really subject to changes anymore :-).
I might look into doing a daily update here, but then people look at today's data without asking good questions.
What do you think?
from covid-19-germany-gae.
from covid-19-germany-gae.
Could you tell me, what I would need to do myself for the daily version?
from covid-19-germany-gae.
IMHO it makes sense to have daily numbers
We definitely have daily numbers! :-) It's just recommendable to choose between Risklayer data and RKI data depending on which time frame you're looking at.
Use RKI-provided numbers for up to 2-3 days ago, use the Risklayer data set for today and the last 2-3 days. The tail end of the Risklayer data set for example today (Sep 29):
2020-09-28T01:00:00+0000,4703,7689,20074,2366,68653,18677,10576,49693,67398,3341,14195,4257,1154,7098,2604,4061,286539
2020-09-29T01:00:00+0000,4742,7749,20269,2385,69213,18798,10678,50048,67757,3343,14326,4266,1165,7174,2635,4070,288618
2020-09-29T09:00:00+0000,4742,7749,20269,2385,69213,18798,10678,50048,67757,3343,14326,4268,1165,7174,2635,4070,288620
straight from: https://github.com/jgehrcke/covid-19-germany-gae/blob/master/cases-rl-crowdsource-by-state.csv
Could you tell me, what I would need to do myself for the daily version?
Which data set (specific CSV file / files) are you interested in generating yourself? The tooling is all in this repository, and we can certainly try to better document how to use it.
from covid-19-germany-gae.
Can continue discussing, but closing this for now!
from covid-19-germany-gae.
Which data set (specific CSV file / files) are you interested in generating yourself? The tooling is all in this repository, and we can certainly try to better document how to use it.
It seems to me that AG_RKI_SUMS_QUERY_BASE_URL is not defined in the code.
from covid-19-germany-gae.
I also need daily updates for the current date, since I want to show other decision makers current numbers.
I sense some frustration here :-). Note that this is a free time project!
At the same time your ask(s) is/are a little ambiguous. I'll try to address your points.
the risklayer file seems to have duplicate days per line (as shown in your example above)
"Duplicate days per line" is a funny description... aehm, let me clarify: each line reflects one data point. A data point is comprised of a timestamp, and a set of numeric values.
Now, yes, there might be more than one data point (line) per day in the RL data set. But only ever for the last day in the data set.
This is by design: as I said above, the RL data set is supposed to be recent.
The timestamp of each data point is provided with a time resolution of 1 hour. When you consume time series data then in general it's good advice to maybe not expect equidistant samples :-).
I want to have one source for both cases and deaths.
RKI and RL data are the same source: Gesundheitsaemter.
Fair enough, but why don't you use the RKI data set for that? I guess you're saying that it does not appear to be 'fresh' enough?
I get that you really want to have the latest RKI data point, and well, yeah. We can do that. :)
So the easiest thing for me would be to copy the code which generates the csv from your repo to my dashboard
You're absolutely welcome to do with the code you find in this repository whatever you'd like, subject to the License declared in the file header(s). And of course -- if you find a meaningful way to run a CPython interpreter with pandas as part of 'your dashboard' then this sounds like a great approach!
from covid-19-germany-gae.
The RKI data files now contain the most recent data points.
from covid-19-germany-gae.
This screenshot shows the RL data set on the left, and the RKI data set on the right, and hopefully clarifies once again how these data sets relate to one another.
For making the point the the RL data set is more accurate for the very recent past here we also see again that it makes sense that, to quote myself from above,
there might be more than one data point (line) per day in the RL data set. But only ever for the last day in the data set.
The screenshot also shows that if you're not particularly interested in today/yesterday, then the RKI data set is more credible (better view into the past).
from covid-19-germany-gae.
I also need daily updates for the current date, since I want to show other decision makers current numbers.
I sense some frustration here :-). Note that this is a free time project!
No no, sorry if this is what you heard. I'm entirely grateful for your project and help. :)
At the same time your ask(s) is/are a little ambiguous. I'll try to address your points.
the risklayer file seems to have duplicate days per line (as shown in your example above)
"Duplicate days per line" is a funny description... aehm, let me clarify: each line reflects one data point. A data point is comprised of a timestamp, and a set of numeric values.
yeah I know, that the last entry has two dates with different times. It just seemed odd to me that this is only happening for the last line. this screws up my plots a bit so I just skip the last line :)
Now, yes, there might be more than one data point (line) per day in the RL data set. But only ever for the last day in the data set.
This is by design: as I said above, the RL data set is supposed to be recent.
The timestamp of each data point is provided with a time resolution of 1 hour. When you consume time series data then in general it's good advice to maybe not expect equidistant samples :-).
I want to have one source for both cases and deaths.
RKI and RL data are the same source: Gesundheitsaemter.
Fair enough, but why don't you use the RKI data set for that? I guess you're saying that it does not appear to be 'fresh' enough?
I get that you really want to have the latest RKI data point, and well, yeah. We can do that. :)
So the easiest thing for me would be to copy the code which generates the csv from your repo to my dashboard
You're absolutely welcome to do with the code you find in this repository whatever you'd like, subject to the License declared in the file header(s). And of course -- if you find a meaningful way to run a CPython interpreter with pandas as part of 'your dashboard' then this sounds like a great approach!
Any chance you could publish the environment URLs? Also do you have a specific time when you update your files?
from covid-19-germany-gae.
Any chance you could publish the environment URLs?
please have a look at #208 :)
from covid-19-germany-gae.
Any chance you could publish the environment URLs?
please have a look at #208 :)
Thank you! Could you also publish the Risklayer URL?
from covid-19-germany-gae.
Thank you! Could you also publish the Risklayer URL?
I use the official Risklayer GmbH Google sheet, based on which I constructed a CSV export URL (using the magic ingredient pub?output=csv
):
export RISKLAYER_HISTORY_CSV_URL="https://docs.google.com/spreadsheets/d/e/2PACX-1vTiKkV3Iy-BsShsK3DSUeO9Gpen7VwsXM_haCOc8avj1PeoCIWqL4Os-Uza3jWMEUgmTrEizEV-Itq5/pub?output=csv"
no guarantees implied :).
from covid-19-germany-gae.
Quick update: we're very close to having multiple updates per day in this repository now; all done automatically.
from covid-19-germany-gae.
To make this a little easier to understand: the RKI data for today should be looked at in about a week, only then it's not really subject to changes anymore :-).
This is especially true for the cumulative count of COVID-19-attributed deaths. I have tried to explain some of that in this Twitter thread: https://twitter.com/gehrcke/status/1343602019651760129
from covid-19-germany-gae.
Also do you have a specific time when you update your files?
There are now multiple updates per day done automatically @gerbsen.
from covid-19-germany-gae.
Related Issues (20)
- RKI data: death rate seems to be bogus; dropping towards 0 HOT 1
- DEU variants HOT 2
- Risklayer Deutschland Zahlen
- No RKI updates since 4 days ... (Landkreis 16056 disappeared from RKI data set) HOT 16
- generate-latest-aggregate.py: KeyError: '16056' HOT 1
- Keine Updates mehr? Schade. HOT 3
- Double-check population count of Landkreis 16063
- Update Fehler HOT 6
- Is there a detailed description of the raw RKI_COVID19.csv -- AnzahlFall 131 ? HOT 6
- Data updates keep overwriting the newest line instead of appending HOT 2
- Feature Request: Single CSV-File with current state HOT 28
- auto update fails as of arcgis system downtime HOT 2
- No updates since 27.01.2023 HOT 3
- change heatmap scale to be absolute
- Discrepancy to RKI data of yesterday while today is accurate. HOT 7
- Time Shift in Gehrcke VS RiskLayer HOT 4
- Discrepancy in Deaths - Gehrcke vs. Risklayer HOT 5
- RKI data update: currently broken because of AGS 9178/9179 history damage in Covid19_RKI_Sums ArcGIS feature server HOT 1
- Potential error in calculation of daily deaths HOT 4
- cases-rki-by-ags.csv - implausible data for entity 8126 - Hohenlohekreis HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from covid-19-germany-gae.