Code Monkey home page Code Monkey logo

Comments (17)

gerbsen avatar gerbsen commented on May 13, 2024 1

I want to have one source for both cases and deaths. Having multiple source with different dates is confusing. I also need daily updates for the current date, since I want to show other decision makers current numbers. Also the risklayer file seems to have duplicate days per line (as shown in your example above). So the easiest thing for me would be to copy the code which generates the csv from your repo to my dashboard, I guess.

from covid-19-germany-gae.

jgehrcke avatar jgehrcke commented on May 13, 2024

Well, I don't think there is an issue (or at least we didn't properly describe it yet), the slight delay is "on purpose" as in "it's fine" as in "it encourages to ask the right questions", as you did here. Thanks for asking.

I'd love you to have a look at #93. It makes sense to look at the RKI timeseries data only up to ~3 days ago -- the RKI data for "the last few days" are a little skewed.

To make this a little easier to understand: the RKI data for today should be looked at in about a week, only then it's not really subject to changes anymore :-).

I might look into doing a daily update here, but then people look at today's data without asking good questions.

What do you think?

from covid-19-germany-gae.

gerbsen avatar gerbsen commented on May 13, 2024

from covid-19-germany-gae.

gerbsen avatar gerbsen commented on May 13, 2024

Could you tell me, what I would need to do myself for the daily version?

from covid-19-germany-gae.

jgehrcke avatar jgehrcke commented on May 13, 2024

IMHO it makes sense to have daily numbers

We definitely have daily numbers! :-) It's just recommendable to choose between Risklayer data and RKI data depending on which time frame you're looking at.

Use RKI-provided numbers for up to 2-3 days ago, use the Risklayer data set for today and the last 2-3 days. The tail end of the Risklayer data set for example today (Sep 29):

2020-09-28T01:00:00+0000,4703,7689,20074,2366,68653,18677,10576,49693,67398,3341,14195,4257,1154,7098,2604,4061,286539
2020-09-29T01:00:00+0000,4742,7749,20269,2385,69213,18798,10678,50048,67757,3343,14326,4266,1165,7174,2635,4070,288618
2020-09-29T09:00:00+0000,4742,7749,20269,2385,69213,18798,10678,50048,67757,3343,14326,4268,1165,7174,2635,4070,288620

straight from: https://github.com/jgehrcke/covid-19-germany-gae/blob/master/cases-rl-crowdsource-by-state.csv

Could you tell me, what I would need to do myself for the daily version?

Which data set (specific CSV file / files) are you interested in generating yourself? The tooling is all in this repository, and we can certainly try to better document how to use it.

from covid-19-germany-gae.

jgehrcke avatar jgehrcke commented on May 13, 2024

Can continue discussing, but closing this for now!

from covid-19-germany-gae.

gerbsen avatar gerbsen commented on May 13, 2024

Which data set (specific CSV file / files) are you interested in generating yourself? The tooling is all in this repository, and we can certainly try to better document how to use it.

It seems to me that AG_RKI_SUMS_QUERY_BASE_URL is not defined in the code.

from covid-19-germany-gae.

jgehrcke avatar jgehrcke commented on May 13, 2024

I also need daily updates for the current date, since I want to show other decision makers current numbers.

I sense some frustration here :-). Note that this is a free time project!

At the same time your ask(s) is/are a little ambiguous. I'll try to address your points.

the risklayer file seems to have duplicate days per line (as shown in your example above)

"Duplicate days per line" is a funny description... aehm, let me clarify: each line reflects one data point. A data point is comprised of a timestamp, and a set of numeric values.

Now, yes, there might be more than one data point (line) per day in the RL data set. But only ever for the last day in the data set.

This is by design: as I said above, the RL data set is supposed to be recent.

The timestamp of each data point is provided with a time resolution of 1 hour. When you consume time series data then in general it's good advice to maybe not expect equidistant samples :-).

I want to have one source for both cases and deaths.

RKI and RL data are the same source: Gesundheitsaemter.

Fair enough, but why don't you use the RKI data set for that? I guess you're saying that it does not appear to be 'fresh' enough?

I get that you really want to have the latest RKI data point, and well, yeah. We can do that. :)

So the easiest thing for me would be to copy the code which generates the csv from your repo to my dashboard

You're absolutely welcome to do with the code you find in this repository whatever you'd like, subject to the License declared in the file header(s). And of course -- if you find a meaningful way to run a CPython interpreter with pandas as part of 'your dashboard' then this sounds like a great approach!

from covid-19-germany-gae.

jgehrcke avatar jgehrcke commented on May 13, 2024

The RKI data files now contain the most recent data points.

from covid-19-germany-gae.

jgehrcke avatar jgehrcke commented on May 13, 2024

This screenshot shows the RL data set on the left, and the RKI data set on the right, and hopefully clarifies once again how these data sets relate to one another.
Screenshot from 2020-10-23 14-06-10

For making the point the the RL data set is more accurate for the very recent past here we also see again that it makes sense that, to quote myself from above,

there might be more than one data point (line) per day in the RL data set. But only ever for the last day in the data set.

The screenshot also shows that if you're not particularly interested in today/yesterday, then the RKI data set is more credible (better view into the past).

from covid-19-germany-gae.

gerbsen avatar gerbsen commented on May 13, 2024

I also need daily updates for the current date, since I want to show other decision makers current numbers.

I sense some frustration here :-). Note that this is a free time project!

No no, sorry if this is what you heard. I'm entirely grateful for your project and help. :)

At the same time your ask(s) is/are a little ambiguous. I'll try to address your points.

the risklayer file seems to have duplicate days per line (as shown in your example above)

"Duplicate days per line" is a funny description... aehm, let me clarify: each line reflects one data point. A data point is comprised of a timestamp, and a set of numeric values.

yeah I know, that the last entry has two dates with different times. It just seemed odd to me that this is only happening for the last line. this screws up my plots a bit so I just skip the last line :)

Now, yes, there might be more than one data point (line) per day in the RL data set. But only ever for the last day in the data set.

This is by design: as I said above, the RL data set is supposed to be recent.

The timestamp of each data point is provided with a time resolution of 1 hour. When you consume time series data then in general it's good advice to maybe not expect equidistant samples :-).

I want to have one source for both cases and deaths.

RKI and RL data are the same source: Gesundheitsaemter.

Fair enough, but why don't you use the RKI data set for that? I guess you're saying that it does not appear to be 'fresh' enough?

I get that you really want to have the latest RKI data point, and well, yeah. We can do that. :)

So the easiest thing for me would be to copy the code which generates the csv from your repo to my dashboard

You're absolutely welcome to do with the code you find in this repository whatever you'd like, subject to the License declared in the file header(s). And of course -- if you find a meaningful way to run a CPython interpreter with pandas as part of 'your dashboard' then this sounds like a great approach!

Any chance you could publish the environment URLs? Also do you have a specific time when you update your files?

from covid-19-germany-gae.

jgehrcke avatar jgehrcke commented on May 13, 2024

Any chance you could publish the environment URLs?

please have a look at #208 :)

from covid-19-germany-gae.

gerbsen avatar gerbsen commented on May 13, 2024

Any chance you could publish the environment URLs?

please have a look at #208 :)

Thank you! Could you also publish the Risklayer URL?

from covid-19-germany-gae.

jgehrcke avatar jgehrcke commented on May 13, 2024

Thank you! Could you also publish the Risklayer URL?

I use the official Risklayer GmbH Google sheet, based on which I constructed a CSV export URL (using the magic ingredient pub?output=csv):

export RISKLAYER_HISTORY_CSV_URL="https://docs.google.com/spreadsheets/d/e/2PACX-1vTiKkV3Iy-BsShsK3DSUeO9Gpen7VwsXM_haCOc8avj1PeoCIWqL4Os-Uza3jWMEUgmTrEizEV-Itq5/pub?output=csv"

no guarantees implied :).

from covid-19-germany-gae.

jgehrcke avatar jgehrcke commented on May 13, 2024

Quick update: we're very close to having multiple updates per day in this repository now; all done automatically.

from covid-19-germany-gae.

jgehrcke avatar jgehrcke commented on May 13, 2024

To make this a little easier to understand: the RKI data for today should be looked at in about a week, only then it's not really subject to changes anymore :-).

This is especially true for the cumulative count of COVID-19-attributed deaths. I have tried to explain some of that in this Twitter thread: https://twitter.com/gehrcke/status/1343602019651760129

from covid-19-germany-gae.

jgehrcke avatar jgehrcke commented on May 13, 2024

Also do you have a specific time when you update your files?

There are now multiple updates per day done automatically @gerbsen.

from covid-19-germany-gae.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.