Code Monkey home page Code Monkey logo

Comments (25)

owahltinez avatar owahltinez commented on May 13, 2024 1

Hi @OmarJay1, testing is not widely reported. Some countries report it, others don't. Further, what constitutes a "test" depends on each jurisdiction, for example if you perform 2-3 tests per patient to discard potential false negatives, does that count as 1 test or 2-3 tests? Those are details that make the data for tests very unreliable, which is why it's not reported here.

I would also like to get more data around local circumstances into this dataset, I'm looking into potentially adding some data from Google's mobility report but I haven't had much time to look into it. Wikipedia has time-series information about events but it's distributed in each of the country's article for the pandemic, there's also no region-level reporting of those things.

What demographic metadata are you interested in? If I can find a reliable source and it includes region-level data, I'd be happy to add it into the dataset.

from data.

owahltinez avatar owahltinez commented on May 13, 2024 1

These are two sources for tests I've found.

Personally I think it's irresponsible to use unreliable data which can be easy to misinterpret. Even JHU's dataset, which is very liberal with the data it reports, stopped reporting test counts.

For demographic data, at the very least:

That's not a bad idea. Sadly it will probably only be feasible to do with the US at the region level, but we should be able to find that info for all at the country-level datapoints.

Weather data would interesting to look into issues like transmission and temperature, humidity, and pressure. This data is readily available, but sometimes tricky to get.

I'll look into this, I've been wanting to add weather data for a while.

Other data like mitigation events could be scraped from Wikipedia.

Not unless it is reported using a consistent machine-readable format, like a table of sorts. Unless you want to try to extract that information using NLP, in which case that sounds fairly unreliable.

from data.

owahltinez avatar owahltinez commented on May 13, 2024 1

FYI I have added mobility and government measures datasets which are relevant to this discussion.

from data.

owahltinez avatar owahltinez commented on May 13, 2024 1

No worries. In the meantime, I finally was able to add a weather dataset.

from data.

owahltinez avatar owahltinez commented on May 13, 2024 1

@OmarJay1 I added a screenshot and link to omnimodel.com on the README, let me know if it looks good to you!

from data.

OmarJay1 avatar OmarJay1 commented on May 13, 2024

These are two sources for tests I've found.

https://en.wikipedia.org/wiki/COVID-19_testing
https://covidtracking.com/data

Dumb question, but for Wikipedia data, is there an automated way to scrape data from their tables?

For demographic data, at the very least:

  1. Age distribution for a region (based on your geo region key).
  2. Race distribution.
  3. Income distribution

I think in the U.S. at least that data can be obtained through Census data.

Weather data would interesting to look into issues like transmission and temperature, humidity, and pressure. This data is readily available, but sometimes tricky to get.

Other data like mitigation events could be scraped from Wikipedia.

I plan on gathering all this data myself. I just don't want to recreate anything that's already been done.

Thanks.

from data.

OmarJay1 avatar OmarJay1 commented on May 13, 2024

@owahltinez, that's awesome that you added mobility and government measures. Sorry, I'm getting bogged down with a few things, but I will respond to the points on this thread ASAP.

Thanks.

from data.

OmarJay1 avatar OmarJay1 commented on May 13, 2024

Weather data is way cool. The optimist in me hopes this whole thing will literally melt away once it starts getting hot. Data will show if true or not.

Did you see NYC's data page? They now have a "Probable Deaths" category to complicate things even further.

https://www1.nyc.gov/site/doh/covid/covid-19-data.page

from data.

aurschmi avatar aurschmi commented on May 13, 2024

Hey @owahltinez I really appreciate your dataset and use it to try out forecasting with an LSTM. Unfortuantely, the weather data seems not that up-to-date (most countries have no entries after 26th of April). Do you know why this happens? And could it be improved?:)

from data.

OmarJay1 avatar OmarJay1 commented on May 13, 2024

@owahltinez, I don't know for sure, but in my experience weather data sources can be very temperamental. I'll take a look at it and I apologize that I started this thread and haven't been very active on it because I do IT and have to help people work from home nowadays.

A couple of questions:

  1. @owahltinez I'm also using RNNs (LTSM or GRU) to classify and analyze sequential Covid data. I'm wondering what are the benefits of RNN versus other DNN models. Are they just more efficient with large datasets? Right now I'm experimenting with curve classification so I can automatically tell when curves are changing and/or hotspots are appearing. I'll put what I develop on a Git repository.

  2. I've heard that Facebook has US county level Covid data. Does anybody know where that can be found? I looked for it here, but can't seem to find it. Thanks.

https://dataforgood.fb.com

from data.

owahltinez avatar owahltinez commented on May 13, 2024

@aurschmi thanks for the kind words. Your observations about weather data are correct, they are a couple of days behind for the data source that I'm using. If you can think of a better data source which has more up-to-date weather info, I can incorporate it.

from data.

owahltinez avatar owahltinez commented on May 13, 2024

@owahltinez I'm also using RNNs (LTSM or GRU) to classify and analyze sequential Covid data. I'm wondering what are the benefits of RNN versus other DNN models. Are they just more efficient with large datasets? Right now I'm experimenting with curve classification so I can automatically tell when curves are changing and/or hotspots are appearing. I'll put what I develop on a Git repository.

I would be surprised if ML can be used effectively to solve this problem when we have decades of research into epidemiology and fine-tuned formulas that take into account things like reinfection, multiple waves, reactions of the population, etc. But I would love to be proven wrong!

I've heard that Facebook has US county level Covid data. Does anybody know where that can be found? I looked for it here, but can't seem to find it. Thanks.

That's a pretty interesting link, I might add the population density as a table to this repo. I can't find any source claiming that FB has country-level data but I find the NYT source to be very high-quality, and only 1-2 days behind: https://github.com/nytimes/covid-19-data

from data.

aurschmi avatar aurschmi commented on May 13, 2024

@owahltinez Thanks for your reply. I am also not aware of a better source.
@OmarJay1 I agree with owahltinez that, given the yet quite sparse and noisy data, it is unlikely that LSTMs will be able to do meaningful predictions. Nevertheless it is fun to try it. :) And if you look at the models which are used by epidemiologists, you might be surprised that they are not that sophisticated. But of course they are applied by people with experience which is the added value other data scientists are missing.

from data.

OmarJay1 avatar OmarJay1 commented on May 13, 2024

@aurschmi, the models are all wrong, and using real data to demonstrate the failings of the naive exponential models is important. I hope to get some DNN stuff posted this week.

from data.

OmarJay1 avatar OmarJay1 commented on May 13, 2024

On the subject of DNNs, it's a topic on Kaggle. https://www.kaggle.com/covid19

from data.

owahltinez avatar owahltinez commented on May 13, 2024

the models are all wrong

@OmarJay1 I very much disagree with this. Here are three examples from a very unsophisticated logistic model for GB, DE and IT that show how well you can fit the data with a minimum amount of variables:

Screen Shot 2020-05-04 at 13 29 10

Screen Shot 2020-05-04 at 13 29 30

Screen Shot 2020-05-04 at 13 30 39

I imagine that a generalized logistic function would perform even better, although it may require more data given the larger number of parameters.

On the subject of DNNs, it's a topic on Kaggle. https://www.kaggle.com/covid19

That's for applying NLP on research papers. I honestly don't know what they were thinking, research papers are already hard enough to understand by experts in the field and there is so much information hidden in very nuanced wording.

from data.

OmarJay1 avatar OmarJay1 commented on May 13, 2024

Hi @owahltinez, unless I see the actual math, variables, and data behind those charts I can't really evaluate them. What I was getting at is things like in New York where Gov Cuomo grossly over-estimated the number of hospital beds and ventilators.

If you scroll down at the Kaggle page, you'll find this:
https://www.kaggle.com/c/covid19-global-forecasting-week-4

from data.

owahltinez avatar owahltinez commented on May 13, 2024

The math is here. Considering what repo this issue tracker is for, I'll let you guess where the data is coming from ;-)

What I was getting at is things like in New York where Gov Cuomo grossly over-estimated the number of hospital beds and ventilators.

Fancy maths and ML only worsen the ability for the layperson to understand estimates made by authorities. In my opinion, it's better to have a decent, simple model with a few caveats that are understood rather than a potentially great model with thousands of parameters and many unknowns.

from data.

OmarJay1 avatar OmarJay1 commented on May 13, 2024

Thanks @owahltinez. I'll take a look at that code.

I totally agree with you on the "fancy maths" issue. What I'm trying to do is sort of "reality check" statements and assumptions by public officials.

For instance the White House is claiming that Covid deaths will reach 3,000 per day by early June.

https://www.cnbc.com/2020/05/04/coronavirus-trump-administration-projects-3000-deaths-per-day-by-june.html

I'm assuming they;re talking about the United States, but the article doesn't say for sure.

I'm working on this simple site, which shows that Covid deaths in the US are at about 5/million per day or about 1,650 total, and the curve is trending down. How are we going to get to 3,000 per day in a month? What do simple models suggest?

https://omnimodel.com/graph1?key=US

My apologies for the "work in progress" site. I'm doing the best I can with limited time, and I don't guarantee any accuracy.

On the Sweden question which lots of people are talking about, here's an interesting chart.

https://omnimodel.com/graph1?key=US,SE

And again, that you so much for gathering the data and providing the code.

from data.

owahltinez avatar owahltinez commented on May 13, 2024

For instance the White House is claiming that Covid deaths will reach 3,000 per day by early June.

That's an... interesting prediction. I did some quick back-of-the-napkin estimate and I'm not 100% sure that the model works for death-related data as well as for confirmed cases -- but it looks like it works reasonably well for cumulative / moving average data. Red is published data, and orange/blue are the model estimates:

download

Daily data is, of course, a lot more spurious so the model does not look like such a great fit but it gives a reasonable expectation of where the moving average should be:

download

Based on those estimates, I find the 3,000 daily deaths figure highly unlikely. You can see the method I used here.

from data.

owahltinez avatar owahltinez commented on May 13, 2024

My apologies for the "work in progress" site

I think that looks very cool! What sources are you using for your data? If it's coming from this repo, I'd be happy to add it to the README whenever you are comfortable sharing that link with everyone that lands in this repo.

from data.

OmarJay1 avatar OmarJay1 commented on May 13, 2024

Hey Oscar, I'm getting the data from here. That would be cool if you could post the link. I'll link back to this site as the source for the data. I was holding off until making sure it was OK. Thanks.

from data.

OmarJay1 avatar OmarJay1 commented on May 13, 2024

To make things even more complicated, there appear to be different strains of Covid-19.

"Spike mutation pipeline reveals the emergence of a more transmissible form of SARS-CoV-2"

https://www.biorxiv.org/content/10.1101/2020.04.29.069054v1.full.pdf

Is the strain in New York and around the Northeast US different than in the West Coast? I've read that New York's origin probably came from Italy, while California's probably came from China.

from data.

OmarJay1 avatar OmarJay1 commented on May 13, 2024

@owahltinez it's awesome. Thank you so much.

from data.

owahltinez avatar owahltinez commented on May 13, 2024

One last update before closing this issue pertaining to the original title: number of tests are now being reported for all regions that have that data available.

Feel free to continue the discussion even after the issue is marked as closed :-)

from data.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.