Code Monkey home page Code Monkey logo

oscovida / oscovida Goto Github PK

View Code? Open in Web Editor NEW
32.0 7.0 18.0 57.59 MB

Explore COVID19 case numbers and deaths related to Coronavirus outbreak 2019/2020 in Pandas and in Jupyter notebook with MyBinder

Home Page: https://oscovida.github.io

License: BSD 3-Clause "New" or "Revised" License

Jupyter Notebook 99.14% Python 0.61% Makefile 0.02% Dockerfile 0.01% CSS 0.07% JavaScript 0.02% HTML 0.14% Shell 0.01%
jupyter covid-19 python covid-analysis

oscovida's Introduction

Most content has moved to the temporary website https://oscovida.github.io

The remaining content here is only for developers.

tests Generate Webpage codecov project chat


Disclaimer

The plots and code here has been put together by volunteers who have no training in epidemiology. THere are likely to be errors in the processing. You are welcome to use the material at your own risk. The license is available.

Acknowledgements

  • Johns Hopkins University provides data for countries
  • Robert Koch Institute provides data for within Germany
  • Open source and scientific computing community for the data tools
  • Github for hosting repository and html files
  • Project Jupyter for the Notebook and Binder service
  • The H2020 project Photon and Neutron Open Science Cloud (PaNOSC)

oscovida's People

Contributors

betatim avatar catchears avatar fangohr avatar kirienko avatar ko-work avatar robertrosca avatar slel avatar tmichela avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

oscovida's Issues

PR Specific Testing - Website Generation

Could make a GitHub Workflow which executes only on a PR and triggers a website build into a separate staging directory, would let us see if the website generation is still working correctly without having to pull in the branch locally and run the website generation.

Could be done by creating a gh-pages branch on this repository where we build the html into a staging directory, such that going to oscovida.github.io/oscovida/staging would show the test-generated website, but that would end up with the html files and plots being committed to this repo and massively increasing its size, and the url is a tad confusing/could clash with the existing one.

Could also be put into a separate directory on the oscovida.github.io repo itself, but that could get quite confusing as well, and might clash with the self-hosted runner that generates the reports twice a day.

Best option would probably be to make another repository (called staging?), then with a bit of work, a PR would trigger a build in the staging repository, and then we could go to oscovida.github.io/staging/PR_NUMBER to see what the website looks like.

Using the debug flag on the report generator limits it to 10 reports per region, which would also mean this can easily be ran on the GitHub provided runners as a few dozen reports won't take too long to execute even on the single-core runners. Use something like:

python -m report_generators.cli -r all --debug --log-level INFO

Hungary Missing Death Numbers Break Table Sorting

Since Hungary has no data for deaths a "missing" string is used instead, this breaks the sorting for the all-countries table as it treats the column as a string column.

Easy fix would be to specify a missing value like -1, but at a glance this looks odd if you're not familiar with using negative values for errors.

Other option is to find a way to specify a null value for the js datatable so that it treats the missing data correctly and also clearly marks it as missing.

Karsruhe and Munich

@fangohr wrote:

Halimah approach me to ask why we have two entries for Munich and two entries for Karlsruhe in our list at https://oscovida.github.io/germany-incidence-rate-14day-20cases.html

Halimah, thank you for pointing this out.

I am pretty certain that this is because one of them is a Landkreis (LK) and the other one is the Stadtkreis (SK). So SK Munich should be a subregion of LK Munich.

For the table on https://oscovida.github.io/germany-incidence-rate-14day-20cases.html, we have removed the SK and LK labels from the name of the region to allow alphabetic sorting by district name. But we hadn’t realised that this can cause confusion as you describe.

So, Halimah, I hope I have answered your question.

Robert, Yury, could you update the code to not remove the LK and SK labels, but to move them to the end of the district name? I.e. transfer “SK Hamburg” to become “Hamburg (SK)” ? I think that might be an easy solution to solve this.

Create additional regional data sets for Germany

(i) Sum the numbers for all Landkreise in each Bundesland, and add Bundeslaender to list of regions for Germany.

(ii) Less important: Add all numbers for Germany, and also add this to the list. (This allows to compare the whole country data from RKI with the data we have already from JHU).

Move files for binder in its own repository

For binder, we only need the ipynb subdirectory, the coronavirus package, and requirements.txt and apt.txt.

Currently, this is all in the wwwroot repository (which is large because of the html versions of the notebooks). Binder needs to clone this big repo everytime we start a binder session.

Thus:

  • create a new and extra repository for binder files
  • and update generate-countries and other code accordingly

Add RKI/JHU test data to test suite

The RKI just changed the download/access location of their data. This upsets continuous integration. It is also non-trivial to check that another location provides the right data set once this is found.

For clarity, we should add (at least some rows) of the data we expect to download to the repository. This could be part of the tests:

  • download test data (RKI/JHU) and add to test suite

  • change existing tests to use that downloaded data. This tests that the code works with the expected structure of the data, and documents the structure.

  • add additional test to download live data, to be able to test that this works when required. This checks that (i) the data location is correct and (ii) the structure of that file is compatible with the code.

Binder URL broken

In the notebooks generated (such as https://oscovida.github.io/html/South-Africa.html), there is a "Execute this Jupyter Notebook using myBinder" link at the top of the notebooks (which works) and a second one further down in the markdown before cell [5] which reads "click here to use myBinder".

The second one hasn't got the right link inserted (thus leading to a 400 error).

Template Source Acknowledgements

The new more generic template generation is cool. There is just one minor observation with the "Download of data from Johns Hopkins university:" part at the bottom of the notebooks. Shouldn't this part change dynamically too?
For example Robert Koch Institute in the case someone clicked on a region in Germany or the github/sanbrock page if someone is looking at a region in Hungary.

Improve doubling time plots: decouple y-axis for death doubling time from y-axis for cases doubling time

In some countries, the rates for increases in deaths is much different from that for cases. See below for a plot.
Screenshot 2020-10-14 at 08 28 29
In this example, we cannot read what the doubling time for infections is at the moment, because the y-axis is chosen to accommodate the maximum death doubling time, which is very large here.

To keep the plot more useful, we could use the left axis to label cases and the right axis for deaths.

Deaths doubling time rolling average is extremely sparse

I noticed that for some regions (say, Hamburg) the 7-day rolling mean of doubling time function is so extremely sparse that it is completely useless:

image

The reason for that is two-fold: firstly, if our function is constant, then according to the formula, the double_time_exponential function naturally returns inf which we turn into NaN; and then the rolling function by default just skips those gaps.

I found an easy fix for that, hold on.

Normalise plots by total population

For some comparisons it is useful to talk about the number of infections/deaths per million citizens rather than in absolute numbers. It would be nice to have that option.

Create framework for notebook that is executed everyday

The idea is to host a (set/subdirectory of) notebook(s) somewhere in the repository, which is re-executed everyday when the html is re-computed.

This could produce a table or plot that we want to update every day, for example a table of countries, ordered according to incidence.

This could also be used as a playground to explore new analysis before we accommodate this into our html/javascript tables from which countries are selected: it should be much easier to try this in a notebook. Or it could provide a permanent link and daily-updated source of information if the notebok is a good format.

An additional benefit is that if we include some analysis in the notebook that leads towards the table/plots, this can be parts of the tutorial that demo what can be done with oscovida.

Task for this issue: enable automatic processing of arbitrary notebooks with a trivial notebook as an example. First thing we could try as an example: compute the mortality rate as requested in issue #90.

Plots not receiving new data?

This morning the notebooks all executed and the new pages were generated, but the plots only went up to the 25th. Same thing happened this evening as well.

CI logs:

Logs show everything went fine, and the "Notebook executed on: ..." print statements at the top show that the notebooks were re-executed, so it looks like the data has stopped coming through after the 25th? I'll investigate this tomorrow.

Daily new deaths below one on log scale

image
The picture above is the output of make_compare_plot_germany("Hamburg"). There are two issues there. The first is that plots go beyond the frame, because functions' values are in the range (0, 1). The second issue is rather conceptual: it's unclear how should we understand the "daily number of death" in between zero and one. (It's okay for Schrödinger's cat but not for a real person.)

The same is applicable for all German Lands, i.e. that's an issue of make_compare_plot_germany().

Improve doubling time plots: limit maximum of the y-axis to some fixed value?

Screenshot 2020-10-14 at 08 31 51

In the example above, there are some very large values, which make the interested line hard to read. An easy to implement fix would be to use a maximum value for either scale of at most 1000 days (or a similar number).

Or introduce this maximum as a parameter in the function call, so that one can create one's own plots quite easily where this number can be changed?

Remove outdated page https://oscovida.github.io/countries-incidence-rate.html

This seems to be remaining from development work: https://oscovida.github.io/countries-incidence-rate.html, and not updated.

We should

  • make it update daily
  • add links (URLs) to the Location column

Same applies for the Germany counterpart page at https://oscovida.github.io/germany-incidence-rate.html

[The same data is available at https://oscovida.github.io/germany-incidence-rate-14day-20cases.html with the highlighting if numbers go above 20. ]

Negative daily cases for Spain

It must be an error in the data (negative new cases), but it doesn't make too much sense to have negative numbers in daily change plots.
Spotted in the plot for Spain (attached screenshot)

imagen

A tests for this issue could be useful.

Provide a second overview plot which shows only the recent past

In many regions, the big peak has passed, and the maximum sets the scale in many plots.

Would be useful to just show the recent pass to be able to better assess effectiveness of containment measures.

The last 4 weeks?

Might need an update to the overview function to allow this. Would be good if it could be done flexibly so it can be used for interactive exploration.

Compute trend indicators for every country (/ region)

Similar to what is shown at https://www.zdf.de/nachrichten/heute/coronavirus-ausbreitung-infografiken-102.html, it would be nice to get some data from each region indicating if:

  • number of new daily cases grows, stays the same, shrinks. Example calculation: average over last 7 days, and 7 days before that, compare the averages. If difference is small, say there is no change. If different is large, deduce an increasing or decreasing trend.

  • quantify the above by providing the percentage change from week to week.

  • probably useful to also show how the absolute numbers change from week to week (in a different column)

I suggest to initially create a table showing the above (maybe countries only?) and to update this daily (using #91).

Region support for all countries

I found that when parsing JHU data, we just throw away all region data by summing them up. Then we built a complex workaround for managing US states data.

Instead, we could treat all countries equally and have all country regions, supported by JHU, for cheap. It requires some refactoring but not a dramatic one. (We already have working code for Germany/RKI)

Run NBVAL on our notebooks

We had a number of examples now where interface changes have led to problems in notebooks that use these functions, which we have only noticed later.

I suggest to run purest --nbval-lax on our notebooks to make sure we notice this problems earlier in the future.

NBVAL: http://nbval.readthedocs.io

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.