oscovida / oscovida Goto Github PK

Explore COVID19 case numbers and deaths related to Coronavirus outbreak 2019/2020 in Pandas and in Jupyter notebook with MyBinder

Home Page: https://oscovida.github.io

License: BSD 3-Clause "New" or "Revised" License

Jupyter Notebook 99.14% Python 0.61% Makefile 0.02% Dockerfile 0.01% CSS 0.07% JavaScript 0.02% HTML 0.14% Shell 0.01%

jupyter covid-19 python covid-analysis

oscovida's Introduction

Most content has moved to the temporary website https://oscovida.github.io

The remaining content here is only for developers.

Disclaimer

The plots and code here has been put together by volunteers who have no training in epidemiology. THere are likely to be errors in the processing. You are welcome to use the material at your own risk. The license is available.

Acknowledgements

Johns Hopkins University provides data for countries
Robert Koch Institute provides data for within Germany
Open source and scientific computing community for the data tools
Github for hosting repository and html files
Project Jupyter for the Notebook and Binder service
The H2020 project Photon and Neutron Open Science Cloud (PaNOSC)

oscovida's People

Contributors

Stargazers

Watchers

Forkers

mgrubisic betatim data2700 xd3262nd eykomm kirienko hellosze paritoshg jacg alex-treebeard ko-work josh-gree egorsobolev juliademi slel catchears ntclai

oscovida's Issues

Visualise distribution of numbers of cases per region / subregion (example: Germany)

show how histogram of incidence numbers per region changes over time

Draft idea in tools/pelican/content/ipynb/daily-distribution-new-cases-germany.ipynb

Legends missing and in wrong place

Taken from https://oscovida.github.io/html/Germany.html :

(i) the legend for the doubling time (lowest plot) is missing

(ii) the legend for the daily new deaths is wrong (it reads doubling time). Seems to apply to all countries.

PR Specific Testing - Website Generation

Could make a GitHub Workflow which executes only on a PR and triggers a website build into a separate staging directory, would let us see if the website generation is still working correctly without having to pull in the branch locally and run the website generation.

Could be done by creating a gh-pages branch on this repository where we build the html into a staging directory, such that going to oscovida.github.io/oscovida/staging would show the test-generated website, but that would end up with the html files and plots being committed to this repo and massively increasing its size, and the url is a tad confusing/could clash with the existing one.

Could also be put into a separate directory on the oscovida.github.io repo itself, but that could get quite confusing as well, and might clash with the self-hosted runner that generates the reports twice a day.

Best option would probably be to make another repository (called staging?), then with a bit of work, a PR would trigger a build in the staging repository, and then we could go to oscovida.github.io/staging/PR_NUMBER to see what the website looks like.

Using the debug flag on the report generator limits it to 10 reports per region, which would also mean this can easily be ran on the GitHub provided runners as a few dozen reports won't take too long to execute even on the single-core runners. Use something like:

python -m report_generators.cli -r all --debug --log-level INFO

Bug: change in `get_country_data` behaviour?

The notebook template is failing in cell [6] (see https://oscovida.github.io/html/Germany.html) and screenshot:

Clickable Location names are missing in https://oscovida.github.io/germany-incidence-rate-14day-20cases.html

They exist for https://oscovida.github.io/countries-incidence-rate-14day-20cases.html

adjust lower limit of y-scale in log diff plots as function of outbreak

We have a nice fix for the lower limit of the y-axis for the deaths in plots for Germany (for example https://oscovida.github.io/html/Germany-Hamburg-SK-Hamburg.html last figure).

It would be nice to also have this for

the cases plot for German regions
for deaths and cases for countries

Hungary Missing Death Numbers Break Table Sorting

Since Hungary has no data for deaths a "missing" string is used instead, this breaks the sorting for the all-countries table as it treats the column as a string column.

Easy fix would be to specify a missing value like -1, but at a glance this looks odd if you're not familiar with using negative values for errors.

Other option is to find a way to specify a null value for the js datatable so that it treats the missing data correctly and also clearly marks it as missing.

Meno of webpage cannot be navigated on mobile phone

Menu button seems not to have an effect
observed on iOS?

Solutions?

Simple fix to plumage theme availaible?
Use different theme?

Update Website Generation Documentation

Bug in plotting comparison plots

There are gaps in the lines (https://oscovida.github.io/html/Germany-Niedersachsen-LK-Leer.html).

Unify code to create ipynb/html for each country

currently duplication for countries, Germany and US states
- prepare for addition of Spain

Tutorial on how to retrieve population numbers

Karsruhe and Munich

@fangohr wrote:

Halimah approach me to ask why we have two entries for Munich and two entries for Karlsruhe in our list at https://oscovida.github.io/germany-incidence-rate-14day-20cases.html

Halimah, thank you for pointing this out.

I am pretty certain that this is because one of them is a Landkreis (LK) and the other one is the Stadtkreis (SK). So SK Munich should be a subregion of LK Munich.

For the table on https://oscovida.github.io/germany-incidence-rate-14day-20cases.html, we have removed the SK and LK labels from the name of the region to allow alphabetic sorting by district name. But we hadn’t realised that this can cause confusion as you describe.

So, Halimah, I hope I have answered your question.

Robert, Yury, could you update the code to not remove the LK and SK labels, but to move them to the end of the district name? I.e. transfer “SK Hamburg” to become “Hamburg (SK)” ? I think that might be an easy solution to solve this.

rename coronavirus -> oscovida, put on PyPI

Rename the current package name 'coronavirus' to 'oscovida'
update code and tests
update notebooks in dev, tools and notebooks
put on PyPI

Create additional regional data sets for Germany

(i) Sum the numbers for all Landkreise in each Bundesland, and add Bundeslaender to list of regions for Germany.

(ii) Less important: Add all numbers for Germany, and also add this to the list. (This allows to compare the whole country data from RKI with the data we have already from JHU).

Fix missing links for the Germany incidence rate pages

The locations do not have links
Maybe(?) add back in the state to the location

Move files for binder in its own repository

For binder, we only need the ipynb subdirectory, the coronavirus package, and requirements.txt and apt.txt.

Currently, this is all in the wwwroot repository (which is large because of the html versions of the notebooks). Binder needs to clone this big repo everytime we start a binder session.

Thus:

create a new and extra repository for binder files
and update generate-countries and other code accordingly

Add All Regions Table to Index Page

Add small all regions table to the index page as a way to search the data.

Add RKI/JHU test data to test suite

The RKI just changed the download/access location of their data. This upsets continuous integration. It is also non-trivial to check that another location provides the right data set once this is found.

For clarity, we should add (at least some rows) of the data we expect to download to the repository. This could be part of the tests:

download test data (RKI/JHU) and add to test suite
change existing tests to use that downloaded data. This tests that the code works with the expected structure of the data, and documents the structure.
add additional test to download live data, to be able to test that this works when required. This checks that (i) the data location is correct and (ii) the structure of that file is compatible with the code.

Binder URL broken

In the notebooks generated (such as https://oscovida.github.io/html/South-Africa.html), there is a "Execute this Jupyter Notebook using myBinder" link at the top of the notebooks (which works) and a second one further down in the markdown before cell [5] which reads "click here to use myBinder".

The second one hasn't got the right link inserted (thus leading to a 400 error).

Create tutorial notebooks

Demonstrate basic pandas manipulation of data by creating demo notebooks
should solve issue #41 first

Template Source Acknowledgements

The new more generic template generation is cool. There is just one minor observation with the "Download of data from Johns Hopkins university:" part at the bottom of the notebooks. Shouldn't this part change dynamically too?
For example Robert Koch Institute in the case someone clicked on a region in Germany or the github/sanbrock page if someone is looking at a region in Hungary.

tutorial: show how to compute doubling time (of daily numbers)

Improve doubling time plots: decouple y-axis for death doubling time from y-axis for cases doubling time

In some countries, the rates for increases in deaths is much different from that for cases. See below for a plot.

In this example, we cannot read what the doubling time for infections is at the moment, because the y-axis is chosen to accommodate the maximum death doubling time, which is very large here.

To keep the plot more useful, we could use the left axis to label cases and the right axis for deaths.

Some URLs are broken (for example https://oscovida.github.io/html/United-Kingdom.html)

The 'old' version of generating the webpages, did leave a space in the names of countries ("United%20Kingdom"). The new version doesn't do that. While I like the new way, it does break some links (at least one).

For example, the plot on the index page (https://oscovida.github.io) - if it is the UK - has a URL to https://oscovida.github.io/html/United%20Kingdom.html . This should be fixed.

Show how to compute 7-day growth factor

Deaths doubling time rolling average is extremely sparse

I noticed that for some regions (say, Hamburg) the 7-day rolling mean of doubling time function is so extremely sparse that it is completely useless:

The reason for that is two-fold: firstly, if our function is constant, then according to the formula, the double_time_exponential function naturally returns inf which we turn into NaN; and then the rolling function by default just skips those gaps.

I found an easy fix for that, hold on.

Normalise plots by total population

For some comparisons it is useful to talk about the number of infections/deaths per million citizens rather than in absolute numbers. It would be nice to have that option.

Create framework for notebook that is executed everyday

The idea is to host a (set/subdirectory of) notebook(s) somewhere in the repository, which is re-executed everyday when the html is re-computed.

This could produce a table or plot that we want to update every day, for example a table of countries, ordered according to incidence.

This could also be used as a playground to explore new analysis before we accommodate this into our html/javascript tables from which countries are selected: it should be much easier to try this in a notebook. Or it could provide a permanent link and daily-updated source of information if the notebok is a good format.

An additional benefit is that if we include some analysis in the notebook that leads towards the table/plots, this can be parts of the tutorial that demo what can be done with oscovida.

Task for this issue: enable automatic processing of arbitrary notebooks with a trivial notebook as an example. First thing we could try as an example: compute the mortality rate as requested in issue #90.

Check averaging for country comparison plots

From https://oscovida.github.io/html/Mauritius.html

For infection days 59 to 66 (approximately) the daily new numbers looks constant. Can we improve on that? (Looks like we use a rolling average with a sharp cut-off at either end of the averaging interval, which could lead to this behaviour.)

Check adding cases in last week (Landkreise)

Are the numbers for the last week correct in all cases? Here is a suspicious one:

https://oscovida.github.io/html/Germany-Rheinland-Pfalz-SK-Pirmasens.html

on 29 May, the last reported case shown in the table was from 16 May:

However, in the overview, this is shows as one new case in the last 7 days:

Broken link in webpage

On the data sources page (https://oscovida.github.io/data-sources.html), the this link in

The above mentioned sources are listed in this python file. is not working - it keeps going to the same webpage, not the relevant Python file.

Plots not receiving new data?

This morning the notebooks all executed and the new pages were generated, but the plots only went up to the 25th. Same thing happened this evening as well.

CI logs:

Logs show everything went fine, and the "Notebook executed on: ..." print statements at the top show that the notebooks were re-executed, so it looks like the data has stopped coming through after the 25th? I'll investigate this tomorrow.

Daily new deaths below one on log scale

The picture above is the output of make_compare_plot_germany("Hamburg"). There are two issues there. The first is that plots go beyond the frame, because functions' values are in the range (0, 1). The second issue is rather conceptual: it's unclear how should we understand the "daily number of death" in between zero and one. (It's okay for Schrödinger's cat but not for a real person.)

The same is applicable for all German Lands, i.e. that's an issue of make_compare_plot_germany().

Improve doubling time plots: limit maximum of the y-axis to some fixed value?

In the example above, there are some very large values, which make the interested line hard to read. An easy to implement fix would be to use a maximum value for either scale of at most 1000 days (or a similar number).

Or introduce this maximum as a parameter in the function call, so that one can create one's own plots quite easily where this number can be changed?

Add more details about reproduction number calculation

Remove outdated page https://oscovida.github.io/countries-incidence-rate.html

This seems to be remaining from development work: https://oscovida.github.io/countries-incidence-rate.html, and not updated.

We should

make it update daily
add links (URLs) to the Location column

Same applies for the Germany counterpart page at https://oscovida.github.io/germany-incidence-rate.html

[The same data is available at https://oscovida.github.io/germany-incidence-rate-14day-20cases.html with the highlighting if numbers go above 20. ]

Negative daily cases for Spain

It must be an error in the data (negative new cases), but it doesn't make too much sense to have negative numbers in daily change plots.
Spotted in the plot for Spain (attached screenshot)

A tests for this issue could be useful.

Provide a second overview plot which shows only the recent past

In many regions, the big peak has passed, and the maximum sets the scale in many plots.

Would be useful to just show the recent pass to be able to better assess effectiveness of containment measures.

The last 4 weeks?

Might need an update to the overview function to allow this. Would be good if it could be done flexibly so it can be used for interactive exploration.

Plots for Germany have wrong shape

Strange shape of axes in plots for Germany. All landkreise seem affected, for example https://oscovida.github.io/html/Germany-Nordrhein-Westfalen-LK-Gütersloh.html

[I submitted an issue for this yesterday night, but couldn't find it now; maybe something went wrong with the upload.]

Line segments missing in log plot of daily new cases for Brandenburg

Example Germany-Brandenburg with data from 7 April:

No lines are plotting around day 5. Why? Why only here?

(version of code: commit 74772bf)

Compute trend indicators for every country (/ region)

Similar to what is shown at https://www.zdf.de/nachrichten/heute/coronavirus-ausbreitung-infografiken-102.html, it would be nice to get some data from each region indicating if:

number of new daily cases grows, stays the same, shrinks. Example calculation: average over last 7 days, and 7 days before that, compare the averages. If difference is small, say there is no change. If different is large, deduce an increasing or decreasing trend.
quantify the above by providing the percentage change from week to week.
probably useful to also show how the absolute numbers change from week to week (in a different column)

I suggest to initially create a table showing the above (maybe countries only?) and to update this daily (using #91).

NBVAL: http://nbval.readthedocs.io