This repository contains data in (CSV format) which are scraped from reliable sources (e.g. World Health Organisation).
-
Data are scraped a few times daily and pushed back to this repository together with generated charts (.PNG files).
-
Data scraping are automated with Github Actions.
-
Look for those CSV direct link below to get the scraped historical data.
-
Another repository related to news scraping is available at https://github.com/alext234/coronavirus-news/blob/master/README.md
Below are international stats, excluding China.
Bar chart of the latest snapshot.
Data are scraped from these reports which are in PDF formats. New reports are released daily.
This page has the realtime stats from China. Data are pulled several times a day by the pipeline.
Data is pulled from Department of Health website.
Data are scraped from the MOH (Ministry of Health) local situation web page.
Cases in the US (data are scraped from here)
Chart for US is not plotted due to change in the way stats are collected.
- Jupyter notebooks are used for scraping data and output to CSV files
- These notebooks are executed on a schedule by Github Actions pipeline to scrape new data
- This pipeline also commits back new data to this repository
- Tools: Python3, Jupyter, Pandas, BeautifulSoup and related stuff (e.g. Selenium for web-scraping). It is recommended to start the development environment with this docker image, which is also used for the Github Actions build pipeline.
docker run -p 8888:8888 -it -v $PWD:/stats -w /stats alext234/datascience:latest bash
- requirements.txt contains Python dependencies
pip install -r requirements.txt
- Start Jupyter notebook from inside the container and then visit the browser at
http://localhost:8888
jupyter notebook --allow-root --ip=0.0.0.0
- Feel free to create new issues for any potential data source worth scraping.
- Pull requests are welcomed!
- Stargazers
- Last update from pipeline
- Pipeline status