PS239 project
This projects scrapes data from Taiwan Sinica's website, and use the spatial information in the data to show some graphs.
- R,
- Python, version 3.5, Anaconda distribution.
- bio_hist.csv: Contains data from the Taiwan Sinica collected via get_biohist.py. Includes information on records of famous people during Ming and Qing dynasty. This files is using UTF-8 encoding.
- bio_hist_gis.csv: The final Analysis Dataset after cleaning bio_hist.csv.
- light05chn.tif: Night light raster data in 2005 for China, raw data from NOAA.
- 1999County: Shapfiles for Chinese county borders.
- 01-get_biohist.py: Scrapes the Sinica website to get historical biography data
- 02-data_clean.R: Cleans the raw datasets scraped.
- 03-bio_gis.R: Conducts descriptive analysis of the data, producing the tables and visualizations found in the Results directory.
- fig1_bio_pts.png: Visualize spatial distribution of those historical records.
- fig2_bio_county.png: Visualize spatial distribution of the record after aggregating to county level.
- fig3_light_raw.png: Visualize the raw raster data of night light.
- fig4_light_county.png: Visualize spatial distribution of the night light after aggregating to county level.
- reg_results.html: Simple regression result.
- reg.png: Visualize the regression.