parsing-gas

This repository contains and data for the project "Parsing Gas: a Scalable Pipeline for Nearest Neighbour Calculations on Spatial Data". The overarching goal of the project is to answer the following question: What ist the distribution of distances from gas-heated buildings to the district-heating network in Denmark?.

Data

The repository depends on data from the BBR ("Building and Housing Register") - a public database maintained by the Housing Agency. Below are a description of the data artifacts.

Filename	Description	License	Source	Generated by `reproduce.sh`
`BBR_Aktuelt_Totaludtraek_XML_20220517180008.zip`	The complete 65GB dataset from the BBR	NA	datafordeler.dk	❌
`bbr_clean.csv`	Processed dataset	MIT	Instructions here	✔️
`output/gas_fjernvarme_xy.csv`	Euclidean distance to district heating network for each gas-heated building	MIT	Generated by `analyse_distances.py`	✔️
`output/{KOMMUNE-ID}_road_dist.csv`	Comparison of road distance and Euclidean distance for a specific municipality (see codes here	MIT	Generated by `analyse_road_dists.py`	✔️

Reproducing the results

TL;DR: An example of the entire setup and running the pipeline can be run using the bash-script reproduce.sh.

Below I explain how to reproduce the analyses and plots of my report.

Setting up the Environment

This project uses mamba, a blazingly fast cross-platform package manager for data science. As described in their docs, it is most easy to install through either miniconda or anaconda so make sure to have one of these installed on your system! After that it is as easy as running the setup.sh script in a bash terminal.

The dependencies of this project are in two yml-files. full_environment.yml has the minimal dependencies and is the file used by setup.sh. frozen_env.yml has the complete 'frozen' environment exactly as was used on my machine. If there are any problems with the setup script it might be a good idea to install directly from the frozen environment with the following command:

mamba env create -f frozen_env.yml

Tests

Parts of the project are developed using a test-driven development framework using pytest. The tests can be run using the following commands:

python -m pytest --cov-report term --cov ./src

This will print a coverage report to the terminal.

Downloading the data

The formatted data is stored in a .csv-file in Google Drive. It can be downloaded manually by following this link, and unzipping the file to the data/raw directory. However, the recommended way is to run the download_data.sh as this does it all automagically.

Description of the Scripts

Below is a high level overview of the different scripts in the repo in relation to the analysis pipeline:

Name	Component of Pipeline	Description	Part of `reproduce.sh`
`extract_bbr.py`	1. Extract BBR	Parses building information from the full BBR xml	❌
`format_bbr.py`	2. Format to CSV	Extracts relevant columns to a .CSV	❌
`analyse_distances.py`	3. Find Nearest District Heating	Does Euclidean distance calculations	✔️
`analyse_road_dists.py`	4. Compare Road Distances	Compares Euclidean Distances for Aabenraa and Gentofte respectively	✔️
`plot_dists.R`	5.1 Plot Distributions	Plots distribution of distances (found here)	✔️
`leaflet_map.R`	5.2 Create Map	Creates an interactive map of gas-heated buildings and their distance	✔️

All of the python scripts are documented using argparse. This means that full documentation can be found using the --help-flag.

Description of /src

To improve coherence and make the code more SOLID I have refactored much functionality into a /src directory. An overview can be seen below:

Name	Description	Part of tests
`extract.py`	For parsing the BBR data efficiently	✔️
`wrangle_bbr.py`	Formats the BBR data to a readable format	✔️
`geo_transform`	Transforms the data into coordinates	✔️
`util.py`	Simple helper functions for reading and writing files	❌

rysias / parsing-gas Goto Github PK

parsing-gas's Introduction

parsing-gas

Data

Reproducing the results

Setting up the Environment

Tests

Downloading the data

Description of the Scripts

Description of /src

parsing-gas's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent