Code Monkey home page Code Monkey logo

parsing-gas's Introduction

License: MIT R Version 4.1 Python Version

parsing-gas

This repository contains and data for the project "Parsing Gas: a Scalable Pipeline for Nearest Neighbour Calculations on Spatial Data". The overarching goal of the project is to answer the following question: What ist the distribution of distances from gas-heated buildings to the district-heating network in Denmark?.

Data

The repository depends on data from the BBR ("Building and Housing Register") - a public database maintained by the Housing Agency. Below are a description of the data artifacts.

Filename Description License Source Generated by reproduce.sh
BBR_Aktuelt_Totaludtraek_XML_20220517180008.zip The complete 65GB dataset from the BBR NA datafordeler.dk
bbr_clean.csv Processed dataset MIT Instructions here ✔️
output/gas_fjernvarme_xy.csv Euclidean distance to district heating network for each gas-heated building MIT Generated by analyse_distances.py ✔️
output/{KOMMUNE-ID}_road_dist.csv Comparison of road distance and Euclidean distance for a specific municipality (see codes here MIT Generated by analyse_road_dists.py ✔️

Reproducing the results

TL;DR: An example of the entire setup and running the pipeline can be run using the bash-script reproduce.sh.

Below I explain how to reproduce the analyses and plots of my report.

Setting up the Environment

This project uses mamba, a blazingly fast cross-platform package manager for data science. As described in their docs, it is most easy to install through either miniconda or anaconda so make sure to have one of these installed on your system! After that it is as easy as running the setup.sh script in a bash terminal.

The dependencies of this project are in two yml-files. full_environment.yml has the minimal dependencies and is the file used by setup.sh. frozen_env.yml has the complete 'frozen' environment exactly as was used on my machine. If there are any problems with the setup script it might be a good idea to install directly from the frozen environment with the following command:

mamba env create -f frozen_env.yml

Tests

Parts of the project are developed using a test-driven development framework using pytest. The tests can be run using the following commands:

python -m pytest --cov-report term --cov ./src

This will print a coverage report to the terminal.

Downloading the data

The formatted data is stored in a .csv-file in Google Drive. It can be downloaded manually by following this link, and unzipping the file to the data/raw directory. However, the recommended way is to run the download_data.sh as this does it all automagically.

Description of the Scripts

Below is a high level overview of the different scripts in the repo in relation to the analysis pipeline:

Analysis pipeline

Name Component of Pipeline Description Part of reproduce.sh
extract_bbr.py 1. Extract BBR Parses building information from the full BBR xml
format_bbr.py 2. Format to CSV Extracts relevant columns to a .CSV
analyse_distances.py 3. Find Nearest District Heating Does Euclidean distance calculations ✔️
analyse_road_dists.py 4. Compare Road Distances Compares Euclidean Distances for Aabenraa and Gentofte respectively ✔️
plot_dists.R 5.1 Plot Distributions Plots distribution of distances (found here) ✔️
leaflet_map.R 5.2 Create Map Creates an interactive map of gas-heated buildings and their distance ✔️

All of the python scripts are documented using argparse. This means that full documentation can be found using the --help-flag.

Description of /src

To improve coherence and make the code more SOLID I have refactored much functionality into a /src directory. An overview can be seen below:

Name Description Part of tests
extract.py For parsing the BBR data efficiently ✔️
wrangle_bbr.py Formats the BBR data to a readable format ✔️
geo_transform Transforms the data into coordinates ✔️
util.py Simple helper functions for reading and writing files

parsing-gas's People

Contributors

rysias avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.