Code Monkey home page Code Monkey logo

citybikesdata's Introduction

CityBikes Data Dashboard

An interactive dashboard that displays local public bike sharing availability, powered by a simple ETL pipeline that collects data from the CityBikes API.

Instructions

To run locally

Prerequisites:

  1. Docker
  2. Docker Compose v1.27.0 or later

Clone repo and run following commands inside project folder to spin up docker containers:

docker compose --env-file env up --build -d # starts containers
sleep 15    # wait for containers to finish starting
docker exec pipelinerunner yoyo develop --no-config-file --database postgres://postgres1:Password1@warehouse:5432/citybikes ./migrations  

# OR use this simple make command:
make up

Pipeline will automatically start running in 'pipelinerunner' container. Confirm container statuses using:

docker ps

To use Metabase instance and start building dashboard, navigate to http:localhost:3000

To run on cloud (ec2)

Prerequisites:

  1. AWS CLI
  2. Terraform

Run the following commands from inside project folder to spin up ec2 instance:

terraform -chdir=./terraform init   # must first initialize 
terraform -chdir=./terraform apply  # will compare changes and ask to approve

If successful, the same containers that were running locally should now be running on the ec2 instance. Thus you can navigate to http:YourEc2AddressHere:3000 to access the cloud's metabase instance.

If not working, most likely one of the 'setup' steps from 'main.tf' (line 123-153) did not complete. Use the following make command to start ssh session with your ec2 instance and troubleshoot and rerun relevant commands from the 'main.tf' file.

make ssh-ec2    # start ssh with ec2 instance

Architecture

Pipeline

The pipeline begins with simple get requests to the API. These requests happen every 5 minutes and all responses are immediately stored in a single table. This single table was meant to act as a Data Lake so that I could preserve the raw data and have as many options later down the road for development and/or analysis.

The next step is to process the API responses from the previous step and update relevant tables. These new tables were meant to represent a Data Warehouse. There are two of these tables, one for each API endpoint, which we will go over later.

The last step is a transformation step where a list of transformations are performed on the two tables from the previous step to satisfy reporting needs.

The last two steps currently run every 30 minutes to reduce resource costs.

Finally, a Metabase instance is spun up and connected to all three database tables which is used to build the dashboard.

Infrastructure

The majority of code written to facilitate the architecture above is written in Python w/ Pandas. PostgreSQL is used for all data storage. Apache Airflow is used for scheduling.* Metabase is used for data visualization and dashboard creation. All these components are containerized using Docker and are currently running on an AWS EC2 instance as our production environment. Github is used for version control and repository hosting. Github Actions is used to automate CI/CD workflows such as consistency measures and pushing new changes to the production environment via ssh(rsync). Terraform is used for IaC, storing all settings and parameters for the EC2 instance. Making it easy to modify the instance when needed.

Currently using Cron to save money as the Airflow instance was consuming too much RAM.

infra

citybikesdata's People

Contributors

cowboychicken avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.