Code Monkey home page Code Monkey logo

ozkary / data-engineering-mta-turnstile Goto Github PK

View Code? Open in Web Editor NEW
22.0 3.0 3.0 12.73 MB

Data Engineering - Metropolitan Transportation Authority (MTA) Subway Data Analysis

Home Page: https://www.ozkary.com/2023/03/data-engineering-process-fundamentals.html

License: Apache License 2.0

Jupyter Notebook 79.69% Python 1.09% HCL 0.08% Dockerfile 0.04% Shell 0.16% C# 0.27% HTML 18.66%
data-lake docker analysis data-engineering python sql static-analysis vscode bigquery data-modeling data-orchestration data-pipeline data-warehouse dbt jupyter-notebook prefect terraform

data-engineering-mta-turnstile's Introduction

Data Engineering - Metropolitan Transportation Authority (MTA) Subway Turnstile Data Analysis

Written by Oscar Garcia

Twitter @ozkary

Use this project Wiki for installation and configuration information

Announcement and Updates

πŸ‘‰ Join this list to receive updates on new content about Data Engineering Sign up here
πŸ‘‰ Follow us on Twitter
πŸ‘‰ Data Engineering Process Fundamental Series
πŸ‘‰ Data Engineering Process Fundamental YouTube Video
πŸ‘‰ Data Engineering Process Fundamental Book on Amazon

Data Engineering Process Fundamentals: Master the Fundamentals of Data Engineering with a Hands-on Approach
Data Engineering Process Fundamentals: Master the Fundamentals of Data Engineering with a Hands-on Approach

Problem Statement

In the city of New York, commuters use the Metropolitan Transportation Authority (MTA) subway system for transportation. There are millions of people that use this system every day; therefore, businesses around the subway stations would like to be able to use Geofencing advertisement to target those commuters or possible consumers and attract them to their business locations at peak hours of the day.

Geofencing is a location based technology service in which mobile devices’ electronic signal is tracked as it enters or leaves a virtual boundary (geo-fence) on a geographical location. Businesses around those locations would like to use this technology to increase their sales.

ozkary MTA Geo Fence

The MTA subway system has stations around the city. All the stations are equipped with turnstiles or gates which tracks as each person enters or leaves the station. MTA provides this information in CSV files, which can be imported into a data warehouse to enable the analytical process to identify patterns that can enable these businesses to understand how to best target consumers.

Analytical Approach

Dataset Criteria

We are using the MTA Turnstile data for 2023. Using this data, we can investigate the following criteria:

  • Stations with the high number of exits by day and hours
  • Stations with high number of entries by day and hours

Exits indicates that commuters are arriving to those locations. Entries indicate that commuters are departing from those locations.

Data Analysis Criteria

The data can be grouped into stations, date and time of the day. This data is audited in blocks of fours hours apart. This means that there are intervals of 8am to 12pm as an example. We analyze the data into those time block intervals to help us identify the best times both in the morning and afternoon for each station location. This should allow businesses to target a particular geo-fence that is close to their business.

Analysis Results

ozkary MTA dashboard

https://lookerstudio.google.com/reporting/94749e6b-2a1f-4b41-aff6-35c6c33f401e

Data Analysis Conclusions

By looking at the dashboard, the following conclusions can be observed:

  • The stations with the highest distribution represent the busiest location
  • The busiest time slot for both exits and entries is the hours between 4pm to 9pm
  • All days of the week show a high volume of commuters

With these observations, plans can be made to optimize the marketing campaigns and target users around a geo-fence area and hours of the day with proximity to the corresponding business locations.

Architecture

ozkary MTA architecture

Data Engineering Process

This project was executed following this process. The details for each of these steps can be found in this project subdirectories.

Note: Follow each link for more details

Brain Storming Process Diagram

ozkary MTA brain storming

Technologies

The following technologies have been used for this project:

  • GitHub and Git
  • Docker and Docker Hub
  • Terraform
  • Visual Studio Code
  • Python language
  • SQL
  • Jupyter Notes
  • Google Cloud
    • VM, Storage, BigQuery
  • Prefect Cloud (Workflow automation)
  • dbt Cloud (Data modeling)

data-engineering-mta-turnstile's People

Contributors

ozkary avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.