Code Monkey home page Code Monkey logo

prayagnshah / data-pipelines-with-airflow Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 433 KB

This project showcases the implementation of a data pipeline using Apache Airflow. Leveraging the OpenWeather API, it efficiently fetches real-time weather data and performs ETL processing. Results are seamlessly stored in AWS S3 buckets for further analysis. Moreover, the integration of Slack notifications ensures timely alerts to myself.

License: MIT License

Python 100.00%
apache-airflow aws-ec2 dataengineering openweathermap-api s3-bucket

data-pipelines-with-airflow's Introduction

Data Pipelines with Airflow and sending Slack Notifications

This project demonstrates how to build a data pipeline using Apache Airflow to fetch data from the OpenWeather API, perform ETL processing, and store the results in AWS S3 buckets. Additionally, it includes the integration of Slack notifications to alert the data engineering team when the pipeline is successfully executed.

Architecture

The architecture of the data pipeline is as follows:

Architecture

  1. OpenWeather API: The pipeline starts by fetching weather data from the OpenWeather API. This data includes information such as temperature, humidity, and wind speed.

  2. ETL Processing: Once the data is retrieved, it undergoes ETL (Extract, Transform, Load) processing. This step involves cleaning the data, performing any necessary transformations, and preparing it for storage.

  3. AWS S3 Buckets: The processed data is then stored in AWS S3 buckets. S3 provides a scalable and durable storage solution for large datasets.

  4. Apache Airflow: The entire pipeline is orchestrated using Apache Airflow. Airflow allows you to define and schedule workflows as directed acyclic graphs (DAGs). In this project, the DAG is responsible for executing the data retrieval, ETL processing, and data storage tasks.

  5. Slack Notifications: To keep the data engineering team (in this case myself) informed about the pipeline's status, Slack notifications are integrated. When the pipeline successfully completes, a notification is sent to the designated Slack channel.

Environment Setup

Hardware Used

t2.medium EC2 instance on AWS

t2.medium
2 vCPUs, 4 GiB memory

How to Run

Make sure airflow webserver and scheduler is running. Open the Airflow UI and enable the weather_api DAG. Once enabled, trigger the DAG to start the pipeline.

Store the Openweather API key in the .test.env file and then remove the .test and then save the file as .env. Once it is done then run airflow standalone to run the pipeline and then check the S3 bucket for the output.

OpenWeather Pipeline DAG: DAG

DAG View: DAG View

Output in the S3 bucket: S3 Bucket

Output in the Slack channel: Slack Channel

Results

Pipelines would run daily at 7 am UTC and the results would be stored in the S3 bucket. The Slack channel would also receive a notification when the pipeline is successfully executed.

data-pipelines-with-airflow's People

Contributors

prayagnshah avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.