Code Monkey home page Code Monkey logo

weather-project's Introduction

Weather Pipeline

A daily ELT pipeline that writes weather data for Orlando, New York, and San Francisco to BigQuery. The transformations/business logic are then done in BigQuery, and a view is created - this view would be used as a basis for any BI layer.

High Level Architecture

image

Pipeline Breakdown

Mage Blocks

  • get_weather_data
    • Pulls down data from the https://www.weatherapi.com/ API (free is for me), flattens out that data, creates a dataframe, and renames some of the columns in that dataframe - BQ wasn't a gigantic fan of the raw column names.
  • create_bq_dataset
    • For whatever reason if there wasn't already a dataset created, this will either confirm one exists or create it if it doesn't. My thought was that if a dataset somehow got deleted this would account for that scenario.
  • upload_to_gcs
    • Writes the dataframe to a parquet file and uploads that file to a bucket in Google Cloud Storage. The bucket has a year/month/day structure to it - in case some days needed to be backfilled or even if the whole table was dropped, having the raw data already accessible should help speed up backfills.
  • create_or_update_bq_table
    • Requirements can change a lot from what I've seen, so this block was a way to account for schema changes in the dataframe. It updates the table in BQ so if there were additions it can still be successfully written to it, or just confirms it's the same. Also will create the table if it doesn't exist, gets dropped, etc.
  • load_data_to_bigquery
    • Loads the daily partition from GCS to BQ and appends it to the table. While I was testing this function locally I was of course running into the issue where the data would append and duplicate if I ran it multiple times, so added logic that checks if data for that date already exists and skips the load if it does.

image

weather-project's People

Contributors

marosenthal18 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.