Code Monkey home page Code Monkey logo

real-time-flight-status's Introduction

Mini Take-Home Project

Objective

Create a datapipe using NIFI, which allows the ingestion of data from aviationstack api, instantiate a master data set (fact model), and creating a view of data of simple transformation,. The services will be instantiated with docker-compose.

Conceptual View

Overall Arch

Service Instaciated

* postgress
  * Master Data Set
  * Batch Views
* nifi
  * Ingestion Processor
  * ETL Batch Process
* api (mock data)
  * Emulated AviatioStack API
* notebook
  * Jupyter Notebook
  • Create a process in Nifi that allows obtaining the information from the API.
    • Aviation Data Flows
      • Ingestion Phase
        • Get Data from AviationStack API
        • Insert the information in the database (Postgres) Master Set
      • ETL on Batch Processing
        • Replace the "/" of the arrivalTerminal and departureTimezone fields for " - ". E.g: "Asia/Shanghai" to "Asia - Shanghai"
        • Insert the information in the database (Postgres).
  • Create a Jupyter notebook to consume the information stored in the database.
    • Instantiated as service, with avstackhelper libs at python path.
    • Showcase how use Pandas dataframe to retrieve data from pandas
  • Master Data Set:
    • Dataset will ever-growing list, immutable, atomic facts

What I Learn?

  • Nifi is a powerfull tool for distribute and processing data. However, I am not feel confortable with UI and how to manage DAGs, in term of CI pipelines.
  • Connect Nifi process to
    • Postgres DB
    • Create a ETL Batch

NOTE Beside the power of the tool, I prefer other tools like airflow, that allow to create DAGs using code rather the UI.

  • Note: The master repository should be place over HDFS and user columnar and compress format such as Apache Parquet
  • Interface SQL - Pandas

Background

Real-time Flight Status

The AviationStack API was built to provide a simple way of accessing global aviation data for real-time and historical flights as well as allow customers to tap into an extensive data set of airline routes and other up-to-date aviation-related information. Requests to the REST API are made using a straightforward HTTP GET URL structure and responses are provided in lightweight JSON format. The objective of this project is to construct an ETL for a client in order to query information from the API, clean it, and store the results into a consumable database.

Nifi

An easy to use, powerful, and reliable system to process and distribute data. Apache NiFi supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic.

FastAPI

FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.6+.

Jupyter Notebooks

The notebook extends the console-based approach to interactive computing in a qualitatively new direction, providing a web-based application suitable for capturing the whole computation process: developing, documenting, and executing code, as well as communicating the results

Requirements

Environment

Folder: docker-aviationstak

  • It is based on docker images orchestrated by docker-compose

How to Use?

Build the Notebook image

docker-compose -f docker-aviationstak/docker-composer.yml build

Start Services

docker-compose -f docker-aviationstak/docker-composer.yml up

Now you could go the following pages. Remember to update the configuration for api_key and endpoint to fetch data from AviationStack API

Links

NOTE: Neigther password nor token is require to access to eigther NIFI UI or Aviation Notebook

Configuration

Since this is conceptual approach, it will be require to enter to NIFI [UI] and configure the following Parameter Contexts of Aviation Data Flow Nifi Group Process

  • AviationStack Api Key

  • AviationStack EndPoint URL

    • According the type of account could be set https or http
    • For testing purpose, it is possible to configure with testing api
      • http://aviationstack_api/v1/flights
    1. Go to Menu -> Parameter Contexts . Edit ingestionDataCtx Parameter . Edit the aviationstack_access_key . Edit viationstack_access_key

AVStack Helper

Objective

The library is intended to create functions that are reused in data processing, analysis, and visualization.

Set Enviroments

real-time-flight-status's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.