Code Monkey home page Code Monkey logo

airflow-tutorial's Introduction

Airflow-Tutorial

Contains tutorial on getting started in using Apache Airflow to automate pipelines.

Contents

Introduction to Apache Airflow

A slide deck which provides brief introduction on what is Apache Airflow, why use Airflow, where is it suitable to use Airflow, and when to use Airflow can be found here.

Getting Started

We are going to use Docker containers to spin up Airflow container as we also need to spin up container for other services. Follow the steps below to get Apache Airflow up and running using Docker container:

  1. Clone this repo and cd into this repo's root.
  2. Configure host user id, additional Python packages and directories:
echo -e "AIRFLOW_UID=$(id -u)\n_PIP_ADDITIONAL_REQUIREMENTS=pymongo pandas scikit-learn apache-airflow-providers-mongo" > .env && mkdir -p dags logs plugins
  1. Initialize the database and create first user account:
docker-compose up airflow-init

You should see similar message as below that indicates success execution:

airflow-init_1       | User "airflow" created with role "Admin"
airflow-init_1       | 2.2.0
airflow-tutorial_airflow-init_1 exited with code 0
  1. Start all services: docker-compose up

  2. Visit localhost:8080. The default username and password are both airflow.

  3. Spin down the containers by: docker-compose down

To completely remove the containers, volumes with DB data and downloaded images, run: docker-compose down --volumes --rmi all

Alternatively, you can install it via PyPi by following the instructions here.

Prerequisites

  • Installed Docker [manual]
  • Installed Docker-compose [manual]
  • At least 4GB memory for Docker Engine

Write an Airflow DAG

DAG stands for directed acyclic graph, which is a collection of tasks which constitute a pipeline/workflow. The steps below are required to construct a DAG.

  • Step 1: Import Libraries
  • Step 2: Configure default arguments
  • Step 3: Instantiate DAG object
  • Step 4: Configure tasks by instantiating operators
  • Step 5: Configure tasks dependencies

Execute DAGs

To execute or trigger a DAG manually, first unpause a DAG.

Next, click on the DAG and the "play" button on top right.

The UI will display the execution status after that.

Monitor Workflow Using Airflow UI

To check out on the details of tasks, click on the task shape in the UI.

Then, select the details which you would like to check out. Log and graph are among useful details which you should check out.

Interact with Docker Container

You might want to interact with the Airflow environment. To do so, execute the following and copy the container ID for airflow-worker service:

docker ps

Then run the following to get a bash terminal (remember to replace the right container ID):

docker exec -it <CONTAINER_ID> bash

You can then inspect the environment, for example we check all the DAGs available by:

ls -ltr dags/

References

airflow-tutorial's People

Contributors

kianyang-lee avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.