Turbine is the set of bare metals behind a simple yet complete and efficient Airflow setup.
The project is intended to be easily deployed, making it great for testing, demos and showcasing Airflow solutions. It is also expected to be easily tinkered with, allowing it to be used in real production environments with little extra effort. Deploy in a few clicks, personalize in a few fields, configure in a few commands.
The stack is composed mainly of two EC2 machines, one for the Airflow webserver and one for the Airflow scheduler, plus an Auto Scaling Group of EC2 machines for Airflow workers. Supporting resources include an RDS instance to host the Airflow metadata database, an SQS instance to be used as broker backend, S3 buckets for logs and deployment bundles, an EFS instance to serve as shared directory, and auto scaling metrics, alarms and triggers. All other resources are the usual boilerplate to keep the wind blowing.
The deployment process through CodeDeploy is very flexible and can be tailored
for each project structure, the only invariant being the Airflow home directory
at /airflow
. It ensures that every Airflow process has the same files and can
upgraded gracefully, but most importantly makes deployments really fast and easy
to begin with.
There's also an EFS shared directory mounted at at /mnt/efs
, which can be
useful for staging files potentially used by workers on different machines and
other syncrhonization scenarios commonly found in ETL/Big Data applications. It
facilitates migrating legacy workloads not ready for running on distributed
workers.
The stack includes an estimate of the cluster load average made by analyzing the amount of failed attempts to retrieve a task from the queue. The rationale is detailed elsewhere, but the metric objective is to measure if the cluster is correctly sized for the influx of tasks. Worker instances have lifecycle hooks promoting a graceful shutdown, waiting for tasks completion when terminating.
The goal of the auto scaling feature is to respond to changes in queue load, which could mean an idle cluster becoming active or a busy cluster becoming idle, the start/end of a backfill, many DAGs with similar schedules hitting their due time, DAGs that branch to many parallel operators. Scaling in response to machine resources like facing CPU intensive tasks is not the goal; the latter is a very advanced scenario and would be best handled by Celery's own scaling mechanism or offloading the computation to another system (like Spark or Kubernetes) and use Airflow only for orchestration.
Create a new stack using the latest template definition at
aws/cloud-formation-template.yml
. The
following button will deploy the stack available in this project's master
branch (defaults to your last used region):
The stack resources take around 10 minutes to create, while the airflow installation and bootstrap another 3 to 5 minutes. After that you can already access the Airflow UI and deploy your own Airflow DAGs.
The only requirement is that you configure the deployment to copy your Airflow
home directory to /airflow
. After crafting your appspec.yml
, you can use the
AWS CLI to deploy your project.
For convenience, you can use this Makefile
to handle the
packaging, upload and deployment commands. A minimal working example of an
Airflow project to deploy can be found at src/airflow
.
If you follow this blueprint, a deployment is as simple as:
make deploy stack-name=yourcoolstackname
GOTCHA: if you rely on the default connections, be sure to configure
aws_default
to use the appropriate region!
Sometimes the cluster operators will want to perform some aditional setup, debug or just inspect the Airflow services and database. The stack is designed to miminize this need, but just in case it also offers decent internal tooling for those scenarios.
The environment variables used by the Airflow service are not immediately
available for the ec2-user
when you SSH into one of the instances. Before
running Airflow commands, you need to use a convenience script exporting the
right variables:
$ source /tmp/env.sh
$ airflow list_dags
The Airflow service runs under systemd
, so logs are available through
journalctl
. Most often used arguments include the --follow
to keep the logs
coming, or the --no-pager
to directly dump the text lines, but it offers much
more.
$ sudo journalctl -u airflow -n 50
-
Why is there an empty
Dummy
subnet in the VPC?There's no official support on CloudFormation for choosing in which VPC a RDS Instance is deployed. The only alternatives are to let it live in the default VPC and communicate with peering or to use DBSubnetGroup, which requires associated subnets that cover at least 2 Availability Zones.
-
Why does auto scaling takes so long to kick in?
AWS doesn't provide minute-level granularity on SQS metrics, only 5 minute aggregates. Also, CloudWatch stamps aggregate metrics with their initial timestamp, meaning that the latest stable SQS metrics are from 10 minutes in the past. This is why the load metric is always 5~10 minutes delayed. To avoid oscillating allocations, the alarm action has a 10 minutes cooldown.
-
Why can't I stop running tasks by terminating all workers?
Workers have lifecycle hooks that make sure to wait for Celery to finish its tasks before allowing EC2 to terminate that instance (except maybe for Spot Instances going out of capacity). If you want to kill running tasks, you will need to SSH into worker instances and stop the airflow service forcefully.
This project aims to be constantly evolving with up to date tooling and newer AWS features, as well as improving its design qualities and maintainability. Requests for Enhancement should be abundant and anyone is welcome to pick them up.
Stacks can get quite opinionated. If you have a divergent fork, you may open a Request for Comments and we will index it. Hopefully this will help to build a diverse set of possible deployment models for various production needs.
See the contribution guidelines for details.
You may also want to take a look at the Citizen Code of Conduct.
Did this project help you? Consider buying me a cup of coffee ;-)
MIT License
Copyright (c) 2017 Victor Villas
See the license file for details.