The setup files are copied directly from airflow's repo and modified to fit the requirements.
- An EKS Cluster.
- Spot Managed Nodes on EKS Cluster with following setup:
a. Labellifecycle: Ec2Spot
.
b. TaintsspotInstance: true:PreferNoSchedule
.
c. InstancesDistribution asspotAllocationStrategy: capacity-optimized
.
d. Note: Without Spot Nodes, Jobs will run in OnDemand Nodes. - An ECR Repo to Push Airflow Docker Images.
- Navigate to
scripts\docker
directory and build the Docker Image usingdocker build -t <ECR-uri:tag> .
- Push the image the ECR Repo using
docker push <ECR-uri:tag>
- set the following environment variables on your terminal :
a.export AOK_AIRFLOW_REPOSITORY=<ECR-uri>
. \ - Navigate to
scrips\kube
directory and run the./deploy.sh
to deploy the kubernetes infrastructure for airflow. - Obtain the airflow URL by running
kubectl get svc -n airflow
- Log in the airflow using the above URL with
eksuser
as user andekspassword
as password. - On your terminal, run
kubectl get nodes --label-columns=lifecycle --selector=lifecycle=Ec2Spot
to get a list of EC2. - On your terminal, run
kubectl get pods -n airflow -w -o wide
. - Trigger one of the DAGs in Airflow console to see the pods triggered for the job in airflow console.
- On your terminal, verify the pods are getting triggered in the same spot nodes with label
lifecycle: Ec2Spot
as in step 7.