Code Monkey home page Code Monkey logo

auto-compose's Introduction

auto-compose

Travis CI Code Style

auto-compose is a utility for dynamically generating Google cloud managed Apache Airflow DAGs from YAML configuration files. It is a fork of dag-factory and uses its logic to parse YAML files and convert them to airflow DAG's.

Installation

To run auto-compose without checking out the github repository run /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/suchitpuri/auto-compose/master/scripts/bootstrap.sh)" . It requires docker which has all the required dependencies baked in.

You can also checkout the repository and run /bin/bash ./scripts/bootstrap.js

Open in Cloud Shell

Usage

Once you run auto-compose, it will ask you for the following details

  1. project-id : This is your GCP project id. When you run auto-compose it uses the underlying environment authentication to gcp environment. If you are not logged in go run gcloud auth login or similar command before running auto-compose.
  2. composer-id : This is the name/id of the composer environment. You can get that from the name column of https://console.cloud.google.com/composer/environments
  3. composer-location: This is the name of the region ( e.g asia-northeast1 ) where composer is running. You can get that from the location column of https://console.cloud.google.com/composer/environments
  4. YAML file absolute path: This is the absolute path of the YML file. Correct absolute path is needed so that docker mount the file

To deploy a DAG in airflow managed by google cloud you first need to create a YAML configuration file. For example:

default:
  default_args:
    owner: 'default_owner'
    start_date: 2019-08-02
    email: ['[email protected]']
    email_on_failure: True
    retries: 1
    email_on_retry: True
  max_active_runs: 1
  schedule_interval: '0 * * * */1'

bq_dag_complex:
  default_args:
    owner: 'add_your_ldap'
    start_date: 2019-02-14
  description: 'this is an sample bigquery dag which runs every day'
  tasks:
    query_1:
      operator: airflow.contrib.operators.bigquery_operator.BigQueryOperator
      bql: 'SELECT count(*) FROM `bigquery-public-data.noaa_gsod.gsod2018`'
      use_legacy_sql: false
    query_2:
      operator: airflow.contrib.operators.bigquery_operator.BigQueryOperator
      bql: 'SELECT count(*) FROM `bigquery-public-data.noaa_gsod.gsod2017`'
      dependencies: [query_1]
      use_legacy_sql: false
    query_3:
      operator: airflow.contrib.operators.bigquery_operator.BigQueryOperator
      bql: 'SELECT count(*) FROM `bigquery-public-data.noaa_gsod.gsod2016`'
      dependencies: [query_1]
      use_legacy_sql: false
    query_4:
      operator: airflow.contrib.operators.bigquery_operator.BigQueryOperator
      bql: 'SELECT count(*) FROM `bigquery-public-data.noaa_gsod.gsod2015`'
      dependencies: [query_1, query_2]
      use_legacy_sql: false
    query_5:
      operator: airflow.contrib.operators.bigquery_operator.BigQueryOperator
      bql: 'SELECT count(*) FROM `bigquery-public-data.noaa_gsod.gsod2014`'
      dependencies: [query_3]
      use_legacy_sql: false

bq_dag_simple:
  default_args:
    owner: 'add_your_ldap'
    start_date: 2019-02-14
  description: 'this is an sample bigquery dag which runs every 12 hours'
  schedule_interval: '0 */12 * * *'
  tasks:
    query_1:
      operator: airflow.contrib.operators.bigquery_operator.BigQueryOperator
      bql: 'SELECT count(*) FROM `bigquery-public-data.noaa_gsod.gsod2018`'
      use_legacy_sql: false
    query_2:
      operator: airflow.contrib.operators.bigquery_operator.BigQueryOperator
      bql: 'SELECT count(*) FROM `bigquery-public-data.noaa_gsod.gsod2017`'
      dependencies: [query_1]
      use_legacy_sql: false
    query_3:
      operator: airflow.contrib.operators.bigquery_operator.BigQueryOperator
      bql: 'SELECT count(*) FROM `bigquery-public-data.noaa_gsod.gsod2016`'
      dependencies: [query_1]
      use_legacy_sql: false

You can see that it has all the airflow semantics, like default args, schedule interval, max active runs and more. You can find a complete list here.

The best part is that currently you can use any of the following operators in YAML file directly without any configuration.

And this DAG will be generated and ready to run in Airflow!

screenshot

screenshot

Benefits

  • Construct DAGs without knowing Python
  • Construct DAGs without learning Airflow primitives
  • Avoid duplicative code
  • Use any of the available google cloud operators
  • Everyone loves YAML! ;)

Contributing

Contributions are welcome! Just submit a Pull Request or Github Issue.

auto-compose's People

Contributors

ajbosco avatar

Watchers

Suchit Puri avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.