Code Monkey home page Code Monkey logo

airflow-castor's Introduction

Castor - An orchestration framework for Apache Airflow

A framework for building Airflow DAGs via YAML files. Castor comprises four modules:

  • Config files
  • DAG factory
  • Task creator
  • Task strategies
  • Operator Factory

Config Files

A set of YAMLs files defined by the user. Each YAML file represents an Airflow DAG.

Syntax

The YAML comprises two sections: dag and tasks.

DAG section

The dag section contains all oficial parameters supported by an Airflow DAG. Check this for more information.

This is an example of a DAG section in a YAML file:

dag:
  dag_id: 'init_castor_dag'
  default_args: '{"owner": "castor", "start_date": "2021-06-13"}'
  schedule_interval: '@once'
  catchup: False
  tags:
    - example

Task section

The parameters a task should include are:

  • [Mandatory] task_name: Name for the task
  • [Mandatory] strategy: The strategy that should be used by the Task Creator to create the task (e.g., PythonOperatorStrategy)
  • [Optional] depends_on: list of dependencies of the task. This are name of other tasks previously defined
  • [Optional] args: Arguments supported by the Airflow operator associated to the Task Strategy

This is an example of a task section in a YAML file:

- name: 'task_name'
    strategy: 'strategy_name'
    depends_on: 
    - 'AnotherTask'
    - 'AnotherTask'
    - ...
    args:
        retries: 2
        trigger_rule: 'all_success'
        provide_context: True
        python_callable: 'print_params'
        op_kwargs:
            param1: 'value1' 

Example

This is a YAML file containing a simple Airflow DAG for showing Castor capabilities.

dag:
  dag_id: 'init_castor_dag'
  default_args: '{"owner": "castor", "start_date": "2021-06-13"}'
  schedule_interval: '@once'
  catchup: False
  tags:
    - example
tasks:
    - name: 'start'
      strategy: 'DummyOperatorStrategy'
    - name: 't1'
      strategy: 'PythonOperatorStrategy'
      depends_on: 
        - 'start'
      args:
        retries: 2
        trigger_rule: 'all_success'
        provide_context: True
        python_callable: 'print_params'
        op_kwargs:
          param1: 'value1' 
    - name: 't2'
      strategy: 'PythonOperatorStrategy'
      depends_on: 
        - 'start'
      args:
        retries: 2
        trigger_rule: 'all_success'
        provide_context: True
        python_callable: 'print_params'
        op_kwargs:
          param1: 'value1'
    - name: 't3'
      strategy: 'PythonOperatorStrategy'
      depends_on: 
        - 't1'
        - 't2'
      args:
        retries: 2
        trigger_rule: 'all_success'
        provide_context: True
        python_callable: 'print_params'
        op_kwargs:
          param1: 'value1' 
    - name: 'end'
      strategy: 'DummyOperatorStrategy'
      depends_on: 
        - 't3'

DAG Factory

The DAG Factory is responsible for creating the DAGs based on the configuration defined in the YAML file.

Task Creator

The Task Creator is responsible for creating DAG tasks based on task strategies.

Task Strategies

A task strategy represents a strategy in which a task can be executed. A strategy can be based on an Airflow operator (e.g., PythonOperatorStrategy).

The strategies supported by Castor at this moment in time are:

Operator factory

It is responsible for creating Airflow Operators based on a set of parameters supplied by the DAG Factory.

The operators supported by Castor at this moment in time are:

airflow-castor's People

Contributors

ajhenaor avatar jfgomez0912 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

airflow-castor's Issues

Syntax checker for config files

Config files have defined a syntaxis. That syntaxis should be validated before creating DAG. In this way, we can avoid undesired effects

Add initial components of codebase

Initial components of codebase:

  • Interface
  • DAG factory
  • Task creator
  • Task strategy
  • Strategies (python_operator_strategy)
  • Operator factory

Example project

Once we have written the framework codebase, we should an example project to the framework. So, in this way people can understand how to use the framework

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.