Code Monkey home page Code Monkey logo

jaywonder20 / apache_airflow_basics Goto Github PK

View Code? Open in Web Editor NEW
0.0 3.0 0.0 1.02 MB

This is a simple demonstration of Apache Airflow hosted on Heroku.This project implements a simple DAG that fetches the top questions from StackOverflow tagged airflow and forwards to a specified email address. The dag is set to run daily. CHECKOUT MY ARTICLE AT https://medium.com/analytics-vidhya/apache-airflow-what-it-is-and-why-you-should-start-using-it-c6334090265d

Home Page: https://airflow-stackoverflow.herokuapp.com/

Python 91.44% HTML 4.94% Shell 3.62%
airflow airflow-dags heroku herokuapp apache

apache_airflow_basics's Introduction

Apache airflow instance on heroku

This is a simple demonstration of Apache Airflow hosted on heroku

This project implements a simple DAG that fetches the top questions from stackoverflow with the tag "airflow" and forwards to a specified email address

Actually this is over engineered and can be done with a simple cronjob or a simple .py script but this a simple project I used to learn apache airflow



Setup

To get started a basic knowledge of apache airflow, Heroku cli , AWS S3 bucket and python is required

Step 1

  • Option 1

    • 🍴 Fork this repo!
  • Option 2

    • 👯 Clone this repo to your local machine using https://github.com/jaywonder20/apache_airflow_basics.git

Step 2

  • Create heroku app and add postgreSql Add-on 🔨🔨🔨

necessary configuration for heroku app

Set the following from Heroku CLi
heroku config:set AIRFLOW_HOME=/app

set environment variables

set AIRFLOW__CORE__SQL_ALCHEMY_CONN in  .profile to your postgreSql connection string

Heroku will automatically export .profile to the env on dyno start up. This way if/when your DB URL changes, it will automatically update.


  • NB: To prevent error during configuration change the "dags_folder" in the airflow.cfg file to a non existent folder to prevent error as the airflow instance is not configured yet
  • push app to heroku

Step 3

Now some configuration

configure the following in the airflow.cfg file
sql_alchemy_conn= postgress db uri
smtp_user [email protected]
smtp_password =password
smtp_port = 587

Step 4

create s3 bucket and get key https://preventdirectaccess.com/docs/amazon-s3-quick-start-guide/

Step 5

set the following connection parameters:

s3_connection
postgres_default

Step 6

  • Create a Stackoverflow app
  • Set the parameters in the variables.json file
  • import variables.json file into variables from the airflow UI

Step 7

  • Run the dag from the airflow UI (The dag runs sucessfully and sends the mail to the specified email address)

Step 8

secure your account

 secure the app by adding an extra environment variables to the .profile file.


export AIRFLOW__WEBSERVER__AUTHENTICATE=True
export AIRFLOW__WEBSERVER__AUTH_BACKEND=airflow.contrib.auth.backends.password_auth

Step 9

Open heroku bash with the Command

heroku run bash

Start python on the heroku bash and type (you know i mean copy right) the following commands as also described in Airflow’s official Documentation.


>>> import airflow
>>> from airflow import models, settings
>>> from airflow.contrib.auth.backends.password_auth import PasswordUser
>>> user = PasswordUser(models.User())
>>> user.username = 'new_user_name'
>>> user.email = '[email protected]'
>>> user.password = 'set_the_password'
>>> session = settings.Session()
>>> session.add(user)
>>> session.commit()
>>> session.close()
>>> exit()

If everything went well, you should be able to see this screen in your browser:

#####Proceed to modify DAG for further customization

Support

Reach out to me at one of the following places!


License

License

apache_airflow_basics's People

Contributors

dependabot[bot] avatar jaywonder20 avatar

Watchers

 avatar  avatar  avatar

apache_airflow_basics's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.