Code Monkey home page Code Monkey logo

twitter-pipeline's Introduction

Twitter scheduler using Apache Airflow on Docker

Here we will be building out a Twitter Scheduler data pipeline, the idea is to collect hundreds of tweets in a file and all the tweets will be segregated and posted on Twitter profile depending on the time it's scheduled for.

Key takeaways from this project are:

  • Make a Twitter Scheduler Data Pipeline.
  • Understand how Airflow works on Docker

To learn more about it follow my Medium blog here πŸ‘ˆ πŸ“š

Pre-Requisites

Step - 1

Install Docker from their official site.

Quick links to download:

Step - 2

Make a Twitter Developer API account. (Apply for accessβ€Š-β€ŠTwitter Developers | Twitter Developer)

After you are done with creating a Twitter Developer account, make sure you save the keys and credentials required and put it in topic_tweet.py file

consumer_key = ''           # Add your API key here
consumer_secret = ''        # Add your API secret key here
access_token = ''           # Add your Access Token key here
access_token_secret = ''    # Add your Access Token secret key here

Step - 3

Enable Google Drive v3 to backup all your data. (Python Quickstart | Google Drive API | Google Developers)

To setup the Google Drive API, you need to create a python environment in your local machine and follow the above link. After you allow giving permission to your app, you will get two files credentials.json and token.pickle. Copy these two files and put it in the repo twitter-pipeline/dags/daglibs folder path.

β”œβ”€β”€ dags
β”‚   β”œβ”€β”€ daglibs
β”‚   β”‚   β”œβ”€β”€ credentials.json
β”‚   β”‚   β”œβ”€β”€ etl_job.py
β”‚   β”‚   β”œβ”€β”€ token.pickle
β”‚   β”‚   β”œβ”€β”€ topic_tweet.py
β”‚   β”‚   └── upload.py
β”‚   └── post_tweet.py
β”œβ”€β”€ data

How to run this project?

We will be running Airflow in Local Executor mode, which meansΒ the compose file won't build your image, go ahead and just build it yourself locally before moving on.

cd twitter-pipeline
docker build -t aflatest .

Now you are ready to start those containers and run Apache Airflow, make sure you are in the home path of twitter-pipeline repo.

docker-compose -f docker-compose-LocalExecutor.yml up -d

docker-compose up -d to start your containers in the background (i.e., detached mode)

Hit the web UI at http://localhost:8080

You should see any DAGs you put in theΒ ./dags directory, although sometimes it can take a minute for them to show up.

Once the DAG shows up, any changes you make to the python file will immediately be effective the next time you trigger the DAG.

Other helpful commands

You can tear down the Compose setup usingΒ 

docker-compose -f docker-compose-LocalExecutor.yml down 

You check the logs of services running in background mode using

docker-compose -f docker-compose-LocalExecutor.yml logs

Now, let's test this pipeline.

Add some tweets in the txt files present in data/tweets folder, you can add multiple tweets to that file or create a new file and add tweets in that. Then trigger the pipeline and you would find your tweet posted in your Twitter account and all the files getting backed up in Google Drive.

If you need to learn more about how this pipeline works and how to use it effectively, Checkout this blog where i have explained the process thoroughly. If this is helping you in some way, please support my work by liking my blog or by giving this repo a star.

Happy Learning!!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.