Code Monkey home page Code Monkey logo

chefd-mini-project's Introduction

Chef'd Mini Project

Overview

This simple set of ruby files will fetch a set of tweets from the Twitter public search API, parse them, and store information about the tweets, users, and hashtags in a postgres database for later analysis. There are three basic components to the code: fetching, parsing, and storage

Fetching

The twitter ruby gem is used to fetch records from the public Twitter API. Currently the search term is hardcoded for convenience. The since_id is fetched from storage at startup and updated as API calls are made. So if the script stops for whatever reason, it's able to pick up where it left off. Right now, for simplicity, the fetching code is just polling the Twitter API every second or so to check for new records. These records are fed into a queue where they can be handled one of the parsing threads.

Parsing

Most of the parsing of the raw API response is done for us by the twitter gem. It takes care of getting the tweet text, user id, hashtags, etc. and making convenient accessor methods to get them. The main job of the parsing threads here is to transform the tweet into a format suitable for insertion into our particular database schema. To this end, it takes each records and packs the relevent tweet, user, and hashtag info into an object which is pushed onto a queue for later use by the storage mechanism.

Storage

Storage is pretty simple in this case. Just pop the data off the queue provided by the parser and try to insert the records in a transaction. Each record ought to result in 1 new tweet record, 0 or 1 new user rows, and 0 to N new hashtag rows.

Schema

Postgres is being used as the persistent data store. The three main tables of interest are the tweets, users, and hashtags tables. The schema for each can be seen in migrations/001_create_models.rb.

Setup

  • install ruby 2.5.0 (e.g. with RVM)
  • install postgres
  • create a twitter app
  • setup following environment variables in a .env file
    • DATABASE_URL
    • TWITTER_CONSUMER_KEY
    • TWITTER_CONSUMER_SECRET
    • TWITTER_ACCESS_TOKEN
    • TWITTER_ACCESS_SECRET
  • bundle
  • bin/rake db:create
  • bin/rake db:migrate

Running It

  • ruby main.rb

The script will run in an infinite loop polling/parsing/storing until the user interrupts it or an error is raised.

chefd-mini-project's People

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.