Code Monkey home page Code Monkey logo

gdq-collector's Introduction

๐Ÿ‘พ gdq-collector

Data Collection Utilities for GDQStatus

(Successor to sgdq-collector)

Explanation

gdq-collector is an amalgamation of services and utilities designed to be a serverless-ish backend for gdq-stats. There are 3 distinct components of the project:

  • gdq_collector (the python module) - Python scraping module designed to be run constantly on a compute platform like EC2 which updates a Postgres database with new timeseries and GDQ schedule data.
  • lambda_suite - Lambda application that caches the Postgres database to a JSON file in S3 (to reduce the load on the Database). Also includes a simple API to query recent timeseries data that doesn't appear in the cached JSON. The Lambda Suite has 3 stages (three separate configurations that are deployed independently):
    • The API stage (dev/prod) - Serves recent data to a publicly facing REST endpoing
    • The Caching stage (cache_databases) - Queries Postgres database and stores query results in S3 as a cache
    • The Monitoring stage (monitoring) - Queries the API stage to do periodic health checks on the system

gdq_collector uses APScheduler its schedule and execute the scraping / refreshing tasks.

The Lambda applications use Zappa for deployment.

Architecture Diagram

aws_setup

Building / Running

gdq_collector

Note: If you're running this on an Ubuntu EC2 instance, bootstrap_aws.sh will be more useful then the following bullet list for specific setup.

  1. Clone the repo and cd into the root project directory.
  2. Pull down the dependencies with pip install -r requirements.txt --user
    • You may wish to run aws/install.sh, as there will be necessary system dependencies to install some of the python packages.
  3. Copy credentials_template.py to credentials.py. Fill in your credentials for Twitch, your Postgres server, and Twitch.
    • You'll need to register a new Twitch application to get your clientid.
    • You'll want to use this site to generate an oauth code for Twitch.
  4. Ensure your Postgres server is running and that your credentials are valid. Create the necessary tables by executing the SQL commands in schema.sql
  5. Run python -m gdq_collector to start the collector.
    • You can run python -m gdq_collector --help to learn about the optional command line args.

lambda_suite

  1. Clone the repo and cd into the root project directory.
  2. Pull down the dependencies with pip install -r requirements.txt --user
    • You may wish to run aws/install.sh, as there will be necessary system dependencies to install some of the python packages.
  3. Copy credentials_template.py to credentials.py. Fill in your credentials for your Postgres database. Add the ARN of the monitoring SNS topic to sns_arn if you want to use the lambda functions to send you notifications when monitoring alarms occur.
  4. Update zappa_settings.json to fit your AWS configuration. Of particular note is that you'll need to update your vpc_config. You'll also need to make an S3 bucket that matches your S3_CACHE_BUCKET config.
  5. Run zappa deploy dev to deploy the application and schedule the caching operations.
  6. Run zappa deploy cache_databases to deploy the lambdas that cache the Postgres data to JSON blobs in S3.
  7. Run zappa deploy monitoring to deploy the lambdas that check the output of the APIs to detect problems with the collector.

gdq-collector's People

Contributors

bcongdon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

crunchtime-ali

gdq-collector's Issues

Finish making branding event agnostic

Plan is to have this repo be the *GDQ tracker for both SGDQ and AGDQ.

There's still a bit of AGDQ specific verbiage in the code base (see schema.sql, for example) that should be made agnostic of the specific event, for clarity's sake.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.