Code Monkey home page Code Monkey logo

flamingo's People

Contributors

hayesdavis avatar jcsalterego avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

flamingo's Issues

Separate Redis Instances for Internal Use and Subscription Dispatch

Internally, flamingo uses Redis as a queue between the wader and dispatcher processes. It's also used for stats tracking. Currently it assumes it can dispatch Resque jobs for subscriptions to this Redis instance as well.

If we allow separate subscription Redis instance(s) that are unrelated to the "core" instance, we can make it possible to ensure that flamingo's performance is not impacted by a rogue Redis being used by other applications.

The design of this separation should make it possible to still use one Redis instance for both core functionality and for subscription dispatching if desired.

Make wader reconnects more robust

The current method of reconnecting to the stream after a predicate change is to simply kill the wader and start a new one. This should be made more robust as described in the Streaming API docs.

Remove Resque Dependency

Resque is likely too heavyweight of a solution for the dispatcher. Ideally the dispatcher would be stripped down to a simple process responsible for taking an event from a redis list (put there by the wader), dispatching it to subscribers and then repeating. No forking, etc.

This would need to be coupled with a new subscription mechanism that would make it easier to create types of subscriptions, one of which would place the event onto a resque queue so it could be consumed by a worker. Another type could append the event to a file, etc.

REST Resources Intermittently Unresponsive

On low volume streams in development branch at version 0.3.0, it seems the *.json resources in the REST API will sometimes become unresponsive. The / resource will return correctly.

This may be due to a Redis client connection issue after forking child processes. 0.3.0 now talks to redis prior to forking the flamingod children which initiates a socket connection in the flamingod process. This socket may need to be reset after the fork.

Should Revert to Last Known Good Stream Config

There are times when a user might specify a bad track term or some other invalid data for a stream which can result in a 406 error from twitter that will make the wader die fatally. It would be nice if the system could rever to a last known good config so that tracking continues. This would likely need to be coupled with some notification system so that this situation can be detected in production.

Better Limit Handling

Limit events are very important and should be:

  1. Logged as a WARN message to the flamingo log
  2. The limit status of the current stream should be stored in the meta info for easy lookup

Limits are a single key value pair of the form {"limit":{STREAM:NNN}} where STREAM is the name of the stream endpoint (usually "filter") and NNN is some integer number of events that have not been delivered since the current connection began. Limit meta information should take into account restarts of the wader which will reset the limit values.

Web UI

Create a resque-web style UI for flamingo that makes it easy to see what's going on.

Incorrect Event Handling

The dispatcher only parses events of type delete and link separate from tweet. The dispatcher needs to parse and handle all non-tweet events with correct typing for sending to downstream subscribers, including unexpected types.

~/flamingo.yml Detection

Doesn't detect existence of ~/flamingo.yml. Detection of ./flamingo.yml works just fine.

To reproduce:

  1. create $HOME_DIR/flamingod.yml
  2. Run $ flamingo-web

Event Log

Currently if the log level is set to DEBUG, the dispatcher will log the JSON of each event received to the overall flamingo log. This is annoying for a few reasons:

  1. The application log is now filled with json, making it hard to locate any actual issues
  2. Replaying or reusing that JSON in some way (in case of a failure somewhere down stream) is difficult because it requires extracting out events from a noisy log.

I'm proposing to log all JSON events received to a rotating set of logs where they will be stored newline separated. Log rotation will be configurable based on a number of events per log. Log retention is outside the scope of this feature. It will be up to the user to occasionally purge logs that are no longer needed.

The event log should be optional based on configuration parameters. It should also permit a flamingo to be configured with no subscriptions and an event log if the user wishes to simply capture information and store it to a file.

Dispatcher Queue Can Grow Large Under Heavy Loads

The dispatcher throughput is too low to ensure that the queue doesn't grow and that tweets are dispatched in real time under heavy stream rates. On a rackspace cloud VM this is topping out at around 1.2k events per minute after which we see queue growth.

This is likely due to the forking overhead imposed by using resque for the dispatching infrastructure.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.