hayesdavis / flamingo Goto Github PK
View Code? Open in Web Editor NEWFlamingo is a service for wading into the Twitter Streaming API.
License: MIT License
Flamingo is a service for wading into the Twitter Streaming API.
License: MIT License
What it says in the title.
Internally, flamingo uses Redis as a queue between the wader and dispatcher processes. It's also used for stats tracking. Currently it assumes it can dispatch Resque jobs for subscriptions to this Redis instance as well.
If we allow separate subscription Redis instance(s) that are unrelated to the "core" instance, we can make it possible to ensure that flamingo's performance is not impacted by a rogue Redis being used by other applications.
The design of this separation should make it possible to still use one Redis instance for both core functionality and for subscription dispatching if desired.
The current method of reconnecting to the stream after a predicate change is to simply kill the wader and start a new one. This should be made more robust as described in the Streaming API docs.
Resque is likely too heavyweight of a solution for the dispatcher. Ideally the dispatcher would be stripped down to a simple process responsible for taking an event from a redis list (put there by the wader), dispatching it to subscribers and then repeating. No forking, etc.
This would need to be coupled with a new subscription mechanism that would make it easier to create types of subscriptions, one of which would place the event onto a resque queue so it could be consumed by a worker. Another type could append the event to a file, etc.
On low volume streams in development branch at version 0.3.0, it seems the *.json resources in the REST API will sometimes become unresponsive. The / resource will return correctly.
This may be due to a Redis client connection issue after forking child processes. 0.3.0 now talks to redis prior to forking the flamingod children which initiates a socket connection in the flamingod process. This socket may need to be reset after the fork.
After exhausting the maximum number of reconnects, the wader just stops trying and basically goes into limbo. This can be a problem if there are connectivity issues with Twitter for an extended period.
There are times when a user might specify a bad track term or some other invalid data for a stream which can result in a 406 error from twitter that will make the wader die fatally. It would be nice if the system could rever to a last known good config so that tracking continues. This would likely need to be coupled with some notification system so that this situation can be detected in production.
Limit events are very important and should be:
Limits are a single key value pair of the form {"limit":{STREAM:NNN}} where STREAM is the name of the stream endpoint (usually "filter") and NNN is some integer number of events that have not been delivered since the current connection began. Limit meta information should take into account restarts of the wader which will reset the limit values.
Create a resque-web style UI for flamingo that makes it easy to see what's going on.
The dispatcher only parses events of type delete and link separate from tweet. The dispatcher needs to parse and handle all non-tweet events with correct typing for sending to downstream subscribers, including unexpected types.
Doesn't detect existence of ~/flamingo.yml
. Detection of ./flamingo.yml
works just fine.
To reproduce:
$HOME_DIR/flamingod.yml
$ flamingo-web
Currently if the log level is set to DEBUG, the dispatcher will log the JSON of each event received to the overall flamingo log. This is annoying for a few reasons:
I'm proposing to log all JSON events received to a rotating set of logs where they will be stored newline separated. Log rotation will be configurable based on a number of events per log. Log retention is outside the scope of this feature. It will be up to the user to occasionally purge logs that are no longer needed.
The event log should be optional based on configuration parameters. It should also permit a flamingo to be configured with no subscriptions and an event log if the user wishes to simply capture information and store it to a file.
The dispatcher throughput is too low to ensure that the queue doesn't grow and that tweets are dispatched in real time under heavy stream rates. On a rackspace cloud VM this is topping out at around 1.2k events per minute after which we see queue growth.
This is likely due to the forking overhead imposed by using resque for the dispatching infrastructure.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.