Code Monkey home page Code Monkey logo

twitter-stream's Introduction

twitter-stream

Simple Ruby client library for twitter streaming API. Uses EventMachine for connection handling. Adheres to twitter's reconnection guidline.

JSON format only.

Install

sudo gem install twitter-stream -s http://gemcutter.org

Usage

require 'rubygems'
require 'twitter/json_stream'

EventMachine::run {
  stream = Twitter::JSONStream.connect(
    :path    => '/1/statuses/filter.json?track=football',
    :auth    => 'LOGIN:PASSWORD'
  )

  stream.each_item do |item|
    # Do someting with unparsed JSON item.
  end

  stream.on_error do |message|
    # No need to worry here. It might be an issue with Twitter.
    # Log message for future reference. JSONStream will try to reconnect after a timeout.
  end

  stream.on_max_reconnects do |timeout, retries|
    # Something is wrong on your side. Send yourself an email.
  end

  stream.on_no_data do
    # Twitter has stopped sending any data on the currently active
    # connection, reconnecting is probably in order
  end
}

Examples

Open examples/reader.rb. Replace LOGIN:PASSWORD with your real twitter login and password. And ruby examples/reader.rb

twitter-stream's People

Contributors

antono avatar bryckbost avatar carloslopes avatar kimoto avatar kou avatar lifo avatar matsuu avatar methodmissing avatar mgartner avatar mgreen avatar migbar avatar moredip avatar rud avatar stve avatar sujal avatar thbar avatar voloko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

twitter-stream's Issues

consider updating simple_oauth to 0.3.1

We try to keep only one version of the library in debian and we already have 0.3.1 in the archive. This will help us maintain twitter-stream package in debian.

EventMachine 1.0.7 Breaks

The update to Event Machine BufferTokenizer initalizer now gives wrong arguement amounts.

 /Users/ryan/.rbenv/versions/2.1.4/lib/ruby/gems/2.1.0/gems/eventmachine-1.0.7/lib/em/buftok.rb:15:in `initialize': wrong number of arguments (2 for 0..1) (ArgumentError)
    from /Users/ryan/.rbenv/versions/2.1.4/lib/ruby/gems/2.1.0/gems/twitter-stream-0.1.16/lib/twitter/json_stream.rb:206:in `new'
    from /Users/ryan/.rbenv/versions/2.1.4/lib/ruby/gems/2.1.0/gems/twitter-stream-0.1.16/lib/twitter/json_stream.rb:206:in `reset_state'

See eventmachine/eventmachine@e385d0a#diff-a6fc4ed08b8c357cd44b13bcefb869faL25

Please release a new gem!

The on_no_data changes are very helpful, but they're not in the current gem on gemcutter. Could you release an upgraded version?

NoneType has no attribute strip

Not sure what happened here, maybe a bad read? In any case, read_line() returned a None.

2015-04-30 17:16:24,350 Unexpected exception.
Traceback (most recent call last):
  File "twitter-streamer/streamer/streamer.py", line 118, in process_tweets
    streamer.filter(**kwargs)
  File "/home/kevin/.virtualenvs/streamer/local/lib/python2.7/site-packages/tweepy/streaming.py", line 428, in filter
    self._start(async)
  File "/home/kevin/.virtualenvs/streamer/local/lib/python2.7/site-packages/tweepy/streaming.py", line 346, in _start
    self._run()
  File "/home/kevin/.virtualenvs/streamer/local/lib/python2.7/site-packages/tweepy/streaming.py", line 255, in _run
    self._read_loop(resp)
  File "/home/kevin/.virtualenvs/streamer/local/lib/python2.7/site-packages/tweepy/streaming.py", line 298, in _read_loop
    line = buf.read_line().strip()
AttributeError: 'NoneType' object has no attribute 'strip'

Security issue: GHSL-2020-097

The GitHub Security Lab reported a potential security vulnerability (GHSL-2020-097) in your project on May 18 2020. It has been 90 days since our initial report and as per our coordinated disclosure policy, we intend to publish a public advisory detailing this issue. If you do wish to further coordinate a response to this issue with the GitHub Security Lab, please contact us at [email protected] within the next 7 days in reference to GHSL-2020-097 and we would love to help you resolve these issues. If not, feel free to close this issue after which we will proceed with advisory publication.

Does not reconnect properly when using https

To solve the problem, start_tls needs to be called in connection_completed so it gets called each time the connection is connected.

connection.start_tls if options[:ssl] needs to be removed from self.connect

and the following needs to be added to the connection_completed callback:

self.start_tls if @options[:ssl]

Reconnect with updated content

The README asserts that twitter-stream adheres to twitter's reconnection guidline, so I've been happily exposing the @options variable, modifying the :content to change track keywords, and issueing immdiate_reconnect to update things when I recieve SIGUSR1.

I've had no problems, I was purely wondering if there was anything I should worry about with this approach? Is there any reason this couldn't be explicitly exposed in future releases?

Twitter OAuth

Would be great to see support for Twitter OAuth here!

need to_s before blank? check in query param construction

json_stream.rb:298

it's possible that params get passed in as Fixnum or Boolean types that won't respond to blank? when iterating and constructing the query string. adding a to_s will prevent exceptions.

def params
  flat = {}
  @options[:params].merge!(:track => @options[:filters]) unless @options[:filters].blank?
  @options[:params].each do |param, val|
    next if val.to_s.empty?
    val = val.join(",") if val.respond_to?(:join)
    flat[escape(param)] = escape(val)
  end
  flat
end

New gem release

Any chance of seeing a new gem released with pull requests and fork features added?

Application Failure retries never incremented

When Twitter returns an error status (e.g., HTTP/1.1 500), it throws an error. However, when it reaches the end of the HTTP headers in parse_header_line, reset_timeouts resets the app's status, so if there's a similar error on the next reconnect, there's no record of consecutive failures and the timeout between retries never increases.

To address this issue, I changed line 178 from

reset_timeouts

to

reset_timeouts if @code == 200

Thus, it only resets the state when there's a successful connection.

(Sorry I don't know Ruby and Git well enough to implement this patch and test it myself.)

Add chunked encoding support

The Streaming API actually delivers results using chunked encoding, but due to a quirk in their load balancer (so I hear), twitter-stream actually parses everything OK. Other streams (Gnip Power Track) use different load balancers and results occasionally get truncated.

json_stream.rb:24: warning: key :path is duplicated and overwritten on line 27

I'm getting this warning from twitter-stream-0.1.12:

json_stream.rb:24: warning: key :path is duplicated and overwritten on line 27

Turns out it's right; the value is definitely in the hash twice.

    DEFAULT_OPTIONS = {
      :method       => 'GET',
      :path         => '/',
      :content_type => "application/x-www-form-urlencoded",
      :content      => '',
      :path         => '/1/statuses/filter.json',
      :host         => 'stream.twitter.com',
      :port         => 80,
      :ssl          => false,
      :user_agent   => 'TwitterStream',
      :timeout      => 0,
      :proxy        => ENV['HTTP_PROXY'],
      :auth         => nil,
      :oauth        => {},
      :filters      => [],
      :params       => {},
    }

It's not a high-priority thing, of course.

Rails 4.2.8
Ruby 2.4.6
OSX 10.15.7

Missing license

Please add a license to this library. If you want an as open as possible license, you could use MIT.

How to update keywords on the fly?

I'm totally stuck at how to use this to update the track keywords on the fly. Any pointers? I've googled a ton, but the documentation for how to do something like that in Eventmachine seems lacking.

Doesn't handle messages split over multiple lines

Using sitestreams, sometimes messages are split over multiple lines (possibly in userstreams too - I don't know?).

There's a few changes that need to be made to handle split lines with this library, but I wanted to point out a non-obvious one.

If a message is split, the split can be anywhere in the message content. Sometimes this means a space from the message is at the start or end of a split line.

The parse_stream_line method starts out by calling ln.strip! with the intention of removing the leading newline. This is problematic, as it also removes the leading or trailing space when they should be there in split lines, causing data corruption.

Changing ln.strip! to ln[0] = '' resolves the issue - but you probably also want to update later checks for ln.empty? as the heartbeat signals from Twitter will now look like "\n" instead of being empty strings (or something similar to this..).

no_data_callback not being called when newlines (\n\r) are being received

Twice in the past month I've had my long running Twitter Stream consumer stall and not receive any tweets, without it calling the no_data_callback. I'm wondering if this could be because the Streaming API is sending whitespace (\n or \r or both) and this whitespace is actually treated as data at https://github.com/voloko/twitter-stream/blob/master/lib/twitter/json_stream.rb#L130. This line that resets @last_data_received_at doesn't seem to check whether the data is whitespace or not - and maybe that's why the no_data_callback is never called.

I guess this would be the wrong place to set @last_data_received_at anyway because it can't know that the data is whitespace without first parsing it.

I'm happy to work on a fix, but I wanted to get some feedback first on whether or not my logic seems sound.

Proxy support?

It doesn't appear that twitter-stream and/or EventMachine supports http proxies. Is that right?

HTTP::Parser::Error: Could not parse data entirely (EDIT: with a fix)

Tested against master today - when I do this to change what I track:

stream.options[:content] = new_content
stream.immediate_reconnect

I get this error (after 2 on_reconnect calls):

[GEM_ROOT]/gems/twitter-stream-0.1.14/lib/twitter/json_stream.rb:121:in `<<'
[GEM_ROOT]/gems/twitter-stream-0.1.14/lib/twitter/json_stream.rb:121:in `receive_data'
[GEM_ROOT]/gems/eventmachine-1.0.0.beta.4/lib/eventmachine.rb:179:in `run_machine'
[GEM_ROOT]/gems/eventmachine-1.0.0.beta.4/lib/eventmachine.rb:179:in `run'

I updated to master and it still applies.

Does anyone else meet this issue?

Problem working with Squid web proxy

I'm having a problem going through Squid despite the changes bgreenlee suggested 3 years ago in Issue #2.

Using Twitter::JSONStream as shown at https://gist.github.com/4179836.js?file=Listen_in_on_Campfire_room.rb , I get the following results - with a little bit of annotation sprinkled in at bottom of JSONStream#send_request, and temporarily backing out the "full url" fix bgreenlee suggested, please read on:

[aburnheimer@hostname ~]$ ./Listen_in_on_Campfire_room.rb
options: {:port=>443, :method=>"GET", :content_type=>"application/x-www-f
orm-urlencoded", :auto_reconnect=>true, :content=>"", :timeout=>0, :param
s=>{}, :user_agent=>"TwitterStream", :host=>"streaming.campfirenow.com", :
proxy=>"http://...proxy_host_name removed...:3128", :auth=>"1234567890abc
def123467890abcdef123456789:X", :ssl=>true, :oauth=>{}, :path=>"/room/123
456/live.json", :filters=>[]}
send_request: ["GET /room/123456/live.json HTTP/1.1", "Host: streaming.ca
mpfirenow.com", "Accept: */*", "User-Agent: TwitterStream", "Authorizatio
n: Basic ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCD", "\r\n
"]
Tried 11 times to connect.

Meanwhile, using curl, I can generate the exact same headers, and it works...

[aburnheimer@hostname ~]$ echo $http_proxy
http://...proxy_host_name removed...:3128
[aburnheimer@hostname ~]$ echo $HTTP_PROXY
http://...proxy_host_name removed...:3128
[aburnheimer@hostname ~]$ curl -k -v -A "TwitterStream" -u 123467890abcdef12346
7890abcdef123456789:X -x $http_proxy https://streaming.campfirenow.com/room/123
467/live.json
* About to connect() to proxy ...proxy_host_name removed... port 3128 (#0)
*   Trying proxy_ip removed... connected
* Connected to proxy_host_name removed (proxy_ip removed) port 3128 (#0)
* Establish HTTP proxy tunnel to streaming.campfirenow.com:443
* Server auth using Basic with user '123467890abcdef123467890abcdef123456789'
> CONNECT streaming.campfirenow.com:443 HTTP/1.1
> Host: streaming.campfirenow.com:443
> User-Agent: TwitterStream
> Proxy-Connection: Keep-Alive
>
< HTTP/1.0 200 Connection established
<
* Proxy replied OK to CONNECT request
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* warning: ignoring value of ssl.verifyhost
* skipping SSL peer certificate verification
* SSL connection using TLS_DHE_RSA_WITH_AES_256_CBC_SHA
* Server certificate:
*   subject: CN=*.campfirenow.com,OU=Domain Control Validated - RapidSSL(R),OU=S
ee www.rapidssl.com/resources/cps (c)11,OU=GT23036785,O=*.campfirenow.com,C=US,
serialNumber=JoGozIJmZwhtvHRQCuYEMvwbe6AYqcjR
*   start date: Dec 24 15:40:13 2011 GMT
*   expire date: Jan 25 19:52:44 2014 GMT
*   common name: *.campfirenow.com
*   issuer: CN=RapidSSL CA,O="GeoTrust, Inc.",C=US
* Server auth using Basic with user '123467890abcdef123467890abcdef123456789'
> GET /room/123467/live.json HTTP/1.1
> Authorization: Basic ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCD
> User-Agent: TwitterStream
> Host: streaming.campfirenow.com
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: nginx
< Date: Sat, 01 Dec 2012 00:41:52 GMT
< Content-Type: application/json
< Transfer-Encoding: chunked
< Connection: keep-alive
<
{"room_id":123467,"created_at":"2012/12/01 00:41:54 +0000","body":"Lorem ipsum
dolor sit amet, consectetur adipiscing elit. Donec a diam lectus. Sed sit amet i
psum mauris. Maecenas congue ligula ac quam viverra nec consectetur ante hendre
rit. Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit ame
t vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. D
onec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit
amet, c

Even just advice on where or how to investigate further would be appreciated. I've been annotating twitter-stream-0.1.16 and eventmachine-1.0.0 to try and determine where these may be diverting from whatever curl is doing right. Unfortunately, I haven't found any clues after lots of trials and errors.

My next move is to switch to the pure_ruby of eventmachine, but the initial resistance I found with that suggested to me to maybe reach out for help first.

input buffer full

Hi, I've implemented a daemon which uses twitter-stream, but we're getting "input buffer full" messages in the logs:

[Chugger 07:05] 4564 tweets
[Chugger 07:06] 4727 tweets
[Chugger 07:07] Error: input buffer full
[Chugger 07:07] Error: input buffer full
[Chugger 07:07] Warning: reconnect in 10 seconds
[Chugger 07:07] 3798 tweets
[Chugger 07:08] 4626 tweets

This looks like something emanating from EM::BufferizedTokenizer. I'm not sure why this would start happening now, but twitter is adding metadata to the JSON so perhaps it's that. A possible solution: up the MAX_LINE_LENGTH (where does your current number 16*1024 come from?).

Cheers,
Kameron

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.