ptwobrussell / mining-the-social-web Goto Github PK

View Code? Open in Web Editor NEW

1.2K 1.2K 491.0 1.97 MB

The official online compendium for Mining the Social Web (O'Reilly, 2011)

Home Page: http://bit.ly/135dHfs

License: Other

Python 19.29% JavaScript 80.63% CSS 0.08%

mining-the-social-web's People

Contributors

Stargazers

Watchers

Forkers

tzuryby gkhnlts rayleyva nborwankar basphil frankk00 eduardocereto deusnet jessegahlla mikedorseyjr odewahn ukituki zhouzhuojie t3j45 hugorodgerbrown dylanthomas abousha chu052 kod3r mrcrabby dasfaha bugrax zakobyte prod-man shotaatago mfadel85 richallensf jwsy davyfeng jcprandini mengd onozka nvoa2358 ssingh10 c0der007 shyamsingh sofianhw shenli dozean udooz moutai laranea pparo hite jagguli katajuta gar dkmonroe nrfm darkfall omarberg azuranop cb372 fone4u colfire sammyrulez semerda neko1990 jonweinberg invinciblejha sp00 surecc bngoogle nava45 ahill1 iabuelruzz aronlindberg trevorhales sffafa ayakix theresia jlmvale tianhaocto srbjwe liang456 huobao36 robbfitzsimmons liule scarimp ctester okumin terrysim samthecobra peterwilliams97 mmuelly donnie-liu bossiernesto kwmalik superdm83 mtfelix cosminstefanxp btagliani elliotk guoyunsky uozias michael54 hackertris jason-chan claichi alexstorer

mining-the-social-web's Issues

Running iPython Notebook for 1st Edition

I have successfully downloaded the repo for the 1st edition and got the iPython notebook in Linux started.

I am using the vagrant instance from 2nd edition.
As a result, upon initial startup of iPython notebook on VM, it says there is a conflict with localhost:8888 and redirected to localhost:8889

Problem is, I can't access localhost:8889 on the browser on my host machine.
I can't seem to find the shadow directory on my host machine to modify bootstrap.sh either.

Below is an image of the initial startup. Any insight is appreciated. Thanks!

IOError: [Errno 22] invalid mode ('r') or filename: url

This error appears when I try to pass a url to apply the TF-IDF with the code:

plus__tf_idf_nltk

and

buzz__tf_idf_nltk

I'm sorry my english...

Exception: Invalid field name: Date! friends_followers__calculate_avg_influence_of_followers.py

I get an Exception: Invalid field name: Date!

when I run: friends_followers__calculate_avg_influence_of_followers.py

It seems it does not like the fields at the line of code:

fields = ['Date', 'Count']

Tried to remove date, but then there error was:

I get an Exception: Invalid field name: Count!

Any input will greatly help.

couchdb-lucene "Search Timed Out"

In case anyone is having trouble with "mailboxes__participants_in_conversations.py" due to timeout (and still having the 300000 secs in local.ini of couched) try adding "stale=ok" to the request:

conn.request('GET', '/%s/_fti/_design/lucene/%s?q=%s&limit=50000&stale=ok'

It worked for me. Hope it helps.

Protovis: Introduction Retweet Visualization

Hi Matthew,

I'm trying to work through the Protovis Retweet Visualization code you referenced in the book:

https://github.com/ptwobrussell/Mining-the-Social-Web/blob/master/python_code/introduction__retweet_visualization.py

After working through some directory issues with the code, I get stuck at:
html = open(HTML_TEMPLATE).read() % (json_data,)
if not os.path.isdir(OUT_DIR):
os.mkdir(OUT_DIR)
f = open(os.path.join(os.getcwd(), OUT_DIR, out_file + ".html"), 'w')
f.write(html)
f.close()

With the error being:
Traceback (most recent call last):
File "C:\Python27\trends", line 343, in
protovis_output = write_protovis_output(g, OUT)
File "C:\Python27\trends", line 276, in write_protovis_output
html = open(HTML_TEMPLATE).read() % (json_data,)
TypeError: not all arguments converted during string formatting

Something to do with string formatting being wrong. I noticed that you resolved the problem for someone running a recipe code with Protovis. You mentioned that is was a problem with Twitter updating the API to 1.1.

I'm a little confused, because in the source code you explicitly mention that you updated the code to API 1.1, so I don't understand why it would crash for me :(

As mentioned earlier, I had to hard code some directories and files as to not get a "This directory or filename does not exist" error. I hope it is not contributing to the string formatting error I'm getting now.

Thanks for your time!

'module' object has no attribute 'Twitter' error

I'm also getting the dreaded 'module' object has no attribute 'Twitter' error.
After reading through the comment threads & misc web forums I can confirm that

The dependent libraries are installed, in easy-install.pth & recognized by my IDE
I downloaded the entire repository (message 2 above this) to make sure that there wasn't a dependency I didn't realize
Using Pything 2.7.2 on a Mac OS 10.8.2 (and PyCharm, for what it's worth)

Tried both the http://code.google.com version of the latest & the github version (I'm assuming github is most recent based on comments on the other site).
Tried removing all files, the egg, etc & re-tracing my steps - same result.

Thanks for any help/suggestions.

-J

p.s. realized it would perhaps be more appropriate to open a new issue rather than comment on an old, closed issue.

Issue with using the Twitter wrapper

Hi Matthew,

I created an issue with api.py of the Twitter wrapper you suggest to use in the book https://github.com/sixohsix/twitter/issues#issue/35. I hope you can check it out as I'm sure others will encounter the same problem.

Thanks!

introduction__retweet_visualization.py

Hi Matthew,

thanks for your very interesting book and the video too - that's what drew me in!

Anyway, I've never used python before. I've setup idle etc and have run the scripts in the early part of the book. But when i try and run this: introduction__retweet_visualization.py as a module in IDLE I get an error:

line 13, in
Q = sys.argv[1]
IndexError: list index out of range

So is this where i add a query somehow, or have i got some sort of system setup error? i installed python 2.7 and the necessary libraries twitter etc...

edit
I got it running by taking out the offending line, and adding the search term directly into:

search_results.append(twitter_search.search(q="mysearchterm", rpp=100, page=page))

but i'm not sure if thats the way its supposed to work! (why was the Q=sys... there in the first place?) any help much appreciated,

thanks,

Gavin

Example 5-4 not working with databases produced by Example 5-14

I have produced a database in couchdb using the_tweet__search.py as in Example 5-14. Now I want to run the_tweet__count_entities_in_tweets.py, from Example 5-4. This is suggested in the book on page 148. Here is the error that I get:

$ python the_tweet__count_entities_in_tweets_from_search.py search-xkcd
Traceback (most recent call last):
File "the_tweet__count_entities_in_tweets_from_search.py", line 83, in
db.view('index/entity_count_by_doc', group=True)],
File "/Library/Python/2.6/site-packages/CouchDB-0.8-py2.6.egg/couchdb/client.py", line 984, in iter
File "/Library/Python/2.6/site-packages/CouchDB-0.8-py2.6.egg/couchdb/client.py", line 1003, in rows
File "/Library/Python/2.6/site-packages/CouchDB-0.8-py2.6.egg/couchdb/client.py", line 990, in _fetch
File "/Library/Python/2.6/site-packages/CouchDB-0.8-py2.6.egg/couchdb/client.py", line 880, in _exec
File "/Library/Python/2.6/site-packages/CouchDB-0.8-py2.6.egg/couchdb/http.py", line 393, in get_json
File "/Library/Python/2.6/site-packages/CouchDB-0.8-py2.6.egg/couchdb/http.py", line 374, in get
File "/Library/Python/2.6/site-packages/CouchDB-0.8-py2.6.egg/couchdb/http.py", line 419, in _request
File "/Library/Python/2.6/site-packages/CouchDB-0.8-py2.6.egg/couchdb/http.py", line 239, in request
File "/Library/Python/2.6/site-packages/CouchDB-0.8-py2.6.egg/couchdb/http.py", line 205, in _try_request_with_retries
socket.error: 54

The CouchDB server is running and in Futon, I can view the database "search-xkcd" after the error. The _design/index document is present. The view index/entity_count_by_doc is present, but is empty.

I read the debug suggestions given to other users and tried to execute the count entities program in a python terminal one piece at a time. When I get to the view definition, I get:

view = ViewDefinition('index', 'entity_count_by_doc', entityCountMapper,
... reduce_fun=summingReducer, language='python')
Traceback (most recent call last):
File "", line 2, in
File "/Library/Python/2.6/site-packages/CouchDB-0.8-py2.6.egg/couchdb/design.py", line 93, in init
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/inspect.py", line 694, in getsource
lines, lnum = getsourcelines(object)
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/inspect.py", line 683, in getsourcelines
lines, lnum = findsource(object)
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/inspect.py", line 531, in findsource
raise IOError('could not get source code')
IOError: could not get source code

Is there something different about the databases produced by the_tweet__search.py from the ones harvested from the user timeline? I have several of those and can run the count entities program on those without problem.

Thank you.
njmccracken

example 4-8

Could you please help me understand why I get the following output with example 4-8?

File "friendsfollowerscrawl.py", line 65, in
crawl([SCREEN_NAME])
File "friendsfollowerscrawl.py", line 59, in crawl
getUserInfo(user_ids=next_queue)
TypeError: getUserInfo() takes at least 2 arguments (1 given)

I just don't understand what's missing here or what I could to to fix it. I'd appreciate any input.

-Manos

Chapter1.py missing examples 1-5 and 1-6

And lots of other stuff that's in the book I just bought (e.g., US examples missing). I'd expect this code to be ahead of what's in the book, not behind it.

I can't get twitter trends in China

My code is as follows . And I get the problem is some thing like net problems.

Traceback (most recent call last):
File "C:\Python27\twitter1", line 27, in
world_trends = twitter_api.trends.place(_id=1)
File "C:\Python27\lib\site-packages\twitter-1.9.1-py2.7.egg\twitter\api.py", line 194, in call
return self._handle_response(req, uri, arg_data, _timeout)
File "C:\Python27\lib\site-packages\twitter-1.9.1-py2.7.egg\twitter\api.py", line 201, in _handle_response
handle = urllib_request.urlopen(req, *_kwargs)
File "C:\Python27\lib\urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 394, in open
response = self._open(req, data)
File "C:\Python27\lib\urllib2.py", line 412, in _open
'_open', req)
File "C:\Python27\lib\urllib2.py", line 372, in _call_chain
result = func(_args)
File "C:\Python27\lib\urllib2.py", line 1207, in https_open
return self.do_open(httplib.HTTPSConnection, req)
File "C:\Python27\lib\urllib2.py", line 1174, in do_open
raise URLError(err)
URLError: <urlopen error [Errno 10060] >

//codes--------------------------------------------------------------
import twitter
import json

Go to http://twitter.com/apps/new to create an app and get these items

See https://dev.twitter.com/docs/auth/oauth for more information on Twitter's OAuth implementation

I have got these credentials.

CONSUMER_KEY = '_'
CONSUMER_SECRET = '_'
OAUTH_TOKEN = '_'
OAUTH_TOKEN_SECRET = '*_*'

auth = twitter.oauth.OAuth(OAUTH_TOKEN, OAUTH_TOKEN_SECRET,
CONSUMER_KEY, CONSUMER_SECRET)

twitter_api = twitter.Twitter(domain='api.twitter.com',
api_version='1.1',
auth=auth
)

With an authenticated twitter_api in existence, you can now use it to query Twitter resources as usual.

However, the trends resource is cleaned up a bit in v1.1, so requests are a bit simpler than in the latest

printing. See https://dev.twitter.com/docs/api/1.1/get/trends/place

The Yahoo! Where On Earth ID for the entire world is 1

WORLD_WOE_ID = 1

Prefix id with the underscore for query string parameterization.

Without the underscore, it's appended to the URL itself

world_trends = twitter_api.trends.place(_id=1)

print json.dumps(world_trends, indent=1)

Query RE: IndexError: list index out of range

Firstly, just wanted to say how fantastic the book is - and this git is just what I was looking for..

I'm learning Python as I'm going - and following all the tutorials I can. But I've hit a stumbling block, I keep getting the following message on friends_followers__get_friends.py (or other friends_followers files)

Traceback (most recent call last):
File "", line 1, in
File "friends_followers__get_friends.py", line 9, in
SCREEN_NAME = sys.argv[1]
IndexError: list index out of range

I've edited the twitter__login.py file using my consumer secret / key but having found about the SCREEN_NAME

Hoping you can help

Thanks

Lee

Example 1.6 Error (TypeError: string indices must be integers)

First off I would like to say I think this is an absolutly great book, I picked it up a little while ago and just got around to looking at it today. I'm having trouble with example 1.6 in the book. I have a text file with a single tweet in json that looks like:

{"favorited": false, "in_reply_to_user_id": null, "contributors": null, "retweeted_status": {"favorited": false, "in_reply_to_user_id": null, "contributors": null, "truncated": false, "text": "The NFL's new Nike uniforms will be unveiled today at 11 a.m. ET in Brooklyn.", "created_at": "Tue Apr 03 13:42:00 +0000 2012", "retweeted": false, "in_reply_to_status_id": null, "coordinates": null, "in_reply_to_user_id_str": null, "entities": {"user_mentions": [], "hashtags": [], "urls": []}, "in_reply_to_status_id_str": null, "id_str": "187173104408199168", "in_reply_to_screen_name": null, "user": {"follow_request_sent": null, "profile_use_background_image": true, "profile_background_image_url_https": "https://si0.twimg.com/profile_background_images/169719095/pfw_background.gif", "verified": false, "profile_image_url_https": "https://si0.twimg.com/profile_images/313743829/PFW_FB1_normal.jpg", "profile_sidebar_fill_color": "F6F6F6", "is_translator": false, "id": 42703684, "profile_text_color": "333333", "followers_count": 18662, "protected": false, "location": "Riverwoods, Illinois", "default_profile_image": false, "listed_count": 768, "utc_offset": -21600, "statuses_count": 9397, "description": "The Authority on Pro Football - \r\nBecome a fan on Facebook at www.facebook.com/ProFootballWeekly", "friends_count": 329, "profile_link_color": "038543", "profile_image_url": "http://a0.twimg.com/profile_images/313743829/PFW_FB1_normal.jpg", "notifications": null, "show_all_inline_media": true, "geo_enabled": false, "profile_background_color": "545c67", "id_str": "42703684", "profile_background_image_url": "http://a0.twimg.com/profile_background_images/169719095/pfw_background.gif", "screen_name": "ProFootballWkly", "lang": "en", "profile_background_tile": true, "favourites_count": 0, "name": "Pro Football Weekly", "url": "http://www.ProFootballWeekly.com", "created_at": "Tue May 26 19:52:34 +0000 2009", "contributors_enabled": false, "time_zone": "Central Time (US & Canada)", "profile_sidebar_border_color": "EEEEEE", "default_profile": false, "following": null}, "place": null, "retweet_count": 5, "geo": null, "id": 187173104408199168, "source": "<a href=\"http://www.tweetdeck.com\" rel=\"nofollow\">TweetDeck</a>"}, "truncated": false, "text": "RT @ProFootballWkly: The NFL's new Nike uniforms will be unveiled today at 11 a.m. ET in Brooklyn.", "created_at": "Tue Apr 03 13:51:42 +0000 2012", "retweeted": false, "in_reply_to_status_id": null, "coordinates": null, "in_reply_to_user_id_str": null, "entities": {"user_mentions": [{"indices": [3, 19], "screen_name": "ProFootballWkly", "id": 42703684, "name": "Pro Football Weekly", "id_str": "42703684"}], "hashtags": [], "urls": []}, "in_reply_to_status_id_str": null, "id_str": "187175547175059456", "in_reply_to_screen_name": null, "user": {"follow_request_sent": null, "profile_use_background_image": true, "profile_background_image_url_https": "https://si0.twimg.com/profile_background_images/431674307/vbgiants_545_140211.jpg", "verified": false, "profile_image_url_https": "https://si0.twimg.com/profile_images/1901950920/asp---AAq2Ez.jpg_large_normal.jpg", "profile_sidebar_fill_color": "211d18", "is_translator": false, "id": 125403702, "profile_text_color": "736d66", "followers_count": 1373, "protected": false, "location": "Queretaro, Mexico", "default_profile_image": false, "listed_count": 15, "utc_offset": -21600, "statuses_count": 15347, "description": "Esposo, Padre, hermano de muchos, 49er y SFGiant de nacimiento,Detroit Piston y  Red Wing. colaborador en gradacentral.com de NFL y MLB", "friends_count": 1798, "profile_link_color": "f9a65e", "profile_image_url": "http://a0.twimg.com/profile_images/1901950920/asp---AAq2Ez.jpg_large_normal.jpg", "notifications": null, "show_all_inline_media": false, "geo_enabled": false, "profile_background_color": "000000", "id_str": "125403702", "profile_background_image_url": "http://a0.twimg.com/profile_background_images/431674307/vbgiants_545_140211.jpg", "screen_name": "asosa49", "lang": "es", "profile_background_tile": false, "favourites_count": 31, "name": "alvaro sosa", "url": "http://www.49ers.com", "created_at": "Mon Mar 22 18:04:40 +0000 2010", "contributors_enabled": false, "time_zone": "Mexico City", "profile_sidebar_border_color": "413d39", "default_profile": false, "following": null}, "place": null, "retweet_count": 5, "geo": null, "id": 187175547175059456, "source": "web"}

I'm trying to load the text field values so that I can move on to counting them (example 1.7). I'm trying the following code as shown / interpreted from the book:

import json
import sys

d = []
d = json.loads(open(sys.argv[1]).read())
print d
print json.dumps(d, sort_keys=True, indent=4)

tweets = [ r['text'] \
           for result in d \
                for r in result['results'] ]
#print tweets

my output looks like:

python example1p7.py test.json 
{u'user': {u'follow_request_sent': None, u'profile_use_background_image': True, u'id': 125403702, u'verified': False, u'profile_image_url_https': u'https://si0.twimg.com/profile_images/1901950920/asp---AAq2Ez.jpg_large_normal.jpg', u'profile_sidebar_fill_color': u'211d18', u'geo_enabled': False, u'profile_text_color': u'736d66', u'followers_count': 1373, u'profile_sidebar_border_color': u'413d39', u'location': u'Queretaro, Mexico', u'default_profile_image': False, u'listed_count': 15, u'utc_offset': -21600, u'statuses_count': 15347, u'description': u'Esposo, Padre, hermano de muchos, 49er y SFGiant de nacimiento,Detroit Piston y  Red Wing. colaborador en gradacentral.com de NFL y MLB', u'friends_count': 1798, u'profile_link_color': u'f9a65e', u'profile_image_url': u'http://a0.twimg.com/profile_images/1901950920/asp---AAq2Ez.jpg_large_normal.jpg', u'notifications': None, u'show_all_inline_media': False, u'profile_background_image_url_https': u'https://si0.twimg.com/profile_background_images/431674307/vbgiants_545_140211.jpg', u'profile_background_color': u'000000', u'id_str': u'125403702', u'profile_background_image_url': u'http://a0.twimg.com/profile_background_images/431674307/vbgiants_545_140211.jpg', u'name': u'alvaro sosa', u'lang': u'es', u'following': None, u'profile_background_tile': False, u'favourites_count': 31, u'screen_name': u'asosa49', u'url': u'http://www.49ers.com', u'created_at': u'Mon Mar 22 18:04:40 +0000 2010', u'contributors_enabled': False, u'time_zone': u'Mexico City', u'protected': False, u'default_profile': False, u'is_translator': False}, u'favorited': False, u'contributors': None, u'retweeted_status': {u'user': {u'follow_request_sent': None, u'profile_use_background_image': True, u'id': 42703684, u'verified': False, u'profile_image_url_https': u'https://si0.twimg.com/profile_images/313743829/PFW_FB1_normal.jpg', u'profile_sidebar_fill_color': u'F6F6F6', u'geo_enabled': False, u'profile_text_color': u'333333', u'followers_count': 18662, u'profile_sidebar_border_color': u'EEEEEE', u'location': u'Riverwoods, Illinois', u'default_profile_image': False, u'listed_count': 768, u'utc_offset': -21600, u'statuses_count': 9397, u'description': u'The Authority on Pro Football - \r\nBecome a fan on Facebook at www.facebook.com/ProFootballWeekly', u'friends_count': 329, u'profile_link_color': u'038543', u'profile_image_url': u'http://a0.twimg.com/profile_images/313743829/PFW_FB1_normal.jpg', u'notifications': None, u'show_all_inline_media': True, u'profile_background_image_url_https': u'https://si0.twimg.com/profile_background_images/169719095/pfw_background.gif', u'profile_background_color': u'545c67', u'id_str': u'42703684', u'profile_background_image_url': u'http://a0.twimg.com/profile_background_images/169719095/pfw_background.gif', u'name': u'Pro Football Weekly', u'lang': u'en', u'following': None, u'profile_background_tile': True, u'favourites_count': 0, u'screen_name': u'ProFootballWkly', u'url': u'http://www.ProFootballWeekly.com', u'created_at': u'Tue May 26 19:52:34 +0000 2009', u'contributors_enabled': False, u'time_zone': u'Central Time (US & Canada)', u'protected': False, u'default_profile': False, u'is_translator': False}, u'favorited': False, u'contributors': None, u'truncated': False, u'source': u'<a href="http://www.tweetdeck.com" rel="nofollow">TweetDeck</a>', u'text': u"The NFL's new Nike uniforms will be unveiled today at 11 a.m. ET in Brooklyn.", u'created_at': u'Tue Apr 03 13:42:00 +0000 2012', u'retweeted': False, u'in_reply_to_status_id_str': None, u'coordinates': None, u'id': 187173104408199168, u'entities': {u'user_mentions': [], u'hashtags': [], u'urls': []}, u'in_reply_to_status_id': None, u'in_reply_to_screen_name': None, u'in_reply_to_user_id': None, u'place': None, u'retweet_count': 5, u'geo': None, u'in_reply_to_user_id_str': None, u'id_str': u'187173104408199168'}, u'truncated': False, u'source': u'web', u'text': u"RT @ProFootballWkly: The NFL's new Nike uniforms will be unveiled today at 11 a.m. ET in Brooklyn.", u'created_at': u'Tue Apr 03 13:51:42 +0000 2012', u'retweeted': False, u'in_reply_to_status_id_str': None, u'coordinates': None, u'id': 187175547175059456, u'entities': {u'user_mentions': [{u'indices': [3, 19], u'id_str': u'42703684', u'screen_name': u'ProFootballWkly', u'name': u'Pro Football Weekly', u'id': 42703684}], u'hashtags': [], u'urls': []}, u'in_reply_to_status_id': None, u'in_reply_to_screen_name': None, u'in_reply_to_user_id': None, u'place': None, u'retweet_count': 5, u'geo': None, u'in_reply_to_user_id_str': None, u'id_str': u'187175547175059456'}
{
    "contributors": null, 
    "coordinates": null, 
    "created_at": "Tue Apr 03 13:51:42 +0000 2012", 
    "entities": {
        "hashtags": [], 
        "urls": [], 
        "user_mentions": [
            {
                "id": 42703684, 
                "id_str": "42703684", 
                "indices": [
                    3, 
                    19
                ], 
                "name": "Pro Football Weekly", 
                "screen_name": "ProFootballWkly"
            }
        ]
    }, 
    "favorited": false, 
    "geo": null, 
    "id": 187175547175059456, 
    "id_str": "187175547175059456", 
    "in_reply_to_screen_name": null, 
    "in_reply_to_status_id": null, 
    "in_reply_to_status_id_str": null, 
    "in_reply_to_user_id": null, 
    "in_reply_to_user_id_str": null, 
    "place": null, 
    "retweet_count": 5, 
    "retweeted": false, 
    "retweeted_status": {
        "contributors": null, 
        "coordinates": null, 
        "created_at": "Tue Apr 03 13:42:00 +0000 2012", 
        "entities": {
            "hashtags": [], 
            "urls": [], 
            "user_mentions": []
        }, 
        "favorited": false, 
        "geo": null, 
        "id": 187173104408199168, 
        "id_str": "187173104408199168", 
        "in_reply_to_screen_name": null, 
        "in_reply_to_status_id": null, 
        "in_reply_to_status_id_str": null, 
        "in_reply_to_user_id": null, 
        "in_reply_to_user_id_str": null, 
        "place": null, 
        "retweet_count": 5, 
        "retweeted": false, 
        "source": "<a href=\"http://www.tweetdeck.com\" rel=\"nofollow\">TweetDeck</a>", 
        "text": "The NFL's new Nike uniforms will be unveiled today at 11 a.m. ET in Brooklyn.", 
        "truncated": false, 
        "user": {
            "contributors_enabled": false, 
            "created_at": "Tue May 26 19:52:34 +0000 2009", 
            "default_profile": false, 
            "default_profile_image": false, 
            "description": "The Authority on Pro Football - \r\nBecome a fan on Facebook at www.facebook.com/ProFootballWeekly", 
            "favourites_count": 0, 
            "follow_request_sent": null, 
            "followers_count": 18662, 
            "following": null, 
            "friends_count": 329, 
            "geo_enabled": false, 
            "id": 42703684, 
            "id_str": "42703684", 
            "is_translator": false, 
            "lang": "en", 
            "listed_count": 768, 
            "location": "Riverwoods, Illinois", 
            "name": "Pro Football Weekly", 
            "notifications": null, 
            "profile_background_color": "545c67", 
            "profile_background_image_url": "http://a0.twimg.com/profile_background_images/169719095/pfw_background.gif", 
            "profile_background_image_url_https": "https://si0.twimg.com/profile_background_images/169719095/pfw_background.gif", 
            "profile_background_tile": true, 
            "profile_image_url": "http://a0.twimg.com/profile_images/313743829/PFW_FB1_normal.jpg", 
            "profile_image_url_https": "https://si0.twimg.com/profile_images/313743829/PFW_FB1_normal.jpg", 
            "profile_link_color": "038543", 
            "profile_sidebar_border_color": "EEEEEE", 
            "profile_sidebar_fill_color": "F6F6F6", 
            "profile_text_color": "333333", 
            "profile_use_background_image": true, 
            "protected": false, 
            "screen_name": "ProFootballWkly", 
            "show_all_inline_media": true, 
            "statuses_count": 9397, 
            "time_zone": "Central Time (US & Canada)", 
            "url": "http://www.ProFootballWeekly.com", 
            "utc_offset": -21600, 
            "verified": false
        }
    }, 
    "source": "web", 
    "text": "RT @ProFootballWkly: The NFL's new Nike uniforms will be unveiled today at 11 a.m. ET in Brooklyn.", 
    "truncated": false, 
    "user": {
        "contributors_enabled": false, 
        "created_at": "Mon Mar 22 18:04:40 +0000 2010", 
        "default_profile": false, 
        "default_profile_image": false, 
        "description": "Esposo, Padre, hermano de muchos, 49er y SFGiant de nacimiento,Detroit Piston y  Red Wing. colaborador en gradacentral.com de NFL y MLB", 
        "favourites_count": 31, 
        "follow_request_sent": null, 
        "followers_count": 1373, 
        "following": null, 
        "friends_count": 1798, 
        "geo_enabled": false, 
        "id": 125403702, 
        "id_str": "125403702", 
        "is_translator": false, 
        "lang": "es", 
        "listed_count": 15, 
        "location": "Queretaro, Mexico", 
        "name": "alvaro sosa", 
        "notifications": null, 
        "profile_background_color": "000000", 
        "profile_background_image_url": "http://a0.twimg.com/profile_background_images/431674307/vbgiants_545_140211.jpg", 
        "profile_background_image_url_https": "https://si0.twimg.com/profile_background_images/431674307/vbgiants_545_140211.jpg", 
        "profile_background_tile": false, 
        "profile_image_url": "http://a0.twimg.com/profile_images/1901950920/asp---AAq2Ez.jpg_large_normal.jpg", 
        "profile_image_url_https": "https://si0.twimg.com/profile_images/1901950920/asp---AAq2Ez.jpg_large_normal.jpg", 
        "profile_link_color": "f9a65e", 
        "profile_sidebar_border_color": "413d39", 
        "profile_sidebar_fill_color": "211d18", 
        "profile_text_color": "736d66", 
        "profile_use_background_image": true, 
        "protected": false, 
        "screen_name": "asosa49", 
        "show_all_inline_media": false, 
        "statuses_count": 15347, 
        "time_zone": "Mexico City", 
        "url": "http://www.49ers.com", 
        "utc_offset": -21600, 
        "verified": false
    }
}
Traceback (most recent call last):
  File "example1p7.py", line 11, in <module>
    for r in result['results'] ]
TypeError: string indices must be integers

I've tried a number of things but nothing seems to work, I always end up with the "TypeError: string indicies must be integers"

Any help would be much appreciated.

uppercase letters causing issues when used as couchdb table names (the_tweet__harvest_timeline.py)

When using the script the_tweet__harvest_timeline.py, I had errors returned at the point of creation of db tables whenever a screen_names from twitter contained uppercase letters. I found that the install of couchdb on my windows machine did not by default allow uppercase letters in the database name, which causes a problem since twitter screen names often have uppercase letters. If anyone else's couch db installation has this limitation, I recommend some pre-processing of the twitter screen names to turn them all into lowercase before using them as part of a couchdb table name. Instead of line 66 which reads: DB = '%s-%s' % (DB, USER), my code uses the following lines:

screennamelower = USER.lower() # added to make names lowercase
DB = '%s-%s' % (DB, screennamelower)

keyError: 'next_results' on updated introduction__retweet_visualization.py

Hello Matt, im having a problem with your new script introduction__retweet_visualization.py
Trying to run the script but i get an error:
next_results = search_results['search_metadata']['next_results']
KeyError: 'next_results'
Any idea? Thnx anyway

iPython -> IPython

Hi,

I saw here and there that IPython is written iPython.
I know Fernando Perez keep saying it shoudl be uper case I in his talks,
so I guess when you have the occasion if you can update, it would be great.

Thanks !

Socket error 104 from Example 3-6

When I run Example 3-6 (code as downloaded from github) on a CouchDB datebase created from the Enron corpus, on
an Ubuntu 10.04 machine, I get:

Finding docs dated from 2002-1-1 to 2002-2-1
Traceback (most recent call last):
File "mailboxes_map_by_date.py", line 46, in
for row in db.view('index/by_date_time', startkey=start, endkey=end):
File "/usr/local/lib/python2.7/dist-packages/CouchDB-0.8-py2.7.egg/couchdb/client.py", line 984, in iter
return iter(self.rows)
File "/usr/local/lib/python2.7/dist-packages/CouchDB-0.8-py2.7.egg/couchdb/client.py", line 1003, in rows
self._fetch()
File "/usr/local/lib/python2.7/dist-packages/CouchDB-0.8-py2.7.egg/couchdb/client.py", line 990, in _fetch
data = self.view._exec(self.options)
File "/usr/local/lib/python2.7/dist-packages/CouchDB-0.8-py2.7.egg/couchdb/client.py", line 880, in _exec
_, _, data = self.resource.get_json(*_self._encode_options(options))
File "/usr/local/lib/python2.7/dist-packages/CouchDB-0.8-py2.7.egg/couchdb/http.py", line 393, in get_json
status, headers, data = self.get(_a, *_k)
File "/usr/local/lib/python2.7/dist-packages/CouchDB-0.8-py2.7.egg/couchdb/http.py", line 374, in get
return self._request('GET', path, headers=headers, *_params)
File "/usr/local/lib/python2.7/dist-packages/CouchDB-0.8-py2.7.egg/couchdb/http.py", line 419, in _request
credentials=self.credentials)
File "/usr/local/lib/python2.7/dist-packages/CouchDB-0.8-py2.7.egg/couchdb/http.py", line 239, in request
resp = _try_request_with_retries(iter(self.retry_delays))
File "/usr/local/lib/python2.7/dist-packages/CouchDB-0.8-py2.7.egg/couchdb/http.py", line 205, in _try_request_with_retries
raise e
socket.error: 104

This is, I think, a "Connection reset by peer" error.
I've looked through the various python and CouchDB forums and not found anything that appears relevant.
Any suggestions gratefully received

Arthur

stuck on example 5-4:

Hello there,

I have been really enjoying working through this book, but I cannot seem to get example 5-4 to run. I successfully ran example 5-3 and harvested the tweets from a timeline, and I know that CouchDB is otherwise working well. I noticed in the errata for the book that someone else has submitted the same error that I am about to describe, so hopefully you can point me in the right direction. I am new to python so I am probably overlooking something very obvious. Any help is greatly appreciated!

This is what I see in Terminal:

Traceback (most recent call last):
File "tweet_freq.py", line 82, in
db.view('index/entity_count_by_doc', group=True)],
File "build/bdist.macosx-10.6-intel/egg/couchdb/client.py", line 984, in iter
File "build/bdist.macosx-10.6-intel/egg/couchdb/client.py", line 1003, in rows
File "build/bdist.macosx-10.6-intel/egg/couchdb/client.py", line 990, in _fetch
File "build/bdist.macosx-10.6-intel/egg/couchdb/client.py", line 880, in _exec
File "build/bdist.macosx-10.6-intel/egg/couchdb/http.py", line 393, in get_json
File "build/bdist.macosx-10.6-intel/egg/couchdb/http.py", line 374, in get
File "build/bdist.macosx-10.6-intel/egg/couchdb/http.py", line 419, in _request
File "build/bdist.macosx-10.6-intel/egg/couchdb/http.py", line 310, in request
couchdb.http.ServerError: (500, (u'EXIT', u'{{badmatch,[]},\n [{couch_query_servers,new_process,3,\n [{file,"/Users/hs/prj/build-couchdb/dependencies/couchdb/src/couchdb/couch_query_servers.erl"},\n {line,472}]},\n {couch_query_servers,lang_proc,3,\n [{file,"/Users/hs/prj/build-couchdb/dependencies/couchdb/src/couchdb/couch_query_servers.erl"},\n {line,462}]},\n {couch_query_servers,handle_call,3,\n [{file,"/Users/hs/prj/build-couchdb/dependencies/couchdb/src/couchdb/couch_query_servers.erl"},\n {line,334}]},\n {gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,578}]},\n {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}'))

When I go into CouchDB Futon, I see that the entity_count_by_doc document HAS been created, but I get the following error popup:

Error: EXIT

{{badmatch,[]},
[{couch_query_servers,new_process,3,
[{file,"/Users/hs/prj/build-couchdb/dependencies/couchdb/src/couchdb/couch_query_servers.erl"},
{line,472}]},
{couch_query_servers,lang_proc,3,
[{file,"/Users/hs/prj/build-couchdb/dependencies/couchdb/src/couchdb/couch_query_servers.erl"},
{line,462}]},
{couch_query_servers,handle_call,3,
[{file,"/Users/hs/prj/build-couchdb/dependencies/couchdb/src/couchdb/couch_query_servers.erl"},
{line,334}]},
{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,578}]},
{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}

Any advice at all would be so helpful. I look forward to your response and suggestions.

linkedin__analyze_titles.py

Traceback (most recent call last):
File "C:/Users/Desktop/linkedin_analyze_titles.py", line 6, in
CSV_FILE = sys.argv[1]
IndexError: list index out of range

and when I fix this error with sys.argv[0]

File "C:/Users/Desktop/linkedin_analyze_titles.py", line 30, in
titles.extend([t.strip() for t in contact['Job Title'].split('/')
KeyError: 'Job Title'

example 5-4 gives long error

it might be similar to issue #9
#9
But in my case, dateutil is not the case.

my environment;
MacOSX 10.6.8
Python 27
couchdb 1.2.0
python-couchdb 0.8
virtualenv 1.8.2

executing the code example 5-4 raises an error.
Do you have any idea why?

soichi

------error

(py27)soichi% [~/Dropbox/Inspiron]
=CRASH REPORT==== 16-Dec-2012::20:24:29 ===
crasher:
initial call: couch_os_process:init/1
pid: <0.6432.0>
registered_name: []
exception exit: {function_clause,
[{couch_os_process,handle_info,
[{#Port<0.2748>,
{data,
{eol,
<<"[[], [[null, 274152780170682369]]]">>}}},
{os_proc,
"/Users/soichi/domains/py27/bin/couchpy",
#Port<0.2748>,
#Fun<couch_os_process.2.55582190>,
#Fun<couch_os_process.3.55582190>,5000}],
[{file,"couch_os_process.erl"},{line,207}]},
{gen_server,handle_msg,5,
[{file,"gen_server.erl"},{line,607}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,227}]}]}
in function gen_server:terminate/6 (gen_server.erl, line 747)
ancestors: [couch_query_servers,couch_secondary_services,
couch_server_sup,<0.31.0>]
messages: []
links: [<0.6431.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 2584
stack_size: 24
reductions: 1741
neighbours:
neighbour: [{pid,<0.6430.0>},
{registered_name,[]},
{initial_call,{couch_work_queue,init,['Argument__1']}},
{current_function,{gen_server,loop,6}},
{ancestors,[<0.6428.0>]},
{messages,[]},
{links,[<0.6428.0>]},
{dictionary,[]},
{trap_exit,false},
{status,waiting},
{heap_size,233},
{stack_size,9},
{reductions,37}]
neighbour: [{pid,<0.6433.0>},
{registered_name,[]},
{initial_call,{erlang,apply,2}},
{current_function,{gen,do_call,4}},
{ancestors,[]},
{messages,[]},
{links,[<0.6428.0>]},
{dictionary,
[{task_status_props,
[{changes_done,0},
{database,<<"tweets-user-timeline-timoreilly">>},
{design_document,<<"_design/index">>},
{progress,0},
{started_on,1355657069},
{total_changes,642},
{type,indexer},
{updated_on,1355657069}]},
{task_status_update,{{0,0,0},500000}}]},
{trap_exit,false},
{status,waiting},
{heap_size,610},
{stack_size,25},
{reductions,111}]
neighbour: [{pid,<0.6429.0>},
{registered_name,[]},
{initial_call,{couch_work_queue,init,['Argument__1']}},
{current_function,{gen_server,loop,6}},
{ancestors,[<0.6428.0>]},
{messages,[]},
{links,[<0.6428.0>]},
{dictionary,[]},
{trap_exit,false},
{status,waiting},
{heap_size,987},
{stack_size,9},
{reductions,904}]
neighbour: [{pid,<0.6428.0>},
{registered_name,[]},
{initial_call,{erlang,apply,2}},
{current_function,{gen,do_call,4}},
{ancestors,[]},
{messages,[]},
{links,[<0.6429.0>,<0.6431.0>,<0.6433.0>,<0.6430.0>,
<0.6424.0>]},
{dictionary,[]},
{trap_exit,false},
{status,waiting},
{heap_size,2584},
{stack_size,61},
{reductions,5322}]
neighbour: [{pid,<0.6431.0>},
{registered_name,[]},
{initial_call,{erlang,apply,2}},
{current_function,{gen,do_call,4}},
{ancestors,[]},
{messages,
[{#Ref<0.0.0.76288>,
{ok,{json,
<<"{"log": "Traceback (most recent call last):\n File \"/Users/soichi/domains/py27/lib/python2.7/site-packages/couchdb/view.py\", line 79, in map_doc\n results.append([[key, value] for key, value in function(doc)])\n File \"\", line 28, in entityCountMapper\nKeyError: 'hastags'\n"}">>}}}]},
{links,[<0.6428.0>,<0.6432.0>]},
{dictionary,[]},
{trap_exit,false},
{status,runnable},
{heap_size,4181},
{stack_size,196},
{reductions,741}]
[error] [<0.6424.0>] ** Generic server <0.6424.0> terminating
** Last message in was {'EXIT',<0.6428.0>,
{function_clause,
[{couch_os_process,handle_info,
[{#Port<0.2748>,
{data,
{eol,<<"[[], [[null, 274152780170682369]]]">>}}},
{os_proc,
"/Users/soichi/domains/py27/bin/couchpy",
#Port<0.2748>,#Fun<couch_os_process.2.55582190>,
#Fun<couch_os_process.3.55582190>,5000}],
[{file,"couch_os_process.erl"},{line,207}]},
{gen_server,handle_msg,5,
[{file,"gen_server.erl"},{line,607}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,227}]}]}}
** When Server state == {group_state,undefined,
<<"tweets-user-timeline-timoreilly">>,
{"/usr/local/var/lib/couchdb",
<<"tweets-user-timeline-timoreilly">>,
{group,
<<237,92,122,114,162,28,196,163,160,252,121,84,46,
99,167,36>>,
nil,<<"_design/index">>,<<"python">>,[],
[{view,0,0,0,[],
<<"def entityCountMapper(doc):\n if not doc.get('entities'):\n import twitter_text\n def getEntities(tweet):\n extractor = twitter_text.Extractor(tweet['text'])\n entities = {}\n entities['user_mentions'] = []\n\n for um in extractor.extract_mentioned_screen_names_with_indices():\n entities['user_mentions'].append(um)\n entities['hashtags'] = []\n for ht in extractor.extract_hashtags_with_indices():\n ht['text'] = ht['hashtag']\n del ht['hashtag']\n entities['hashtags'].append(ht)\n entities['urls'] = []\n for url in extractor.extract_urls_with_indices():\n entities['urls'].append(url)\n return entities\n\n doc['entities'] = getEntities(doc)\n\n\n if doc['entities'].get('user_mentions'):\n for user_mention in doc['entities']['user_mentions']:\n yield ('@' + user_mention['screen_name'].lower(), [doc['_id'], doc['id']])\n if doc['entities'].get('hashtags'):\n for hashtag in doc['entities']['hastags']:\n yield ('#' + hashtag['text'], [doc['_id'], doc['id']])\n if doc['entites'].get('urls'):\n for url in doc['entites']['urls']:\n yield (url['url'], [doc['_id'], doc['id']])">>,
nil,
[{<<"entity_count_by_doc">>,
<<"def summingReducer(keys, values, rereduce):\n if rereduce:\n return sum(values)\n else:\n return len(values)">>}],
[]},
{view,1,0,0,[],
<<"def idMapper(doc):\n yield(None, doc['id'])">>,
nil,
[{<<"max_tweet_id">>,
<<"def maxFindingReducer(keys, values, rereduce):\n return max(values)">>}],
[]}],
{[]},
nil,0,0,nil,nil}},
{group,
<<237,92,122,114,162,28,196,163,160,252,121,84,46,99,
167,36>>,
<0.6425.0>,<<"_design/index">>,<<"python">>,[],
[{view,0,0,0,[],
<<"def entityCountMapper(doc):\n if not doc.get('entities'):\n import twitter_text\n def getEntities(tweet):\n extractor = twitter_text.Extractor(tweet['text'])\n entities = {}\n entities['user_mentions'] = []\n\n for um in extractor.extract_mentioned_screen_names_with_indices():\n entities['user_mentions'].append(um)\n entities['hashtags'] = []\n for ht in extractor.extract_hashtags_with_indices():\n ht['text'] = ht['hashtag']\n del ht['hashtag']\n entities['hashtags'].append(ht)\n entities['urls'] = []\n for url in extractor.extract_urls_with_indices():\n entities['urls'].append(url)\n return entities\n\n doc['entities'] = getEntities(doc)\n\n\n if doc['entities'].get('user_mentions'):\n for user_mention in doc['entities']['user_mentions']:\n yield ('@' + user_mention['screen_name'].lower(), [doc['_id'], doc['id']])\n if doc['entities'].get('hashtags'):\n for hashtag in doc['entities']['hastags']:\n yield ('#' + hashtag['text'], [doc['_id'], doc['id']])\n if doc['entites'].get('urls'):\n for url in doc['entites']['urls']:\n yield (url['url'], [doc['_id'], doc['id']])">>,
{btree,<0.6425.0>,nil,
#Fun<couch_btree.3.133731799>,
#Fun<couch_btree.4.133731799>,
#Fun<couch_view.less_json_ids.2>,
#Fun<couch_view_group.10.75991388>,snappy},
[{<<"entity_count_by_doc">>,
<<"def summingReducer(keys, values, rereduce):\n if rereduce:\n return sum(values)\n else:\n return len(values)">>}],
[]},
{view,1,0,0,[],
<<"def idMapper(doc):\n yield(None, doc['id'])">>,
{btree,<0.6425.0>,nil,
#Fun<couch_btree.3.133731799>,
#Fun<couch_btree.4.133731799>,
#Fun<couch_view.less_json_ids.2>,
#Fun<couch_view_group.10.75991388>,snappy},
[{<<"max_tweet_id">>,
<<"def maxFindingReducer(keys, values, rereduce):\n return max(values)">>}],
[]}],
{[]},
{btree,<0.6425.0>,nil,
#Fun<couch_btree.3.133731799>,
#Fun<couch_btree.4.133731799>,
#Fun<couch_btree.5.133731799>,nil,snappy},
0,0,nil,nil},
<0.6428.0>,nil,false,
[{{<0.139.0>,#Ref<0.0.0.76126>},650}],
<0.6427.0>,false}
** Reason for termination ==
** {function_clause,
[{couch_os_process,handle_info,
[{#Port<0.2748>,
{data,{eol,<<"[[], [[null, 274152780170682369]]]">>}}},
{os_proc,"/Users/soichi/domains/py27/bin/couchpy",#Port<0.2748>,
#Fun<couch_os_process.2.55582190>,
#Fun<couch_os_process.3.55582190>,5000}],
[{file,"couch_os_process.erl"},{line,207}]},
{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,607}]},
{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}

=ERROR REPORT==== 16-Dec-2012::20:24:29 ===
** Generic server <0.6424.0> terminating
** Last message in was {'EXIT',<0.6428.0>,
{function_clause,
[{couch_os_process,handle_info,
[{#Port<0.2748>,
{data,
{eol,<<"[[], [[null, 274152780170682369]]]">>}}},
{os_proc,
"/Users/soichi/domains/py27/bin/couchpy",
#Port<0.2748>,#Fun<couch_os_process.2.55582190>,
#Fun<couch_os_process.3.55582190>,5000}],
[{file,"couch_os_process.erl"},{line,207}]},
{gen_server,handle_msg,5,
[{file,"gen_server.erl"},{line,607}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,227}]}]}}
** When Server state == {group_state,undefined,
<<"tweets-user-timeline-timoreilly">>,
{"/usr/local/var/lib/couchdb",
<<"tweets-user-timeline-timoreilly">>,
{group,
<<237,92,122,114,162,28,196,163,160,252,121,84,46,
99,167,36>>,
nil,<<"_design/index">>,<<"python">>,[],
[{view,0,0,0,[],
<<"def entityCountMapper(doc):\n if not doc.get('entities'):\n import twitter_text\n def getEntities(tweet):\n extractor = twitter_text.Extractor(tweet['text'])\n entities = {}\n entities['user_mentions'] = []\n\n for um in extractor.extract_mentioned_screen_names_with_indices():\n entities['user_mentions'].append(um)\n entities['hashtags'] = []\n for ht in extractor.extract_hashtags_with_indices():\n ht['text'] = ht['hashtag']\n del ht['hashtag']\n entities['hashtags'].append(ht)\n entities['urls'] = []\n for url in extractor.extract_urls_with_indices():\n entities['urls'].append(url)\n return entities\n\n doc['entities'] = getEntities(doc)\n\n\n if doc['entities'].get('user_mentions'):\n for user_mention in doc['entities']['user_mentions']:\n yield ('@' + user_mention['screen_name'].lower(), [doc['_id'], doc['id']])\n if doc['entities'].get('hashtags'):\n for hashtag in doc['entities']['hastags']:\n yield ('#' + hashtag['text'], [doc['_id'], doc['id']])\n if doc['entites'].get('urls'):\n for url in doc['entites']['urls']:\n yield (url['url'], [doc['_id'], doc['id']])">>,
nil,
[{<<"entity_count_by_doc">>,
<<"def summingReducer(keys, values, rereduce):\n if rereduce:\n return sum(values)\n else:\n return len(values)">>}],
[]},
{view,1,0,0,[],
<<"def idMapper(doc):\n yield(None, doc['id'])">>,
nil,
[{<<"max_tweet_id">>,
<<"def maxFindingReducer(keys, values, rereduce):\n return max(values)">>}],
[]}],
{[]},
nil,0,0,nil,nil}},
{group,
<<237,92,122,114,162,28,196,163,160,252,121,84,46,99,
167,36>>,
<0.6425.0>,<<"_design/index">>,<<"python">>,[],
[{view,0,0,0,[],
<<"def entityCountMapper(doc):\n if not doc.get('entities'):\n import twitter_text\n def getEntities(tweet):\n extractor = twitter_text.Extractor(tweet['text'])\n entities = {}\n entities['user_mentions'] = []\n\n for um in extractor.extract_mentioned_screen_names_with_indices():\n entities['user_mentions'].append(um)\n entities['hashtags'] = []\n for ht in extractor.extract_hashtags_with_indices():\n ht['text'] = ht['hashtag']\n del ht['hashtag']\n entities['hashtags'].append(ht)\n entities['urls'] = []\n for url in extractor.extract_urls_with_indices():\n entities['urls'].append(url)\n return entities\n\n doc['entities'] = getEntities(doc)\n\n\n if doc['entities'].get('user_mentions'):\n for user_mention in doc['entities']['user_mentions']:\n yield ('@' + user_mention['screen_name'].lower(), [doc['_id'], doc['id']])\n if doc['entities'].get('hashtags'):\n for hashtag in doc['entities']['hastags']:\n yield ('#' + hashtag['text'], [doc['_id'], doc['id']])\n if doc['entites'].get('urls'):\n for url in doc['entites']['urls']:\n yield (url['url'], [doc['_id'], doc['id']])">>,
{btree,<0.6425.0>,nil,
#Fun<couch_btree.3.133731799>,
#Fun<couch_btree.4.133731799>,
#Fun<couch_view.less_json_ids.2>,
#Fun<couch_view_group.10.75991388>,snappy},
[{<<"entity_count_by_doc">>,
<<"def summingReducer(keys, values, rereduce):\n if rereduce:\n return sum(values)\n else:\n return len(values)">>}],
[]},
{view,1,0,0,[],
<<"def idMapper(doc):\n yield(None, doc['id'])">>,
{btree,<0.6425.0>,nil,
#Fun<couch_btree.3.133731799>,
#Fun<couch_btree.4.133731799>,
#Fun<couch_view.less_json_ids.2>,
#Fun<couch_view_group.10.75991388>,snappy},
[{<<"max_tweet_id">>,
<<"def maxFindingReducer(keys, values, rereduce):\n return max(values)">>}],
[]}],
{[]},
{btree,<0.6425.0>,nil,
#Fun<couch_btree.3.133731799>,
#Fun<couch_btree.4.133731799>,
#Fun<couch_btree.5.133731799>,nil,snappy},
0,0,nil,nil},
<0.6428.0>,nil,false,
[{{<0.139.0>,#Ref<0.0.0.76126>},650}],
<0.6427.0>,false}
** Reason for termination ==
** {function_clause,
[{couch_os_process,handle_info,
[{#Port<0.2748>,
{data,{eol,<<"[[], [[null, 274152780170682369]]]">>}}},
{os_proc,"/Users/soichi/domains/py27/bin/couchpy",#Port<0.2748>,
#Fun<couch_os_process.2.55582190>,
#Fun<couch_os_process.3.55582190>,5000}],
[{file,"couch_os_process.erl"},{line,207}]},
{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,607}]},
{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}
[error] [<0.6424.0>] {error_report,<0.30.0>,
{<0.6424.0>,crash_report,
[[{initial_call,
{couch_view_group,init,['Argument__1']}},
{pid,<0.6424.0>},
{registered_name,[]},
{error_info,
{exit,
{function_clause,
[{couch_os_process,handle_info,
[{#Port<0.2748>,
{data,
{eol,
<<"[[], [[null, 274152780170682369]]]">>}}},
{os_proc,
"/Users/soichi/domains/py27/bin/couchpy",
#Port<0.2748>,
#Fun<couch_os_process.2.55582190>,
#Fun<couch_os_process.3.55582190>,5000}],
[{file,"couch_os_process.erl"},{line,207}]},
{gen_server,handle_msg,5,
[{file,"gen_server.erl"},{line,607}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,227}]}]},
[{gen_server,terminate,6,
[{file,"gen_server.erl"},{line,747}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,227}]}]}},
{ancestors,[<0.6423.0>]},
{messages,[]},
{links,[<0.6425.0>,<0.121.0>]},
{dictionary,[]},
{trap_exit,true},
{status,running},
{heap_size,1597},
{stack_size,24},
{reductions,476}],
[]]}}

=CRASH REPORT==== 16-Dec-2012::20:24:29 ===
crasher:
initial call: couch_view_group:init/1
pid: <0.6424.0>
registered_name: []
exception exit: {function_clause,
[{couch_os_process,handle_info,
[{#Port<0.2748>,
{data,
{eol,
<<"[[], [[null, 274152780170682369]]]">>}}},
{os_proc,
"/Users/soichi/domains/py27/bin/couchpy",
#Port<0.2748>,
#Fun<couch_os_process.2.55582190>,
#Fun<couch_os_process.3.55582190>,5000}],
[{file,"couch_os_process.erl"},{line,207}]},
{gen_server,handle_msg,5,
[{file,"gen_server.erl"},{line,607}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,227}]}]}
in function gen_server:terminate/6 (gen_server.erl, line 747)
ancestors: [<0.6423.0>]
messages: []
links: [<0.6425.0>,<0.121.0>]
dictionary: []
trap_exit: true
status: running
heap_size: 1597
stack_size: 24
reductions: 476
neighbours:
[error] [<0.139.0>] {error_report,<0.30.0>,
{<0.139.0>,crash_report,
[[{initial_call,
{mochiweb_acceptor,init,
['Argument__1','Argument__2','Argument__3']}},
{pid,<0.139.0>},
{registered_name,[]},
{error_info,
{error,badarg,
[{erlang,list_to_binary,
[[{couch_os_process,handle_info,
[{#Port<0.2748>,
{data,
{eol,
<<"[[], [[null, 274152780170682369]]]">>}}},
{os_proc,
"/Users/soichi/domains/py27/bin/couchpy",
#Port<0.2748>,
#Fun<couch_os_process.2.55582190>,
#Fun<couch_os_process.3.55582190>,5000}],
[{file,"couch_os_process.erl"},{line,207}]},
{gen_server,handle_msg,5,
[{file,"gen_server.erl"},{line,607}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,227}]}]],
[]},
{couch_httpd,error_info,1,
[{file,"couch_httpd.erl"},{line,779}]},
{couch_httpd,send_error,2,
[{file,"couch_httpd.erl"},{line,883}]},
{couch_httpd,handle_request_int,5,
[{file,"couch_httpd.erl"},{line,345}]},
{mochiweb_http,headers,5,
[{file,"mochiweb_http.erl"},{line,136}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,227}]}]}},
{ancestors,
[couch_httpd,couch_secondary_services,
couch_server_sup,<0.31.0>]},
{messages,[]},
{links,[<0.123.0>,#Port<0.2745>]},
{dictionary,
[{mochiweb_request_qs,[{"group","true"}]},
{mochiweb_request_cookie,[]}]},
{trap_exit,false},
{status,running},
{heap_size,2584},
{stack_size,24},
{reductions,43524}],
[]]}}

=CRASH REPORT==== 16-Dec-2012::20:24:29 ===
crasher:
initial call: mochiweb_acceptor:init/3
pid: <0.139.0>
registered_name: []
exception error: bad argument
in function list_to_binary/1
called as list_to_binary([{couch_os_process,handle_info,
[{#Port<0.2748>,
{data,
{eol,
<<"[[], [[null, 274152780170682369]]]">>}}},
{os_proc,
"/Users/soichi/domains/py27/bin/couchpy",
#Port<0.2748>,
#Fun<couch_os_process.2.55582190>,
#Fun<couch_os_process.3.55582190>,5000}],
[{file,"couch_os_process.erl"},
{line,207}]},
{gen_server,handle_msg,5,
[{file,"gen_server.erl"},{line,607}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,227}]}])
in call from couch_httpd:error_info/1 (couch_httpd.erl, line 779)
in call from couch_httpd:send_error/2 (couch_httpd.erl, line 883)
in call from couch_httpd:handle_request_int/5 (couch_httpd.erl, line 345)
in call from mochiweb_http:headers/5 (mochiweb_http.erl, line 136)
ancestors: [couch_httpd,couch_secondary_services,couch_server_sup,
<0.31.0>]
messages: []
links: [<0.123.0>,#Port<0.2745>]
dictionary: [{mochiweb_request_qs,[{"group","true"}]},
{mochiweb_request_cookie,[]}]
trap_exit: false
status: running
heap_size: 2584
stack_size: 24
reductions: 43524
neighbours:
[error] [<0.6425.0>] ** Generic server <0.6425.0> terminating
** Last message in was {'EXIT',<0.6424.0>,
{function_clause,
[{couch_os_process,handle_info,
[{#Port<0.2748>,
{data,
{eol,<<"[[], [[null, 274152780170682369]]]">>}}},
{os_proc,
"/Users/soichi/domains/py27/bin/couchpy",
#Port<0.2748>,#Fun<couch_os_process.2.55582190>,
#Fun<couch_os_process.3.55582190>,5000}],
[{file,"couch_os_process.erl"},{line,207}]},
{gen_server,handle_msg,5,
[{file,"gen_server.erl"},{line,607}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,227}]}]}}
** When Server state == {file,{file_descriptor,prim_file,{#Port<0.2747>,22}},
51}
** Reason for termination ==
** {function_clause,
[{couch_os_process,handle_info,
[{#Port<0.2748>,
{data,{eol,<<"[[], [[null, 274152780170682369]]]">>}}},
{os_proc,"/Users/soichi/domains/py27/bin/couchpy",#Port<0.2748>,
#Fun<couch_os_process.2.55582190>,
#Fun<couch_os_process.3.55582190>,5000}],
[{file,"couch_os_process.erl"},{line,207}]},
{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,607}]},
{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}

=ERROR REPORT==== 16-Dec-2012::20:24:29 ===
** Generic server <0.6425.0> terminating
** Last message in was {'EXIT',<0.6424.0>,
{function_clause,
[{couch_os_process,handle_info,
[{#Port<0.2748>,
{data,
{eol,<<"[[], [[null, 274152780170682369]]]">>}}},
{os_proc,
"/Users/soichi/domains/py27/bin/couchpy",
#Port<0.2748>,#Fun<couch_os_process.2.55582190>,
#Fun<couch_os_process.3.55582190>,5000}],
[{file,"couch_os_process.erl"},{line,207}]},
{gen_server,handle_msg,5,
[{file,"gen_server.erl"},{line,607}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,227}]}]}}
** When Server state == {file,{file_descriptor,prim_file,{#Port<0.2747>,22}},
51}
** Reason for termination ==
** {function_clause,
[{couch_os_process,handle_info,
[{#Port<0.2748>,
{data,{eol,<<"[[], [[null, 274152780170682369]]]">>}}},
{os_proc,"/Users/soichi/domains/py27/bin/couchpy",#Port<0.2748>,
#Fun<couch_os_process.2.55582190>,
#Fun<couch_os_process.3.55582190>,5000}],
[{file,"couch_os_process.erl"},{line,207}]},
{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,607}]},
{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}
[error] [<0.6425.0>] {error_report,<0.30.0>,
{<0.6425.0>,crash_report,
[[{initial_call,{couch_file,init,['Argument__1']}},
{pid,<0.6425.0>},
{registered_name,[]},
{error_info,
{exit,
{function_clause,
[{couch_os_process,handle_info,
[{#Port<0.2748>,
{data,
{eol,
<<"[[], [[null, 274152780170682369]]]">>}}},
{os_proc,
"/Users/soichi/domains/py27/bin/couchpy",
#Port<0.2748>,
#Fun<couch_os_process.2.55582190>,
#Fun<couch_os_process.3.55582190>,5000}],
[{file,"couch_os_process.erl"},{line,207}]},
{gen_server,handle_msg,5,
[{file,"gen_server.erl"},{line,607}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,227}]}]},
[{gen_server,terminate,6,
[{file,"gen_server.erl"},{line,747}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,227}]}]}},
{ancestors,[<0.6424.0>,<0.6423.0>]},
{messages,[{'EXIT',<0.6427.0>,shutdown}]},
{links,[]},
{dictionary,[]},
{trap_exit,true},
{status,running},
{heap_size,987},
{stack_size,24},
{reductions,1189}],
[]]}}

=CRASH REPORT==== 16-Dec-2012::20:24:29 ===
crasher:
initial call: couch_file:init/1
pid: <0.6425.0>
registered_name: []
exception exit: {function_clause,
[{couch_os_process,handle_info,
[{#Port<0.2748>,
{data,
{eol,
<<"[[], [[null, 274152780170682369]]]">>}}},
{os_proc,
"/Users/soichi/domains/py27/bin/couchpy",
#Port<0.2748>,
#Fun<couch_os_process.2.55582190>,
#Fun<couch_os_process.3.55582190>,5000}],
[{file,"couch_os_process.erl"},{line,207}]},
{gen_server,handle_msg,5,
[{file,"gen_server.erl"},{line,607}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,227}]}]}
in function gen_server:terminate/6 (gen_server.erl, line 747)
ancestors: [<0.6424.0>,<0.6423.0>]
messages: [{'EXIT',<0.6427.0>,shutdown}]
links: []
dictionary: []
trap_exit: true
status: running
heap_size: 987
stack_size: 24
reductions: 1189
neighbours:

Redis Blues

Hi Matthew,

Here is my situation:

I'm running Redis on Windows
I (think) I successfully install the ServiceStack pack that includes redis in the Package Manager console
*The reason I know this is because when I type "import redis" in the Python GUI, it does not return anything (no errors etc.)

So when I run friends_followers__redis_to_networkx.py (from the repository of course) I get the following error:

Traceback (most recent call last):
File "C:\Python27\python_code\friends_followers__redis_to_networkx.py", line 22, in
friend_ids = list(r.smembers(getRedisIdByScreenName(SCREEN_NAME, 'friend_ids')))
File "C:\Python27\lib\site-packages\redis-2.7.5-py2.7.egg\redis\client.py", line 1075, in smembers
return self.execute_command('SMEMBERS', name)
File "C:\Python27\lib\site-packages\redis-2.7.5-py2.7.egg\redis\client.py", line 381, in execute_command
connection.send_command(_args)
File "C:\Python27\lib\site-packages\redis-2.7.5-py2.7.egg\redis\connection.py", line 299, in send_command
self.send_packed_command(self.pack_command(_args))
File "C:\Python27\lib\site-packages\redis-2.7.5-py2.7.egg\redis\connection.py", line 281, in send_packed_command
self.connect()
File "C:\Python27\lib\site-packages\redis-2.7.5-py2.7.egg\redis\connection.py", line 229, in connect
raise ConnectionError(self._error_message(e))
ConnectionError: Error 10061 connecting localhost:6379. No connection could be made because the target machine actively refused it.

I'm practically stuck at this point because I don't know where to go to open up the redis console and begin to change the port/configure redis etc.

I know that there are some redis files under the Python directory that has connection.py and init.py etc. I tried running those to make sure the redis service was started and connected but it doesn't change anything. Wondering if I'm completely in the wrong place in regards to redis.

Can you config redis directly from the Python GUI or do you need open another console (and where would that be?)

Thanks much!

JSON Problem with mailboxes__jsonify_mbox.py

When executing the provided script, I encounter following error:

Traceback (most recent call last):
File "C:\Users\Christian\workspace\Mail Datamining\mailboxes__jsonify_mbox.py", line 89, in
json.dump(json_msgs,open(OUT_FILE, 'wb'), indent=4)
File "C:\Python27\lib\json__init__.py", line 181, in dump
for chunk in iterable:
File "C:\Python27\lib\json\encoder.py", line 436, in _iterencode
o = _default(o)
File "C:\Python27\lib\json\encoder.py", line 178, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: <generator object gen_json_msgs at 0x024BFB98> is not JSON serializable

I'm new to Python, but it seems that the code is trying to serialize the function itself instead of the objects it returns.

Thank you very much, for any help!!

Sampling technique doesn't work

In twitter__util.py, line 107, the code lst = lst[:int(len(lst) * sample)] fails to trim the lists as intended because it's assigning the trimmed list to lst, not screen_names or user_ids.

Issue with 'the_tweet_harvest_timeline.py' code

Hi Matthew,

First of all I love the book Mining the Social Web and been working through the examples. I tried to run the harvest script and get the following error:

error: [Errno 10061] No connection could be made because the target machine actively refused it

Any ideas why connection is being refused?

I'm running python 27 on a windows 7 machine.

Thanks.

Regards,
Paula

twitter__util.py does not work on example 4-3 (friends_followers__get_friends_refactored.py)

Hi.
twitter__util.py does not work on example 4-3.
I'm using latest version of twitter__util.py on GitHub.

I wrote blow is error message. It occurs 404 error.

xxxxxxx-MacBook-Air:~ onozka$ python friends_followers__get_friends_refactored.py
Traceback (most recent call last):
File "friends_followers__get_friends_refactored.py", line 37, in
ids = getFriendIds(sys.argv[0], friends_limit=10000)
File "friends_followers__get_friends_refactored.py", line 26, in getFriendIds
response = makeTwitterRequest(t, t.friends.ids, **params)
File "/Users/pamsuke/twitter__util.py", line 21, in makeTwitterRequest
wait_period = handleTwitterHTTPError(e, t, wait_period)
File "/Users/pamsuke/twitter__util.py", line 61, in handleTwitterHTTPError
raise e
twitter.api.TwitterHTTPError: Twitter sent status 404 for URL: 1/friends/ids.json using parameters: (cursor=-1&oauth_consumer_key=xxxxxxxx&oauth_nonce=xxxxxxxxx&oauth_signature_method=HMAC-SHA1&oauth_timestamp=1325677562&oauth_token=xxxxxxx&oauth_version=1.0&screen_name=friends_followers__get_friends_refactored.py&oauth_signature=xxxxxxxxxxxx)
details: {"error":"Not found","request":"/1/friends/ids.json?cursor=-1&oauth_consumer_key=xxxxxxxxx&oauth_nonce=xxxxxxxxxx&oauth_signature_method=HMAC-SHA1&oauth_timestamp=1325677562&oauth_token=xxxxxxxxxx&oauth_version=1.0&screen_name=friends_followers__get_friends_refactored.py&oauth_signature=xxxxxxxxxx"}

Same error occurs in any other examples which imports twitter__util.py.

How to fix twitter__util.py? X(

SyntaxError in Ex 1-12 and when using 'except IndexError:'

Hi :)

I'm working through the examples in MTSW in example 1-12

OUT="snl_search_results.dot
try:
nx.drawing.write_dot(g, OUT) except Importer, e:
dot=['"%s" -> "%s" [tweet_id=%s]' % (n1, n2, g[n1][n2]['tweet_id'])
for n1,n2 in g.edges()]
f=open(OUT, 'w')
f.write(strict digraph {\n%s\n}' % (';\n' .join(dot),))
f.close()
...:

File "", line 2
nx.drawing.write_dot(g, OUT)

except Importer, e:
^
SyntaxError: invalid syntax

Then I moved to a different example 2-2
import sys
import urllib2
import HTMLParser
from BeautifulSoup import BeautifulSoup
URL=sys.argv[1]

and get the following error

IndexError Traceback (most recent call last)
/Library/Frameworks/ in ()
----> 1 URL=sys.argv[1]

IndexError: list index out of range

can you help please?

Thanks in advance

Lee

Example 1-3 fails with 404 errors

Hello,
I've done some searching on both this site, as well as the site for the twitter package (version 1.6.1), and the twitter API, but I'm unclear as to why this code fails to execute properly. I'm on Snow Leopard, using the Enthought python distribution (32-bit) with python 2.7.1, and entering the code in the IDLE shell. Executing the code results in the following error message:

Traceback (most recent call last):
File "<pyshell#4>", line 1, in
trends = twitter_search.trends()
File "/Library/Frameworks/Python.framework/Versions/7.0/lib/python2.7/site-packages/twitter/api.py", line 153, in call
return self._handle_response(req, uri, arg_data)
File "/Library/Frameworks/Python.framework/Versions/7.0/lib/python2.7/site-packages/twitter/api.py", line 168, in _handle_response
raise TwitterHTTPError(e, uri, self.format, arg_data)
TwitterHTTPError: Twitter sent status 404 for URL: trends.json using parameters: ()
details:

<title>The page you were looking for doesn't exist (404)</title> <style type="text/css"> body { background-color: #fff; font-family: Helvetica, sans-serif; } h1 { margin: 10px 0; } img { border: 0; } </style>

The page you were looking for doesn't exist.

You may have mistyped the address or the page may have moved.

My suspicion is that something has changed in how the search api handles trends, but I'm not very experienced with api's, so I may have misunderstood what the search api is supposed to be doing.

I began to raise the issue on an existing thread in the twitter package repository (python-twitter-tools/twitter#41 (comment)) but as I said, I think this problem is separate from the one experienced there, involving the search api, and setting the api_version flag to None.

Any help, or even confirmation that its a change in how the twitter api is handling trends (thus suggesting I'm not an idiot), would be appreciated.
Thanks!

Graphing 2nd degree friendships with redis & networkx

I have gone through your book and found it to be a really awesome resource as I try to teach myself more about this fascinating subject. I also really appreciated the updates to make the code work with the twitter 1.1 api, as that had been causing me some problems initially as I worked through some of the examples.

After getting all the example scripts working now, I have started making some of my own modifications as a way to learn more (still new to programming and development in general) I have been playing around with the script "friends_followers__redis_to_networkx" and after reading a bit of the networkx documentation I figured out how to have it write the files as .gexf instead of pickling them, since I have also been playing around with Gephi and like to view the results using that app. One thing I would still like to figure out, but am kind of stuck on, is how to modify the script to include the 2nd degree nodes in the graph file. I know that the graphs could quickly become very large, I'd like to be able to figure out how. I have tried modifying the for loops, to iterate through all friends of friends for the ID in redis and add them to the graph as well, but I must be doing it wrong, since I always end up with the same output when viewing the graph file (ie I can see all the 1st degree friendships, as well as edges that exist between those nodes, but I don't see any of the actual 2nd degree nodes themselves) I realize this is what the script is designed to do, but I wondered if you could give me any help making this tweak to it.

In any case, thanks for all your work on this awesome book, it has really been a great help getting started!

Trend_timeline not working

I'm trying to run the code written here:
https://github.com/ptwobrussell/Recipes-for-Mining-Twitter/blob/master/recipe__get_search_results_for_trending_topic.py

And get this:

Traceback (most recent call last):
File "C:\Python27\trend_timeline.py", line 19, in
trends = json.dumps(t.trends(), indent=1)
File "C:\Documents and Settings\Zanzamar\Datos de programa\Python\Python27\site-packages\twitter\api.py", line 165, in call
return self._handle_response(req, uri, arg_data)
File "C:\Documents and Settings\Zanzamar\Datos de programa\Python\Python27\site-packages\twitter\api.py", line 180, in _handle_response
raise TwitterHTTPError(e, uri, self.format, arg_data)
TwitterHTTPError: Twitter sent status 404 for URL: 1/trends.json using parameters: ()
details:

<title>Twitter / ?</title>

Home →

Sorry, that page doesn’t exist!

Thanks for noticing—we're going to fix it up and have things back to normal soon.

Search for a username, first or last name

Bahasa Indonesia
Bahasa Melayu
Deutsch
English
Español
Filipino

.... output truncated as it is a very long code...

I'm using ActivePython with Python 2.7 (PC), installed twitter and json packs.

NameError 'search results' not defined

Example 1-11. I keep getting the following error. Can you please help???? Has anyone else received this error.

import networkx as nx
import re
g = nx.DiGraph()

all_tweets = [ tweet
... for page in search_results
... for tweet in page["results"] ]
Traceback (most recent call last):
File "", line 2, in
NameError: name 'search_results' is not defined

Example 2-7,Why I get a null result?

Hello!
I use the command about "python microformats_foodnetwork_hrecipe.py http://www.foodnetwork.com/recipes/alton-brown/pad-thai-recipe/index.html"

Result:I get a '{}',why i don't get the parse result!

thanks！

Same issue with 'the_tweet_search.py' code

Hi Matthew,

I thought I could move on in the book but came across the same problem when running the tweet search code.

I entered teaparty as the argument in the PythonWin argument line and got the same error.

error: [Errno 10061] No connection could be made because the target machine actively refused it

This is the error code that proceeded the above:

File "C:\Python27\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py", line 325, in RunScript
exec codeObject in main.dict
File "C:\Python27\search.py", line 23, in
db = server.create(DB)
File "build\bdist.win32\egg\couchdb\client.py", line 193, in create
self.resource.put_json(validate_dbname(name))
File "build\bdist.win32\egg\couchdb\http.py", line 405, in put_json
status, headers, data = self.put(_a, *_k)
File "build\bdist.win32\egg\couchdb\http.py", line 384, in put
return self._request('PUT', path, body=body, headers=headers, **params)
File "build\bdist.win32\egg\couchdb\http.py", line 419, in _request
credentials=self.credentials)
File "build\bdist.win32\egg\couchdb\http.py", line 239, in request
resp = _try_request_with_retries(iter(self.retry_delays))
File "build\bdist.win32\egg\couchdb\http.py", line 205, in _try_request_with_retries
raise e

Would really love to run the tweet search code on other topics. Please advise.

Thanks.

Regards,
Paula

Error for example 5-3

Hi Russell,

I had error when ran example 5-3, here is the error I got:
** Reason for termination ==
** {'EXIT',
{{badmatch,
{error,{bad_return_value,{os_process_error,{exit_status,4}}}}},
[{couch_query_servers,new_process,3,
[{file,"d:/relax/couchdb/src/couchdb/couch_query_servers.erl"},
{line,477}]},
{couch_query_servers,lang_proc,3,
[{file,"d:/relax/couchdb/src/couchdb/couch_query_servers.erl"},
{line,462}]},
{couch_query_servers,handle_call,3,
[{file,"d:/relax/couchdb/src/couchdb/couch_query_servers.erl"},
{line,334}]},
{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,578}]},
{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}}

Any reason? Thanks

Protovis example at the top of page 17

Hi Matthew

I've been trying out the protovis example at the top of page 17 and the generated file (which incidentally is named ....HTML.HTML) refuses to display anything (in FF 3.6.2).

I can't see how to attach the file to this post. I would be very grateful for your comments on the file as I'm rather stuck, so could you please tell me how to get it to you.- perhaps via the oreilly address in the book i.e. http://getsatisfaction.com/oreilly or a direct email?

Many Thanks

David Rush

Question about Figure 5-1

Does someone know how to draw Figure 5-1. "The frequency of entities that have been retweeted by @timoreilly for a sample of recent tweets"?

e.g.
Is the figure drawn by Protovis?
if so, how can I use Protovis to draw the figure?

A step by step explanation will be highly appreciated !

Example 9-1 not working

I've tried my App ID/API Key, my App Secret, and my own Facebook id on the code of example 9-1
in the CLIENT_ID variable.

Every time the python interpreter gets the command

webbrowser.open('https://graph.facebook.com/oauth/authorize?'
+ urllib.urlencode(args))

I get in the browser the message

{
"error": {
"message": "Error validating application.",
"type": "OAuthException"
}
}

Thanks for your attention, jcprandini.

the_tweet__search with geocode

import sys
import twitter
from twitter_login import login

q = "xxxxx"
count = 500
geocode = 'xx.055207,xx.745055,10mi'
MAX_PAGES = 5

t = login()

search_results = t.search.tweets(q=q, geocode=geocode, count=count)
tweets = search_results['statuses']

for _ in range(MAX_PAGES-1):
next_results = search_results['search_metadata']['next_results']
kwargs = dict([ kv.split('=') for kv in next_results[1:].split("&") ]) # Create a dictionary from the query string params
search_results = twitter_api.search.tweets(**kwargs)
tweets += search_results['statuses']
if len(search_results['statuses']) == 0:
break

print 'Fetched %i tweets so far' % (len(tweets),)

import json
print json.dumps(statuses[0:1], indent=1)

tweets = [ status['text'] for status in statuses ]

print tweets[0]

ERROR:
next_results = search_results['search_metadata']['next_results']
KeyError: 'next_results'

Unable to fetch more than 75,000 ids

Hi,

Using friends_followers__friend_follower_symmetry.py, I am unable to fetch more than 75,000 follower ids.

The error message is
twitter.api.TwitterHTTPError: Twitter sent status 404 for URL: 1.1/account/rate_limit_status.json using parameters: (oauth_consumer_key=XXXX&oauth_nonce=XXXX&oauth_signature_method=HMAC-SHA1&oauth_timestamp=1369338285&oauth_token=XXXX&oauth_version=1.0&oauth_signature=XXXX)
details: {"errors":[{"message":"Sorry, that page does not exist","code":34}]}

Looks like the retry code of handleTwitterHTTPError goes straight to the last else
else:
raise e

I think I have not reached the 350 requests/hour rate.

How do I fix this error?

Thanks

memory error on example 3-5

Hi,

I would really appreciate your help on the following issues:

In example 3-6: After adding in Couch DB configuration path for couchpy( asbolute path "C:\Python27\Scripts\couchpy.exe" ) and restarting the service I executed the following code :

import sys
import couchdb
from couchdb.design import ViewDefinition
try:
... import jsonlib2 as json
... except ImportError:
... import json
...
DB = 'enronami'
START_DATE = '1900-01-01' #YYYY-MM-DD
END_DATE = '2100-01-01' #YYYY-MM-DD
def dateTimeToDocMapper(doc):
... from dateutil.parser import parse
... from datetime import datetime as dt
... if doc.get('Date'):
... _date = list(dt.timetuple(parse(doc['Date']))[:-3])
... yield (_date, doc)
...
view = ViewDefinition('index', 'by_date_time', dateTimeToDocMapper,
... language='python')
Traceback (most recent call last):
File "", line 2, in
File "C:\Python27\lib\site-packages\couchdb-0.8-py2.7.egg\couchdb\design.py",
line 93, in init
map_fun = _strip_decorators(getsource(map_fun).rstrip())
File "C:\Python27\lib\inspect.py", line 699, in getsource
lines, lnum = getsourcelines(object)
File "C:\Python27\lib\inspect.py", line 688, in getsourcelines
lines, lnum = findsource(object)
File "C:\Python27\lib\inspect.py", line 529, in findsource
raise IOError('source code not available')
IOError: source code not available

Also in example 3-5 I got the following error:

db.update(docs, all_or_nothing=True)
Traceback (most recent call last):
File "", line 1, in
File "C:\Python27\lib\site-packages\couchdb-0.8-py2.7.egg\couchdb\client.py",
line 733, in update
_, _, data = self.resource.post_json('_bulk_docs', body=content)
File "C:\Python27\lib\site-packages\couchdb-0.8-py2.7.egg\couchdb\http.py", li
ne 399, in post_json
status, headers, data = self.post(_a, *_k)
File "C:\Python27\lib\site-packages\couchdb-0.8-py2.7.egg\couchdb\http.py", li
ne 381, in post
**params)
File "C:\Python27\lib\site-packages\couchdb-0.8-py2.7.egg\couchdb\http.py", li
ne 419, in _request
credentials=self.credentials)
File "C:\Python27\lib\site-packages\couchdb-0.8-py2.7.egg\couchdb\http.py", li
ne 176, in request
body = json.encode(body).encode('utf-8')
MemoryError

But this error atleast temporarily I was able to solve by trimming enron.mbox.json to 2000 objects instead of the full size which had 41000 json objects.

With Regards,
Amitabh

Redis difficulties

Hi, having problems getting Redis to work with Windows. Every time I try and run your code containing Redis I get an error message - can't connect to port 6379. From looking at some posts online I'm guessing that for some reason Redis isn't installed where it would usually be expected to be found - is there a way I can check whee it is installed and move it to port 6379. Thanks for your help - and great book btw, really enjoying it!

Example 5-2 "the_tweet__extract_tweet_entities.py"

Hi,

I just tried to run "the_tweet__extract_tweet_entities.py", but it returned
"{"errors":[{"message":"Bad Authentication data","code":215}]}",
even though other examples which need login are working correctly so far.

I tried to put "t = login()" instead of "t = twitter.Twitter(domain='api.twitter.com', api_version='1.1')", and it looks working now.

Thank you.

Example 1-3 (Page 5)

Just got the book today and Example 1-3 does not appear to work on Page 5. I am brand new to python, but it appears that "http://search.twitter.com/trends.json" is no longer a page, or call, that exists. Am I missing something?

Thanks!
Chris

Hi there! We're gonna get you all set up to use MiningTheSocialWeb.

wait for a long time, and then come out "urllib2.URLError: <urlopen error [Errno 110] Connection timed out>" at last

example 1-3 does not work for me even when I go with domain="api.twitter.com"

Looks like api.twitter.com is no longer current.

        twitter_search = twitter.Twitter(domain="api.twitter.com")
        twitter_search.domain
        'api.twitter.com'
        trends = twitter_search.trends()

Traceback (most recent call last):
File "", line 1, in
trends = twittersearch.trends()
File "/opt/ActivePython-2.7/lib/python2.7/site-packages/twitter-1.8.0-py2.7.egg/twitter/api.py", line 167, in _call
return self._handle_response(req, uri, arg_data)
File "/opt/ActivePython-2.7/lib/python2.7/site-packages/twitter-1.8.0-py2.7.egg/twitter/api.py", line 182, in _handle_response
raise TwitterHTTPError(e, uri, self.format, arg_data)
TwitterHTTPError: Twitter sent status 404 for URL: 1/trends.json using parameters: ()
details: {"errors":[{"message":"Sorry, that page does not exist","code":34}]}

        Now from the developer page I see that the new place for grabbing trends is api.twitter.com/1/ . When I try that it still breaks.
        twitter_search = twitter.Twitter(domain="api.twitter.com/1/")
        twitter_search.domain
        'api.twitter.com/1/'
        trends = twitter_search.trends()

Traceback (most recent call last):
File "", line 1, in
trends = twittersearch.trends()
File "/opt/ActivePython-2.7/lib/python2.7/site-packages/twitter-1.8.0-py2.7.egg/twitter/api.py", line 167, in _call
return self._handle_response(req, uri, arg_data)
File "/opt/ActivePython-2.7/lib/python2.7/site-packages/twitter-1.8.0-py2.7.egg/twitter/api.py", line 182, in _handle_response
raise TwitterHTTPError(e, uri, self.format, arg_data)
TwitterHTTPError: Twitter sent status 404 for URL: trends.json using parameters: ()
details: {"errors":[{"message":"Sorry, that page does not exist","code":34}]}

Ex 4.10 fails with Unexpected String or buffer

I've been able to follow the examples well enough until 4.10, when line 23
id_for_screen_name = json.loads(r.get(getRedisIdByScreenName(SCREEN_NAME,'info.json'))) ['id']
throws "TypeError: expected string or buffer" down in the decoder.py script.

I have friend and follower sets loaded in Redis for two screen_name handles, and I checked to make sure I'm spelling the screen name properly in my ex 4.1 script -- I checked in Komodo and it's properly loading the friend_ids list in line 22. What am I missing?

Here's the traceback:

Traceback (most recent call last):
File "C:\Python27\Scripts\kevin_py_scripts\FFredis2nx.py", line 23, in
id_for_screen_name = json.loads(r.get(getRedisIdByScreenName(SCREEN_NAME,'info.json'))) ['id']
File "C:\Python27\lib\json__init__.py", line 326, in loads
return _default_decoder.decode(s)
File "C:\Python27\lib\json\decoder.py", line 360, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
TypeError: expected string or buffer

Arrggg. Twitter Trends

I'm sorry to beat a dead horse... I've read the errata, and tried to understand how example 1.3 has changed - and then seemingly changed again. But it looks so dead simple, but nothing I do seems to make it work.

Here is where I've ended up:

import twitter
twitter_api=twitter.Twitter(domain="api.twitter.com", api_version='1')
trends=twitter_api.trends()

Traceback (most recent call last):
File "", line 1, in
File "build/bdist.macosx-10.7-intel/egg/twitter/api.py", line 165, in call
File "build/bdist.macosx-10.7-intel/egg/twitter/api.py", line 180, in _handle_response
twitter.api.TwitterHTTPError: Twitter sent status 404 for URL: 1/trends.json using parameters: ()
details:

(followed by a ton of HTML and CSS... )

Reading the API documentation, I'm pretty sure I need to be including some parameters in the line:

twitter_api=twitter.Twitter(domain="api.twitter.com", api_version='1')

but no matter what I try it just doesn't work. I'm sure there's a simple solution, but I've bashed my head against it so long that I just can't see it.

Anyone care to show me what I'm doing wrong?

Thanks!
John

enron json file

I can't find the json data for chapter 3. Is it supposed to be given?? Or should we create it from scratch (from cmu cs webpage)?

the_tweet__count_entities_in_tweets.py dose not work (example 5-4).

Hi. I have a problem again. Sorry...

When I invoked script ($ python the_tweet__count_entities_in_tweets.py tweets-user-timeline-onozka 5), an error occurs.
I wrote the error message below.

Please tell me how to solve this error.
(Both example 5-2 and 5-3 worked correctly!)

And please tell me what [(row.key, row.value) for row in db.view('index/entity_count_by_doc', group=True)] means..

Traceback (most recent call last):
File "the_tweet__count_entities_in_tweets.py", line 83, in
entities_freqs = sorted([(row.key, row.value) for row in db.view('index/entity_count_by_doc', group=True)], key=lambda x: x[1], reverse=True)
File "build/bdist.macosx-10.6-intel/egg/couchdb/client.py", line 984, in iter
File "build/bdist.macosx-10.6-intel/egg/couchdb/client.py", line 1003, in rows
File "build/bdist.macosx-10.6-intel/egg/couchdb/client.py", line 990, in _fetch
File "build/bdist.macosx-10.6-intel/egg/couchdb/client.py", line 880, in _exec
File "build/bdist.macosx-10.6-intel/egg/couchdb/http.py", line 393, in get_json
File "build/bdist.macosx-10.6-intel/egg/couchdb/http.py", line 374, in get
File "build/bdist.macosx-10.6-intel/egg/couchdb/http.py", line 419, in _request
File "build/bdist.macosx-10.6-intel/egg/couchdb/http.py", line 310, in request
couchdb.http.ServerError: (500, (u'EXIT', u'{{badmatch,[]},\n [{couch_query_servers,new_process,3},\n {couch_query_servers,lang_proc,3},\n {couch_query_servers,handle_call,3},\n {gen_server,handle_msg,5},\n {proc_lib,init_p_do_apply,3}]}'))

Sorry... X(

Example 6-1, Exception: Invalid field name: Company!

Just did a fresh clone on MacOS X 10.8.2 (Mountain Lion) and hitting problems in example 6-1, linkedin__analyze_companies.py: Exception: Invalid field name: Company!

See my fix below...
I am running python2.7 within virtualenv and have prettytable==0.6.1.
here's the stack trace:

Traceback (most recent call last):
File "/Applications/eclipse/plugins/org.python.pydev_2.7.1.2012100913/pysrc/pydevd.py", line 1397, in
debugger.run(setup['file'], None, None)
File "/Applications/eclipse/plugins/org.python.pydev_2.7.1.2012100913/pysrc/pydevd.py", line 1090, in run
pydev_imports.execfile(file, globals, locals) #execute the script
File "/Users/jeff/Documents/workspace/Mining-the-Social-Web/python_code/linkedin__analyze_companies.py", line 23, in
pt = PrettyTable(fields=['Company', 'Freq'])
File "/Users/jeff/.virtualenvs/kitsink/lib/python2.7/site-packages/prettytable.py", line 125, in init
self._validate_option(option, kwargs[option])
File "/Users/jeff/.virtualenvs/kitsink/lib/python2.7/site-packages/prettytable.py", line 210, in _validate_option
self._validate_all_field_names(option, val)
File "/Users/jeff/.virtualenvs/kitsink/lib/python2.7/site-packages/prettytable.py", line 285, in _validate_all_field_names
self._validate_field_name(name, x)
File "/Users/jeff/.virtualenvs/kitsink/lib/python2.7/site-packages/prettytable.py", line 280, in _validate_field_name
raise Exception("Invalid field name: %s!" % val)
Exception: Invalid field name: Company!

I found the following 3 changes on lines 23-28 were necessary to make this module work:

23: pt = PrettyTable(['Company', 'Freq'])

25: pt.align['Company'] = 'l'

28: print(pt)

ptwobrussell / mining-the-social-web Goto Github PK

mining-the-social-web's People

Contributors

Stargazers

Watchers

Forkers

mining-the-social-web's Issues

Go to http://twitter.com/apps/new to create an app and get these items

See https://dev.twitter.com/docs/auth/oauth for more information on Twitter's OAuth implementation

I have got these credentials.

With an authenticated twitter_api in existence, you can now use it to query Twitter resources as usual.

However, the trends resource is cleaned up a bit in v1.1, so requests are a bit simpler than in the latest

printing. See https://dev.twitter.com/docs/api/1.1/get/trends/place

The Yahoo! Where On Earth ID for the entire world is 1

Prefix id with the underscore for query string parameterization.

Without the underscore, it's appended to the URL itself

------error

The page you were looking for doesn't exist.

Sorry, that page doesn’t exist!

Recommend Projects

Recommend Topics

Recommend Org