Code Monkey home page Code Monkey logo

radices's People

Contributors

flxvctr avatar thiesben avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

radices's Issues

What should our seed nodes be?

Imho we've some options here:

  1. Random from Axel's German accounts
  2. The list compiled by HBI comprising 'public speakers'

Both would be interesting, however we have to start with one.

Operational Error in db

Exception in thread 15775749:
Traceback (most recent call last):
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1170, in _execute_context
    context)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/sqlalchemy/dialects/mysql/mysqldb.py", line 105, in do_executemany
    rowcount = cursor.executemany(statement, parameters)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/cursors.py", line 197, in executemany
    self._get_db().encoding)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/cursors.py", line 234, in _do_execute_many
    rows += self.execute(sql + postfix)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/cursors.py", line 170, in execute
    result = self._query(query)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/cursors.py", line 328, in _query
    conn.query(q)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/connections.py", line 516, in query
    self._affected_rows = self._read_query_result(unbuffered=unbuffered)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/connections.py", line 727, in _read_query_result
    result.read()
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/connections.py", line 1066, in read
    first_packet = self.connection._read_packet()
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/connections.py", line 683, in _read_packet
    packet.check_error()
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/protocol.py", line 220, in check_error
    err.raise_mysql_exception(self._data)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/err.py", line 109, in raise_mysql_exception
    raise errorclass(errno, errval)
pymysql.err.IntegrityError: (1062, "Duplicate entry '918408053074153472' for key 'PRIMARY'")

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/muench/HBI/Projects/SparseTwitter/collector.py", line 623, in work_through_seed_get_next_seed
    index=False, con=self.dbh.engine)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pandas/core/generic.py", line 2130, in to_sql
    dtype=dtype)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pandas/io/sql.py", line 450, in to_sql
    chunksize=chunksize, dtype=dtype)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pandas/io/sql.py", line 1127, in to_sql
    table.insert(chunksize)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pandas/io/sql.py", line 641, in insert
    self._execute_insert(conn, keys, chunk_iter)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pandas/io/sql.py", line 616, in _execute_insert
    conn.execute(self.insert_statement(), data)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 948, in execute
    return meth(self, multiparams, params)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/sqlalchemy/sql/elements.py", line 269, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1060, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1200, in _execute_context
    context)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1413, in _handle_dbapi_exception
    exc_info
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 265, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 248, in reraise
    raise value.with_traceback(tb)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1170, in _execute_context
    context)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/sqlalchemy/dialects/mysql/mysqldb.py", line 105, in do_executemany
    rowcount = cursor.executemany(statement, parameters)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/cursors.py", line 197, in executemany
    self._get_db().encoding)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/cursors.py", line 234, in _do_execute_many
    rows += self.execute(sql + postfix)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/cursors.py", line 170, in execute
    result = self._query(query)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/cursors.py", line 328, in _query
    conn.query(q)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/connections.py", line 516, in query
    self._affected_rows = self._read_query_result(unbuffered=unbuffered)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/connections.py", line 727, in _read_query_result
    result.read()
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/connections.py", line 1066, in read
    first_packet = self.connection._read_packet()
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/connections.py", line 683, in _read_packet
    packet.check_error()
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/protocol.py", line 220, in check_error
    err.raise_mysql_exception(self._data)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/err.py", line 109, in raise_mysql_exception
    raise errorclass(errno, errval)
sqlalchemy.exc.IntegrityError: (pymysql.err.IntegrityError) (1062, "Duplicate entry '918408053074153472' for key 'PRIMARY'") [SQL: 'INSERT INTO user_details (created_at, followers_count, id, lang, statuses_count) VALUES (%(created_at)s, %(followers_count)s, %(id)s, %(lang)s, %(statuses_count)s)'] [parameters: ({'created_at': 'Thu Oct 12 09:28:37 +0000 2017', 'followers_count': 6621, 'id': 918408053074153472, 'lang': 'de', 'statuses_count': 2126}, {'created_at': 'Wed Feb 26 11:17:10 +0000 2014', 'followers_count': 9738, 'id': 2362527018, 'lang': 'de', 'statuses_count': 4161}, {'created_at': 'Thu Mar 19 10:08:49 +0000 2009', 'followers_count': 313, 'id': 25265816, 'lang': 'de', 'statuses_count': 1346}, {'created_at': 'Sun Apr 06 10:09:07 +0000 2008', 'followers_count': 1898, 'id': 14314684, 'lang': 'de', 'statuses_count': 57136}, {'created_at': 'Sat Sep 05 09:24:45 +0000 2015', 'followers_count': 5433, 'id': 3554009357, 'lang': 'de', 'statuses_count': 1793}, {'created_at': 'Tue Jun 09 20:17:13 +0000 2009', 'followers_count': 1048, 'id': 45932478, 'lang': 'de', 'statuses_count': 38493}, {'created_at': 'Sun May 10 20:49:03 +0000 2009', 'followers_count': 2820, 'id': 39108885, 'lang': 'de', 'statuses_count': 28330}, {'created_at': 'Tue Sep 07 18:12:42 +0000 2010', 'followers_count': 132957, 'id': 188006776, 'lang': 'de', 'statuses_count': 11213}  ... displaying 10 of 441 total bound parameter sets ...  {'created_at': 'Sun Mar 16 11:56:41 +0000 2008', 'followers_count': 1365, 'id': 14157169, 'lang': 'de', 'statuses_count': 4446}, {'created_at': 'Thu Jan 03 12:43:35 +0000 2008', 'followers_count': 549, 'id': 11795412, 'lang': 'de', 'statuses_count': 2})] (Background on this error at: http://sqlalche.me/e/gkpj)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
    context)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 509, in do_execute
    cursor.execute(statement, parameters)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/cursors.py", line 170, in execute
    result = self._query(query)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/cursors.py", line 328, in _query
    conn.query(q)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/connections.py", line 516, in query
    self._affected_rows = self._read_query_result(unbuffered=unbuffered)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/connections.py", line 727, in _read_query_result
    result.read()
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/connections.py", line 1066, in read
    first_packet = self.connection._read_packet()
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/connections.py", line 683, in _read_packet
    packet.check_error()
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/protocol.py", line 220, in check_error
    err.raise_mysql_exception(self._data)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/err.py", line 109, in raise_mysql_exception
    raise errorclass(errno, errval)
pymysql.err.OperationalError: (1213, 'Deadlock found when trying to get lock; try restarting transaction')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/muench/.pyenv/versions/3.6.6/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/Users/muench/HBI/Projects/SparseTwitter/collector.py", line 56, in run
    raise self.err
  File "/Users/muench/HBI/Projects/SparseTwitter/collector.py", line 53, in run
    mp.Process.run(self)
  File "/Users/muench/.pyenv/versions/3.6.6/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/muench/HBI/Projects/SparseTwitter/collector.py", line 634, in work_through_seed_get_next_seed
    self.dbh.engine.execute(query)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 2075, in execute
    return connection.execute(statement, *multiparams, **params)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 942, in execute
    return self._execute_text(object, multiparams, params)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1104, in _execute_text
    statement, parameters
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1200, in _execute_context
    context)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1413, in _handle_dbapi_exception
    exc_info
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 265, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 248, in reraise
    raise value.with_traceback(tb)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
    context)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 509, in do_execute
    cursor.execute(statement, parameters)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/cursors.py", line 170, in execute
    result = self._query(query)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/cursors.py", line 328, in _query
    conn.query(q)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/connections.py", line 516, in query
    self._affected_rows = self._read_query_result(unbuffered=unbuffered)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/connections.py", line 727, in _read_query_result
    result.read()
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/connections.py", line 1066, in read
    first_packet = self.connection._read_packet()
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/connections.py", line 683, in _read_packet
    packet.check_error()
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/protocol.py", line 220, in check_error
    err.raise_mysql_exception(self._data)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/err.py", line 109, in raise_mysql_exception
    raise errorclass(errno, errval)
sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction') [SQL: 'REPLACE INTO user_details SELECT *, CURRENT_TIMESTAMP FROM temp_b9f5ee95_e4fb_45ec_8953_a22faa2b4cad;'] (Background on this error at: http://sqlalche.me/e/e3q8)```

add verbose command line option

e.g. for suppressing printouts such as

Token starting with 4901 not tried yet. Trying.
REMAINING CALLS FOR /friends/ids WITH TOKEN STARTING WITH 4901:  0
Attempt with next available token.
REMAINING CALLS FOR /friends/ids WITH TOKEN STARTING WITH 8366:  7
Token starting with 8366 not tried yet. Trying.
REMAINING CALLS FOR /friends/ids WITH TOKEN STARTING WITH 8366:  5

mp.dummy has no timeout error

Traceback (most recent call last):
File "start.py", line 98, in
lang=args.language, test_fail=args.fail)
File "start.py", line 29, in main_loop
raise mp.TimeoutError
AttributeError: module 'multiprocessing.dummy' has no attribute 'TimeoutError'

Organise servers

We need some Linux (most likely we'll use ubuntu) servers to parallelise the data collection.

We can ask Sebastian to set up some VM, but that only takes us not far beyond initial development. In the end we either can ask Axel Bruns if we can use Nectar ressources as part of a collaboration or Cornelius whether it'd be possible to rent something from Google/Amazon and the like. Or maybe Cornelius has other ideas.

Find possible outlets

We have to do some research on what publications/conferences could be interested in this kind of research.

Handle exception if no friends found in user_details because there were no friends with right language

Exception in thread 1621528116:
Traceback (most recent call last):
  File "/Users/muench/.pyenv/versions/3.6.6/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/Users/muench/HBI/Projects/SparseTwitter/collector.py", line 56, in run
    raise self.err
  File "/Users/muench/HBI/Projects/SparseTwitter/collector.py", line 53, in run
    mp.Process.run(self)
  File "/Users/muench/.pyenv/versions/3.6.6/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/muench/HBI/Projects/SparseTwitter/collector.py", line 643, in work_through_seed_get_next_seed
    == max_follower_count]['id'].values[0]
IndexError: index 0 is out of bounds for axis 0 with size 0

Conceptualise architecture of data crawler

Because we have to parallelise the data collection we need some central database that several crawlers can write to and read from at the same time.

Imho possible candidates would be MySQL or Google BigQuery

avoid multiple entries or catch exception differently after restart

Encountered exception in work_through_seed_get_next_seed((<collector.Coordinator object at 0x7ff7921c25c0>,), {'seed': 13951532, 'select': ['contributors_enabled', 'created_at', 'defa
ult_profile', 'default_profile_image', 'description', 'entities_description_urls', 'entities_url_urls', 'favourites_count', 'follow_request_sent', 'followers_count', 'following', 'fri
ends_count', 'geo_enabled', 'has_extended_profile', 'id', 'id_str', 'is_translation_enabled', 'is_translator', 'lang', 'listed_count', 'location', 'name', 'needs_phone_verification',
'notifications', 'profile_background_color', 'profile_background_image_url', 'profile_background_image_url_https', 'profile_background_tile', 'profile_banner_url', 'profile_image_url'
, 'profile_image_url_https', 'profile_link_color', 'profile_sidebar_border_color', 'profile_sidebar_fill_color', 'profile_text_color', 'profile_use_background_image', 'protected', 'sc
reen_name', 'status_contributors', 'status_coordinates', 'status_coordinates_coordinates', 'status_coordinates_type', 'status_created_at', 'status_entities_hashtags', 'status_entities
_media', 'status_entities_symbols', 'status_entities_urls', 'status_entities_user_mentions', 'status_extended_entities_media', 'status_favorite_count', 'status_favorited', 'status_geo
', 'status_geo_coordinates', 'status_geo_type', 'status_id', 'status_id_str', 'status_in_reply_to_screen_name', 'status_in_reply_to_status_id', 'status_in_reply_to_status_id_str', 'st
atus_in_reply_to_user_id', 'status_in_reply_to_user_id_str', 'status_is_quote_status', 'status_lang', 'status_place', 'status_place_bounding_box_coordinates', 'status_place_bounding_b
ox_type', 'status_place_contained_within', 'status_place_country', 'status_place_country_code', 'status_place_full_name', 'status_place_id', 'status_place_name', 'status_place_place_t
ype', 'status_place_url', 'status_possibly_sensitive', 'status_quoted_status_id', 'status_quoted_status_id_str', 'status_retweet_count', 'status_retweeted', 'status_retweeted_status_c
ontributors', 'status_retweeted_status_coordinates', 'status_retweeted_status_created_at', 'status_retweeted_status_entities_hashtags', 'status_retweeted_status_entities_media', 'stat
us_retweeted_status_entities_symbols', 'status_retweeted_status_entities_urls', 'status_retweeted_status_entities_user_mentions', 'status_retweeted_status_extended_entities_media', 's
tatus_retweeted_status_favorite_count', 'status_retweeted_status_favorited', 'status_retweeted_status_geo', 'status_retweeted_status_id', 'status_retweeted_status_id_str', 'status_ret
weeted_status_in_reply_to_screen_name', 'status_retweeted_status_in_reply_to_status_id', 'status_retweeted_status_in_reply_to_status_id_str', 'status_retweeted_status_in_reply_to_user
id', 'status_retweeted_status_in_reply_to_user_id_str', 'status_retweeted_status_is_quote_status', 'status_retweeted_status_lang', 'status_retweeted_status_place', 'status_retweeted
status_possibly_sensitive', 'status_retweeted_status_quoted_status_id', 'status_retweeted_status_quoted_status_id_str', 'status_retweeted_status_retweet_count', 'status_retweeted_stat
us_retweeted', 'status_retweeted_status_source', 'status_retweeted_status_full_text', 'status_retweeted_status_truncated', 'status_source', 'status_full_text', 'status_truncated', 'st
atuses_count', 'suspended', 'time_zone', 'translator_type', 'url', 'utc_offset', 'verified'], 'lang': 'de', 'fail': False, 'fail_hidden': False, 'restart': True, 'retries': 10}).
Connection was burned already or there were multiple entries.Retrying in 1.

Catch rate limit error

Started with 100 seeds, this exception occured:

Exception in thread 14645160:
Traceback (most recent call last):
  File "/Users/muench/.pyenv/versions/3.6.6/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/Users/muench/HBI/Projects/SparseTwitter/collector.py", line 56, in run
    raise self.err
  File "/Users/muench/HBI/Projects/SparseTwitter/collector.py", line 53, in run
    mp.Process.run(self)
  File "/Users/muench/.pyenv/versions/3.6.6/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/muench/HBI/Projects/SparseTwitter/collector.py", line 582, in work_through_seed_get_next_seed
    raise e
  File "/Users/muench/HBI/Projects/SparseTwitter/collector.py", line 566, in work_through_seed_get_next_seed
    friend_list = collector.get_friend_list()
  File "/Users/muench/HBI/Projects/SparseTwitter/collector.py", line 244, in get_friend_list
    for page in tweepy.Cursor(self.connection.api.friends_ids, user_id=twitter_id).pages():
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/tweepy/cursor.py", line 49, in __next__
    return self.next()
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/tweepy/cursor.py", line 75, in next
    **self.kargs)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/tweepy/binder.py", line 250, in _call
    return method.execute()
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/tweepy/binder.py", line 232, in execute
    raise RateLimitError(error_msg, resp)
tweepy.error.RateLimitError: [{'message': 'Rate limit exceeded', 'code': 88}]

Get more API tokens

All interested researchers at HBI (and maybe QUT DMRC) should provide us with personal API keys to parallelise the data collection

AttributeError: 'numpy.int64' object has no attribute 'translate'

This seems to occur when a user is already in the database.

Exception in thread 2560446173:
Traceback (most recent call last):
  File "/Users/muench/.pyenv/versions/3.6.6/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/Users/muench/HBI/Projects/SparseTwitter/collector.py", line 123, in run
    raise self.err
  File "/Users/muench/HBI/Projects/SparseTwitter/collector.py", line 120, in run
    mp.Process.run(self)
  File "/Users/muench/.pyenv/versions/3.6.6/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/muench/HBI/Projects/SparseTwitter/collector.py", line 110, in func_wrapper
    return func(*args, **kwargs)
  File "/Users/muench/HBI/Projects/SparseTwitter/collector.py", line 886, in work_through_seed_get_next_seed
    result.to_sql('result', if_exists='append', index=False, con=self.dbh.engine)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pandas/core/generic.py", line 2130, in to_sql
    dtype=dtype)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pandas/io/sql.py", line 450, in to_sql
    chunksize=chunksize, dtype=dtype)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pandas/io/sql.py", line 1127, in to_sql
    table.insert(chunksize)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pandas/io/sql.py", line 641, in insert
    self._execute_insert(conn, keys, chunk_iter)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pandas/io/sql.py", line 616, in _execute_insert
    conn.execute(self.insert_statement(), data)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 948, in execute
    return meth(self, multiparams, params)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/sqlalchemy/sql/elements.py", line 269, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1060, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1200, in _execute_context
    context)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1416, in _handle_dbapi_exception
    util.reraise(*exc_info)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 249, in reraise
    raise value
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1170, in _execute_context
    context)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/sqlalchemy/dialects/mysql/mysqldb.py", line 105, in do_executemany
    rowcount = cursor.executemany(statement, parameters)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/cursors.py", line 197, in executemany
    self._get_db().encoding)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/cursors.py", line 213, in _do_execute_many
    v = values % escape(next(args), conn)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/cursors.py", line 127, in _escape_args
    return dict((key, conn.literal(val)) for (key, val) in args.items())
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/cursors.py", line 127, in <genexpr>
    return dict((key, conn.literal(val)) for (key, val) in args.items())
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/connections.py", line 469, in literal
    return self.escape(obj, self.encoders)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/connections.py", line 462, in escape
    return converters.escape_item(obj, self.charset, mapping=mapping)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/converters.py", line 27, in escape_item
    val = encoder(val, mapping)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/converters.py", line 118, in escape_unicode
    return u"'%s'" % _escape_unicode(value)
  File "/Users/muench/.local/share/virtualenvs/SparseTwitter-OibZFs7Y/lib/python3.6/site-packages/pymysql/converters.py", line 73, in _escape_unicode
    return value.translate(_escape_table)
AttributeError: 'numpy.int64' object has no attribute 'translate'

Test fails if tokens are depleted

======================================================================
FAIL: test_invalid_token (__main__.CollectorTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ubuntu/SparseTwitter/collector.py", line 280, in wrapper
    if collector.token_blacklist[old_token] <= time.time():
KeyError: 'invalid'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/SparseTwitter/collector.py", line 342, in check_API_calls_and_update_if_necessary
    next_reset_at = token_dict[token]
KeyError: 'invalid'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "tests/tests.py", line 828, in test_invalid_token
    collector.check_API_calls_and_update_if_necessary(endpoint='/friends/ids')
tweepy.error.TweepError: [{'code': 89, 'message': 'Invalid or expired token.'}]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "tests/tests.py", line 830, in test_invalid_token
    self.fail()
AssertionError: None

----------------------------------------------------------------------
Ran 50 tests in 123.955s

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.