Code Monkey home page Code Monkey logo

ahmia-crawler's People

Contributors

chamalis avatar dependabot[bot] avatar iriahi avatar juhanurmi avatar online-pol-ads avatar skrish13 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ahmia-crawler's Issues

socks.SOCKS5Error: 0x06: TTL expired

After fixing #9 this came up, seems like a PySocks bug: #todo report it.


Exception happened during processing of request from ('127.0.0.1', 48068)
Traceback (most recent call last):
  File "<path2venv>/ahmia-crawler/lib/python3.6/site-packages/socks.py", line 851, in connect
    negotiate(self, dest_addr, dest_port)
  File "<path2venv>/ahmia-crawler/lib/python3.6/site-packages/socks.py", line 497, in _negotiate_SOCKS5
    self, CONNECT, dest_addr)
  File "<path2venv>/ahmia-crawler/lib/python3.6/site-packages/socks.py", line 578, in _SOCKS5_request
    raise SOCKS5Error("{0:#04x}: {1}".format(status, error))
socks.SOCKS5Error: 0x06: TTL expired

Currently polipo remains the stable proxy solution for the crawler.

scripts of type [inline], operation [update] and lang [groovy] are disabled'

Anyone seen this before?

Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 653, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/usr/local/lib/python2.7/dist-packages/scrapyelasticsearch/scrapyelasticsearch.py", line 116, in process_item
self.index_item(item)
File "/home/akent/Desktop/ahmia-crawler/ahmia/ahmia/pipelines.py", line 87, in index_item
self.send_items()
File "/usr/local/lib/python2.7/dist-packages/scrapyelasticsearch/scrapyelasticsearch.py", line 109, in send_items
helpers.bulk(self.es, self.items_buffer)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/init.py", line 190, in bulk
for ok, item in streaming_bulk(client, actions, **kwargs):
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/init.py", line 162, in streaming_bulk
for result in _process_bulk_chunk(client, bulk_actions, raise_on_exception, raise_on_error, **kwargs):
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/init.py", line 134, in _process_bulk_chunk
raise BulkIndexError('%i document(s) failed to index.' % len(errors), errors)
BulkIndexError: (u'316 document(s) failed to index.', [{u'update': {u'status': 400, u'_type': u'tor', u'_id': u'd3ba376d34cbea850dabc25b1d4a884da852c995', u'error': {u'caused_by': {u'reason': u'scripts of type [inline], operation [update] and lang [groovy] are disabled', u'type': u'script_exception'}, u'reason': u'failed to execute script', u'type': u'illegal_argument_exception'}, u'_index': u'crawl'}}, {u'update': {u'status': 400, u'_type': u'tor', u'_id': u'9bbd33851fb71cc9980b262797a00c1a06932094', u'error': {u'caused_by': {u'reason': u'scripts of type [inline], operation [update] and lang [groovy] are disabled', u'type': u'script_exception'}, u'reason': u'failed to execute script', u'type': u'illegal_argument_exception'}, u'_index': u'crawl'}}, {u'update': {u'status': 400, u'_type': u'tor', u'_id': u'd730f4a540adad5892aa58a5870baed904b6c4c1', u'error': {u'caused_by': {u'reason': u'sc

twisted 19.2.1 has requirement attrs>=17.4.0

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.2 LTS
Release: 18.04
Codename: bionic

ERROR:

ERROR: twisted 19.2.1 has requirement attrs>=17.4.0, but you'll have attrs 17.2.0 which is incompatible.

TorSocks proxy throws: unexpected keyword argument 'fileno'

$ python http_tor_proxy.py 
Serving at port 14444
$ curl -x http://localhost:14444 http://msydqstlz2kzerdg.onion
Traceback (most recent call last):
  File "http_tor_proxy.py", line 67, in <module>
    httpd.serve_forever()
  File "/usr/lib/python3.6/socketserver.py", line 238, in serve_forever
    self._handle_request_noblock()
  File "/usr/lib/python3.6/socketserver.py", line 312, in _handle_request_noblock
    request, client_address = self.get_request()
  File "/usr/lib/python3.6/socketserver.py", line 500, in get_request
    return self.socket.accept()
  File "/usr/lib/python3.6/socket.py", line 210, in accept
    sock = socket(self.family, type, self.proto, fileno=fd)
TypeError: __init__() got an unexpected keyword argument 'fileno'

Tor

Does this require tor to be installed? It's failing the initial run daily trying to retrieve and .onion site.

Would it be harrd to build a docker image so this all just works?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.