Code Monkey home page Code Monkey logo

freshonions-torscraper's Introduction

Fresh Onions TOR Hidden Service Crawler

This is a copy of the source for the http://zlal32teyptf4tvi.onion hidden service, which implements a tor hidden service crawler / spider and web site.

Features

  • Crawls the darknet looking for new hidden service
  • Find hidden services from a number of clearnet sources
  • Optional fulltext elasticsearch support
  • Marks clone sites of the /r/darknet superlist
  • Finds SSH fingerprints across hidden services
  • Finds email addresses across hidden services
  • Finds bitcoin addresses across hidden services
  • Shows incoming / outgoing links to onion domains
  • Up-to-date alive / dead hidden service status
  • Portscanner
  • Search for "interesting" URL paths, useful 404 detection
  • Automatic language detection
  • Fuzzy clone detection (requires elasticsearch, more advanced than superlist clone detection)

Licence

This software is made available under the GNU Affero GPL 3 License.. What this means is that is you deploy this software as part of networked software that is available to the public, you must make the source code available (and any modifications).

From the GNU site:

The GNU Affero General Public License is a modified version of the ordinary GNU GPL version 3. It has one added requirement: if you run a modified program on a server and let other users communicate with it there, your server must also allow them to download the source code corresponding to the modified version running there

Dependencies

  • python
  • tor

pip install:

pip install -r requirements.txt

Install

Create mysql db from schema.sql

Edit etc/database for your database setup

Edit etc/proxy for your TOR setup

script/push.sh someoniondirectory.onion 
script/push.sh anotheroniondirectory.onion

Edit etc/uwsgi_only and set BASEDIR to wherever torscraper is installed (i.e. /home/user/torscraper)

Run:

init/scraper_service.sh # to start crawling
init/isup_service.sh # to keep site status up to date

Optional ElasticSearch Fulltext Search

The torscraper comes with optional elasticsearch capability (enabled by default). Edit etc/elasticsearch and set vars or set ELASTICSEARCH_ENABLED=false to disable. Run scripts/elasticsearch_migrate.sh to perform the initial setup after configuration.

if elasticsearch is disabled there will be no fulltext search, however crawling and discovering new sites will still work.

cronjobs

# harvest onions from various sources
1 18 * * * /home/scraper/torscraper/scripts/harvest.sh

# get ssh fingerprints for new sites
1 4,16 * * * /home/scraper/torscraper/scripts/update_fingerprints.sh

# mark sites as genuine / fake from the /r/darknetmarkets superlist    
1 9 * * 1 /home/scraper/torscraper/scripts/get_valid.sh

# scrape pastebin for onions (needs paid account / IP whitelisting)                 
*/5 * * * * /home/scraper/torscraper/scripts/pastebin.sh

# portscan new onions               
1 */6 * * * /home/scraper/torscraper/scripts/portscan_up.sh

# scrape stronghold paste
32 */2 * * * /home/scraper/torscraper/scripts/stronghold_paste_rip.sh

# detect clones
16 3 * * * /home/scraper/torscraper/scripts/detect_clones.sh

Infrastructure

Fresh Onions runs on two servers, a frontend host running the database and hidden service web site, and a backend host running the crawler. Probably most interesting to the reader is the setup for the backend. TOR as a client is COMPLETELY SINGLETHREADED. I know! It's 2017, and along with a complete lack of flying cars, TOR runs in a single thread. What this means is that if you try to run a crawler on a single TOR instance you will quickly find you are maxing out your CPU at 100%.

The solution to this problem is running multiple TOR instances and connecting to them through some kind of frontend that will round-robin your requests. The Fresh Onions crawler runs eight Tor instances.

Debian (and ubuntu) comes with a useful program "tor-instance-create" for quickly creating multiple instances of TOR. I used Squid as my frontend proxy, but unfortunately it can't connect to SOCKS directly, so I used "privoxy" as an intermediate proxy. You will need one privoxy instance for every TOR instance. There is a script in "scripts/create_privoxy.sh" to help with creating privoxy instances on debian systems. It also helps to replace /etc/privoxy/default.filter with an empty file, to reduce CPU load by removing unnecessary regexes.

Additionally, this resource https://www.howtoforge.com/ultimate-security-proxy-with-tor might be useful in setting up squid. If all you are doing is crawling and don't care about anonymity, I also recommend running TOR in tor2web mode (required recompilation) for increased speed

freshonions-torscraper's People

Contributors

dirtyfilthy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

freshonions-torscraper's Issues

No informations link to specific domain.

Hi guys, I worked on this for a personal project, but I got a problem and I didn't find the solution.
When I launch the script harvest. sh or push_list. sh or scrape. sh all works well, but when I found some alive hidden pages, I can't find the links to other domain or to this domain. Do I need to enable something?? I would like to have some help to understand how "links on this domain, or links from this domain" works.

In this example, I assume that I should see some links for this page,but I got nothing.

Information for jd6yhuwmmfyykhl4.onion
Dream Market Login - Featured anonymous marketplace [SITE] [JSON]
Open Ports:
80:http
Interesting Paths:
No interesting paths in database.
Emails:
No emails in database.
Bitcoin Addresses:
No bitcoin addresses in database.
SSH fingerprint
No SSH fingerprint in database.
Clones
This site appears to have no clones.

Links to this domain:
No links to this domain in database.
Links from this domain:
No links from this domain in database.

a problem in running some scripts

hi there... I run some scripts of this project successfully but there is a problem with some scripts.
by default I disable the elastic-search and I push a .onion domain with push.sh, and I see the results in domain and page tables,but after that, when I run scraper-service.sh face with a loop like below:
screenshot from 2017-10-15 12-57-50
screenshot from 2017-10-15 12-59-27

As you see a part of my terminal during running scraper-service.sh, scrapy opens and closes again and again and it can not get out from this loop by itself...
the second problem is with elastic-search! When i enable it, none of the scripts run and i have this error:
screenshot from 2017-10-15 13-48-50
screenshot from 2017-10-15 13-51-04

Would you plz guide me... and if you need some files, i will show you...thanks...

Issue with connecting to DB

File "/usr/lib/python2.7/site-packages/pony/orm/dbapiprovider.py", line 53, in wrap_dbapi_exceptions
except dbapi_module.OperationalError as e: raise OperationalError(e)
pony.orm.dbapiprovider.OperationalError: (1045, u"Access denied for user 'xxxx' at 'ip_address' (using password: YES)")

Please advise as can not work out where this is coming from

Installation Guide

Can someone explain me how to use this crawler. I am very interested because of my research project.

Where are results save?

Hello guys..
I run the torscraper and run harvest.sh spider..
After run and finised spider:
Mysql is empty too..
screen shot 1396-04-18 at 14 13 01

that's work but I don't now where the results save..

screen shot 1396-04-18 at 14 08 03

Error in before_insert()

Harvest is working well, but trying to use push.sh I got the message below:

2018-05-04 14:37:58 [scrapy.core.scraper] ERROR: Spider error processing <GET http://oxwugzccvk3dk6tj.onion/index.html> (referer: None)
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/defer.py", line 102, in iter_errback
yield next(it)
File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/offsite.py", line 30, in process_spider_output
for x in result:
File "/root/freshonions/torscraper/middlewares.py", line 192, in
return (_set_range(r) for r in result or ())
File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/referer.py", line 339, in
return (_set_referer(r) for r in result or ())
File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/urllength.py", line 37, in
return (r for r in result or () if _filter(r))
File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/depth.py", line 58, in
return (r for r in result or () if _filter(r))
File "/usr/local/lib/python2.7/dist-packages/pony/orm/core.py", line 533, in new_gen_func
output = wrapped_interact(iterator)
File "/usr/local/lib/python2.7/dist-packages/pony/orm/core.py", line 520, in wrapped_interact
rollback_and_reraise(sys.exc_info())
File "/usr/local/lib/python2.7/dist-packages/pony/orm/core.py", line 320, in rollback_and_reraise
reraise(*exc_info)
File "/usr/local/lib/python2.7/dist-packages/pony/orm/core.py", line 510, in wrapped_interact
output = interact(iterator, input, exc_info)
File "/usr/local/lib/python2.7/dist-packages/pony/orm/core.py", line 484, in interact
return next(iterator) if input is None else iterator.send(input)
File "/root/freshonions/torscraper/spiders/tor_scrapy.py", line 323, in parse
page = self.update_page_info(response.url, title, response.status, is_frontpage, size)
File "<auto generated wrapper of update_page_info() function>", line 2, in update_page_info
File "/usr/local/lib/python2.7/dist-packages/pony/orm/core.py", line 460, in new_func
try: return func(*args, **kwargs)
File "/root/freshonions/torscraper/spiders/tor_scrapy.py", line 230, in update_page_info
page = Page.get(url=url)
File "<auto generated wrapper of get() function>", line 2, in get
File "/usr/local/lib/python2.7/dist-packages/pony/utils/utils.py", line 58, in cut_traceback
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/pony/orm/core.py", line 3702, in get
try: return entity.find_one(kwargs) # can throw MultipleObjectsFoundError
File "/usr/local/lib/python2.7/dist-packages/pony/orm/core.py", line 3801, in find_one
if obj is None: obj = entity.find_in_db(avdict, unique, for_update, nowait)
File "/usr/local/lib/python2.7/dist-packages/pony/orm/core.py", line 3860, in find_in_db
cursor = database._exec_sql(sql, arguments)
File "/usr/local/lib/python2.7/dist-packages/pony/orm/core.py", line 813, in _exec_sql
connection = cache.prepare_connection_for_query_execution()
File "/usr/local/lib/python2.7/dist-packages/pony/orm/core.py", line 1651, in prepare_connection_for_query_execution
if not cache.noflush_counter and cache.modified: cache.flush()
File "/usr/local/lib/python2.7/dist-packages/pony/orm/core.py", line 1728, in flush
if obj is not None: obj.before_save()
File "/usr/local/lib/python2.7/dist-packages/pony/orm/core.py", line 5049, in before_save
if status == 'created': obj.before_insert()
File "/root/freshonions/lib/tor_db/models/domain.py", line 155, in before_insert
dom.save()
File "/usr/local/lib/python2.7/dist-packages/elasticsearch_dsl/document.py", line 437, in save
return meta['created']

Problem installing txsocksx

Upon installing requirements, I get the following error on txsocksx

Collecting txsocksx (from -r requirements.txt (line 10))
  Using cached https://files.pythonhosted.org/packages/ed/36/5bc796eb2612b500baa26a68481d699e08af5382152a9de18e5a45b44ea7/txsocksx-1.15.0.2.tar.gz
    ERROR: Complete output from command python setup.py egg_info:
    ERROR: zip_safe flag not set; analyzing archive contents...

    Installed /tmp/pip-install-wsr9xyns/txsocksx/.eggs/vcversioner-2.16.0.0-py3.7.egg
    error in txsocksx setup command: 'install_requires' must be a string or list of strings containing valid project/version requirement specifiers; 'int' object is not iterable
    ----------------------------------------
ERROR: Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-wsr9xyns/txsocksx/

I think this is probably the same issue but this means this repo does not support python 3.x+ as a consequence. Has anyone managed to fix this and getting it to run on python 3?

What exactly does this dependency do? Can it be replaced? I'm amazed at the number of dependencies in this project.

Searching for a "long string" like "this site is made as a joke" returns error 500

Searching for a long string - selecting the "match phrase" returns an error - it looks like elastic search returns the error: "elasticsearch_dsl.response.hit.HitMeta object' has no attribute 'highlight'"

Any tips on how to fix this problem ?

127.0.0.1 - - [27/Sep/2018 22:25:57] "GET /?search=This+site+is+made+as+a+joke&submit=Go+%3E%3E%3E HTTP/1.1" 200 -

[2018-09-27 22:26:01,051] ERROR in app: Exception on / [GET]
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/flask/app.py", line 1982, in wsgi_app
response = self.full_dispatch_request()
File "/usr/lib/python2.7/dist-packages/flask/app.py", line 1614, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/lib/python2.7/dist-packages/flask/app.py", line 1517, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/usr/lib/python2.7/dist-packages/flask/app.py", line 1612, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/lib/python2.7/dist-packages/flask/app.py", line 1598, in dispatch_request
return self.view_functionsrule.endpoint
File "/app/torsearch/server/lib/tor_cache.py", line 60, in my_decorator
response = f(*args, **kwargs)
File "<auto generated wrapper of index() function>", line 2, in index
File "/usr/local/lib/python2.7/dist-packages/pony/orm/core.py", line 460, in new_func
try: return func(*args, **kwargs)
File "/app/torsearch/server/web/app.py", line 154, in index
r, n_results = helpers.render_elasticsearch(context)
File "<auto generated wrapper of render_elasticsearch() function>", line 2, in render_elasticsearch
File "/usr/local/lib/python2.7/dist-packages/pony/orm/core.py", line 460, in new_func
try: return func(*args, **kwargs)
File "/app/torsearch/server/lib/helpers.py", line 47, in render_elasticsearch
return (render_template('index_fulltext.html', domains=domains, results=results, context=context, orig_count=orig_count, n_results=n_results, page=page, per_page=result_limit, sort=sort, is_more = is_more), orig_count)
File "/usr/lib/python2.7/dist-packages/flask/templating.py", line 134, in render_template
context, ctx.app)
File "/usr/lib/python2.7/dist-packages/flask/templating.py", line 116, in _render
rv = template.render(context)
File "/usr/lib/python2.7/dist-packages/jinja2/environment.py", line 989, in render
return self.environment.handle_exception(exc_info, True)
File "/usr/lib/python2.7/dist-packages/jinja2/environment.py", line 754, in handle_exception
reraise(exc_type, exc_value, tb)
File "/app/torsearch/server/web/templates/index_fulltext.html", line 4, in top-level template code
{% from 'search_panel.macro.html' import search_panel %}
File "/app/torsearch/server/web/templates/layout.html", line 5, in top-level template code
{% block body %}{% endblock %}
File "/app/torsearch/server/web/templates/index_fulltext.html", line 16, in block "body"
{{ domain_fulltext_table(domains, results, sortable=True, context=context) }}
File "/app/torsearch/server/web/templates/domain_table.macro.html", line 161, in template
{{break_long_words(hit.meta.highlight.body_stripped[0])|safe}}
File "/usr/lib/python2.7/dist-packages/jinja2/environment.py", line 408, in getattr
return getattr(obj, attribute)
UndefinedError: 'elasticsearch_dsl.response.hit.HitMeta object' has no attribute 'highlight'

Installation Error

i have the following problem. when installing torscraper i get the following error message every time.

Collecting txsocksx
  Downloading txsocksx-1.15.0.2.tar.gz (19 kB)
    ERROR: Command errored out with exit status 1:
     command: /usr/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-dp1fhm0i/txsocksx/setup.py'"'"'; __file__='"'"'/tmp/pip-install-dp1fhm0i/txsocksx/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-install-dp1fhm0i/txsocksx/pip-egg-info
         cwd: /tmp/pip-install-dp1fhm0i/txsocksx/
    Complete output (1 lines):
    error in txsocksx setup command: 'install_requires' must be a string or list of strings containing valid project/version requirement specifiers; 'int' object is not iterable
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

how can i fix this error message?

thank u very much if someone can help

no directroy '/etc/private'

I can't run ./scraper-service.sh,
ERROR-code:
./scraper-service.sh: 33: .: Can't open /home/ubuntu/freshonions/etc/private/flask.secret

And when I look up the directory etc, there is no private directory.
could anyone help me?

503 - Forwarding failure (Privoxy@localhost)

2018-09-13 10:19:43 [tor] DEBUG: Finding links...
2018-09-13 10:19:43 [scrapy.core.engine] DEBUG: Crawled (503) <GET http://mbupzaotochxi6od.onion/> (referer: None)
2018-09-13 10:19:43 [tor] INFO: Page count is 1 for 5gkj3x7i2ucyli7g.onion
2018-09-13 10:19:43 [scrapy.core.engine] DEBUG: Crawled (503) <GET http://fa5qwuxkk2vokvem.onion/> (referer: None)
2018-09-13 10:19:43 [tor] INFO: Page count is 1 for 3y67mtxyjyw4uiul.onion
2018-09-13 10:19:43 [scrapy.core.engine] DEBUG: Crawled (503) <GET http://mweyederzght2ujt.onion/> (referer: None)
2018-09-13 10:19:43 [tor] INFO: Page count is 1 for xrivkrh73mcmiuwx.onion
2018-09-13 10:19:43 [scrapy.core.engine] DEBUG: Crawled (503) <GET http://asylumdlmb6gne6g.onion/> (referer: None)
2018-09-13 10:19:43 [scrapy.core.engine] DEBUG: Crawled (503) <GET http://5gkj3x7i2ucyli7g.onion/> (referer: None)
2018-09-13 10:19:43 [scrapy.core.engine] DEBUG: Crawled (503) <GET http://3y67mtxyjyw4uiul.onion/> (referer: None)
2018-09-13 10:19:43 [scrapy.core.engine] DEBUG: Crawled (503) <GET http://xrivkrh73mcmiuwx.onion/> (referer: None)
2018-09-13 10:19:43 [tor] DEBUG: Got http://songstems5d5b6y3.onion/ (503 - Forwarding failure (Privoxy@localhost))
2018-09-13 10:19:43 [tor] DEBUG: Finding links...
2018-09-13 10:19:43 [tor] DEBUG: Got http://silkroadyh2dmuad.onion/ (503 - Forwarding failure (Privoxy@localhost))
2018-09-13 10:19:43 [tor] DEBUG: Finding links...
2018-09-13 10:19:43 [tor] DEBUG: Got http://qtjp7izj3ir33hpi.onion/ (503 - Forwarding failure (Privoxy@localhost))

Can anyone help with this, If you help me in any way, it would be appreciated.Thanx

Support for v3 onions?

Great work on this project. Just wondering if you plan to add support for the new v3 onions, which contain a much longer in address.

Detect and bypass Captcha

Hi guys,

I thought that it can be interesting to bypass pages that were protected by Captcha. I started developing a proof of concept but it was really basic.

I was able to solve captcha like this :
image

It can be interesting to solve harder captcha like this :
image

image

image

Through this service, we found one case in Korea.

Through this service, we found one case in Korea and arrested the actual criminal.

We are a team working on Korea's Dark Web. I unexpectedly found a site that opened the university database through the service and reported it to the government. I hope this case will be widely known and helpful for freshonion.

It was a well-known example that it was broadcasted in Korea and appeared in newspapers.

Hey! Collab?

Hey,
Nice project! I have been working on something similar. I was checking out how to make scrapy work with TOR and found your project! I was thinking if we could do a collab. https://github.com/danieleperera/OnionIngestor
It's very extendable and uses the logic of modules. There are source modules which collect onion links from the clear web. Then there are operators which scan and crawls onion links. Finally, the database engines where the collected information is index/stored! Currently, I'm working with onionscan as the main operator and elasticsearch and telegram as main database engines!

automatic login and register

hi all...
as we know, there are many onion domains that has a login/register page to enter and crawl their pages. Such domains require user name, password and captcha... I have run freshonions-torscraper and it seems that it doesn't crawl such domain's contents and we see just index page. I am interested to know how can i crawl such domains by entering into them! i have searched a bit, and i saw something like scrapy's FormRequest object... but i don't know what can i do with it in this project! Do you have any idea about my issue?! please guide me...

screenshot from 2018-03-04 13-18-18

ConnectionError('N/A', str(e), e)

Hi, could you tell me what this error is?

ERROR in app: Exception on / [GET]
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/flask/app.py", line 1982, in wsgi_app
response = self.full_dispatch_request()
File "/usr/lib/python2.7/dist-packages/flask/app.py", line 1614, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/lib/python2.7/dist-packages/flask/app.py", line 1517, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/usr/lib/python2.7/dist-packages/flask/app.py", line 1612, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/lib/python2.7/dist-packages/flask/app.py", line 1598, in dispatch_request
return self.view_functionsrule.endpoint
File "/home/user/tor/freshonions-torscraper/lib/tor_cache.py", line 60, in my_decorator
response = f(*args, **kwargs)
File "<auto generated wrapper of index() function>", line 2, in index
File "/home/user/.local/lib/python2.7/site-packages/pony/orm/core.py", line 528, in new_func
result = func(*args, **kwargs)
File "/home/user/tor/freshonions-torscraper/web/app.py", line 156, in index
r, n_results = helpers.render_elasticsearch(context)
File "<auto generated wrapper of render_elasticsearch() function>", line 2, in render_elasticsearch
File "/home/user/.local/lib/python2.7/site-packages/pony/orm/core.py", line 515, in new_func
return func(*args, **kwargs)
File "/home/user/tor/freshonions-torscraper/lib/helpers.py", line 28, in render_elasticsearch
results = elasticsearch_pages(context, sort, page)
File "/home/user/tor/freshonions-torscraper/lib/tor_elasticsearch.py", line 89, in elasticsearch_pages
return query.execute()
File "/usr/local/lib/python2.7/dist-packages/elasticsearch_dsl/search.py", line 639, in execute
**self._params
File "/home/user/.local/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 73, in _wrapped
return func(*args, params=params, **kwargs)
File "/home/user/.local/lib/python2.7/site-packages/elasticsearch/client/init.py", line 632, in search
doc_type, '_search'), params=params, body=body)
File "/home/user/.local/lib/python2.7/site-packages/elasticsearch/transport.py", line 312, in perform_request
status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/home/user/.local/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 123, in perform_request
raise ConnectionError('N/A', str(e), e)
ConnectionError: ConnectionError(<urllib3.connection.HTTPConnection object at 0x7f71b5fcf350>: Failed to establish a new connection: [Errno -2] Name or service not known) caused by: NewConnectionError(<urllib3.connection.HTTPConnection object at 0x7f71b5fcf350>: Failed to establish a new connection: [Errno -2] Name or service not known)
127.0.0.1 - - [03/Feb/2020 12:07:36] "GET /?rep=n%2Fa&search=port&submit=Go+%3E%3E%3E HTTP/1.1" 500 -
127.0.0.1 - - [03/Feb/2020 12:09:24] "GET / HTTP/1.1" 200 -
[2020-02-03 12:09:27,196] ERROR in app: Exception on / [GET]
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/flask/app.py", line 1982, in wsgi_app
response = self.full_dispatch_request()
File "/usr/lib/python2.7/dist-packages/flask/app.py", line 1614, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/lib/python2.7/dist-packages/flask/app.py", line 1517, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/usr/lib/python2.7/dist-packages/flask/app.py", line 1612, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/lib/python2.7/dist-packages/flask/app.py", line 1598, in dispatch_request
return self.view_functionsrule.endpoint
File "/home/user/tor/freshonions-torscraper/lib/tor_cache.py", line 60, in my_decorator
response = f(*args, **kwargs)
File "<auto generated wrapper of index() function>", line 2, in index
File "/home/user/.local/lib/python2.7/site-packages/pony/orm/core.py", line 528, in new_func
result = func(*args, **kwargs)
File "/home/user/tor/freshonions-torscraper/web/app.py", line 156, in index
r, n_results = helpers.render_elasticsearch(context)
File "<auto generated wrapper of render_elasticsearch() function>", line 2, in render_elasticsearch
File "/home/user/.local/lib/python2.7/site-packages/pony/orm/core.py", line 515, in new_func
return func(*args, **kwargs)
File "/home/user/tor/freshonions-torscraper/lib/helpers.py", line 28, in render_elasticsearch
results = elasticsearch_pages(context, sort, page)
File "/home/user/tor/freshonions-torscraper/lib/tor_elasticsearch.py", line 89, in elasticsearch_pages
return query.execute()
File "/usr/local/lib/python2.7/dist-packages/elasticsearch_dsl/search.py", line 639, in execute
**self._params
File "/home/user/.local/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 73, in _wrapped
return func(*args, params=params, **kwargs)
File "/home/user/.local/lib/python2.7/site-packages/elasticsearch/client/init.py", line 632, in search
doc_type, '_search'), params=params, body=body)
File "/home/user/.local/lib/python2.7/site-packages/elasticsearch/transport.py", line 312, in perform_request
status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/home/user/.local/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 123, in perform_request
raise ConnectionError('N/A', str(e), e)
ConnectionError: ConnectionError(<urllib3.connection.HTTPConnection object at 0x7f71b61bb910>: Failed to establish a new connection: [Errno -2] Name or service not known) caused by: NewConnectionError(<urllib3.connection.HTTPConnection object at 0x7f71b61bb910>: Failed to establish a new connection: [Errno -2] Name or service not known)

Result problem

Hello:
I am a postgraduate student, I am trying to execute gosecure/freshonions-torscraper but found that repository was archived, so i ask the questions at here.
I use docker way, but there is something wrong in my result, as show in the picture. I have deployed the system http, https and sock5 proxy. Can u give me some help? thanks.
2020-10-13 15-04-49屏幕截图

TransportError 400 when trying to insert into Elasticsearch 5.6.10

    for x in result:
  File "/root/freshonions-torscraper/torscraper/middlewares.py", line 192, in <genexpr>
    return (_set_range(r) for r in result or ())
  File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/referer.py", line 339, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/usr/local/lib/python2.7/dist-packages/pony/orm/core.py", line 533, in new_gen_func
    output = wrapped_interact(iterator)
  File "/usr/local/lib/python2.7/dist-packages/pony/orm/core.py", line 520, in wrapped_interact
    rollback_and_reraise(sys.exc_info())
  File "/usr/local/lib/python2.7/dist-packages/pony/orm/core.py", line 320, in rollback_and_reraise
    reraise(*exc_info)
  File "/usr/local/lib/python2.7/dist-packages/pony/orm/core.py", line 510, in wrapped_interact
    output = interact(iterator, input, exc_info)
  File "/usr/local/lib/python2.7/dist-packages/pony/orm/core.py", line 484, in interact
    return next(iterator) if input is None else iterator.send(input)
  File "/root/freshonions-torscraper/torscraper/spiders/tor_scrapy.py", line 379, in parse
    pg.save()
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch_dsl/document.py", line 429, in save
    **doc_meta
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/utils.py", line 73, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/__init__.py", line 300, in index
    _make_path(index, doc_type, id), params=params, body=body)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/transport.py", line 312, in perform_request
    status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/http_urllib3.py", line 129, in perform_request
    self._raise_error(response.status, raw_data)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/base.py", line 125, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
RequestError: TransportError(400, u'illegal_argument_exception', u"can't specify parent if no parent field has been configured")

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.