Code Monkey home page Code Monkey logo

torscrapper's Introduction

TorScrapper

A basic scrapper made in python with BeautifulSoup and Tor support to -

  • Scrape Onion and normal links.
  • Save the output in html format in Output folder.
  • Filter the html output and strip out useful data only (Work in Progress).
  • Striping out IOCs and other related data (On To-Do list).

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.

Prerequisites

  • You will need Python3 to run this project smoothly. Go to your terminal and execute the following command or visit Python3 website.
[sudo] apt-get install python3 python3-dev
[sudo] pip3 install -r requirements.txt

TL;DR: We recommend installing TorScrapper inside a virtual environment on all platforms.

Python packages can be installed either globally (a.k.a system wide), or in user-space. We do not recommend installing TorScrapper system wide.

Instead, we recommend that you install TorScrapper within a so-called “virtual environment” (virtualenv). Virtualenvs allow you to not conflict with already-installed Python system packages (which could break some of your system tools and scripts), and still install packages normally with pip (without sudo and the likes).

To get started with virtual environments, see virtualenv installation instructions. To install it globally (having it globally installed actually helps here), it should be a matter of running:

[sudo] pip install virtualenv

Basic setup

Before you run the torBot make sure the following things are done properly:

  • Run tor service sudo service tor start

  • Set a password for tor tor --hash-password "my_password"

  • Give the password inside /Modules/Scrape.py from stem.control import Controller with Controller.from_port(port = 9051) as controller: controller.authenticate("your_password_hash") controller.signal(Signal.NEWNYM)

  • Go to /etc/tor/torrc and uncomment - ControlPort 9051

Read more about torrc here : Torrc

Deployment

A step by step series of examples that tells what you have to do to get this project running -

  • Enter the project directory.
  • Copy all the onion and normal links you want to scrape in onions.txt
[nano]/[vim]/[gedit]/[Your choice of editor] onions.txt
  • Run TorScrapper.py using Python3
[sudo] python3 TorScrapper.py
  • Check the scraped outputs in Output folder.

Built With

  • Python - Python programming language.
  • Tor - If you don't know about Tor then you probably shouldn't be here :)
  • BeautifulSoup - Beautiful Soup is a Python library for pulling data out of HTML and XML files.

Contributing

If you have new ideas which is worth implementing, mention those by starting a new issue with the title [FEATURE_REQUEST]. If the idea is worth implementing, congratz you are now a contributor.

Versioning

Version 1.something Mehh...

Authors

  • Shivam Kapoor - An avid learner who likes to know every tiny detail in working of real life systems. Real enthusiast of cyber security and underlying networking concepts. (Email - [email protected])

License

Too lazy to decide on a License. zZzZ

torscrapper's People

Contributors

little-endian-0x01 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.