spider's Introduction

Web Crawler

Web crawler to look for local or external links from an especified URL

How does it works ⚙️

The crawler gets and URL especified by the user and starts to looks for the links in that page by the "href" tag. Once it has all the links, it clasifies them as Local or External to the main url given. Then, the crawler takes all those URLs and repeats the process getting all the URLs related to the main one.

The URLs are saved on two folders:

URLS_locales: Where all the local links are saved in "url_.txt" file
URLS_externas: Where all the external links are saved in "url_.txt" file

Installing 🔧

First clone the repo:

git clone https://github.com/Carliquiss/spider

Then run the following command to install needed libs:

pip3 install -r requirements.txt

Usage ⌨️

The URL is given by the "-u" param: -u url (in format http://www.example.com)

You can use the "-c" param to clear all folders and files created by the crawler created in previous usages

python3 spider.py -u <url> -c

This also save the externasl URLs from the local links under the "URLS_externas" folder.

If you want to get the urls from a file just use "-i input_file":

python3 spider.py -i <input_file>

For all the options you can add the verbose mode with "-v"

python3 spider.py -i <input_file> -v

python3 spider.py -u <url> -v

python3 spider.py -i <input_file> -cv

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.

Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

TensorFlow

An Open Source Machine Learning Framework for Everyone

Django

The Web framework for perfectionists with deadlines.

Laravel

A PHP framework for web artisans

D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

web

Some thing interesting about web. New door for the world.

server

A server is a program made to process requests and deliver data to clients.

Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

Visualization

Some thing interesting about visualization, use data art

Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.

Microsoft

Open source projects and samples from Microsoft.

Google

Google ❤️ Open Source for everyone.

Alibaba

Alibaba Open Source for everyone

D3

Data-Driven Documents codes.

Tencent

China tencent open source team.

alvarontwrk / spider Goto Github PK