Code Monkey home page Code Monkey logo

crawler's Introduction

Crawler

About

Crawler is a library which simplifies the process of writing web-crawlers.
It provides a modern application programming interface
using classes and event-based callbacks. It can be used to write
applications which populate databases for search engines like Google Search or Microsoft Bing.

Supported Protocols

  • HTTP 1.0
  • HTTP 1.1
  • any other protocol can be easily integrated

Future

  • Protocol Support
    • HTTP 2.0
    • HTTPS
    • FTP
  • Event System
    • More events
    • More flexible
    • More extendable
  • Worker
    • Better synchronization
    • More perfomance

Tested Operating Systems

  • Debian (Stretch and Jessie)
  • Ubuntu (15.04 LTS and 14.04.2 LTS)
  • Fedora (22, 21 and 20)
  • CentOS 7
  • Arch Linux
  • Mac OSX

Directories

  • CMake - cmake script
  • Documentation - documentation files
    • Documentation/html - html documentation
    • Documentation/man - man documentation
    • Documentation/latex - latex documentation
    • Documentation/rtf - rtf documentation
  • Documents - important documents
    • Documents/Templates - document templates provided by the instructor
    • Documents/Presentations - presentations
    • Documents/Charts - charts used to design the library
  • Source - source code files
    • Source/Crawler - library
    • Source/Example - example crawler without any funtionality
    • Source/Email - email crawler which fetches all email addresses from a website
    • Source/GlobalLinkDirectory - example crawler which creates a directory of all used links from a website
    • Source/LocalLinkDirectory - example crawler which creates a directory of only local links from a website
  • Tests - simple unit tests

CMake

  • Console
    • cd $ROOT
    • mkdir Build
    • cd Build
    • cmake ../CMake
  • GUI
    • Where is the source code: "$ROOT/CMake"
    • Where to build the binaries: "$ROOT/Build"
    • Configure
    • Generate

Dependencies

SFML
pugixml
uriparser

Installing Dependencies

  • Mac OSX
    • brew install sfml
    • brew install uriparser
  • Debian and Ubuntu
    • apt-get install libsfml-dev
    • apt-get install liburiparser-dev
  • Fedora and CentOS
    • yum install SFML-devel
    • yum install uriparser-devel
  • Arch Linux
    • pacman -S sfml
    • pacman -S uriparser

Team

Justus Flerlage

crawler's People

Contributors

jlipkin8 avatar stazer avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.