Code Monkey home page Code Monkey logo

scrapers's Introduction

Scrapers

A list of scrapers from around the web.

Find your way through with the Table of Contents. It will showcase the entire list with easy navigate to their pros and cons while also providing links to their respective websites.

Please contribute by adding links, adding pros/cons, titles, or anything else you think would be helpful! Please help maintain alphabetical order.

Table Of Contents

Description: Cloud-based scraper for JavaScript.

Applicable Language(s)

  • JavaScript

Description: A Python library for navigating and parsing results from the Web. It allow for searching the HTML tree to find various tags.

Applicable Language(s)

  • Python

Description:Fast, flexible & lean implementation of core jQuery designed

Applicable Language(s)

  • JavaScript

Description: Service for looking up company and people information.

Applicable Language(s)


Description: Open dataset of crawled websites.

Applicable Language(s)


Description: Automatic service that turns a website into structured data in the form of JSON or CSV.

Applicable Language(s)


Description: Website data extraction using a visual programming language.

Applicable Language(s)


Description: Automated tool for extracting structured information from pages, crawling websites, and turning a website into an API.

Applicable Language(s)


Description: Cloud based web scraping platform.

Applicable Language(s)

  • SML
  • Javascript

Pros

  • Scraper can be build using visual tool and scraping meta language
  • Can execute JS snippets inside scraper
  • Supports Selenium (optionally) and OCR
  • Automated data validation and export to any text based format
  • Can run scrapers manually and scheduled in the cloud or compile and run locally
  • Full automation using API and integrations with other APIs

Cons

  • Currently in beta
  • Doesn't support PDF parsing yet

Description: Tool to mine LinkedIn profiles based on keywords.

Applicable Language(s)


Description: Local software that can download a proxy list and let users choose which one to use.

Applicable Language(s)


Description: API to find e-mail addresses for a given domain name.

Applicable Language(s)


Description: Provide various website extraction and transformation tools such as Full-Text RSS and Term Extraction as services.

Applicable Language(s)


Description: Local software for web scraping using a recording and a visual programming language.

Applicable Language(s)


Description: API to retrieve more information on a person.

Applicable Language(s)


Description: Service that searches a website for e-mails.

Applicable Language(s)


Description: A chrome extension which scrapes off all the href's from a web page.

Applicable Language(s)


Description: Automated tool to extract structured information from websites.

Applicable Language(s)


Description: Kimono was acquired by Palantir. This was a cloud-based service for turning websites into structured APIs. Now they offer a desktop-based alternative for continuing to use their tools.

Applicable Language(s)


Description: lxml is the most feature-rich and easy-to-use library for processing XML and HTML in the Python language.

Pros

Applicable Language(s)

  • Python

Description: Extract structured information from HTML, PDF, Excel, and Word by clicking on document elements.

Applicable Language(s)


Description: Based on ScraperWiki, run scrapers in Python, Ruby, R, Perl or Node.js.

Applicable Language(s)

  • Node.js
  • Perl
  • Python
  • R
  • Ruby

Description: Web Crawler/Spider for NodeJS + server-side jQuery

Applicable Language(s)

  • Node.js

Description: Web crawler that can be combined with the Hadoop ecosystem to run in a cluster.

Applicable Language(s)


Description: Application that can extract information from a website and turn it into structured data (CSV, Excel, etc.).

Applicable Language(s)


Description: The free web scraping tool for extracting all the web page data into several structured file formats easily and effectively.

Applicable Language(s)


Description: R package to scrape information from web pages. It is designed to work with magrittr to make it easy to express common web scraping tasks, inspired by libraries like beautiful soup.

Applicable Language(s)

  • R

Description: A Node.js scraper for humans.

Applicable Language(s)

  • JavaScript (Node.js)

Description: Write a scraper in the browser and run on their cloud-based service. This is used by many news organisations.

Applicable Language(s)


Description: Scraper cloud hosting as a service. Allows developers to deploy their own scrapers on their platform and benefit from their existing infrastructure.

Applicable Language(s)


Description: Local tool for scraping websites.

Applicable Language(s)


Description: Service for looking up business e-mails.

Applicable Language(s)


Description: Web automation software using a visual programming language and recorder.

Applicable Language(s)


Description: Visual tool for GUI automation by recording.

Applicable Language(s)


Description: Venom is an open source focused crawler for the Deep Web.

Features

  • Multi-threaded
  • Structured crawling
  • Page Validation
  • Automatic Retries
  • Proxy support

Applicable Language(s)

  • JAVA

Description: Data as a Service platform for web scraping.

Pros

  • Scraping dynamic javascript heavy websites
  • Login and form fill on websites
  • Data normalization and validation
  • Data uploads

Cons

  • Currently in beta
  • Possible payment model in the future

Applicable Language(s)


Description: Extension that downloads websites and turns them into structured data. Data is selected by element or by specialised selectors (e.g., for tables).

Applicable Language(s)


Description: Turn a website into an API. The structure of the data is defined by clicking elements or regular expressions.

Applicable Language(s)


Description: NPM module for scraping structured data via jQuery-like selectors.

Applicable Language(s)

  • JavaScript (Node.js)

scrapers's People

Contributors

carlwangx avatar cassidoo avatar dchang87 avatar despean avatar erzk avatar jabbahotep avatar lwj5 avatar maxcell avatar nathanchapman avatar naturkultur1 avatar paulbradshaw avatar theiyd avatar zmughal avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.