Code Monkey home page Code Monkey logo

xjsv / crawlers Goto Github PK

View Code? Open in Web Editor NEW

This project forked from norconex/crawlers

0.0 0.0 0.0 10.02 MB

Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.

Home Page: https://opensource.norconex.com/crawlers

License: Apache License 2.0

Shell 0.07% Java 99.06% HTML 0.77% Batchfile 0.09%

crawlers's Introduction

Norconex Crawlers

Norconex web and filesystem crawlers are full-featured crawlers (or spider) that can manipulate and store collected data in a repository of your choice (e.g., a search engine). They are very flexible, powerful, easy to extend, and portable. They can be used command-line with file-based configuration on any OS or embedded into Java applications using well-documented APIs.

Visit the website for binary downloads and documentation: https://opensource.norconex.com/crawlers/

Are you on the right branch?

This branch holds version 4 code, which is still in development.

For the latest stable release of Norconex Web Crawler, use the version 3 branch.

UPCOMING: Crawler V4 Stack

As of Feb 24, 2024, the default main branch holds code for the upcoming version 4 crawler stack. It is now a mono-repo containing all Norconex crawler-related projects previously maintained in separate repos. All projects in this mono report will now be released simultaneously and share the same version number.

Until v4 is officially released, this branch should not be considered stable.

Projects

Java CI with Maven

Folder Artifact Id Build
crawler/core/ nx-crawler-core test Quality Gate Status
crawler/fs/ nx-crawler-fs Quality Gate Status
crawler/web/ nx-crawler-web Quality Gate Status
importer/ nx-importer Quality Gate Status
committer/amazoncloudsearch/ nx-committer-amazoncloudsearch Quality Gate Status
committer/apachekafka/ nx-committer-apachekafka Quality Gate Status
committer/azurecognitivesearch/ nx-committer-azurecognitivesearch Quality Gate Status
committer/core/ nx-committer-core Quality Gate Status
committer/idol/ nx-committer-idol Quality Gate Status
committer/elasticsearch/ nx-committer-elasticsearch Quality Gate Status
committer/neo4j/ nx-committer-neo4j Quality Gate Status
committer/solr/ nx-committer-solr Quality Gate Status
committer/sql/ nx-committer-sql Quality Gate Status

All projects in this repository share the same Maven group id:

com.norconex.crawler

crawlers's People

Contributors

essiembre avatar brian-yuen avatar ohtwadi avatar dependabot[bot] avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.