Code Monkey home page Code Monkey logo

spatie-crawler-queue-with-laravel-model's Introduction

spatie-crawler-queue-with-laravel-model

Spatie's Crawler with Laravel Model as Queue

This is just a laravel application 8.x with a model class, a queue class, a migration class and a command class to use Spatie's Crawler package.

Why this is better than others spatie/crawler queues packages?

The main reason is the others queues packages store all items in one single array, which can be a RAM problem for big sites. Furthermore, you can preserve and use crawled links as you want

To expire items we use mvdnbrk/laravel-model-expires

Processed items are marked as soft-deleted

Steps

  1. Clone the repo
  2. Run composer install
  3. Run php artisan migration (after configure database.php)
  4. Adjust app/Console/Commands/CrawlerRun.php
  5. Run php artisan craw https://site_or_blog.com

Main files to take a look:

spatie-crawler-queue-with-laravel-model's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

zohar-israel

spatie-crawler-queue-with-laravel-model's Issues

Could not check compatibility between App\Observers\Crawler\ConsoleObserver::crawlFailed

PHP Fatal error: Could not check compatibility between App\Observers\Crawler\ConsoleObserver::crawlFailed(Psr\Http\Message\UriInterface $url, App\Observers\Crawler\RequestException $requestException, ?Psr\Http\Message\UriInterface $foundOnUrl = null): void and Spatie\Crawler\CrawlObservers\CrawlObserver::crawlFailed(Psr\Http\Message\UriInterface $url, GuzzleHttp\Exception\RequestException $requestException, ?Psr\Http\Message\UriInterface $foundOnUrl = null): void, because class App\Observers\Crawler\RequestException is not available in /var/www/html/app/Observers/Crawler/ConsoleObserver.php on line 57

Getting this issue on fresh install

The crawler is overlap when running 2 crawler for 2 URLS at same time.

When starting 2 crawlers (A and B) to crawl 2 URLs with setTotalCrawlLimit(100) at the same time. The queues share the same total amount of crawl (100) and sometimes crawler A crawls URL of crawler B and vice versa.

That behavior doesn't happen when using ArrayCrawlQueue queue (the default one).

Moreover, when calling getProcessedUrlCount method, it always returns a negative number.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.