Refactor signal handling similar how <a href="http://code.djangoproject.com/wiki/Backw

Signals got <a href="http://doc.scrapy.org/en/latest/topics/api.html#topics-api-signal

hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

I'm pasting a comment I wrongly made in another issue: <a class="use

That's right <a class="user-mention notranslate" data-hovercard-type="user" data-hover

For the reference, django's implementation: <a href="https://github.com/django/django/

Here is a partial proof of concept for switching the backend to <code class="notransla

Refactor signals about scrapy HOT 15 OPEN

scrapy commented on May 22, 2024

Refactor signals

from scrapy.

Comments (15)

pablohoffman commented on May 22, 2024

Signals got their own API on 0.15 (no longer need to import pydispatcher) but we can still rewrite the backend (which is what this issue is for) - it will actually be easier to do it now.

from scrapy.

artemdevel commented on May 22, 2024

Perhaps we still need some refactoring here (or maybe I found a bug). I added some test code here https://gist.github.com/artem-dev/4996685 So at first I tried to use new API as shown in test5.py but this approach didn't work for me. I managed to make it work as shown in test5_2.py but as for me this is even less convenient than old approach which is shown in test5_1.py Did I use this new API wrong way or is it some sort of bug?

from scrapy.

artemdevel commented on May 22, 2024

Just after my previous post I decided to do one more small test, looks like this change https://gist.github.com/artem-dev/4996755 fixes the issue I described earlier, so sm = SignalManager() is working, but I'm not sure if using dispatcher.Any instead of dispatcher.Anonymous is ok for the rest of the engine (but my simple example worked fine)

from scrapy.

artemdevel commented on May 22, 2024

hi @pablohoffman this code https://gist.github.com/artem-dev/5046355 is working, is it a correct way to use the new SignalManager?

from scrapy.

pablohoffman commented on May 22, 2024

I'm pasting a comment I wrongly made in another issue:

@artem-dev you are instantiating a SignalManager in your own spider, that's not how it's supposed to work. You should be connecting to the SignalManager of the Crawler controlling your spider (accessible through crawler.signals).

User code should never instantiate a SignalManager, but use the one instantiated by framework code (and accessible through the Crawler object).

I hope this is more clear now.

from scrapy.

pablohoffman commented on May 22, 2024

That's right @artem-dev. However, doing it in parse would cause it to be called many times, so start_requests would be more appropriate.

from scrapy.

artemdevel commented on May 22, 2024

yeah, it was just for a test

from scrapy.

kmike commented on May 22, 2024

For the reference, django's implementation: https://github.com/django/django/tree/master/django/dispatch
One advantage of switching to django implementation is that is supports Python 3.x. But I haven't checked if it is possible to switch.

from scrapy.

jdemaeyer commented on May 22, 2024

Here is a partial proof of concept for switching the backend to django.dispatch while conserving support for the old signaling API: http://jakobdemaeyer.com/x/scrapy_signaling_proof.html

In this implementation, the SignalManager is deprecated and replaced by a NewSignalManager whose sole purpose is backwards-compatibility. The major open question is how to assign the Crawler instance as default sender for a signal, as the signals are now essentially completely decoupled from anything else.

A different implementation could preserve using the SignalManager by solely rewriting the backend, using the NewSignalManager implementation without warnings. Signals would then still be emitted (and handlers registered) via the Crawler.signals attribute instead of on the Signals themselves.

from scrapy.

jdemaeyer commented on May 22, 2024

Here's a list of signaling frameworks that might be of interest: http://www.fortpedro.com/blog/2015/1/observer-pattern-event-oriented-programming-in-python

from scrapy.

kmike commented on May 22, 2024

another option: https://pypi.python.org/pypi/blinker

from scrapy.

eLRuLL commented on May 22, 2024

I would like to work on this issue 😄

from scrapy.

curita commented on May 22, 2024

If anybody is currently working on this issue let us know, otherwise @eLRuLL: all yours!

from scrapy.

kmike commented on May 22, 2024

I've profiled a simple spider which downloads a page and follows all links from it using LinkExractor. There is a non-standard CrawlerProcess which listens to all signals from its Crawlers and re-sends them through its own SignalManager to provide a single place to listen for all signals - nothing complicated. I haven't checked it with a standard CrawlerProcess.

For this spider signal handling (SignalManager.send_catch_log function) is one of the bottlenecks - it takes ~5 times more time than HTML parsing.

from scrapy.

redapple commented on May 22, 2024

See #2030

from scrapy.

Refactor signals about scrapy HOT 15 OPEN

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent