Comments (15)
Signals got their own API on 0.15 (no longer need to import pydispatcher) but we can still rewrite the backend (which is what this issue is for) - it will actually be easier to do it now.
from scrapy.
Perhaps we still need some refactoring here (or maybe I found a bug). I added some test code here https://gist.github.com/artem-dev/4996685 So at first I tried to use new API as shown in test5.py but this approach didn't work for me. I managed to make it work as shown in test5_2.py but as for me this is even less convenient than old approach which is shown in test5_1.py Did I use this new API wrong way or is it some sort of bug?
from scrapy.
Just after my previous post I decided to do one more small test, looks like this change https://gist.github.com/artem-dev/4996755 fixes the issue I described earlier, so sm = SignalManager() is working, but I'm not sure if using dispatcher.Any instead of dispatcher.Anonymous is ok for the rest of the engine (but my simple example worked fine)
from scrapy.
hi @pablohoffman this code https://gist.github.com/artem-dev/5046355 is working, is it a correct way to use the new SignalManager?
from scrapy.
I'm pasting a comment I wrongly made in another issue:
@artem-dev you are instantiating a SignalManager
in your own spider, that's not how it's supposed to work. You should be connecting to the SignalManager
of the Crawler controlling your spider (accessible through crawler.signals
).
User code should never instantiate a SignalManager
, but use the one instantiated by framework code (and accessible through the Crawler
object).
I hope this is more clear now.
from scrapy.
That's right @artem-dev. However, doing it in parse would cause it to be called many times, so start_requests would be more appropriate.
from scrapy.
yeah, it was just for a test
from scrapy.
For the reference, django's implementation: https://github.com/django/django/tree/master/django/dispatch
One advantage of switching to django implementation is that is supports Python 3.x. But I haven't checked if it is possible to switch.
from scrapy.
Here is a partial proof of concept for switching the backend to django.dispatch
while conserving support for the old signaling API: http://jakobdemaeyer.com/x/scrapy_signaling_proof.html
In this implementation, the SignalManager is deprecated and replaced by a NewSignalManager whose sole purpose is backwards-compatibility. The major open question is how to assign the Crawler instance as default sender for a signal, as the signals are now essentially completely decoupled from anything else.
A different implementation could preserve using the SignalManager by solely rewriting the backend, using the NewSignalManager implementation without warnings. Signals would then still be emitted (and handlers registered) via the Crawler.signals attribute instead of on the Signals themselves.
from scrapy.
Here's a list of signaling frameworks that might be of interest: http://www.fortpedro.com/blog/2015/1/observer-pattern-event-oriented-programming-in-python
from scrapy.
another option: https://pypi.python.org/pypi/blinker
from scrapy.
I would like to work on this issue 😄
from scrapy.
If anybody is currently working on this issue let us know, otherwise @eLRuLL: all yours!
from scrapy.
I've profiled a simple spider which downloads a page and follows all links from it using LinkExractor. There is a non-standard CrawlerProcess which listens to all signals from its Crawlers and re-sends them through its own SignalManager to provide a single place to listen for all signals - nothing complicated. I haven't checked it with a standard CrawlerProcess.
For this spider signal handling (SignalManager.send_catch_log
function) is one of the bottlenecks - it takes ~5 times more time than HTML parsing.
from scrapy.
See #2030
from scrapy.
Related Issues (20)
- There are some requests in Scrapy's downloader.active that have not been released. HOT 1
- Please, release new version with twisted version restriction fix HOT 13
- Parse function doesn't support other functions inside it HOT 3
- CI tests broken due to `sybil==6.0.0` release HOT 2
- Clean up test_download_gzip_response
- AttributeError("Response content isn't text") -- linkextractor tries to extract PDF files when PDF is in deny_extension HOT 2
- scrapy shell breaks with ipython >= 8.18.0 and AsyncioSelectorReactor HOT 6
- Best Practices for Utilizing Built-in Scrapy Store Classes in Custom Extensions HOT 3
- Switch to the latest sphinx HOT 9
- Pipeline send request HOT 5
- Unification of log types HOT 2
- Improve how combining DEFAULT_REQUEST_HEADERS with the Referer middleware is handled HOT 2
- `FEED_EXPORT_BATCH_ITEM_COUNT` not working HOT 3
- lxml parser gives back wrong parsing results, messes up html HOT 3
- UserAgent middleware stopped the spider HOT 1
- install scrapy for raspberry
- Contradiction in Documentation about installing scrapy HOT 1
- Test fails when pytest runs without pytest-cov argument HOT 1
- The first rule in a robots.txt with BOM will be ignored HOT 1
- Need support for making blank requests HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scrapy.