scrapy-plugins / scrapy-jsonrpc Goto Github PK

View Code? Open in Web Editor NEW

296.0 296.0 71.0 47 KB

Scrapy extension to control spiders using JSON-RPC

Shell 1.60% Python 98.40%

scrapy-jsonrpc's People

Contributors

Stargazers

Watchers

Forkers

donsunsoft liwei123o0 leoninnovate sibiryakov lvye1937 jengyic tools-alexuser01 bopo gischen barrycug runrunliuliu hongtoushizi realms-ai bricedelh david-webb samucc hcd19900410 fazel94 lucabonifacio robertodormepoco cometdlut fz2111 alosultan derek-zl yaoankun robic eric-seekas candale bzhangyan11 zch513430014 optionalg li363849131 fullstackenviormentss scrapyutilsdevteam djalmabright spidysenses nanmuyao fulihai milankanhasoft innovatorzhang microhuang planck-epoch v188v blackmoolheavywind edmcn90 abd1rahmane kingking888 przor3n myt2000 datafields-team philipdongfei haoshanwei zanachka stars-rivers shan081251 balves-dev higuseonhye syntaxwork xiayus baronrustamov hzilex ajmarks boyooka jaygith vekaco bassem97 python-repository-hub iq-scm

scrapy-jsonrpc's Issues

log.msg has been deprecated

The logging of messages via log module is deprecated. Scrapy uses now the python logging module.

ImportError: cannot import name 'log' from 'scrapy'

in scrapy_jsonrpc\webservice.py

is this repo still alive?

Python 3 compatibility

scrapy-jsonrpc is not compatible with Python 3.

Apart from the example client code that uses urllib.urlopen() :

the crawler resource is not found, the child resource name "crawler" needs to be passed as bytes to Twisted
the responses are not bytes and Twisted also complains

Error when the scrapy spider starts crawling and I access the path /crawler

Once I have started a spider and I try to access the URL http://localhost:6080/crawler, the following error is thrown.

web.Server Traceback (most recent call last):
exceptions.TypeError: <scrapy.crawler.Crawler object at 0x7fc808b829d0> is not JSON serializable
/usr/local/lib/python2.7/dist-packages/twisted/web/server.py:189 in process
188                    self._encoder = encoder
189            self.render(resrc)
190        except:
/usr/local/lib/python2.7/dist-packages/twisted/web/server.py:238 in render
237        try:
238            body = resrc.render(self)
239        except UnsupportedMethod as e:
/usr/local/lib/python2.7/dist-packages/scrapy/utils/txweb.py:11 in render
10        r = resource.Resource.render(self, txrequest)
11        return self.render_object(r, txrequest)
12
/usr/local/lib/python2.7/dist-packages/scrapy/utils/txweb.py:14 in render_object
13    def render_object(self, obj, txrequest):
14        r = self.json_encoder.encode(obj) + "\n"
15        txrequest.setHeader('Content-Type', 'application/json')
/usr/local/lib/python2.7/dist-packages/scrapy/utils/serialize.py:89 in encode
88            o = self.spref.encode_references(o)
89        return super(ScrapyJSONEncoder, self).encode(o)
90
/usr/lib/python2.7/json/encoder.py:207 in encode
206        # equivalent to the PySequence_Fast that ''.join() would do.
207        chunks = self.iterencode(o, _one_shot=True)
208        if not isinstance(chunks, (list, tuple)):
/usr/lib/python2.7/json/encoder.py:270 in iterencode
269                self.skipkeys, _one_shot)
270        return _iterencode(o, 0)
271
/usr/local/lib/python2.7/dist-packages/scrapy/utils/serialize.py:109 in default
108        else:
109            return super(ScrapyJSONEncoder, self).default(o)
110
/usr/lib/python2.7/json/encoder.py:184 in default
183        """
184        raise TypeError(repr(o) + " is not JSON serializable")
185
exceptions.TypeError: <scrapy.crawler.Crawler object at 0x7fc808b829d0> is not JSON serializable

Do you think extending and serialzing the Crawler object would be right thing to do here ? I can create a pull request with the fix if thats the case.

cannot import name 'unicode_to_str'

I tried use scrapy-jsonrpc but that error is showed:

I can't access http://localhost:6080/crawler

Can someone tell me how to use it? I modified the configuration according to the document, but I could not access http://localhost:6080/crawler....

a part of "setting.py":
JSONRPC_ENABLED = True
EXTENSIONS = {
'scrapy_jsonrpc.webservice.WebService': 500,
}

i use python3.5,and scrapy 1.3.2
if you know the problem,could you please answer me?Thank you very much...

Please complete doc and test it

Some suggestions:

complete your doc about how to use, please give a example in scrapy;
this code have some bugs, eg. https://github.com/movingheart/django_example/blob/master/QQ%E5%9B%BE%E7%89%8720160628005154.png

No one maintenance code ,you can use python3 version scrapy-jsonrpc

Welcome to fork and star

https://github.com/songhao8080/scrapy-jsonrpc

This project Fix http://localhost:6080/crawler 404 problem

https://github.com/xiayus/scrapy-jsonrpc

exceptions.TypeError: <scrapy.crawler.Crawler object at 0x0000000004681D30> is not JSON serializable

on windows7，python2.7

how to install and enable in the scrapy project?

Please complete doc and test it

Some suggestions:

complete your doc about how to use, please give a example in scrapy;
this code have some bugs, eg. https://github.com/movingheart/django_example/blob/master/QQ%E5%9B%BE%E7%89%8720160628005154.png

import error

since I installed this to my project I've been getting

module impoet error both in python2 and python3

import(name)
File "/Users/jerry/venv2/lib/python2.7/site-packages/scrapy_jsonrpc/webservice.py", line 7, in
from scrapy_jsonrpc.jsonrpc import jsonrpc_server_call
File "/Users/jerry/venv2/lib/python2.7/site-packages/scrapy_jsonrpc/jsonrpc.py", line 11, in
from scrapy_jsonrpc.serialize import ScrapyJSONDecoder
File "/Users/jerry/venv2/lib/python2.7/site-packages/scrapy_jsonrpc/serialize.py", line 8, in
from scrapy.spider import Spider
ImportError: No module named spider

Multiple Crawls (no scrapyd) signal handler Error (WebService, Address)

Note: Originally reported by @ThiagoF at scrapy/scrapy#1122

I'm running a long concurrent crawl from a shell script. There are many scrapy processes running in parallel.

Time to time one throw this errors

015-03-31 01:11:12-0300 [scrapy] ERROR: Error caught on signal handler: <bound method ?.stop_listening of <scrapy.webservice.WebService instance at 0x7f48362a4710>>
    Traceback (most recent call last):
      File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1107, in _inlineCallbacks
        result = g.send(result)
      File "/usr/local/lib/python2.7/dist-packages/scrapy/core/engine.py", line 300, in _finish_stopping_engine
        yield self.signals.send_catch_log_deferred(signal=signals.engine_stopped)
      File "/usr/local/lib/python2.7/dist-packages/scrapy/signalmanager.py", line 23, in send_catch_log_deferred
        return signal.send_catch_log_deferred(*a, **kw)
      File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/signal.py", line 53, in send_catch_log_deferred
        *arguments, **named)
    --- <exception caught here> ---
      File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 140, in maybeDeferred
        result = f(*args, **kw)
      File "/usr/local/lib/python2.7/dist-packages/scrapy/xlib/pydispatch/robustapply.py", line 54, in robustApply
        return receiver(*arguments, **named)
      File "/usr/local/lib/python2.7/dist-packages/scrapy/webservice.py", line 96, in stop_listening
        self.port.stopListening()
    exceptions.AttributeError: WebService instance has no attribute 'port'

2015-03-31 01:12:16-0300 [scrapy] ERROR: Error caught on signal handler: <bound method ?.start_listening of <scrapy.webservice.WebService instance at 0x7fa8a733e710>>
    Traceback (most recent call last):
      File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1107, in _inlineCallbacks
        result = g.send(result)
      File "/usr/local/lib/python2.7/dist-packages/scrapy/core/engine.py", line 77, in start
        yield self.signals.send_catch_log_deferred(signal=signals.engine_started)
      File "/usr/local/lib/python2.7/dist-packages/scrapy/signalmanager.py", line 23, in send_catch_log_deferred
        return signal.send_catch_log_deferred(*a, **kw)
      File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/signal.py", line 53, in send_catch_log_deferred
        *arguments, **named)
    --- <exception caught here> ---
      File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 140, in maybeDeferred
        result = f(*args, **kw)
      File "/usr/local/lib/python2.7/dist-packages/scrapy/xlib/pydispatch/robustapply.py", line 54, in robustApply
        return receiver(*arguments, **named)
      File "/usr/local/lib/python2.7/dist-packages/scrapy/webservice.py", line 90, in start_listening
        self.port = listen_tcp(self.portrange, self.host, self)
      File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/reactor.py", line 14, in listen_tcp
        return reactor.listenTCP(x, factory, interface=host)
      File "/usr/local/lib/python2.7/dist-packages/twisted/internet/posixbase.py", line 495, in listenTCP
        p.startListening()
      File "/usr/local/lib/python2.7/dist-packages/twisted/internet/tcp.py", line 991, in startListening
        skt.listen(self.backlog)
      File "/usr/lib/python2.7/socket.py", line 224, in meth
        return getattr(self._sock,name)(*args)
    socket.error: [Errno 98] Address already in use

Had similar problem with telnet, but we disabled it.