scrapy-plugins / scrapy-jsonrpc Goto Github PK
View Code? Open in Web Editor NEWScrapy extension to control spiders using JSON-RPC
Scrapy extension to control spiders using JSON-RPC
The logging of messages via log module is deprecated. Scrapy uses now the python logging module.
in scrapy_jsonrpc\webservice.py
is this repo still alive?
scrapy-jsonrpc is not compatible with Python 3.
Apart from the example client code that uses urllib.urlopen()
:
crawler
resource is not found, the child resource name "crawler" needs to be passed as bytes to TwistedOnce I have started a spider and I try to access the URL http://localhost:6080/crawler, the following error is thrown.
web.Server Traceback (most recent call last):
exceptions.TypeError: <scrapy.crawler.Crawler object at 0x7fc808b829d0> is not JSON serializable
/usr/local/lib/python2.7/dist-packages/twisted/web/server.py:189 in process
188 self._encoder = encoder
189 self.render(resrc)
190 except:
/usr/local/lib/python2.7/dist-packages/twisted/web/server.py:238 in render
237 try:
238 body = resrc.render(self)
239 except UnsupportedMethod as e:
/usr/local/lib/python2.7/dist-packages/scrapy/utils/txweb.py:11 in render
10 r = resource.Resource.render(self, txrequest)
11 return self.render_object(r, txrequest)
12
/usr/local/lib/python2.7/dist-packages/scrapy/utils/txweb.py:14 in render_object
13 def render_object(self, obj, txrequest):
14 r = self.json_encoder.encode(obj) + "\n"
15 txrequest.setHeader('Content-Type', 'application/json')
/usr/local/lib/python2.7/dist-packages/scrapy/utils/serialize.py:89 in encode
88 o = self.spref.encode_references(o)
89 return super(ScrapyJSONEncoder, self).encode(o)
90
/usr/lib/python2.7/json/encoder.py:207 in encode
206 # equivalent to the PySequence_Fast that ''.join() would do.
207 chunks = self.iterencode(o, _one_shot=True)
208 if not isinstance(chunks, (list, tuple)):
/usr/lib/python2.7/json/encoder.py:270 in iterencode
269 self.skipkeys, _one_shot)
270 return _iterencode(o, 0)
271
/usr/local/lib/python2.7/dist-packages/scrapy/utils/serialize.py:109 in default
108 else:
109 return super(ScrapyJSONEncoder, self).default(o)
110
/usr/lib/python2.7/json/encoder.py:184 in default
183 """
184 raise TypeError(repr(o) + " is not JSON serializable")
185
exceptions.TypeError: <scrapy.crawler.Crawler object at 0x7fc808b829d0> is not JSON serializable
Do you think extending and serialzing the Crawler
object would be right thing to do here ? I can create a pull request with the fix if thats the case.
a part of "setting.py":
JSONRPC_ENABLED = True
EXTENSIONS = {
'scrapy_jsonrpc.webservice.WebService': 500,
}
i use python3.5,and scrapy 1.3.2
if you know the problem,could you please answer me?Thank you very much...
Some suggestions:
https://github.com/songhao8080/scrapy-jsonrpc
on windows7,python2.7
how to install and enable in the scrapy project?
Some suggestions:
import(name)
File "/Users/jerry/venv2/lib/python2.7/site-packages/scrapy_jsonrpc/webservice.py", line 7, in
from scrapy_jsonrpc.jsonrpc import jsonrpc_server_call
File "/Users/jerry/venv2/lib/python2.7/site-packages/scrapy_jsonrpc/jsonrpc.py", line 11, in
from scrapy_jsonrpc.serialize import ScrapyJSONDecoder
File "/Users/jerry/venv2/lib/python2.7/site-packages/scrapy_jsonrpc/serialize.py", line 8, in
from scrapy.spider import Spider
ImportError: No module named spider
Note: Originally reported by @ThiagoF at scrapy/scrapy#1122
I'm running a long concurrent crawl from a shell script. There are many scrapy processes running in parallel.
Time to time one throw this errors
015-03-31 01:11:12-0300 [scrapy] ERROR: Error caught on signal handler: <bound method ?.stop_listening of <scrapy.webservice.WebService instance at 0x7f48362a4710>>
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1107, in _inlineCallbacks
result = g.send(result)
File "/usr/local/lib/python2.7/dist-packages/scrapy/core/engine.py", line 300, in _finish_stopping_engine
yield self.signals.send_catch_log_deferred(signal=signals.engine_stopped)
File "/usr/local/lib/python2.7/dist-packages/scrapy/signalmanager.py", line 23, in send_catch_log_deferred
return signal.send_catch_log_deferred(*a, **kw)
File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/signal.py", line 53, in send_catch_log_deferred
*arguments, **named)
--- <exception caught here> ---
File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 140, in maybeDeferred
result = f(*args, **kw)
File "/usr/local/lib/python2.7/dist-packages/scrapy/xlib/pydispatch/robustapply.py", line 54, in robustApply
return receiver(*arguments, **named)
File "/usr/local/lib/python2.7/dist-packages/scrapy/webservice.py", line 96, in stop_listening
self.port.stopListening()
exceptions.AttributeError: WebService instance has no attribute 'port'
2015-03-31 01:12:16-0300 [scrapy] ERROR: Error caught on signal handler: <bound method ?.start_listening of <scrapy.webservice.WebService instance at 0x7fa8a733e710>>
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1107, in _inlineCallbacks
result = g.send(result)
File "/usr/local/lib/python2.7/dist-packages/scrapy/core/engine.py", line 77, in start
yield self.signals.send_catch_log_deferred(signal=signals.engine_started)
File "/usr/local/lib/python2.7/dist-packages/scrapy/signalmanager.py", line 23, in send_catch_log_deferred
return signal.send_catch_log_deferred(*a, **kw)
File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/signal.py", line 53, in send_catch_log_deferred
*arguments, **named)
--- <exception caught here> ---
File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 140, in maybeDeferred
result = f(*args, **kw)
File "/usr/local/lib/python2.7/dist-packages/scrapy/xlib/pydispatch/robustapply.py", line 54, in robustApply
return receiver(*arguments, **named)
File "/usr/local/lib/python2.7/dist-packages/scrapy/webservice.py", line 90, in start_listening
self.port = listen_tcp(self.portrange, self.host, self)
File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/reactor.py", line 14, in listen_tcp
return reactor.listenTCP(x, factory, interface=host)
File "/usr/local/lib/python2.7/dist-packages/twisted/internet/posixbase.py", line 495, in listenTCP
p.startListening()
File "/usr/local/lib/python2.7/dist-packages/twisted/internet/tcp.py", line 991, in startListening
skt.listen(self.backlog)
File "/usr/lib/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
socket.error: [Errno 98] Address already in use
Had similar problem with telnet, but we disabled it.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.