Code Monkey home page Code Monkey logo

Comments (5)

lukasschwab avatar lukasschwab commented on August 27, 2024

Hmm, I can't reproduce the issue––I ran precisely the code you included in Python 2.7, and both papers downloaded without issue.

~ » python
Python 2.7.15 (default, Aug 22 2018, 16:36:18)
[GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import arxiv
>>> # Query for a paper of interest, then download
... paper = arxiv.query(id_list=["1707.08567"])[0]
>>> arxiv.download(paper)
u'./Proceedings of Workshop AEW10: Concepts in Information Theory and\n  Communications.pdf'
>>> # You can skip the query step if you have the paper info!
... paper2 = {"pdf_url": "http://arxiv.org/pdf/1707.08567v1", "title": "The Paper Title"}
>>> arxiv.download(paper2)
'./The Paper Title.pdf'

It's possible that the API itself was temporarily down, which is outside the scope of this project. Can you confirm that the above code still produces this error for you?

from arxiv.py.

chen-bowen avatar chen-bowen commented on August 27, 2024

Yes it does. I was receiving the following error

`ConnectionResetError Traceback (most recent call last)
~\Anaconda3\lib\urllib\request.py in do_open(self, http_class, req, **http_conn_args)
1317 h.request(req.get_method(), req.selector, req.data, headers,
-> 1318 encode_chunked=req.has_header('Transfer-encoding'))
1319 except OSError as err: # timeout error

~\Anaconda3\lib\http\client.py in request(self, method, url, body, headers, encode_chunked)
1238 """Send a complete request to the server."""
-> 1239 self._send_request(method, url, body, headers, encode_chunked)
1240

~\Anaconda3\lib\http\client.py in _send_request(self, method, url, body, headers, encode_chunked)
1284 body = _encode(body, 'body')
-> 1285 self.endheaders(body, encode_chunked=encode_chunked)
1286

~\Anaconda3\lib\http\client.py in endheaders(self, message_body, encode_chunked)
1233 raise CannotSendHeader()
-> 1234 self._send_output(message_body, encode_chunked=encode_chunked)
1235

~\Anaconda3\lib\http\client.py in _send_output(self, message_body, encode_chunked)
1025 del self._buffer[:]
-> 1026 self.send(msg)
1027

~\Anaconda3\lib\http\client.py in send(self, data)
963 if self.auto_open:
--> 964 self.connect()
965 else:

~\Anaconda3\lib\http\client.py in connect(self)
1399 self.sock = self._context.wrap_socket(self.sock,
-> 1400 server_hostname=server_hostname)
1401 if not self._context.check_hostname and self._check_hostname:

~\Anaconda3\lib\ssl.py in wrap_socket(self, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname, session)
406 server_hostname=server_hostname,
--> 407 _context=self, _session=session)
408

~\Anaconda3\lib\ssl.py in init(self, sock, keyfile, certfile, server_side, cert_reqs, ssl_version, ca_certs, do_handshake_on_connect, family, type, proto, fileno, suppress_ragged_eofs, npn_protocols, ciphers, server_hostname, _context, _session)
813 raise ValueError("do_handshake_on_connect should not be specified for non-blocking sockets")
--> 814 self.do_handshake()
815

~\Anaconda3\lib\ssl.py in do_handshake(self, block)
1067 self.settimeout(None)
-> 1068 self._sslobj.do_handshake()
1069 finally:

~\Anaconda3\lib\ssl.py in do_handshake(self)
688 """Start the SSL/TLS handshake."""
--> 689 self._sslobj.do_handshake()
690 if self.context.check_hostname:

ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host

During handling of the above exception, another exception occurred:

URLError Traceback (most recent call last)
in ()
----> 1 arxiv.download(paper_info[0], dirname='./papers/quantitative_biology', slugify=True)

~\Anaconda3\lib\site-packages\arxiv\arxiv.py in download(obj, dirname, prepend_id, slugify)
103 filename = dirname + filename + '.pdf'
104 # Download
--> 105 urlretrieve(obj['pdf_url'], filename)
106 return filename
107 else:

~\Anaconda3\lib\urllib\request.py in urlretrieve(url, filename, reporthook, data)
246 url_type, path = splittype(url)
247
--> 248 with contextlib.closing(urlopen(url, data)) as fp:
249 headers = fp.info()
250

~\Anaconda3\lib\urllib\request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
221 else:
222 opener = _opener
--> 223 return opener.open(url, data, timeout)
224
225 def install_opener(opener):

~\Anaconda3\lib\urllib\request.py in open(self, fullurl, data, timeout)
530 for processor in self.process_response.get(protocol, []):
531 meth = getattr(processor, meth_name)
--> 532 response = meth(req, response)
533
534 return response

~\Anaconda3\lib\urllib\request.py in http_response(self, request, response)
640 if not (200 <= code < 300):
641 response = self.parent.error(
--> 642 'http', request, response, code, msg, hdrs)
643
644 return response

~\Anaconda3\lib\urllib\request.py in error(self, proto, *args)
562 http_err = 0
563 args = (dict, proto, meth_name) + args
--> 564 result = self._call_chain(*args)
565 if result:
566 return result

~\Anaconda3\lib\urllib\request.py in _call_chain(self, chain, kind, meth_name, *args)
502 for handler in handlers:
503 func = getattr(handler, meth_name)
--> 504 result = func(*args)
505 if result is not None:
506 return result

~\Anaconda3\lib\urllib\request.py in http_error_302(self, req, fp, code, msg, headers)
754 fp.close()
755
--> 756 return self.parent.open(new, timeout=req.timeout)
757
758 http_error_301 = http_error_303 = http_error_307 = http_error_302

~\Anaconda3\lib\urllib\request.py in open(self, fullurl, data, timeout)
524 req = meth(req)
525
--> 526 response = self._open(req, data)
527
528 # post-process response

~\Anaconda3\lib\urllib\request.py in _open(self, req, data)
542 protocol = req.type
543 result = self._call_chain(self.handle_open, protocol, protocol +
--> 544 '_open', req)
545 if result:
546 return result

~\Anaconda3\lib\urllib\request.py in _call_chain(self, chain, kind, meth_name, *args)
502 for handler in handlers:
503 func = getattr(handler, meth_name)
--> 504 result = func(*args)
505 if result is not None:
506 return result

~\Anaconda3\lib\urllib\request.py in https_open(self, req)
1359 def https_open(self, req):
1360 return self.do_open(http.client.HTTPSConnection, req,
-> 1361 context=self._context, check_hostname=self.check_hostname)
1362
1363 https_request = AbstractHTTPHandler.do_request

~\Anaconda3\lib\urllib\request.py in do_open(self, http_class, req, **http_conn_args)
1318 encode_chunked=req.has_header('Transfer-encoding'))
1319 except OSError as err: # timeout error
-> 1320 raise URLError(err)
1321 r = h.getresponse()
1322 except:

URLError: <urlopen error [WinError 10054] An existing connection was forcibly closed by the remote host>


`
I have previously downloaded 1000 papers, do you know if there is a limit for downloading papers?

from arxiv.py.

lukasschwab avatar lukasschwab commented on August 27, 2024

Are you downloading 1000 papers programmatically, in quick succession? If so, there are a couple of arXiv policies that may cause them to close your connection––

  • The API documentation requests a 3-second delay between multiple API calls, though I don't known this to be enforced.
  • You're more likely running into arXiv's scraper policy; downloading papers accesses the general site, so it is subject to different rules: https://arxiv.org/help/robots

Either way––as far as I can tell––this is a matter of arXiv's server behavior rather than a bug in the API wrapper:

URLError: <urlopen error [WinError 10054] An existing connection was forcibly closed by the remote host>

I'm going to close this issue because I'm fairly confident it's a usage policy issue––I'll reopen it if this really turns out to be an issue with this API wrapper. I can't give much meaningful advice on what usage will/won't be permitted by the arXiv servers; this is an unofficial project.

If someone would like more graceful handling of unexpected HTTP behavior by the arXiv servers, I recommend opening a new PR.

from arxiv.py.

lukasschwab avatar lukasschwab commented on August 27, 2024

@chenbowen184 you might be interested in arXiv Bulk Data Access!

from arxiv.py.

chen-bowen avatar chen-bowen commented on August 27, 2024

Thank you so much Luke, I think that will solve my problem

from arxiv.py.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.