Code Monkey home page Code Monkey logo

scihub2pdf's Introduction

SciHub to PDF(Beta)

Description

scihub2pdf is a module of bibcure

Downloads pdfs via a DOI number, article title or a bibtex file, using the database of libgen, Sci-Hub and Arxiv.

Install

$ sudo python /usr/bin/pip install scihub2pdf

If you want to download files from scihub you will need to get PhantomJS

OSX

$ npm install -g phantomjs

Linux Using npm

$ sudo apt-get install npm
$ sudo npm install -g phantomjs

Features and how to use

Given a bibtex file

$ scihub2pdf -i input.bib

Given a DOI number...

$ scihub2pdf 10.1038/s41524-017-0032-0

Given a title...

$ scihub2pdf --title An useful paper

Arxiv...

$ scihub2pdf arxiv:0901.2686
$ scihub2pdf --title arxiv:Periodic table for topological insulators

Location folder as argument

$ scihub2pdf -i input.bib -l somefoler/

Use libgen instead sci-hub

$ scihub2pdf -i input.bib --uselibgen

Sci-hub:

  • Stable
  • Annoying CAPTCHA
  • Fast

Libgen

  • Unstalbe
  • No CAPTCHA
  • Slow

Download from list of items

Given a text file like

10.1038/s41524-017-0032-0
10.1063/1.3149495
.....

download all pdf's

$ scihub2pdf -i dois.txt --txt

Given a text file like

Some Title 1
Some Title 2
.....

download all pdf's

$ scihub2pdf -i titles.txt --txt --title

Given a text file like

arXiv:1708.06891
arXiv:1708.06071
arXiv:1708.05948
.....

download all pdf's

$ scihub2pdf -i arxiv_ids.txt --txt

scihub2pdf's People

Contributors

devmessias avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scihub2pdf's Issues

Exceptions with example sci2pdf 10.1038/s41524-017-0032-0

Debian 8

jaap@jaap:/$ sci2pdf 10.1038/s41524-017-0032-0
10.1038/s41524-017-0032-0
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 595, in urlopen
chunked=chunked)
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 393, in _make_request
six.raise_from(e, None)
File "", line 2, in raise_from
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 389, in _make_request
httplib_response = conn.getresponse()
File "/usr/lib/python3.4/http/client.py", line 1172, in getresponse
response.begin()
File "/usr/lib/python3.4/http/client.py", line 351, in begin
version, status, reason = self._read_status()
File "/usr/lib/python3.4/http/client.py", line 321, in _read_status
raise BadStatusLine(line)
http.client.BadStatusLine: ''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 423, in send
timeout=timeout
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 640, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 261, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/lib/python3/dist-packages/six.py", line 685, in reraise
raise value.with_traceback(tb)
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 595, in urlopen
chunked=chunked)
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 393, in _make_request
six.raise_from(e, None)
File "", line 2, in raise_from
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 389, in _make_request
httplib_response = conn.getresponse()
File "/usr/lib/python3.4/http/client.py", line 1172, in getresponse
response.begin()
File "/usr/lib/python3.4/http/client.py", line 351, in begin
version, status, reason = self._read_status()
File "/usr/lib/python3.4/http/client.py", line 321, in _read_status
raise BadStatusLine(line)
requests.packages.urllib3.exceptions.ProtocolError: ('Connection aborted.', BadStatusLine("''",))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/bin/sci2pdf", line 80, in
main()
File "/usr/local/bin/sci2pdf", line 68, in main
download_from_doi(value, location)
File "/usr/local/lib/python3.4/dist-packages/sci2pdf/libgen.py", line 91, in download_from_doi
bib_libgen = get_libgen_url(bib)
File "/usr/local/lib/python3.4/dist-packages/sci2pdf/libgen.py", line 22, in get_libgen_url
r = requests.get(url, params=params, headers=headers)
File "/usr/lib/python3/dist-packages/requests/api.py", line 70, in get
return request('get', url, params=params, **kwargs)
File "/usr/lib/python3/dist-packages/requests/api.py", line 56, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 475, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 596, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 473, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', BadStatusLine("''",))
jaap@jaap:/$

Multiple requests at same time

Libgen site says the limit the connections per user is 40, but for some reason, I' only can do ~3 requests at the same time. I think this issue is related to my code(I've not yet studied the documentation of the lib requests.py)...libgen also can limit requests/per user in a given interval of time... I didn't find any information about that.

Scihub2pdf not works now! I guess the base scihub url root has changed.

image

root@ecs-6e13:~# scihub2pdf  doi:10.1016/j.patcog.2016.10.023  --uselibgen

	 Using Libgen.

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 144, in _new_conn
    (self.host, self.port), self.timeout, **extra_kw)
  File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 60, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "/usr/lib/python3.6/socket.py", line 745, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -5] No address associated with hostname

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 601, in urlopen
    chunked=chunked)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 357, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python3.6/http/client.py", line 1281, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1327, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1276, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1042, in _send_output
    self.send(msg)
  File "/usr/lib/python3.6/http/client.py", line 980, in send
    self.connect()
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 169, in connect
    conn = self._new_conn()
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 153, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fbf0d0f29e8>: Failed to establish a new connection: [Errno -5] No address associated with hostname

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 440, in send
    timeout=timeout
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 639, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 398, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='libgen.io', port=80): Max retries exceeded with url: /scimag/ads.php?doi=doi%3A10.1016%2Fj.patcog.2016.10.023&downloadname= (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fbf0d0f29e8>: Failed to establish a new connection: [Errno -5] No address associated with hostname',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/scihub2pdf", line 191, in <module>
    main()
  File "/usr/local/bin/scihub2pdf", line 148, in main
    download_from_doi(value, location, use_libgen)
  File "/usr/local/lib/python3.6/dist-packages/scihub2pdf/download.py", line 161, in download_from_doi
    download_from_libgen(doi, pdf_file)
  File "/usr/local/lib/python3.6/dist-packages/scihub2pdf/download.py", line 68, in download_from_libgen
    found, r = ScrapLib.navigate_to(doi, pdf_file)
  File "/usr/local/lib/python3.6/dist-packages/scihub2pdf/libgen.py", line 44, in navigate_to
    headers=self.headers
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 533, in get
    return self.request('GET', url, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 520, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 630, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 508, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='libgen.io', port=80): Max retries exceeded with url: /scimag/ads.php?doi=doi%3A10.1016%2Fj.patcog.2016.10.023&downloadname= (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fbf0d0f29e8>: Failed to establish a new connection: [Errno -5] No address associated with hostname',))

Problems with the domain

It seems that my school has banned the "sci-hub.cc" domain but the "sci-hub.bz" it is still working. Is there any way to change the domain? maybe passing it as an argument or using some kind of a config file?. A search between a list of possible domains can also be useful.

Direst output infos to files

Nice jobs. most articles are downloaded automaticlly, but a few cannot be found like below:
DOI: 10.1080/16742834.2014.11447220
Sci-Hub Link: https://sci-hub.se/10.1080/16742834.2014.11447220
checking if has captcha...
No pdf found. Maybe, the sci-hub dosen't have the file
Try to open the link in your browser.

Then, I'd like to grep the Sci-Hub Link only and save the Link to a.txt.
scihub2pdf --title ${article_title} > a.txt
doens't work, it will treat '${article_title} > a.txt' as a whole title.
Appreciate if you could solve it.
For example, add an option -d to direst the scihub2pdf infos to files. Thanks.

Download has stopped because of the captcha?

I have tried to download pdfs using the list of DOI that I have stored in the .txt file. Then, I got an issue after 3-4 pdfs are succesfully downloaded:

`DOI:  10.1016/j.telpol.2009.08.001
	Sci-Hub Link:  http://sci-hub.tw/10.1016/j.telpol.2009.08.001
	checking if has captcha...
	Download: ok

	DOI:  10.1080/0268396032000150816
	Sci-Hub Link:  http://sci-hub.tw/10.1080/0268396032000150816
	checking if has captcha...
Traceback (most recent call last):
  File "/Users/mustikarizkifitriyanti/anaconda/envs/thesis/bin/scihub2pdf", line 191, in <module>
    main()
  File "/Users/mustikarizkifitriyanti/anaconda/envs/thesis/bin/scihub2pdf", line 163, in main
    download_from_doi(value, location, use_libgen)
  File "/Users/mustikarizkifitriyanti/anaconda/envs/thesis/lib/python2.7/site-packages/scihub2pdf/download.py", line 163, in download_from_doi
    download_from_scihub(doi, pdf_file)
  File "/Users/mustikarizkifitriyanti/anaconda/envs/thesis/lib/python2.7/site-packages/scihub2pdf/download.py", line 105, in download_from_scihub
    captcha_img = ScrapSci.get_captcha_img()
  File "/Users/mustikarizkifitriyanti/anaconda/envs/thesis/lib/python2.7/site-packages/scihub2pdf/scihub.py", line 98, in get_captcha_img
    self.driver.execute_script("document.getElementById('content').style.zIndex = 9999;")
  File "/Users/mustikarizkifitriyanti/anaconda/envs/thesis/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 635, in execute_script
    'args': converted_args})['value']
  File "/Users/mustikarizkifitriyanti/anaconda/envs/thesis/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 320, in execute
    self.error_handler.check_response(response)
  File "/Users/mustikarizkifitriyanti/anaconda/envs/thesis/lib/python2.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: {"errorMessage":"null is not an object (evaluating 'document.getElementById('content').style')","request":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close","Content-Length":"134","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:49931","User-Agent":"selenium/3.13.0 (python mac)"},"httpVersion":"1.1","method":"POST","post":"{\"sessionId\": \"927e3730-9652-11e8-ae2e-f99d263e318f\", \"args\": [], \"script\": \"document.getElementById('content').style.zIndex = 9999;\"}","url":"/execute","urlParsed":{"anchor":"","query":"","file":"execute","directory":"/","path":"/execute","relative":"/execute","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/execute","queryKey":{},"chunks":["execute"]},"urlOriginal":"/session/927e3730-9652-11e8-ae2e-f99d263e318f/execute"}}
Screenshot: available via screen
`

I wonder maybe this happens because that specific DOI has a captcha, Does anyone can help me to solve this issue?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.