Comments (12)
Thanks for the issue. I can't run this as is. Where is the skg package? That issue you link to is 4 yrs old. There may have been an issue with Crossref at that time, but it's unlikely to be the same problem
from habanero.
This error is very strange. See https://github.com/WolfgangFahl/pysotsog/blob/main/tests/test_crossref.py for the test source code and https://github.com/WolfgangFahl/pysotsog/blob/main/skg/crossref.py for the helper package.
The CI runs fine and the code runs on most of my machines with no problems. The python versions are 3.9 and 3.10 and the operating sytems linux and MacOs. The machine that is not working is using Python 3.10.8 on MacOS 11.6.2. I have tried quite a few work-arounds - see below. None of the work arounds worked so i wonder why i can get a 401.
To reproduce the code
git clone https://github.com/WolfgangFahl/pysotsog
pip install green
cd pysotsog
green
File "/Users/wf/Library/Python/3.10/lib/python/site-packages/requests/models.py", line 960, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://api.crossref.org/v1/works/10.1016%2FJ.ARTMED.2017.07.002/transform
def test_curl_style(self):
session = requests.Session()
session.headers.update({
'User-Agent': 'curl/7.86.0',
'Accept': 'application/x-bibtex',
})
from http.cookiejar import DefaultCookiePolicy
session.cookies.set_policy(DefaultCookiePolicy(allowed_domains=[]))
response=session.get('https://doi.org/10.1021/acs.jpcc.0c05161')
print (response.status_code)
print (response.text)
def doi2bib(self,doi):
"""
Return a bibTeX string of metadata for a given DOI.
"""
url = f"https://doi.org/{doi}"
headers = {
"accept": "application/x-bibtex"
}
r = requests.get(url, headers = headers)
if r.status_code==200:
return r.text
else:
return r.status_code
def test_crossref_bib(self):
doi="10.1016/J.ARTMED.2017.07.002"
bib_text=self.doi2bib(doi)
print (bib_text)
def test_crossref_direct(self):
"""
"""
headers = {
'User-Agent': 'Mozilla/5.0; mailto:@doe.com',
}
doi="10.1016/J.ARTMED.2017.07.002"
url=f"https://api.crossref.org/v1/works/{doi}"
print (url)
response = requests.get(url,headers=headers)
print(response.status_code)
if response.status_code==200:
print(response.json())
from habanero.
Just tried python 3.9 and getting the same error.
from habanero.
It is very strange. The error is computer dependend not IP, not MAC address. What on earth could crossref evaluate do create a 401 specifically for a computer?
from habanero.
Does habanero have some kind of proxy cabability e.g. to ask another computer todo the actual work?
from habanero.
Thanks for the details @WolfgangFahl I'll take a look soon.
I'd be surprised if the problem was with habanero, but it's possible i guess
from habanero.
I have opened a ticket with CrossRef in the meantime but didn't get a reply yet. For my daily work this is still a showstopper and if have to use a different machine. I wonder whether a simple docker environment would change the situation and may try it out in the upcoming weeks if not other solution comes up.
from habanero.
I ran the code in your comment #110 (comment) and green
ran without any problems. If you can find where the issue is coming from - and if its coming from habanero - then I can help fix.
from habanero.
There is now a reply from CrossRef and i explained that this is only on a single machine and only when using habanero. I can access the service itself just fine using the class below. See latests changes at WolfgangFahl/pysotsog@64bf3c9
test_doi.py
from unittest import IsolatedAsyncioTestCase
import json
class TestDOILookup(IsolatedAsyncioTestCase):
"""
test DOI lookup
"""
async def testDOILookup(self):
"""
test DOI lookup
"""
debug=True
dois=["10.1109/TBDATA.2022.3224749"]
expected=["@article{Li_2022,","@inproceedings{Faruqui_2015,"]
for i,doi in enumerate(dois):
doi_obj=DOI(doi)
result=await doi_obj.doi2bibTex()
if debug:
print(result)
self.assertTrue(result.startswith(expected[i]))
async def testCiteproc(self):
"""
cite proc lookup
"""
dois=["10.3115/v1/N15-1184"]
debug=True
for doi in dois:
doi_obj=DOI(doi)
json_data=await doi_obj.doi2Citeproc()
if debug:
print(json.dumps(json_data,indent=2))
self.assertTrue("DOI" in json_data)
self.assertEqual(doi.lower(),json_data["DOI"])
async def testDataCiteLookup(self):
"""
test the dataCite Lookup api
"""
debug=True
dois=["10.5438/0012"]
for doi in dois:
doi_obj=DOI(doi)
json_data=await doi_obj.dataCiteLookup()
if debug:
print(json.dumps(json_data,indent=2))
self.assertTrue("data" in json_data)
data=json_data["data"]
self.assertTrue("id" in data)
self.assertEquals(doi,data["id"])
pass
doi.py
'''
Created on 2022-11-22
@author: wf
'''
import re
import aiohttp
class DOI:
"""
Digital Object Identifier handling
see e.g. https://www.wikidata.org/wiki/Property:P356
see https://www.doi.org/doi_handbook/2_Numbering.html#2.2
see https://github.com/davidagraf/doi2bib2/blob/master/server/doi2bib.js
see https://citation.crosscite.org/docs.html
"""
pattern=re.compile(r"((?P<directory_indicator>10)\.(?P<registrant_code>[0-9]{4,})(?:\.[0-9]+)*(?:\/|%2F)(?:(?![\"&\'])\S)+)")
def __init__(self,doi:str):
"""
a DOI
"""
self.doi=doi
match=re.match(DOI.pattern,doi)
self.ok=bool(match)
if self.ok:
self.registrant_code=match.group("registrant_code")
@classmethod
def isDOI(cls,doi:str):
"""
check that the given string is a doi
Args:
doi(str): the potential DOI string
"""
if not doi:
return False
if isinstance(doi,list):
ok=len(doi)>0
for single_doi in doi:
ok=ok and cls.isDOI(single_doi)
return ok
if not isinstance(doi,str):
return False
doi_obj=DOI(doi)
return doi_obj.ok
async def fetch_json(self,url,headers):
"""
fetch text for the given url with the given headers
"""
async with aiohttp.ClientSession(headers=headers) as session:
async with session.get(url) as response:
return await response.json()
async def fetch_text(self,url,headers):
"""
fetch text for the given url with the given headers
"""
async with aiohttp.ClientSession(headers=headers) as session:
async with session.get(url) as response:
return await response.text()
async def doi2bibTex(self):
"""
get the bibtex result for my doi
"""
url=f"https://doi.org/{self.doi}"
headers= {
'Accept': 'application/x-bibtex; charset=utf-8'
}
return await self.fetch_text(url,headers)
async def doi2Citeproc(self):
"""
get the Citeproc JSON result for my doi
see https://citeproc-js.readthedocs.io/en/latest/csl-json/markup.html
"""
url=f"https://doi.org/{self.doi}"
headers= {
'Accept': 'application/vnd.citationstyles.csl+json; charset=utf-8'
}
return await self.fetch_json(url, headers)
async def dataCiteLookup(self):
"""
get the dataCite json result for my doi
"""
url=f"https://api.datacite.org/dois/{self.doi}"
headers= {
'Accept': 'application/vnd.api+json; charset=utf-8'
}
return await self.fetch_json(url, headers)
from habanero.
great, glad it works for you. sounds like no changes are needed here
from habanero.
i still can't use habanero - the above is only a workaround
from habanero.
Okay, sorry it doesn't work! I closed it because i'ts been a while and I have no ideas of how to fix this for you.
The 401 Client Error: Unauthorized for url
error doesn't make sense because the API does not require authentication. The mailto
header is just to get in the "faster lane" where requests should be more reliable/faster .
The only thing I can think is that perhaps your IP address got on their block list. Perhaps you were hitting the API pretty hard at some point? I dont know if they do that kind of thing or not
from habanero.
Related Issues (20)
- CrossRef.works return type HOT 7
- Exact search for titles HOT 5
- change master to main HOT 1
- test mailto, add to test-settings.py
- list index out of range when using WorksContainer HOT 4
- Get DOI from query? Convert to dataframe? HOT 4
- Installs "test" into Python path HOT 2
- Add timeout option HOT 5
- works: Warning for one bad id pollutes entire response HOT 10
- Mark two more tests as vcr ones HOT 3
- habanero_utils.py gives TypeError on api.crossref.org 404 err HOT 4
- UnboundLocalError in request_class.py HOT 10
- Fix GH actions for python 3.8 and 3.9
- Month returned by habanero.cn.content_negotiation(ids = doi) no longer in curly brackets HOT 6
- Can't make field queries request HOT 2
- Fix readthedocs connectgion HOT 1
- Library returns wrong values HOT 4
- Update where Crossref's issue tracker is HOT 1
- bibtexparser integration HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from habanero.