see <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id

This error is very strange. See <a href="https://github.com/WolfgangFahl/pysotsog/blob

Thanks for the details <a class="user-mention notranslate" data-hovercard-type="user"

I ran the code in your comment <a class="issue-link js-issue-link" data-error-text="Fa

error message requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url even when ua and mailto is set about habanero HOT 12 CLOSED

WolfgangFahl commented on August 16, 2024

error message requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url even when ua and mailto is set

from habanero.

Comments (12)

sckott commented on August 16, 2024

Thanks for the issue. I can't run this as is. Where is the skg package? That issue you link to is 4 yrs old. There may have been an issue with Crossref at that time, but it's unlikely to be the same problem

from habanero.

WolfgangFahl commented on August 16, 2024

This error is very strange. See https://github.com/WolfgangFahl/pysotsog/blob/main/tests/test_crossref.py for the test source code and https://github.com/WolfgangFahl/pysotsog/blob/main/skg/crossref.py for the helper package.
The CI runs fine and the code runs on most of my machines with no problems. The python versions are 3.9 and 3.10 and the operating sytems linux and MacOs. The machine that is not working is using Python 3.10.8 on MacOS 11.6.2. I have tried quite a few work-arounds - see below. None of the work arounds worked so i wonder why i can get a 401.

To reproduce the code

git clone https://github.com/WolfgangFahl/pysotsog
pip install green
cd pysotsog
green

  File "/Users/wf/Library/Python/3.10/lib/python/site-packages/requests/models.py", line 960, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://api.crossref.org/v1/works/10.1016%2FJ.ARTMED.2017.07.002/transform

def test_curl_style(self): 
        session = requests.Session()
        session.headers.update({
            'User-Agent': 'curl/7.86.0',
            'Accept': 'application/x-bibtex',
        })
        from http.cookiejar import DefaultCookiePolicy
        session.cookies.set_policy(DefaultCookiePolicy(allowed_domains=[]))
        response=session.get('https://doi.org/10.1021/acs.jpcc.0c05161')
        print (response.status_code)
        print (response.text)
    
    def doi2bib(self,doi):
        """
        Return a bibTeX string of metadata for a given DOI.
        """
        url = f"https://doi.org/{doi}" 
        headers = {
            "accept": "application/x-bibtex"
        }
        r = requests.get(url, headers = headers)
        if r.status_code==200:
            return r.text
        else:
            return r.status_code
    
    def test_crossref_bib(self):
        doi="10.1016/J.ARTMED.2017.07.002"
        bib_text=self.doi2bib(doi)
        print (bib_text)
    
    def test_crossref_direct(self):
        """
        """
        headers = {
            'User-Agent': 'Mozilla/5.0; mailto:@doe.com',
        } 
        doi="10.1016/J.ARTMED.2017.07.002"
        url=f"https://api.crossref.org/v1/works/{doi}"
        print (url)
        response = requests.get(url,headers=headers)
        print(response.status_code)
        if response.status_code==200:
            print(response.json())

from habanero.

WolfgangFahl commented on August 16, 2024

Just tried python 3.9 and getting the same error.

from habanero.

WolfgangFahl commented on August 16, 2024

It is very strange. The error is computer dependend not IP, not MAC address. What on earth could crossref evaluate do create a 401 specifically for a computer?

from habanero.

WolfgangFahl commented on August 16, 2024

Does habanero have some kind of proxy cabability e.g. to ask another computer todo the actual work?

from habanero.

sckott commented on August 16, 2024

Thanks for the details @WolfgangFahl I'll take a look soon.

I'd be surprised if the problem was with habanero, but it's possible i guess

from habanero.

WolfgangFahl commented on August 16, 2024

I have opened a ticket with CrossRef in the meantime but didn't get a reply yet. For my daily work this is still a showstopper and if have to use a different machine. I wonder whether a simple docker environment would change the situation and may try it out in the upcoming weeks if not other solution comes up.

from habanero.

sckott commented on August 16, 2024

I ran the code in your comment #110 (comment) and green ran without any problems. If you can find where the issue is coming from - and if its coming from habanero - then I can help fix.

from habanero.

WolfgangFahl commented on August 16, 2024

There is now a reply from CrossRef and i explained that this is only on a single machine and only when using habanero. I can access the service itself just fine using the class below. See latests changes at WolfgangFahl/pysotsog@64bf3c9

test_doi.py

from unittest import IsolatedAsyncioTestCase
import json

class TestDOILookup(IsolatedAsyncioTestCase): 
    """
    test DOI lookup
    """
    async def testDOILookup(self):
        """
        test DOI lookup 
        """
        debug=True
        dois=["10.1109/TBDATA.2022.3224749"]
        expected=["@article{Li_2022,","@inproceedings{Faruqui_2015,"]
        for i,doi in enumerate(dois):
            doi_obj=DOI(doi)
            result=await doi_obj.doi2bibTex()
            if debug:
                print(result)
            self.assertTrue(result.startswith(expected[i]))
            
    async def testCiteproc(self):
        """
        cite proc lookup
        """ 
        dois=["10.3115/v1/N15-1184"]
        debug=True
        for doi in dois:
            doi_obj=DOI(doi)
            json_data=await doi_obj.doi2Citeproc()
            if debug:
                print(json.dumps(json_data,indent=2))
            self.assertTrue("DOI" in json_data)
            self.assertEqual(doi.lower(),json_data["DOI"])
        
    async def testDataCiteLookup(self):
        """
        test the dataCite Lookup api
        """
        debug=True
        dois=["10.5438/0012"]
        for doi in dois:
            doi_obj=DOI(doi)
            json_data=await doi_obj.dataCiteLookup()
            if debug:
                print(json.dumps(json_data,indent=2))
            self.assertTrue("data" in json_data)
            data=json_data["data"]
            self.assertTrue("id" in data)
            self.assertEquals(doi,data["id"])
            pass

doi.py

'''
Created on 2022-11-22

@author: wf
'''
import re
import aiohttp

class DOI:
    """
    Digital Object Identifier handling
    
    see e.g. https://www.wikidata.org/wiki/Property:P356
    see https://www.doi.org/doi_handbook/2_Numbering.html#2.2
    see https://github.com/davidagraf/doi2bib2/blob/master/server/doi2bib.js
    see https://citation.crosscite.org/docs.html
    
    """
    pattern=re.compile(r"((?P<directory_indicator>10)\.(?P<registrant_code>[0-9]{4,})(?:\.[0-9]+)*(?:\/|%2F)(?:(?![\"&\'])\S)+)")
  
    def __init__(self,doi:str):
        """
        a DOI
        """
        self.doi=doi
        match=re.match(DOI.pattern,doi)
        self.ok=bool(match)
        if self.ok:
            self.registrant_code=match.group("registrant_code")
        
    @classmethod
    def isDOI(cls,doi:str):
        """
        check that the given string is a doi
        
        Args:
            doi(str): the potential DOI string
        """
        if not doi:
            return False
        if isinstance(doi,list):
            ok=len(doi)>0
            for single_doi in doi:
                ok=ok and cls.isDOI(single_doi)
            return ok
        if not isinstance(doi,str):
            return False
        doi_obj=DOI(doi)
        return doi_obj.ok
    
    async def fetch_json(self,url,headers):
        """
        fetch text for the given url with the given headers
        """
        async with aiohttp.ClientSession(headers=headers) as session:
            async with session.get(url) as response:
                return await response.json()
    
    async def fetch_text(self,url,headers):
        """
        fetch text for the given url with the given headers
        """
        async with aiohttp.ClientSession(headers=headers) as session:
            async with session.get(url) as response:
                return await response.text()
    
    async def doi2bibTex(self):
        """
        get the bibtex result for my doi
        """
        url=f"https://doi.org/{self.doi}"
        headers= {
            'Accept': 'application/x-bibtex; charset=utf-8'
        }
        return await self.fetch_text(url,headers)     
    
    async def doi2Citeproc(self):
        """
        get the Citeproc JSON result for my doi
        see https://citeproc-js.readthedocs.io/en/latest/csl-json/markup.html
        """
        url=f"https://doi.org/{self.doi}"
        headers= {
            'Accept': 'application/vnd.citationstyles.csl+json; charset=utf-8'
        }
        return await self.fetch_json(url, headers)
    
    async def dataCiteLookup(self):
        """
        get the dataCite json result for my doi
        """
        url=f"https://api.datacite.org/dois/{self.doi}"
        headers= {
            'Accept': 'application/vnd.api+json; charset=utf-8'
        }
        return await self.fetch_json(url, headers)

from habanero.

sckott commented on August 16, 2024

great, glad it works for you. sounds like no changes are needed here

from habanero.

WolfgangFahl commented on August 16, 2024

i still can't use habanero - the above is only a workaround

from habanero.

sckott commented on August 16, 2024

Okay, sorry it doesn't work! I closed it because i'ts been a while and I have no ideas of how to fix this for you.

The 401 Client Error: Unauthorized for url error doesn't make sense because the API does not require authentication. The mailto header is just to get in the "faster lane" where requests should be more reliable/faster .

The only thing I can think is that perhaps your IP address got on their block list. Perhaps you were hitting the API pretty hard at some point? I dont know if they do that kind of thing or not

from habanero.

error message requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url even when ua and mailto is set about habanero HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent