Code Monkey home page Code Monkey logo

Comments (12)

sckott avatar sckott commented on August 16, 2024

Thanks for the issue. I can't run this as is. Where is the skg package? That issue you link to is 4 yrs old. There may have been an issue with Crossref at that time, but it's unlikely to be the same problem

from habanero.

WolfgangFahl avatar WolfgangFahl commented on August 16, 2024

This error is very strange. See https://github.com/WolfgangFahl/pysotsog/blob/main/tests/test_crossref.py for the test source code and https://github.com/WolfgangFahl/pysotsog/blob/main/skg/crossref.py for the helper package.
The CI runs fine and the code runs on most of my machines with no problems. The python versions are 3.9 and 3.10 and the operating sytems linux and MacOs. The machine that is not working is using Python 3.10.8 on MacOS 11.6.2. I have tried quite a few work-arounds - see below. None of the work arounds worked so i wonder why i can get a 401.

To reproduce the code

git clone https://github.com/WolfgangFahl/pysotsog
pip install green
cd pysotsog
green
  File "/Users/wf/Library/Python/3.10/lib/python/site-packages/requests/models.py", line 960, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://api.crossref.org/v1/works/10.1016%2FJ.ARTMED.2017.07.002/transform
def test_curl_style(self): 
        session = requests.Session()
        session.headers.update({
            'User-Agent': 'curl/7.86.0',
            'Accept': 'application/x-bibtex',
        })
        from http.cookiejar import DefaultCookiePolicy
        session.cookies.set_policy(DefaultCookiePolicy(allowed_domains=[]))
        response=session.get('https://doi.org/10.1021/acs.jpcc.0c05161')
        print (response.status_code)
        print (response.text)
    
    def doi2bib(self,doi):
        """
        Return a bibTeX string of metadata for a given DOI.
        """
        url = f"https://doi.org/{doi}" 
        headers = {
            "accept": "application/x-bibtex"
        }
        r = requests.get(url, headers = headers)
        if r.status_code==200:
            return r.text
        else:
            return r.status_code
    
    def test_crossref_bib(self):
        doi="10.1016/J.ARTMED.2017.07.002"
        bib_text=self.doi2bib(doi)
        print (bib_text)
    
    def test_crossref_direct(self):
        """
        """
        headers = {
            'User-Agent': 'Mozilla/5.0; mailto:@doe.com',
        } 
        doi="10.1016/J.ARTMED.2017.07.002"
        url=f"https://api.crossref.org/v1/works/{doi}"
        print (url)
        response = requests.get(url,headers=headers)
        print(response.status_code)
        if response.status_code==200:
            print(response.json())

from habanero.

WolfgangFahl avatar WolfgangFahl commented on August 16, 2024

Just tried python 3.9 and getting the same error.

from habanero.

WolfgangFahl avatar WolfgangFahl commented on August 16, 2024

It is very strange. The error is computer dependend not IP, not MAC address. What on earth could crossref evaluate do create a 401 specifically for a computer?

from habanero.

WolfgangFahl avatar WolfgangFahl commented on August 16, 2024

Does habanero have some kind of proxy cabability e.g. to ask another computer todo the actual work?

from habanero.

sckott avatar sckott commented on August 16, 2024

Thanks for the details @WolfgangFahl I'll take a look soon.

I'd be surprised if the problem was with habanero, but it's possible i guess

from habanero.

WolfgangFahl avatar WolfgangFahl commented on August 16, 2024

I have opened a ticket with CrossRef in the meantime but didn't get a reply yet. For my daily work this is still a showstopper and if have to use a different machine. I wonder whether a simple docker environment would change the situation and may try it out in the upcoming weeks if not other solution comes up.

from habanero.

sckott avatar sckott commented on August 16, 2024

I ran the code in your comment #110 (comment) and green ran without any problems. If you can find where the issue is coming from - and if its coming from habanero - then I can help fix.

from habanero.

WolfgangFahl avatar WolfgangFahl commented on August 16, 2024

There is now a reply from CrossRef and i explained that this is only on a single machine and only when using habanero. I can access the service itself just fine using the class below. See latests changes at WolfgangFahl/pysotsog@64bf3c9

test_doi.py

from unittest import IsolatedAsyncioTestCase
import json

class TestDOILookup(IsolatedAsyncioTestCase): 
    """
    test DOI lookup
    """
    async def testDOILookup(self):
        """
        test DOI lookup 
        """
        debug=True
        dois=["10.1109/TBDATA.2022.3224749"]
        expected=["@article{Li_2022,","@inproceedings{Faruqui_2015,"]
        for i,doi in enumerate(dois):
            doi_obj=DOI(doi)
            result=await doi_obj.doi2bibTex()
            if debug:
                print(result)
            self.assertTrue(result.startswith(expected[i]))
            
    async def testCiteproc(self):
        """
        cite proc lookup
        """ 
        dois=["10.3115/v1/N15-1184"]
        debug=True
        for doi in dois:
            doi_obj=DOI(doi)
            json_data=await doi_obj.doi2Citeproc()
            if debug:
                print(json.dumps(json_data,indent=2))
            self.assertTrue("DOI" in json_data)
            self.assertEqual(doi.lower(),json_data["DOI"])
        
    async def testDataCiteLookup(self):
        """
        test the dataCite Lookup api
        """
        debug=True
        dois=["10.5438/0012"]
        for doi in dois:
            doi_obj=DOI(doi)
            json_data=await doi_obj.dataCiteLookup()
            if debug:
                print(json.dumps(json_data,indent=2))
            self.assertTrue("data" in json_data)
            data=json_data["data"]
            self.assertTrue("id" in data)
            self.assertEquals(doi,data["id"])
            pass

doi.py

'''
Created on 2022-11-22

@author: wf
'''
import re
import aiohttp

class DOI:
    """
    Digital Object Identifier handling
    
    see e.g. https://www.wikidata.org/wiki/Property:P356
    see https://www.doi.org/doi_handbook/2_Numbering.html#2.2
    see https://github.com/davidagraf/doi2bib2/blob/master/server/doi2bib.js
    see https://citation.crosscite.org/docs.html
    
    """
    pattern=re.compile(r"((?P<directory_indicator>10)\.(?P<registrant_code>[0-9]{4,})(?:\.[0-9]+)*(?:\/|%2F)(?:(?![\"&\'])\S)+)")
  
    def __init__(self,doi:str):
        """
        a DOI
        """
        self.doi=doi
        match=re.match(DOI.pattern,doi)
        self.ok=bool(match)
        if self.ok:
            self.registrant_code=match.group("registrant_code")
        
    @classmethod
    def isDOI(cls,doi:str):
        """
        check that the given string is a doi
        
        Args:
            doi(str): the potential DOI string
        """
        if not doi:
            return False
        if isinstance(doi,list):
            ok=len(doi)>0
            for single_doi in doi:
                ok=ok and cls.isDOI(single_doi)
            return ok
        if not isinstance(doi,str):
            return False
        doi_obj=DOI(doi)
        return doi_obj.ok
    
    async def fetch_json(self,url,headers):
        """
        fetch text for the given url with the given headers
        """
        async with aiohttp.ClientSession(headers=headers) as session:
            async with session.get(url) as response:
                return await response.json()
    
    async def fetch_text(self,url,headers):
        """
        fetch text for the given url with the given headers
        """
        async with aiohttp.ClientSession(headers=headers) as session:
            async with session.get(url) as response:
                return await response.text()
    
    async def doi2bibTex(self):
        """
        get the bibtex result for my doi
        """
        url=f"https://doi.org/{self.doi}"
        headers= {
            'Accept': 'application/x-bibtex; charset=utf-8'
        }
        return await self.fetch_text(url,headers)     
    
    async def doi2Citeproc(self):
        """
        get the Citeproc JSON result for my doi
        see https://citeproc-js.readthedocs.io/en/latest/csl-json/markup.html
        """
        url=f"https://doi.org/{self.doi}"
        headers= {
            'Accept': 'application/vnd.citationstyles.csl+json; charset=utf-8'
        }
        return await self.fetch_json(url, headers)
    
    async def dataCiteLookup(self):
        """
        get the dataCite json result for my doi
        """
        url=f"https://api.datacite.org/dois/{self.doi}"
        headers= {
            'Accept': 'application/vnd.api+json; charset=utf-8'
        }
        return await self.fetch_json(url, headers)

from habanero.

sckott avatar sckott commented on August 16, 2024

great, glad it works for you. sounds like no changes are needed here

from habanero.

WolfgangFahl avatar WolfgangFahl commented on August 16, 2024

i still can't use habanero - the above is only a workaround

from habanero.

sckott avatar sckott commented on August 16, 2024

Okay, sorry it doesn't work! I closed it because i'ts been a while and I have no ideas of how to fix this for you.

The 401 Client Error: Unauthorized for url error doesn't make sense because the API does not require authentication. The mailto header is just to get in the "faster lane" where requests should be more reliable/faster .

The only thing I can think is that perhaps your IP address got on their block list. Perhaps you were hitting the API pretty hard at some point? I dont know if they do that kind of thing or not

from habanero.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.