ip-tools / python-epo-ops-client Goto Github PK

Python client for EPO OPS, the European Patent Office's Open Patent Services API.

License: Apache License 2.0

Makefile 3.72% Python 96.28%

epo ops epo-ops patent-data patent-data-api patent-office

python-epo-ops-client's Introduction

python-epo-ops-client

python-epo-ops-client is an Apache2 licensed client library for accessing the European Patent Office's ("EPO") Open Patent Services ("OPS") v.3.2 (based on v 1.3.16 of the reference guide).

import epo_ops

client = epo_ops.Client(key='abc', secret='xyz')  # Instantiate client
response = client.published_data(  # Retrieve bibliography data
  reference_type = 'publication',  # publication, application, priority
  input = epo_ops.models.Docdb('1000000', 'EP', 'A1'),  # docdb, epodoc
  endpoint = 'biblio',  # optional, defaults to biblio in case of published_data
  constituents = []  # optional, list of constituents
)

Features

python-epo-ops-client abstracts away the complexities of accessing EPO OPS:

Format the requests properly
Bubble up quota problems as proper HTTP errors
Handle token authentication and renewals automatically
Handle throttling properly
Add optional caching to minimize impact on the OPS servers

There are two main layers to python-epo-ops-client: Client and Middleware.

Client

The Client contains all the formatting and token handling logic and is what you'll interact with mostly.

When you issue a request, the response is a requests.Response object. If response.status_code != 200 then a requests.HTTPError exception will be raised — it's your responsibility to handle those exceptions if you want to. The one case that's handled is when the access token has expired: in this case, the client will automatically handle the HTTP 400 status and renew the token.

Note that the Client does not attempt to interpret the data supplied by OPS, so it's your responsibility to parse the XML or JSON payload for your own purpose.

The following custom exceptions are raised for cases when OPS quotas are exceeded, they are all in the epo_ops.exceptions module and are subclasses of requests.HTTPError, and therefore offer the same behaviors:

IndividualQuotaPerHourExceeded
RegisteredQuotaPerWeekExceeded

Again, it's up to you to parse the response and decide what to do.

Currently the Client knows how to issue request for the following services:

Client method	API end point	throttle
`family(reference_type, input, endpoint=None, constituents=None)`	family	inpadoc
`image(path, range=1, extension='tiff')`	published-data/images	images
`number(reference_type, input, output_format)`	number-service	other
`published_data(reference_type, input, endpoint='biblio', constituents=None)`	published-data	retrieval
`published_data_search(cql, range_begin=1, range_end=25, constituents=None)`	published-data/search	search
`register(reference_type, input, constituents=['biblio'])`	register	other
`register_search(cql, range_begin=1, range_end=25)`	register/search	other

Bulk operations can be achieved by passing a list of valid models to the published_data input field.

See the OPS guide or use the Developer's Area for more information on how to use each service.

Please submit pull requests for the following services by enhancing the epo_ops.api.Client class:

Legal service

Middleware

All requests and responses are passed through each middleware object listed in client.middlewares. Requests are processed in the order listed, and responses are processed in the reverse order.

Each middleware should subclass middlewares.Middleware and implement the process_request and process_response methods.

There are two middleware classes out of the box: Throttler and Dogpile. Throttler is in charge of the OPS throttling rules and will delay requests accordingly. Dogpile is an optional cache which will cache all HTTP status 200, 404, 405, and 413 responses.

By default, only the Throttler middleware is enabled, if you want to enable caching:

import epo_ops

middlewares = [
    epo_ops.middlewares.Dogpile(),
    epo_ops.middlewares.Throttler(),
]
client = epo_ops.Client(
    key='key',
    secret='secret',
    middlewares=middlewares,
)

You'll also need to install caching dependencies in your projects, such as pip install dogpile.cache.

Note that caching middleware should be first in most cases.

Dogpile

Dogpile is based on (surprise) dogpile.cache. By default it is instantiated with a DBMBackend region with timeout of 2 weeks.

Dogpile takes three optional instantiation parameters:

region: You can pass whatever valid dogpile.cache Region you want to backend the cache
kwargs_handlers: A list of keyword argument handlers, which it will use to process the kwargs passed to the request object in order to extract elements for generating the cache key. Currently one handler is implemented (and instantiated by default) to make sure that the range request header is part of the cache key.
http_status_codes: A list of HTTP status codes that you would like to have cached. By default 200, 404, 405, and 413 responses are cached.

Note: dogpile.cache is not installed by default, if you want to use it, pip install dogpile.cache in your project.

Throttler

Throttler contains all the logic for handling different throttling scenarios. Since OPS throttling is based on a one minute rolling window, we must persist historical (at least for the past minute) throtting data in order to know what the proper request frequency is. Each Throttler must be instantiated with a Storage object.

Storage

The Storage object is responsible for:

Knowing how to update the historical record with each request (Storage.update()), making sure to observe the one minute rolling window rule.
Calculating how long to wait before issuing the next request (Storage.delay_for()).

Currently the only Storage backend provided is SQLite, but you can easily write your own Storage backend (such as file, Redis, etc.). To use a custom Storage type, just pass the Storage object when you're instantiating a Throttler object. See epo_ops.middlewares.throttle.storages.Storage for more implementation details.

python-epo-ops-client's People

Contributors

Stargazers

Watchers

Forkers

rblo kobaski rtav yngcan alicadaly eltermann shawnmoyer panyang pyankoff rickardsjogren benhoyle conlonpj andre-santos rfaga madsvj altagura xianqiangsun plonerm aviyallapalli apogre geoffcline furongpeng mattth86 bmeindl mmath aghster mu-wang yuantinglee asdbaihu patent2net ziyuanzhao zeonium mmustafa53 emil2k inezvanlaer lennardm blingbling333 orgpatentroot j-heinze bioinfonerd-forks ghooooooooost pofernandes chahinez-arezki polo6767 vascoarizna neil-rubens duoduo695055 benecollyridam kanishkanmd eryl oco9oco marcao02 techprolet davidmax2023 mattkeanny julian-renner cholotook

python-epo-ops-client's Issues

404 Client Error: Not Found for url: https://ops.epo.org/3.2/rest-services/published-data/publication/docdb/claims

Looking here, I see the Published Claims Retrieval Service (POST)

with the prototype http://ops.epo.org/3.2/rest-services/published-data/{type}/{format}/claims

When I call:

    # Retrieve bibliography data
    response = client.published_data(
        reference_type="publication",
        input=epo_ops.models.Docdb('2024054343', 'WO', 'A2'),
        endpoint="claims",
    )

I get an exception:

404 Client Error: Not Found for url: https://ops.epo.org/3.2/rest-services/published-data/publication/docdb/claims
  File "/home/dan/Build/python-epo-ops-client/epo_ops/api.py", line 177, in _make_request
    response.raise_for_status()
  File "/home/dan/Build/python-epo-ops-client/epo_ops/api.py", line 219, in _service_request
    return self._make_request(url, data)
  File "/home/dan/Build/python-epo-ops-client/epo_ops/api.py", line 78, in published_data
    return self._service_request(
  File "/home/dan/Geromics/PatentBot2000/EPO/getting_started_with_epo_ops_and_pydantic.py", line 97, in test_published_data
    response = client.published_data(
  File "/home/dan/Geromics/PatentBot2000/EPO/getting_started_with_epo_ops_and_pydantic.py", line 164, in <module>
    data = test_published_data(client, test_docdb)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://ops.epo.org/3.2/rest-services/published-data/publication/docdb/claims
20220105

The url parameter seems to match the prototype... Not sure what I'm doing wrong

epo_ops.middlewares.Dogpile() module 'epo_ops.middlewares' has no attribute 'Dogpile'

when running:
middlewares = [
epo_ops.middlewares.Dogpile(),
epo_ops.middlewares.Throttler(),
]
I get:
'epo_ops.middlewares' has no attribute 'Dogpile'

I checked the dir of epo_ops.middlewares
print(dir(epo_ops.middlewares))

result:
['Middleware', 'Throttler', 'builtins', 'cached', 'doc', 'file', 'loader', 'name', 'package', 'path', 'spec', 'middleware', 'throttle']

'Throttler' is there but not 'Dogpile'.

Is everything all right?

[Question] Is it possible to change the range of the results `published_data_search` ?

First of all, great work you've done! Thanks a lot!

This is what I found in EPO FAQ:

By default, OPS displays only the first 25 results but if you want to see more, you can change the interval at the end of the URL ("&Range=xx-xx"). You will then be able to download batches of up to 100 results at a time:

"&Range=1-100"

However, OPS can only display a total of 2,000 results. If your list of results exceeds this number, you will need to limit your query to reduce the number of search results.

Also, I saw here that you explicitly put the range. Is it possible to change the range from 1 to 1000 as it's said in the FAQ?

Thank you.

Constantly get 404 error

Hello!
I using the EPO OPS client but have some troubles with my request.
When I use the example request of the EPO OPS client everything seems to work fine, but when using my code i constantly get a 404 error.
HTTPError: 404 Client Error: Not Found for url: https://ops.epo.org/3.2/rest-services/register/application/epodoc/biblio
Seems like between /epodoc and /biblio the application number seems to be missing right?
This is my code:

#example
	client = epo_ops.Client(key='key', secret='secret')  # Instantiate client
        response = client.register(  # Retrieve bibliography data
		  		reference_type = 'application',  # publication, application, priority
		  		input = epo_ops.models.Epodoc('12345'),  # original, docdb, epodoc
		  		constituents = ['biblio']  # optional, list of constituents
				)

I tried to do i similar like in the example
input = epo_ops.models.Docdb('1000000', 'EP', 'A1'),
but using
input = epo_ops.models.Epodoc(appnum), instead, as i want to make a register request.

https://www.pydoc.io/pypi/python-epo-ops-client-2.3.0/autoapi/models/index.html says that models.Epodoc just needs a number as input.
Does anyone know what i misunderstood or what is going wrong?

Convert to markdown README for pypi

See https://dustingram.com/articles/2018/03/16/markdown-descriptions-on-pypi for more info.

This will get rid of having to convert to rst!

SQLite objects created in a thread can only be used in that same thread. The object was created in thread

Hello,

I encountered this new error.
Do you have any idea to resolve it?
FYI, I don't use any multithreading process neither sqlite in my code.

"Traceback (most recent call last):
  File "/layers/google.python.pip/pip/lib/python3.7/site-packages/epo_ops/middlewares/throttle/storages/sqlite.py", line 76, in prune
    self.db.execute(sql)
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 69052472574080 and this is thread id 69052152153856.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/layers/google.python.pip/pip/lib/python3.7/site-packages/epo_ops/api.py", line 97, in published_data_search
    range,
  File "/layers/google.python.pip/pip/lib/python3.7/site-packages/epo_ops/api.py", line 223, in _search_request
    url, {"q": cql}, {range["key"]: "{begin}-{end}".format(**range)}
  File "/layers/google.python.pip/pip/lib/python3.7/site-packages/epo_ops/api.py", line 172, in _make_request
    response = request_method(url, data=data, headers=headers, params=params)
  File "/layers/google.python.pip/pip/lib/python3.7/site-packages/epo_ops/models.py", line 86, in post
    return self._request(_post_callback, url, data, **kwargs)
  File "/layers/google.python.pip/pip/lib/python3.7/site-packages/epo_ops/models.py", line 95, in _request
    url, data, kwargs = mw.process_request(self.env, url, data, **kwargs)
  File "/layers/google.python.pip/pip/lib/python3.7/site-packages/epo_ops/middlewares/throttle/throttler.py", line 20, in process_request
    time.sleep(self.history.delay_for(service))
  File "/layers/google.python.pip/pip/lib/python3.7/site-packages/epo_ops/middlewares/throttle/storages/sqlite.py", line 110, in delay_for
    self.prune()
  File "/layers/google.python.pip/pip/lib/python3.7/site-packages/epo_ops/middlewares/throttle/storages/sqlite.py", line 76, in prune
    self.db.execute(sql)
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 69052472574080 and this is thread id 69052152153856."

Multiple published documents search

Multiple document search is supported by EPO 3.2. But I don't see an interface for the same in the code. A constructor overloading in Docdb would help achieve the same in Python3. Here is an edited version of models.py that I am using to achieve the same.

import inspect
import types

class MultiMethod:
    '''
    Represents a single multimethod.
    '''
    def __init__(self, name):
        self._methods = {}
        self.__name__ = name

    def register(self, meth):
        '''
        Register a new method as a multimethod
        '''
        sig = inspect.signature(meth)

        # Build a type-signature from the method's annotations
        types = []
        for name, parm in sig.parameters.items():
            if name == 'self': 
                continue
            if parm.annotation is inspect.Parameter.empty:
                raise TypeError(
                    'Argument {} must be annotated with a type'.format(name)
                    )
            if not isinstance(parm.annotation, type):
                raise TypeError(
                    'Argument {} annotation must be a type'.format(name)
                    )
            if parm.default is not inspect.Parameter.empty:
                self._methods[tuple(types)] = meth
            types.append(parm.annotation)

        self._methods[tuple(types)] = meth

    def __call__(self, *args):
        '''
        Call a method based on type signature of the arguments
        '''
        types = tuple(type(arg) for arg in args[1:])
        meth = self._methods.get(types, None)
        if meth:
            return meth(*args)
        else:
            raise TypeError('No matching method for types {}'.format(types))
        
    def __get__(self, instance, cls):
        '''
        Descriptor method needed to make calls work in a class
        '''
        if instance is not None:
            return types.MethodType(self, instance)
        else:
            return self
    
class MultiDict(dict):
    '''
    Special dictionary to build multimethods in a metaclass
    '''
    def __setitem__(self, key, value):
        if key in self:
            # If key already exists, it must be a multimethod or callable
            current_value = self[key]
            if isinstance(current_value, MultiMethod):
                current_value.register(value)
            else:
                mvalue = MultiMethod(key)
                mvalue.register(current_value)
                mvalue.register(value)
                super().__setitem__(key, mvalue)
        else:
            super().__setitem__(key, value)

class MultipleMeta(type):
    '''
    Metaclass that allows multiple dispatch of methods
    '''
    def __new__(cls, clsname, bases, clsdict):
        return type.__new__(cls, clsname, bases, dict(clsdict))

    @classmethod
    def __prepare__(cls, clsname, bases):
        return MultiDict()

## Actual Docdb change
class Docdb(BaseInput, metaclass=MultipleMeta):
    def __init__(self, number:str, country_code:str, kind_code:str, date:str=None):
        if not all([country_code, kind_code]):
            raise MissingRequiredValue(
                'number, country_code, and kind_code must be present'
            )
        super(Docdb, self).__init__(number, country_code, kind_code, date)

    def __init__(self, data_points:list):
        self.data_points = []
        for dp in data_points:
            d = Docdb(dp[0], dp[1], dp[2])
            self.data_points.append(d)

    def as_api_input(self):
        print(self.__dict__.keys(), )
        if hasattr(self, 'data_points'):
            print(self.data_points)
            print("#" * 20)
            return '\n'.join([x.as_api_input() for x in self.data_points])
        return super(Docdb, self).as_api_input()

Update dependencies

Python 2.6 compatibility

Tests are failing for Python 2.6 on Travis. Drop support or figure out what's wrong.

Throw an error if `original` is used with any incompatible method

See #81 (comment) for more info.

We need to flesh out design details for this.

Python 3.6 compatibility

Need to make sure package is compatible with Python 3.6

Different results between espacenet and API

Hello,

I would like to know if any of you has encountered the same issue as me.
Actually I get the feeling that the result is different from what we get via Espacenet platform and from the API.
I don't have the same number of documents and the same relevant documents.

Thanks,

TypeError: _make_request() got an unexpected keyword argument 'use_get'

Dear @gsong,

first things first: Thanks again for your exceptional work on maintaining this library.

We just tried to integrate the most recent release python-epo-ops-client 3.1.1 into PatZilla as we are looking forward to the updates coming from #33 and #34.

However, we hit a speed bump we would like to share with you. When requesting family information, python_epo_ops_client-3.1.1 croaked like

  File "/Users/amo/dev/elmyra/sources/ip-navigator/patzilla/access/epo/ops/api.py", line 718, in ops_family_inpadoc
    response = ops.family(reference_type, ops_id, constituents=to_list(constituents))
  File "/Users/amo/dev/elmyra/sources/ip-navigator/.venv2/lib/python2.7/site-packages/python_epo_ops_client-3.1.1-py2.7.egg/epo_ops/api.py", line 50, in family
    return self._make_request(url, None, params=input.as_api_input(), use_get=True)
TypeError: _make_request() got an unexpected keyword argument 'use_get'

While we haven't investigated this more thoroughly, it looks like the problem is coming from within the module.

Thanks already for looking into this and with kind regards,
Andreas.

`client.image` Method Fails to Retrieve Specific Document Pages

Issue:
Using client.image to fetch pages from EPO's API always returns the first page, regardless of the range parameter provided.

Steps to Reproduce:

Used client.image with range=page to fetch the nth page of a document.
Only the first page was returned in every attempt.

Workaround:
Manually constructing the URL and headers to specify the X-OPS-Range and making a direct request with client._make_request retrieves the correct page.

Expected:
client.image should interpret the range parameter to fetch the specified page number.

Actual:
Always retrieves the first page.

Details:

Failing call: client.image("http://ops.epo.org/rest-services/published-data/images/EP/1000000/PA/fullimage", range=page, document_format="application/pdf")
Successful workaround:

base_url = "http://ops.epo.org/rest-services/published-data/images"
publication_authority = "EP"
publication_number = "1000000"
publication_kind = "A1"
url = f"{base_url}/{publication_authority}/{publication_number}/{publication_kind}/fullimage"

# Make the request using client._make_request, specifying the Range through parameters and headers
image_response = client._make_request(
    url, 
    data="", 
    params={"Range": page}, 
    extra_headers={"Accept": "application/pdf", "X-OPS-Range": str(page)}, 
    use_get=True
)

(I found the above through trial and error, because I couldn't understand what the data parameter does here)

always get 404

Hello,
with the supplied example I always get the following errors:

Traceback (most recent call last):
  File "C:\Users\Philippe\Develop\Python\Divers\Patents\patents.py", line 17, in <module>
    constituents = []  # optional, list of constituents
  File "C:\Python27\lib\site-packages\epo_ops\api.py", line 92, in published_data
    constituents
  File "C:\Python27\lib\site-packages\epo_ops\api.py", line 85, in _service_request
    return self.make_request(url, input.as_api_input())
  File "C:\Python27\lib\site-packages\epo_ops\api.py", line 167, in make_request
    response.raise_for_status()
  File "C:\Python27\lib\site-packages\requests\models.py", line 795, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found

I have Python 2.7 32 bits on Windows7 64 bits. Tried at work behind a proxy (looked ok : response was a valid 404...) and at home (no proxy). Tried as anonymous or registered user...
I traced in the code but couldn't find what's wrong...
Any idea ? Thanks !

Caching problems with requests for register information

Dear George,

first things first: Thanks for conceiving and maintaining this great library for accessing the EPO OPS service!

When working on the Patent2net project, we might have found a minor anomaly when using your library.

Anomaly

We found that the requests for register information might not get cached properly, while it works like a charm for published data search and family information requests.

How to reproduce?

For reproducing this, using the Patent2Net project is just three commands away (you might want to run this inside a virtualenv).

Prepare

Install Patent2Net:
pip install 'https://github.com/Patent2net/P2N/archive/3.0.0-dev5.tar.gz'

Configure (use your OPS credentials here ;]):
p2n ops init --key=ScirfedyifJiashwOckNoupNecpainLo --secret=degTefyekDevgew1

Run

p2n adhoc worldmap --expression='TA=lentille' --country-field='designated_states' --with-family --with-register

Observations

When running this a second time with a warm cache, you will recognize from the log output that all requests to published data search and family information are speedy as they don't hit the OPS API at all.

However, all requests for register information feel like they hit the OPS API each time again, which can be recognized by the significant visible delay between each request.

Conclusion

I tried to peek around the code a bit, but couldn't find anything reasonable what could cause this behavior. Maybe you can find some time to look into this issue?

Thank you very much in advance!

With kind regards,
Andreas.

Epodoc biblio url not found

When I try to query using Epodoc, I always get the same issue:

requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://ops.epo.org/3.2/rest-services/register/publication/epodoc/biblio

Is there something I am missing in the code?

data = 'EP07102956'
client = Client(key=USER, secret=PASS)  # Instantiate client
response = client.register(
        reference_type='publication',
        input=models.Epodoc(data),
  )
 print(response.text)

Update to OPS v3.2

From announcement [1]:

The planned schedule is as follows:

• Current: version 3.1 production environment – version 3.2 public beta
• January 2017: version 3.2 in full production – version 3.1 still available
• As from Q2 2017: discontinuation of version 3.1

@gsong, do you have any availability to coordinate this effort?

[1] http://forums.epo.org/open-patent-services-and-publication-server-web-service/topic5378.html

Upload to new PyPI.org

See https://packaging.python.org/guides/migrating-to-pypi-org/#uploading for more details.

Move Travis env var to build settings instead of encrypted within travis.yml

It seems forks are not able to use the encrypted environment variables for testing.

Replace Apiary tests with monkeypatch or in-process mockery

Dear George,

in the backlog file, I've picked up that item which raised my interest.

Testing: Replace Apiary tests with monkeypatch? Or https://pypi.python.org/pypi/pytest-localserver.

Maybe you can quickly educate me what this Apiary API is used for?

What does this API do?
Is the test suite running against the Apiary API only? Why do I need an OPS account then?
Where is the source code for the Apiary API, and how would one make changes to it?

Following up on that, I will be more than happy to help replacing that testing subsystem with something else more self-contained, if that makes sense and would actually be possible.

With kind regards,
Andreas.

Module epo_ops not found error

I ran the following basic code for OPS authentication using my API keys. Code looks like below

import epo_ops

client = epo_ops.Client(key='Key1', secret='Seckey1') # Instantiate client
response = client.published_data( # Retrieve bibliography data
reference_type = 'publication', # publication, application, priority
input = epo_ops.models.Docdb('1000000', 'EP', 'A1'), # original, docdb, epodoc
endpoint = 'biblio', # optional, defaults to biblio in case of published_data
constituents = [] # optional, list of constituents
)

When I ran, it showed ImportError: No module named epo_ops. Fixes I have tried so far

pip install epo_ops
Downloaded zip file from GitHub and ran the setup file but still it did not recognized epo_ops.

SSL Error

Hi I want to do a basic published data search request.

On my home PC it's working fine, but on my Laptop I get the

requests.exceptions.SSLError: HTTPSConnectionPool(host='ops.epo.org', port=443): Max retries exceeded with url: /3.2/auth/accesstoken (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:997)')))

error.

Code:

import epo_ops

client = epo_ops.Client(key='...', secret='...')  # Instantiate client
response = client.published_data_search('ti all "plastic"',1,25)

Is this a device configuration problem or some other library version problem?

Any ideas?

HTTP 500 error

Hi,

Since last night, I'm having problems sending requests to EPO OPS.

For example, with python-epo-ops-client v2.3.0:

import epo_ops
import epo_ops.models
client = epo_ops.Client(key='X', secret='X')
inp = epo_ops.models.Epodoc('CA826085')
client.published_data(reference_type='publication', input=inp, endpoint='biblio')

And I receive a 500 error with the body:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
  <fault xmlns="http://ops.epo.org">
    <code>SERVER.NotSupported</code>
    <message>An unknown error occurred. Please contact Open Patent Services Team (http://forums.epo.org/open-patent-services-and-publication-server-web-service)</message>
   </fault>

It seems that yesterday a patch was released for OPS: https://forums.epo.org/maintenance-release-of-open-patent-services-ops-v-3-2-expected-soon-7224

I've noticed that this works fine:

curl -X POST --header "Authorization: Bearer XXX" --header "Content-Type: text/plain" -d "(CA826085)" "http://ops.epo.org/3.2/rest-services/published-data/publication/epodoc/biblio"

But it generates a 500 error if the Content-Type header is missing:

curl -X POST --header "Authorization: Bearer XXX" -d "(CA826085)" "http://ops.epo.org/3.2/rest-services/published-data/publication/epodoc/biblio"

In epo_ops/api.py:64 _post method, a default 'Content-type' headers needs to be provided:

    def _post(self, url, data, extra_headers=None, params=None):
        headers = {
            'Accept': self.accept_type,
            'Content-Type': 'text/plain'
        }
        headers.update(extra_headers or {})
        return self.request.post(
            url, data=data, headers=headers, params=params
        )

404 HTTP Error in Search

I am trying to use the python-epo-ops-client library to perform a search; however, for some searches (not all), I am getting a 404 response status.

Here is a minimal example of a situation where this error occurs:

import epo_ops

key = '****'
consumer_secret = '*******'

client = epo_ops.Client(key=key, secret=consumer_secret)  # Instantiate client

query = '(txt all "water" AND "derailleur" AND "reuse") AND (cl all "E03B")'

response = client.published_data_search(
    cql=query, 
    range_begin=1,
    range_end=100
)

I am using Python 3.9 and python_epo_ops_client version 4.0.0.

Does anyone know how to fix this or indicate what I am doing wrong?

type hints

Please, add typehints, it will make the library way easier to use!

Negative delay when using Sqlite throttle middleware

When using the library for an extended time we eventually got the following error:

Your operating system name and version.
- We got the error in Docker (Linux d65ebfed45f6 4.19.121-linuxkit x86_64 GNU/Linux) hosted on macOS Bug Sur (x84_64)
Any details about your local setup that might be helpful in troubleshooting.
- It is a very long running program, so it seems like a bug that only happens in rare cases
Detailed steps to reproduce the bug.
- Keep sending queries for ~20 hrs. It happens reliably for us but very rarely

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
...
~/.heroku/python/lib/python3.9/site-packages/epo_ops/api.py in published_data_search(self, cql, range_begin, range_end, constituents)
     90     ):
     91         range = dict(key="X-OPS-Range", begin=range_begin, end=range_end)
---> 92         return self._search_request(
     93             dict(
     94                 service=self.__published_data_search_path__, constituents=constituents
~/.heroku/python/lib/python3.9/site-packages/epo_ops/api.py in _search_request(self, info, cql, range)
    220     def _search_request(self, info, cql, range):
    221         url = self._make_request_url(info)
--> 222         return self._make_request(
    223             url, {"q": cql}, {range["key"]: "{begin}-{end}".format(**range)}
    224         )
~/.heroku/python/lib/python3.9/site-packages/epo_ops/api.py in _make_request(self, url, data, extra_headers, params, use_get)
    170             request_method = self.request.get
    171 
--> 172         response = request_method(url, data=data, headers=headers, params=params)
    173         response = self._check_for_expired_token(response)
    174         response = self._check_for_exceeded_quota(response)
~/.heroku/python/lib/python3.9/site-packages/epo_ops/models.py in post(self, url, data, **kwargs)
     84 
     85     def post(self, url, data=None, **kwargs):
---> 86         return self._request(_post_callback, url, data, **kwargs)
     87 
     88     def get(self, url, data=None, **kwargs):
~/.heroku/python/lib/python3.9/site-packages/epo_ops/models.py in _request(self, callback, url, data, **kwargs)
     93 
     94         for mw in self.middlewares:
---> 95             url, data, kwargs = mw.process_request(self.env, url, data, **kwargs)
     96 
     97         # Either get response from cache environment or request from upstream
~/.heroku/python/lib/python3.9/site-packages/epo_ops/middlewares/throttle/throttler.py in process_request(self, env, url, data, **kwargs)
     18         if not env["from-cache"]:
     19             service = service_for_url(url)
---> 20             time.sleep(self.history.delay_for(service))
     21         return url, data, kwargs
     22 
ValueError: sleep length must be non-negative

When I dig into the code it seems that it is in the calculation here epo_ops/middlewares/throttle/storages/sqlite.py#L123

I think it happens if the next run is in the past (aka. next_run < _now). It could probably be fixed by adding a check for that and returning 0

I have made a PR that should fix it, but maybe I'm missing something/not doing it the desired way: #49

Is "description" supported by this client?

I did not find "description" or "claims" under the table "Currently the Client knows how to issue request for the following services:".
Is it supported by this client yet? (I am not familiar with python...) thanks a lot.
what I have tried:

response = client.published_data(
    reference_type = 'publication', # publication, application, priority
    input = epo_ops.models.Epodoc('EP1000000'), # original, docdb, epodoc
    endpoint = 'description', # optional, defaults to biblio in case of published_data
    constituents = [] # optional, list of constituents
)

Fix test case `test_mock_quota_exceeded`

About

At #63 (review), we reported that the test case test_mock_quota_exceeded currently fails. The reason is:

requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://opsv31.docs.apiary.io//individual-per-hour-exceeded/publication/docdb/biblio

References

Maybe related: GH-65

Make sure PRs can be validated via Travis

Travis restricts PR access to secure environment variables. Let's make sure tests can be run using anonymous clients.

See https://docs.travis-ci.com/user/pull-requests#Security-Restrictions-when-testing-Pull-Requests

Trouble retrieving text

Hi,

First, thank you for creating this package. I am a beginner when it comes to using API. I am trying to extract data but not receiving any text.

I ran this code:

mport epo_ops

anonymous_client = epo_ops.Client() # Instantiate a default client
response = anonymous_client.published_data( # Retrieve bibliography data
reference_type = 'publication', # publication, application, priority
input = epo_ops.models.Docdb('1000000', 'EP', 'A1'), # original, docdb, epodoc
endpoint = 'biblio', # optional, defaults to biblio in case of published_data
constituents = [] # optional, list of constituents
)

registered_client = epo_ops.RegisteredClient(key='abc', secret='xyz')
registered_client.access_token # To see the current token
response = registered_client.published_data(…)

Replaced the last line:

response = registered_client.published_data('publication', epo_ops.models.Docdb('1000000', 'EP', 'A1'), endpoint='biblio', constituents=None)

Output I got : <Response [200]>

How should I modify the code to retrieve text? I would appreciate your help.

Module epo_ops not found

I ran the following basic code for OPS authentication using my API keys. Code looks like below

import epo_ops

When I ran, it showed ImportError: No module named epo_ops. Fixes I have tried so far

pip install epo_ops
Downloaded zip file from GitHub and ran the setup file but still it did not recognized epo_ops.

Document Client interface

Document the Client interface more clearly in the README.

Make invoking "response.raise_for_status()" optional

Hi there,

coming here from #35 (comment), @gsong was so kind to offer he would think about an option on how to toggle the behavior of one of the core methods within this library.

We are talking about the _make_request method here, which invokes response.raise_for_status() in order to raise exceptions on all HTTP requests to the OPS API which respond with HTTP status >= 400.

For reasons I am currently not able to remember exactly, we wanted to turn that behavior off within PatZilla.

Most probably it was because the exception bubbling up from this method lacked important information from the original request object we wanted to present to the user, like details from OAuth failures or rate limiting errors we are extracting from HTTP response headers or similar things.

Please advise if we are getting this wrong and the same things could be achieved with this call in place. Thanks!

With kind regards,
Andreas.

Bibliographic data missing from OPS family response

Dear George,

some users and the lovely people from @Patent2net reported that family responses from OPS stopped yielding bibliographic information for some weeks already. This broke some features of Patent2Net [2] as well as PatZilla [3].

As outlined on [1], it looks like only POST requests would fail while GET requests would still succeed.

With kind regards,
Andreas.

[1] https://meta.ip-tools.org/t/bibliographic-data-missing-from-ops-family-response/161
[2] https://github.com/Patent2net/P2N-v3
[3] https://github.com/ip-tools/ip-navigator

Use dotenv to configure local runtime environment

Use dotenv for setting local environments while testing. Make sure to gracefully fail for environments like Travis, where the envvar is injected by the testing runtime.

Next steps in maintenance and modernization

Dear George,

thank you for your trust in handing over the maintenance of the repository on behalf of GH-60. In order to closer approach the project and code base, I would like to spend some iterations on submitting a few maintenance patches, both modernizing the sandbox infrastructure, and the CI configuration for this repository.

While doing it, I would like to gradually work towards a modern Python project setup, ultimately using Ruff as a main linter, a pyproject.toml file for describing project metadata, poethepoet as task runner, versioningit for versioning, and GitHub Actions (GHA) for CI purposes, as I am doing on all our other newly conceived projects as well (see, for example, [1,2]), so I can re-use many parts from there already without much effort.

Alongside the same iterations of modernizing the repository, I will also add support for Python 3.10 and 3.11 to the CI configuration. As I do not expect any changes, I believe the functional side of the code base will not be touched at all. Along the way, the requirements-based dependency management will be dissolved, and, if you agree, I will also dissolve the usage of tox. I don't know much about it, and I would like to get rid of the maintenance burden.

The modernization process will happen gradually, and the Make-based sandbox will still be around for a while, so you can get gradually accustomed with the modernization process. It is important for me that you do not have the feeling to get lost in this process, still I would like to make reasonably progress on that, because regular work will also re-start soon on my end.

In this spirit, I am humbly asking about any concerns or objections you may have about this topic, how much you would like to still be in the boat to be concerned with all the details, or whether you trust me enough to welcome all the small decisions I will make on those next steps, and/or if you are giving a general chèque en blanc on this.

I hope you are appreciating the plan outlined in this post in general, otherwise please let me know so I can back off correspondingly.

With kind regards,
Andreas.

[1] https://github.com/panodata/aika
[2] https://github.com/daq-tools/lorrystream

Future maintenance of repository and package

Dear George,

first of all, I would like to thank you tremendously for conceiving and maintaining this package. I think it has a high value for open data and open access applications aiming to connect to EPO's OPS service to gather information.

At #59 (comment), you mentioned that you are not actively working on this project anymore, and that you would welcome anyone to take it over.

In this spirit, I am offering my support. A few references about our past work in this area are attached below. Let me know if this resonates with you. 🌻

With kind regards,
Andreas.

ip-tools / python-epo-ops-client Goto Github PK

python-epo-ops-client's Introduction

python-epo-ops-client

Features

Client

Middleware

Dogpile

Throttler

Storage

python-epo-ops-client's People

Contributors

Stargazers

Watchers

Forkers

python-epo-ops-client's Issues

Anomaly

How to reproduce?

Prepare

Run

Observations

Conclusion

About

References

Replaced the last line:

References

Recommend Projects

Recommend Topics

Recommend Org