Code Monkey home page Code Monkey logo

bookops-worldcat's Introduction

Build Status Coverage Status PyPI version PyPI - Python Version Code style: black License: MIT

bookops-worldcat

BookOps-Worldcat provides a Python interface for the WorldCat Metadata API. This wrapper simplifies requests to OCLC web services making them more accessible to OCLC member libraries.

Bookops-Worldcat version 1.0 supports changes released in version 2.0 (May 2023) of the OCLC Metadata API.

Installation

Use pip:

$ pip install bookops-worldcat

Documentation

For full documentation please see https://bookops-cat.github.io/bookops-worldcat/

Features

Bookops-Worldcat takes advantage of the functionality of the popular Requests library and interactions with OCLC's services are built around 'Requests' sessions. MetadataSession inherits all requests.Session properties. Server responses are requests.Response objects with all of their properties and methods.

Authorizing a web service session simply requires passing an access token to MetadataSession. Opening a session allows the user to call specific methods to facilitate communication between the user's script/client and a particular endpoint of the Metadata API. Many of the hurdles related to making valid requests are hidden under the hood of this package, making it as simple as possible.

BookOps-Worldcat supports requests to all endpoints of the WorldCat Metadata API 2.0 and Authentication using the Client Credential Grant flow:

  • Authentication via Client Credential Grant
  • Worldcat Metadata API
    • Manage Bibliographic Records
    • Manage Institution Holdings
    • Manage Local Bibliographic Data
    • Manage Local Holdings Records
    • Search Member Shared Print Holdings
    • Search Member General Holdings
    • Search Bibliographic Resources
    • Search Local Holdings Resources
    • Search Local Bibliographic Resources

Basic usage:

Authorizing a MetadataSession

from bookops_worldcat import WorldcatAccessToken
token = WorldcatAccessToken(
    key="my_WSKey",
    secret="my_secret",
    scopes="WorldCatMetadataAPI",
)
print(token)
#>"access_token: 'tk_Yebz4BpEp9dAsghA7KpWx6dYD1OZKWBlHjqW', expires_at: '2024-01-01 12:00:00Z'"
print(token.is_expired())
#>False

Search for brief bibliographic resources

with MetadataSession(authorization=token) as session:
    response = session.brief_bibs_search(q="ti:The Power Broker AND au: Caro, Robert")
    print(response.json())
{
  "numberOfRecords": 89,
  "briefRecords": [
    {
      "oclcNumber": "1631862",
      "title": "The power broker : Robert Moses and the fall of New York",
      "creator": "Robert A. Caro",
      "date": "1975",
      "machineReadableDate": "1975",
      "language": "eng",
      "generalFormat": "Book",
      "specificFormat": "PrintBook",
      "edition": "Vintage Books edition",
      "publisher": "Vintage Books",
      "catalogingInfo": {
        "catalogingAgency": "DLC",
        "catalogingLanguage": "eng",
        "levelOfCataloging": " ",
        "transcribingAgency": "DLC"
      }
    }
  ]
}

MetadataSession as Context Manager:

with MetadataSession(authorization=token) as session:
    result = session.bib_get("1631862")
    print(result.text) 
<?xml version='1.0' encoding='UTF-8'?>
  <record xmlns="http://www.loc.gov/MARC21/slim">
    <leader>00000cam a2200000 i 4500</leader>
    <controlfield tag="001">ocm01631862</controlfield>
    <controlfield tag="003">OCoLC</controlfield>
    <controlfield tag="005">20240201163642.4</controlfield>
    <controlfield tag="008">750320t19751974nyuabf   b    001 0beng  </controlfield>
    <datafield tag="010" ind1=" " ind2=" ">
      <subfield code="a">   75009557 </subfield>
    </datafield>
<!--...-->
    <datafield tag="020" ind1=" " ind2=" ">
      <subfield code="a">9780394720241</subfield>
      <subfield code="q">(paperback)</subfield>
<!--...-->
    <datafield tag="100" ind1="1" ind2=" ">
      <subfield code="a">Caro, Robert A.,</subfield>
      <subfield code="e">author.</subfield>
    </datafield>
    <datafield tag="245" ind1="1" ind2="4">
      <subfield code="a">The power broker :</subfield>
      <subfield code="b">Robert Moses and the fall of New York /</subfield>
      <subfield code="c">by Robert A. Caro.</subfield>
    </datafield>
    <datafield tag="246" ind1="3" ind2="0">
      <subfield code="a">Robert Moses and the fall of New York</subfield>
    </datafield>
    <datafield tag="250" ind1=" " ind2=" ">
      <subfield code="a">Vintage Books edition.</subfield>
    </datafield>
    <datafield tag="264" ind1=" " ind2="1">
      <subfield code="a">New York :</subfield>
      <subfield code="b">Vintage Books,</subfield>
      <subfield code="c">1975.</subfield>
    </datafield>
<!--...-->
</record>

Changes in Version 1.0

New functionality available in version 1.0:

  • Send requests to all endpoints of WorldCat Metadata API
    • Match bib records and retrieve bib classification
    • Create, update, and validate bib records
    • Create, retrieve, update, and delete local bib and holdings records
  • Add automatic retries to failed requests
  • Authenticate and authorize for multiple institutions within MetadataSession
  • Support for Python 3.11 and 3.12
  • Dropped support for Python 3.7

Migration Information

Bookops-Worldcat 1.0 introduces many breaking changes for users of previous versions. Due to a complete refactor of the Metadata API, the methods from Bookops-Worldcat 0.5.0 have been rewritten. Most of the functionality from previous versions of the Metadata API is still available in Version 2.0. For a comparison of the functionality available in Versions 1.0, 1.1, and 2.0 of the Metadata API, see OCLC's documentation and their functionality comparison table.

Versions 1.0 and 1.1 of the Metadata API will be sunset after April 30, 2024 at which point tools that rely on Bookops-Worldcat 0.5 will no longer be able to query the Metadata API.

For more information on changes made in Version 1.0, see Features in Version 1.0 in the docs.

Changelog

Consult the Changelog page for a full list of fixes and enhancements for each version.

Bugs/Requests

Please use the Github issue tracker to submit bugs or request features.

Contributing

See Contribution Guidelines for information on how to contribute to bookops-worldcat.

bookops-worldcat's People

Contributors

charlottekostelic avatar klinga avatar mfgloger avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

bookops-worldcat's Issues

Fix handling of authorization errors on internal server errors

When auth server has internal issues it returns HTTP 500 code and html (no json). Raising WorldcatAuthorizationError should provide content of this response instead of trying to parse non-existing json.
Replace on authorize.WorldcatAccessToken._parse_server_response line 179 response.json() with response.content

README update

Update README update with water down info from docs.

Update OCLC links

OCLC developer network pages have been redesigned - update links in README and documentation.

itemSubType issue

Apparently API has a problem with itemSubType parameter. Any requests with this parameter return 500 error:
https://americas.metadata.api.oclc.org/worldcat/search/v1/brief-bibs/?q=ti:zendegi&itemType=book&itemSubType=digital
{
"type": "SYSTEM_ERROR",
"title": "Internal Server Error",
"detail": "There was a system error. Please try again later"
}
Requests without the parameter works. It appears the service issue. Brought this to OCLC attention on 3/22/21. Awaiting their answer/fix"

Temporarily removed that parameter from tests.

timeout argument

It seems session timeout argument is not really passed into session requests. At this point timeout does not do anything.

Tests: fix exception msg assertions

fix indentation in exception assertion errors

correct way to assert:
with pytest.raises(Exception) as exc:
do_something()
assert "err msg" in str(exc.value)

SearchSession: enhance sru_query with available search types

Currently only "=" (keyword proximity search) is supported. Expand to:
Exact
provides a phrase search. The Exact phrase will start and end with the exact terms of the phrase. This is also called an anchored phrase search.
=
provides a keyword proximity search. The Proximity search is using WITH between terms with no words separating the words. This is sometimes called an unanchored phrase search.
All
provides a keyword search which gives results as if there was an AND between the search terms
Any
provides a keyword search which gives results as if there was an OR between the search terms

Utilities

Add utilities that will parse web services' responses and can be plugged easily to pymarc or similar library to create records.

Live tests can only be run by BookOps contributors

Live keys are loaded following a windows filepath and look for a BookOps-specific file name. Possible fixes:

  • Change how API credentials are loaded in order to allow non-BookOps contributors, and/or
  • Add info about running live tests to contributors section of documentation?

showHoldingsIndicators param

It appears showHoldingsIndicators in the /brief-bibs/{oclcNumber} endpoint is not working.
Bring to OCLC attention and verify.

Trouble getting an access token from WorldcatAccessToken()

I'm having some trouble getting an access token from WorldCatAccessToken() to query the Worldcat Search API. I see that the constructor accepts a few different parameters- I understand what key and secret are, but I'm not sure what kind of values to use for scopes, principal_id, and principal_idns.

Is there a list of scopes I can refer to? And what should I use for principal_id and principal_idns?

Thank you very much for any help you can provide- I appreciate it.

Get current holdings sample code question

I find that the holdings_get_current sample code (https://bookops-cat.github.io/bookops-worldcat/latest/manage_holdings/) only returns the following error, unless the OCLC number is enclosed in quotes:

Traceback (most recent call last):
File "C:\Users\tosaka\PycharmProjects\DataSync2\main.py", line 20, in
response = session.holdings_get_current(oclcNumbers=123456789)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\tosaka\Anaconda3\envs\pythonProject\Lib\site-packages\bookops_worldcat\metadata_api.py", line 879, in holdings_get_current
vetted_numbers = verify_oclc_numbers(oclcNumbers)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\tosaka\Anaconda3\envs\pythonProject\Lib\site-packages\bookops_worldcat\utils.py", line 84, in verify_oclc_numbers
raise InvalidOclcNumber(
bookops_worldcat.errors.InvalidOclcNumber: Argument 'oclcNumbers' must be a list or comma separated string of valid OCLC #s.

(OCoLC) prefix handling

def prep_oclc_number_str(oclcNumber: str) -> str:
"""
Checks for OCLC prefixes and removes them.
Args:
oclcNumber: OCLC record as string
Returns:
oclcNumber as str
"""
if oclcNumber.strip().startswith("ocm") or oclcNumber.strip().startswith("ocn"):
oclcNumber = oclcNumber.strip()[3:]
elif oclcNumber.strip().startswith("on"):
oclcNumber = oclcNumber.strip()[2:]
try:
oclcNumber = str(int(oclcNumber))
return oclcNumber
except ValueError:
raise InvalidOclcNumber("Argument 'oclcNumber' does not look like real OCLC #.")

What do you think about adding a functionality to remove the "(OCoLC)" prefix as it is recorded in the 035 MARC tag?
This may be more common to record the OCLC # in this tag instead of a local control field (001).

SearchSession.sru_query CQL sytnax

CQL queries in sru_query could omit "+" sign, instead use white space. When a url is build, white space would be replaced with +. Research how this would work for exact phrase searches enclosed in double quotes.

WorldcatAccessToken default timeout's mixed info

https://github.com/BookOps-CAT/bookops-worldcat/blob/6a865bb91dbbcd6a4bea5773df113fd479cccf33/bookops_worldcat/authorize.py#L42C1-L43C70

I'm concerned about somewhat mixed/confusing timeout defaults in the WorldcatAccessToken class. In the docstring it is stated timeout arg defaults to 3 while its type annotation uses Optional and gives the default value as None. This is somewhat exacerbated in the documentation by the display in the table with default value given as None. I think similar approach as in _session module is clearer:

    ...
        timeout: Union[int, float, Tuple[int, int], Tuple[float, float], None] = (
            5,
            5,
        )

Camel case in methods' arguments

For clarity use the camel case for arguments passed to methods. This will be in agreement to OCLC APIs approach and will be more intuitive to users.

tokens don't get refreshed before a request is made

In cases where requests are made in a very quick succession (looping over a list) a delay between method's check for expired token and actual request being sent/received is enough for the token to expire which causes the request to fail.
.is_expired. method should include at least 1 second padding.

Live service tests

Run selected, lightweight tests against live OCLC web services. Set up a cron job in Travis to run these tests once a week (month?) to validate and ensure no changes to the service was introduced.

SearchSession sru/cql query more in depth syntax validation

Provide deeper validation of syntax of SearchSession method for sru/cql queries.

  • catch query that is limits only
  • syntax errors
  • conflicts between Worldcat index and service_level argument (some work on when service level is set to 'full')

Use ValueError or custom exception?

Changes to /bibs-summary-holdings endpoint

/bibs-summary-holdings endpoint documentation does not show anymore parameters for offset and limit.
Check if they are no longer accepted by the endpoint and remove from search_geneal_holdings()

add tox to test multiple configurations

While basic python versions are tested by Travis, that is not the case for different dependency configurations. This will become more and more important with time. tox will work well with poetry instead of setup.py (reminder: review dependencies config in pyproject.toml file and make sure no unbound dependencies are defined).
This new workflow will change workflow for publishing to PyPI (update release-checklist doc).

see for reference:
https://wrongsideofmemphis.com/2018/10/28/package-and-deploy-a-python-module-in-pypi-with-poetry-tox-and-travis/
https://python-poetry.org/docs/faq/#is-tox-supported

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.