mrmiguez / pymods Goto Github PK

View Code? Open in Web Editor NEW

18.0 6.0 0.0 8.77 MB

process MODS records from Python

Home Page: https://pypi.python.org/pypi/pymods

License: MIT License

Python 100.00%

mods xml oai metadata o

pymods's Issues

URL search in record.collection can return an AttributeError

if related_item.find('./{0}location/{0}url'.format(NAMESPACES['mods'])).text is not None: (line 79) should test for elem.text only after testing for elem

names.role should act more link languages

roles=Role(text, code, authority) in the same tuple rather than a list of Role(text, type, ...) tuples.

DCRecord strip whitespace from delimited returns

Like so (lines 646-468):

        return [split_text.strip()
                for item in self.findall('{0}'.format(elem))
                for split_text in item.text.split(delimiter)]

OAIRecord.metadata.get_element needs to strip whitespace

.strip() list items before they are appended and returned.

lxml.etree persistence

This is a known issue with custom parsers: http://lxml.de/element_classes.html#element-initialization.

An internal container needs to be built.

Pass up @valueURI for subject/name

Should valueURI's from subject/elem@valueURI pass up to subject@valueURI when there isn't one there?

Example:

<subject>
  <name authority="lcnaf" 
              authorityURI="http://id.loc.gov/authorities/names" 
              type="personal" 
              valueURI="http://id.loc.gov/authorities/names/n79056767">
    <namePart type="date">1877-1956</namePart>
    <namePart type="given">Alben William</namePart>
    <namePart type="family">Barkley</namePart>
  </name>
</subject>

returns:

{ children: [ { type="name", valueURI="http://id.loc.gov/authorities/names/n79056767", ... }, ... ], ...}

when it might be more useful to return:

{ valueURI="http://id.loc.gov/authorities/names/n79056767", children= [...], ... }

use MODSReader on string-Objects

Hello!

I'm trying to apply the MODSReader not to a xml-file (as in the examples provided) but rather on requests.get-responses I've tried transforming the xml-string into a file-like object using io.StringIO (which would be the usual way to deal with the issue in etree, I guess), but I'm getting a ValueError:

  File "mods_parse.py", line 6, in <module>
    MODSReader(io.StringIO(request_opac("pica.sys=j2017").text))
  File "/home/alex/.local/lib/python3.6/site-packages/pymods/reader.py", line 58, in __init__
    super(MODSReader, self).__init__(file_location, '{0}mods'.format(NAMESPACES['mods']), parser=mods_parser)
  File "/home/alex/.local/lib/python3.6/site-packages/pymods/reader.py", line 27, in __init__
    self.iterator = parse(file_location, parser=parser).iter(iter_elem)
  File "/home/alex/.local/lib/python3.6/site-packages/pymods/reader.py", line 8, in parse
    return etree.parse(source, parser=parser)
  File "src/lxml/etree.pyx", line 3469, in lxml.etree.parse
  File "src/lxml/parser.pxi", line 1856, in lxml.etree._parseDocument
  File "src/lxml/parser.pxi", line 1871, in lxml.etree._parseMemoryDocument
ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.

Could you suggest me a way to pipe the xml-string directly into the parser?

Thank you very much!

Extend instance vars for more authority types

RDA content: <genre authority="rdacontent">text</genre>
RDA media: <form authority="rdamedia" type="RDA media terms">computer</form>
RDA carrier: <form authority="rdacarrier" type="RDA carrier terms">online resource</form>
COAR resource type: <genre authority="coar" authorityURI="http://purl.org/coar/resource_type" valueURI="http://purl.org/coar/resource_type/">bachelor thesis</genre>

Element comparison errors when sorting NamedTuples

Some pymods.Record functions that return sorted lists of NamedTuples including lxml elements cause comparison error when there is a repeated element.

Given the structure:

<name>
  <namePart>Cash</namePart>
</name>
<name>
  <namePart>Cash</namePart>
</name>

pymods will create the tuples:
Name('Cash', '', '', '', '', '', lxml.Element pointer A)
Name('Cash', '', '', '', '', '', lxml.Element pointer B)

The only unique tuple element for sorting is the element pointer, but comparisons between lxml.Elements is not allowed:

TypeError: '<' not supported between instances of 'lxml.etree._Element' and 'lxml.etree._Element'

Really there's no reason for any pymods.Record function or property to return a sorted list. Removing the sort()'s would easily fix this.

Single records must be called with an iterator

In [2]: rec = pymods.MODSReader('FSU_ARHHouse_1018.xml')

In [3]: rec.
rec.close                      rec.makeelement
rec.copy                      rec.resolvers
rec.error_log                rec.setElementClassLookup
rec.feed                       rec.set_element_class_lookup
rec.feed_error_log        rec.target
rec.iterator                  rec.version

In [3]: mods = next(rec)

In [4]: mods.tag
Out[4]: '{http://www.loc.gov/mods/v3}mods'

Typo in pymods.DCRecord

line 647, in get_element
for item in self.finall('{0}'.format(elem))

Empty elements throwing TypeErrors

For example

        <subject>
          <name authority="lcnaf" authorityURI="http://id.loc.gov/authorities/names" type="personal" valueURI="http://id.loc.gov/authorities/names/n85385724">
            <namePart type="date"/>
            <namePart type="given">Benjamin V.</namePart>
            <namePart type="family">Cohen</namePart>
          </name>
        </subject>

returns
KeyError - sourceResource.subject: None, Can't convert 'NoneType' object to str implicitly

VRA Core support

Add support for VRA Core schema and elements: https://www.loc.gov/standards/vracore/

mrmiguez / pymods Goto Github PK

pymods's Issues

URL search in record.collection can return an AttributeError

names.role should act more link languages

DCRecord needs tests

DCRecord strip whitespace from delimited returns

OAIRecord.metadata.get_element needs to strip whitespace

lxml.etree persistence

Pass up @valueURI for subject/name

use MODSReader on string-Objects

Extend instance vars for more authority types

Element comparison errors when sorting NamedTuples

Single records must be called with an iterator

Typo in pymods.DCRecord

Empty elements throwing TypeErrors

VRA Core support

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent