Code Monkey home page Code Monkey logo

pymods's Issues

Pass up @valueURI for subject/name

Should valueURI's from subject/elem@valueURI pass up to subject@valueURI when there isn't one there?

Example:

<subject>
  <name authority="lcnaf" 
              authorityURI="http://id.loc.gov/authorities/names" 
              type="personal" 
              valueURI="http://id.loc.gov/authorities/names/n79056767">
    <namePart type="date">1877-1956</namePart>
    <namePart type="given">Alben William</namePart>
    <namePart type="family">Barkley</namePart>
  </name>
</subject>

returns:

{ children: [ { type="name", valueURI="http://id.loc.gov/authorities/names/n79056767", ... }, ... ], ...}

when it might be more useful to return:

{ valueURI="http://id.loc.gov/authorities/names/n79056767", children= [...], ... }

use MODSReader on string-Objects

Hello!

I'm trying to apply the MODSReader not to a xml-file (as in the examples provided) but rather on requests.get-responses I've tried transforming the xml-string into a file-like object using io.StringIO (which would be the usual way to deal with the issue in etree, I guess), but I'm getting a ValueError:

  File "mods_parse.py", line 6, in <module>
    MODSReader(io.StringIO(request_opac("pica.sys=j2017").text))
  File "/home/alex/.local/lib/python3.6/site-packages/pymods/reader.py", line 58, in __init__
    super(MODSReader, self).__init__(file_location, '{0}mods'.format(NAMESPACES['mods']), parser=mods_parser)
  File "/home/alex/.local/lib/python3.6/site-packages/pymods/reader.py", line 27, in __init__
    self.iterator = parse(file_location, parser=parser).iter(iter_elem)
  File "/home/alex/.local/lib/python3.6/site-packages/pymods/reader.py", line 8, in parse
    return etree.parse(source, parser=parser)
  File "src/lxml/etree.pyx", line 3469, in lxml.etree.parse
  File "src/lxml/parser.pxi", line 1856, in lxml.etree._parseDocument
  File "src/lxml/parser.pxi", line 1871, in lxml.etree._parseMemoryDocument
ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.

Could you suggest me a way to pipe the xml-string directly into the parser?

Thank you very much!

Extend instance vars for more authority types

RDA content: <genre authority="rdacontent">text</genre>
RDA media: <form authority="rdamedia" type="RDA media terms">computer</form>
RDA carrier: <form authority="rdacarrier" type="RDA carrier terms">online resource</form>
COAR resource type: <genre authority="coar" authorityURI="http://purl.org/coar/resource_type" valueURI="http://purl.org/coar/resource_type/">bachelor thesis</genre>

Element comparison errors when sorting NamedTuples

Some pymods.Record functions that return sorted lists of NamedTuples including lxml elements cause comparison error when there is a repeated element.

Given the structure:

<name>
  <namePart>Cash</namePart>
</name>
<name>
  <namePart>Cash</namePart>
</name>

pymods will create the tuples:
Name('Cash', '', '', '', '', '', lxml.Element pointer A)
Name('Cash', '', '', '', '', '', lxml.Element pointer B)

The only unique tuple element for sorting is the element pointer, but comparisons between lxml.Elements is not allowed:

TypeError: '<' not supported between instances of 'lxml.etree._Element' and 'lxml.etree._Element'

Really there's no reason for any pymods.Record function or property to return a sorted list. Removing the sort()'s would easily fix this.

Single records must be called with an iterator

In [2]: rec = pymods.MODSReader('FSU_ARHHouse_1018.xml')

In [3]: rec.
rec.close                      rec.makeelement
rec.copy                      rec.resolvers
rec.error_log                rec.setElementClassLookup
rec.feed                       rec.set_element_class_lookup
rec.feed_error_log        rec.target
rec.iterator                  rec.version

In [3]: mods = next(rec)

In [4]: mods.tag
Out[4]: '{http://www.loc.gov/mods/v3}mods'

Empty elements throwing TypeErrors

For example

        <subject>
          <name authority="lcnaf" authorityURI="http://id.loc.gov/authorities/names" type="personal" valueURI="http://id.loc.gov/authorities/names/n85385724">
            <namePart type="date"/>
            <namePart type="given">Benjamin V.</namePart>
            <namePart type="family">Cohen</namePart>
          </name>
        </subject>

returns
KeyError - sourceResource.subject: None, Can't convert 'NoneType' object to str implicitly

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.