mrmiguez / pymods Goto Github PK
View Code? Open in Web Editor NEWprocess MODS records from Python
Home Page: https://pypi.python.org/pypi/pymods
License: MIT License
process MODS records from Python
Home Page: https://pypi.python.org/pypi/pymods
License: MIT License
if related_item.find('./{0}location/{0}url'.format(NAMESPACES['mods'])).text is not None:
(line 79) should test for elem.text
only after testing for elem
roles=Role(text, code, authority) in the same tuple rather than a list of Role(text, type, ...) tuples.
Like so (lines 646-468):
return [split_text.strip()
for item in self.findall('{0}'.format(elem))
for split_text in item.text.split(delimiter)]
.strip()
list items before they are appended and returned.
This is a known issue with custom parsers: http://lxml.de/element_classes.html#element-initialization.
An internal container needs to be built.
Should valueURI's from subject/elem@valueURI pass up to subject@valueURI when there isn't one there?
Example:
<subject>
<name authority="lcnaf"
authorityURI="http://id.loc.gov/authorities/names"
type="personal"
valueURI="http://id.loc.gov/authorities/names/n79056767">
<namePart type="date">1877-1956</namePart>
<namePart type="given">Alben William</namePart>
<namePart type="family">Barkley</namePart>
</name>
</subject>
returns:
{ children: [ { type="name", valueURI="http://id.loc.gov/authorities/names/n79056767", ... }, ... ], ...}
when it might be more useful to return:
{ valueURI="http://id.loc.gov/authorities/names/n79056767", children= [...], ... }
Hello!
I'm trying to apply the MODSReader not to a xml-file (as in the examples provided) but rather on requests.get
-responses I've tried transforming the xml-string into a file-like object using io.StringIO
(which would be the usual way to deal with the issue in etree, I guess), but I'm getting a ValueError
:
File "mods_parse.py", line 6, in <module>
MODSReader(io.StringIO(request_opac("pica.sys=j2017").text))
File "/home/alex/.local/lib/python3.6/site-packages/pymods/reader.py", line 58, in __init__
super(MODSReader, self).__init__(file_location, '{0}mods'.format(NAMESPACES['mods']), parser=mods_parser)
File "/home/alex/.local/lib/python3.6/site-packages/pymods/reader.py", line 27, in __init__
self.iterator = parse(file_location, parser=parser).iter(iter_elem)
File "/home/alex/.local/lib/python3.6/site-packages/pymods/reader.py", line 8, in parse
return etree.parse(source, parser=parser)
File "src/lxml/etree.pyx", line 3469, in lxml.etree.parse
File "src/lxml/parser.pxi", line 1856, in lxml.etree._parseDocument
File "src/lxml/parser.pxi", line 1871, in lxml.etree._parseMemoryDocument
ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.
Could you suggest me a way to pipe the xml-string directly into the parser?
Thank you very much!
RDA content: <genre authority="rdacontent">text</genre>
RDA media: <form authority="rdamedia" type="RDA media terms">computer</form>
RDA carrier: <form authority="rdacarrier" type="RDA carrier terms">online resource</form>
COAR resource type: <genre authority="coar" authorityURI="http://purl.org/coar/resource_type" valueURI="http://purl.org/coar/resource_type/">bachelor thesis</genre>
Some pymods.Record functions that return sorted lists of NamedTuples including lxml elements cause comparison error when there is a repeated element.
Given the structure:
<name>
<namePart>Cash</namePart>
</name>
<name>
<namePart>Cash</namePart>
</name>
pymods will create the tuples:
Name('Cash', '', '', '', '', '', lxml.Element pointer A)
Name('Cash', '', '', '', '', '', lxml.Element pointer B)
The only unique tuple element for sorting is the element pointer, but comparisons between lxml.Elements is not allowed:
TypeError: '<' not supported between instances of 'lxml.etree._Element' and 'lxml.etree._Element'
Really there's no reason for any pymods.Record function or property to return a sorted list. Removing the sort()
's would easily fix this.
In [2]: rec = pymods.MODSReader('FSU_ARHHouse_1018.xml')
In [3]: rec.
rec.close rec.makeelement
rec.copy rec.resolvers
rec.error_log rec.setElementClassLookup
rec.feed rec.set_element_class_lookup
rec.feed_error_log rec.target
rec.iterator rec.version
In [3]: mods = next(rec)
In [4]: mods.tag
Out[4]: '{http://www.loc.gov/mods/v3}mods'
line 647, in get_element
for item in self.finall('{0}'.format(elem))
For example
<subject>
<name authority="lcnaf" authorityURI="http://id.loc.gov/authorities/names" type="personal" valueURI="http://id.loc.gov/authorities/names/n85385724">
<namePart type="date"/>
<namePart type="given">Benjamin V.</namePart>
<namePart type="family">Cohen</namePart>
</name>
</subject>
returns
KeyError - sourceResource.subject: None, Can't convert 'NoneType' object to str implicitly
Add support for VRA Core schema and elements: https://www.loc.gov/standards/vracore/
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.