Code Monkey home page Code Monkey logo

Comments (6)

iherman avatar iherman commented on June 8, 2024

Tim,

I am on vacations right now, so I cannot really look at it for another two weeks. However... I suspect I know the answer. As it has been discussed on the core
RDFLib mailing list, the latest version of the HTML5Lib has a bug in handling unicode characters. Unfortunately, while the HTML5Lib people handle that, we have
to rely on an earlier (I think it was 0.95) version that worked without problems.

I will have to update the Readme file on the github repository; I will do that when I am back.

I hope this answers your question/issue (even if, I know, it is not a very nice situation...)

Thanks for your nice words on the tool itself!

Sincerely

Ivan Herman

On 2013-7-15 13:46 , Tim Strehle wrote:

Thanks for this very useful tool! I’m trying to turn this RDFa into RDF/XML using scripts/localRDFa.py (note the Unicode ellipsis characters):

|

Unicode is accepted here…

… but not here!
|

It fails with these error messages:

|[digicol@timsdcxvm pyrdfa3-master]$ scripts/localRDFa.py -p /tmp/unicode.html
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/pyRdfa/init.py", line 648, in graph_from_source
return self.graph_from_DOM(dom, graph, pgraph)
File "/usr/lib/python2.6/site-packages/pyRdfa/init.py", line 501, in graph_from_DOM
parse_one_node(topElement, default_graph, None, state, [])
File "/usr/lib/python2.6/site-packages/pyRdfa/parse.py", line 67, in parse_one_node
_parse_1_1(node, graph, parent_object, incoming_state, parent_incomplete_triples)
File "/usr/lib/python2.6/site-packages/pyRdfa/parse.py", line 289, in _parse_1_1
_parse_1_1(n, graph, object_to_children, state, incomplete_triples)
File "/usr/lib/python2.6/site-packages/pyRdfa/parse.py", line 289, in _parse_1_1
_parse_1_1(n, graph, object_to_children, state, incomplete_triples)
File "/usr/lib/python2.6/site-packages/pyRdfa/parse.py", line 289, in _parse_1_1
_parse_1_1(n, graph, object_to_children, state, incomplete_triples)
File "/usr/lib/python2.6/site-packages/pyRdfa/parse.py", line 275, in _parse_1_1
ProcessProperty(node, graph, current_subject, state, typed_resource).generate_1_1()
File "/usr/lib/python2.6/site-packages/pyRdfa/property.py", line 126, in generate_1_1
object = Literal(self._get_HTML_literal(self.node), datatype=HTMLLiteral)
File "/usr/lib/python2.6/site-packages/rdflib-4.0.1-py2.6.egg/rdflib/term.py", line 564, in new
_value, _datatype = _castPythonToLiteral(value)
File "/usr/lib/python2.6/site-packages/rdflib-4.0.1-py2.6.egg/rdflib/term.py", line 1386, in _castPythonToLiteral
return castFunc(obj), dType
File "/usr/lib/python2.6/site-packages/rdflib-4.0.1-py2.6.egg/rdflib/term.py", line 1319, in _writeXML
if s.startswith(b(u'')):
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 38: ordinal not in range(128)
Traceback (most recent call last):
File "scripts/localRDFa.py", line 126, in
print processor.rdf_from_sources(value, outputFormat = format, rdfOutput = rdfOutput)
File "/usr/lib/python2.6/site-packages/pyRdfa/init.py", line 685, in rdf_from_sources
self.graph_from_source(name, graph, rdfOutput)
File "/usr/lib/python2.6/site-packages/pyRdfa/init.py", line 657, in graph_from_source
if not rdfOutput : raise b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 38: ordinal not in range(128)
|

If I remove the Unicode ellipsis character from the schema:articleBody, the HTML parses fine. It doesn’t hurt in the schema:headline.

I don’t know Python (yet) so I’m reporting this here, hoping that someone has the time for a hopefully quick fix. Thanks for looking into this!


Reply to this email directly or view it on GitHub #6.

Ivan Herman
Bankrashof 108
1183NW Amstelveen
The Netherlands
tel: +31-64-1044153
http://www.ivan-herman.net

from pyrdfa3.

tistre avatar tistre commented on June 8, 2024

Ivan,

thanks a lot for the quick reply. It’s not urgent, enjoy your vacation :-)

This page told me how to downgrade with “pip install html5lib==0.95”:

http://stackoverflow.com/questions/17462385/python-rdflib-not-parsing-creative-commons-license-information-correctly

Even though “pip list” now says:

html5lib (0.95)
pyRdfa (3.4.3)
rdflib (4.0.1)

… the above example still fails for me. But I might be doing something wrong.

Kind regards,
Tim

from pyrdfa3.

iherman avatar iherman commented on June 8, 2024

Tim,

I have tried it on my local machine (which runs 0.95), and indeed there seems to be a problem. Let me look into this when I am back to work!

Cheers

Ivan

On 2013-7-16 23:27 , Tim Strehle wrote:

Ivan,

thanks a lot for the quick reply. It’s not urgent, enjoy your vacation :-)

This page told me how to downgrade with “pip install html5lib==0.95”:

http://stackoverflow.com/questions/17462385/python-rdflib-not-parsing-creative-commons-license-information-correctly

Even though “pip list” now says:

html5lib (0.95)
pyRdfa (3.4.3)
rdflib (4.0.1)

… the above example still fails for me. But I might be doing something wrong.

Kind regards,
Tim


Reply to this email directly or view it on GitHub #6 (comment).

Ivan Herman
Bankrashof 108
1183NW Amstelveen
The Netherlands
tel: +31-64-1044153
http://www.ivan-herman.net

from pyrdfa3.

iherman avatar iherman commented on June 8, 2024

Sigh...

I hope I have handled it although, I must say, it is pretty much of a hack
because there are some mysterious things going on with the encoding of unicode
strings, utf-8 and all that mess. In python3 this ought to be much better.

In case you use the version on git, it should be updated now. In case you use
the service on the W3C web site, I will have to get back to the system guys to
make an update for me, that will not happen before next week...

Thanks!

Ivan

Tim Strehle wrote:

Ivan,

thanks a lot for the quick reply. It’s not urgent, enjoy your vacation :-)

This page told me how to downgrade with “pip install html5lib==0.95”:

http://stackoverflow.com/questions/17462385/python-rdflib-not-parsing-creative-commons-license-information-correctly

Even though “pip list” now says:

html5lib (0.95)
pyRdfa (3.4.3)
rdflib (4.0.1)

… the above example still fails for me. But I might be doing something wrong.

Kind regards,
Tim


Reply to this email directly or view it on GitHub
#6 (comment).

Ivan Herman
4, rue Beauvallon, Clos St. Joseph
13090 Aix-en-Provence
France
tel: +31-64-1044153 ou +33 6 52 46 00 43
http://www.ivan-herman.net

from pyrdfa3.

tistre avatar tistre commented on June 8, 2024

Thanks a lot, the latest git master branch works fine now!

[digicol@timsdcxvm pyrdfa3-master]$ scripts/localRDFa.py -p /tmp/unicode.html
<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:schema="http://schema.org/"
>
  <schema:BlogPosting rdf:about="http://example.com/blog/1">
    <schema:articleBody rdf:datatype="http://www.w3.org/1999/02/22-rdf-syntax-ns#HTML">… but not here!</schema:articleBody>
    <schema:headline xml:lang="en">Unicode is accepted here…</schema:headline>
  </schema:BlogPosting>
</rdf:RDF>

from pyrdfa3.

iherman avatar iherman commented on June 8, 2024

:-)

Ivan

On Jul 30, 2013, at 21:36 , Tim Strehle [email protected] wrote:

Thanks a lot, the latest git master branch works fine now!

[digicol@timsdcxvm pyrdfa3-master]$ scripts/localRDFa.py -p /tmp/unicode.html

<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:schema="http://schema.org/"

… but not here!/schema:articleBody
Unicode is accepted here…/schema:headline
/schema:BlogPosting
/rdf:RDF


Reply to this email directly or view it on GitHub.

from pyrdfa3.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.