Code Monkey home page Code Monkey logo

Comments (7)

drake127 avatar drake127 commented on September 24, 2024

Hi, can I vouch for this ticket?

I happen to need to parse thousands of character (BIC) files, process them in XML a save them back. When I fixed up encodings (cp1250->utf-8->cp1250) I encountered color codes in one of the files.

Specifically <c\x01\x01\x01> ie. black which does convert to UTF-8 (noop, remains <c\x01\x01\x01>) but does not convert back to cp-1250 in xml2gff.

The actual error was:

ERROR: XML document failed to parse
    Because: xmlvault/Jiraiya/gorky.xml:287: parser error : PCDATA invalid Char value 1
    <exostring label="FamiliarName">&lt;c&gt;&lt;/c&gt;</exostring>

Thanks.

from xoreos-tools.

drake127 avatar drake127 commented on September 24, 2024

To clarify: gff2xml does NOT transform them to <CRRGGBBAA>, they stays in raw values. See attachment.

char.zip

from xoreos-tools.

DrMcCoy avatar DrMcCoy commented on September 24, 2024

Ah, yes, seems like we're only doing that for LocStrings and TLK strings, not for ExoStrings. Since we're not reconstructing them anyway, do consider gff2xml and xml2gff broken for GFF files with color codes for now.

What do you mean with wanting to "vouch" for this ticket? Just to add your vote that you consider this an important issue, or that you'd want to work on a fix yourself?

I can offer pointers for the latter case, but I can't guarantee when I'll have time to work on this myself. To be fair, I do have a few weeks of vacations coming up, though, so chances are good I'll find time to work on this before the end of the year, but I wouldn't say no to "converting" you to a contributors to the xoreos project. ;)

from xoreos-tools.

drake127 avatar drake127 commented on September 24, 2024

😄 I intended to just add a vote but if it annoys me too much, I'd try to fix it somehow.

My issue is not only with color codes but with any invalid characters. I would be quite happy if it was just encoded like <c&#01;&#01;&#01;> but libxml2 will not parse it anyway (I tried). This is unfortunate as it's not possible to store arbitrary payload to XML...

Maybe it would be more reasonable to base64 encode whole string (when it contains !libxml2::IS_CHAR(c) character) instead of converting just color codes with arbitrary format?

from xoreos-tools.

DrMcCoy avatar DrMcCoy commented on September 24, 2024

Yeah, gff2xml already base64 encodes string with invalid byte values (which it detects by checking if iconv throws an error), but again not for ExoStrings, that's one thing I overlooked. Also apparently not for all LocStrings? I need to recheck that.

You can see this happening for your example file, because it shows a warning on stderr, and some strings consist of "[!?!]" (a dummy value) instead. That's certainly a bug, in that case gff2xml should base64-encode the string instead, so that we're at least no losing any data.

In the case of the color codes, I would still like to convert them to something reasonable and human-readable, like the <cRRGGBB> format.

from xoreos-tools.

drake127 avatar drake127 commented on September 24, 2024

I haven't look into code yet but I don't think that iconv would fail (in some instances) as all characters < 0x80 are valid in UTF-8 but not in XML. That's what causes the error.

I use cp1250 and so far didn't see an issue with dummy characters. Entire cp1250 (0-255) should map to UTF-8 and back to cp1250 correctly.

from xoreos-tools.

DrMcCoy avatar DrMcCoy commented on September 24, 2024

No, there are undefined values in CP1250, for example 0x90.

Compare

echo -ne "\x66\x6f\x6f\x0a" | iconv -f cp1250 -t utf8

vs

echo -ne "\x66\x6f\x6f\x90\x0a" | iconv -f cp1250 -t utf8

For gorky.bic, look at line 593ff of the xml produced by gff2xml:

        <locstring label="DescIdentified" strref="4294967295">
          <string language="0">[!?!]</string>
        </locstring>

Still, for color codes, certain values might not throw an error. Which is why they still need to be preparsed first.

See line 287, for example:

<exostring label="FamiliarName">&lt;c^A^A^A&gt;&lt;/c&gt;</exostring>

For readability, it might an idea to wrap color-coded strings in a CDATA section, to increase readiblity with the angle brackets.

from xoreos-tools.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.