Comments (7)
Hi, can I vouch for this ticket?
I happen to need to parse thousands of character (BIC) files, process them in XML a save them back. When I fixed up encodings (cp1250->utf-8->cp1250) I encountered color codes in one of the files.
Specifically <c\x01\x01\x01> ie. black which does convert to UTF-8 (noop, remains <c\x01\x01\x01>) but does not convert back to cp-1250 in xml2gff.
The actual error was:
ERROR: XML document failed to parse
Because: xmlvault/Jiraiya/gorky.xml:287: parser error : PCDATA invalid Char value 1
<exostring label="FamiliarName"><c></c></exostring>
Thanks.
from xoreos-tools.
To clarify: gff2xml does NOT transform them to <CRRGGBBAA>, they stays in raw values. See attachment.
from xoreos-tools.
Ah, yes, seems like we're only doing that for LocStrings and TLK strings, not for ExoStrings. Since we're not reconstructing them anyway, do consider gff2xml and xml2gff broken for GFF files with color codes for now.
What do you mean with wanting to "vouch" for this ticket? Just to add your vote that you consider this an important issue, or that you'd want to work on a fix yourself?
I can offer pointers for the latter case, but I can't guarantee when I'll have time to work on this myself. To be fair, I do have a few weeks of vacations coming up, though, so chances are good I'll find time to work on this before the end of the year, but I wouldn't say no to "converting" you to a contributors to the xoreos project. ;)
from xoreos-tools.
😄 I intended to just add a vote but if it annoys me too much, I'd try to fix it somehow.
My issue is not only with color codes but with any invalid characters. I would be quite happy if it was just encoded like <c>
but libxml2 will not parse it anyway (I tried). This is unfortunate as it's not possible to store arbitrary payload to XML...
Maybe it would be more reasonable to base64 encode whole string (when it contains !libxml2::IS_CHAR(c)
character) instead of converting just color codes with arbitrary format?
from xoreos-tools.
Yeah, gff2xml already base64 encodes string with invalid byte values (which it detects by checking if iconv throws an error), but again not for ExoStrings, that's one thing I overlooked. Also apparently not for all LocStrings? I need to recheck that.
You can see this happening for your example file, because it shows a warning on stderr, and some strings consist of "[!?!]" (a dummy value) instead. That's certainly a bug, in that case gff2xml should base64-encode the string instead, so that we're at least no losing any data.
In the case of the color codes, I would still like to convert them to something reasonable and human-readable, like the <cRRGGBB>
format.
from xoreos-tools.
I haven't look into code yet but I don't think that iconv would fail (in some instances) as all characters < 0x80 are valid in UTF-8 but not in XML. That's what causes the error.
I use cp1250 and so far didn't see an issue with dummy characters. Entire cp1250 (0-255) should map to UTF-8 and back to cp1250 correctly.
from xoreos-tools.
No, there are undefined values in CP1250, for example 0x90.
Compare
echo -ne "\x66\x6f\x6f\x0a" | iconv -f cp1250 -t utf8
vs
echo -ne "\x66\x6f\x6f\x90\x0a" | iconv -f cp1250 -t utf8
For gorky.bic, look at line 593ff of the xml produced by gff2xml:
<locstring label="DescIdentified" strref="4294967295">
<string language="0">[!?!]</string>
</locstring>
Still, for color codes, certain values might not throw an error. Which is why they still need to be preparsed first.
See line 287, for example:
<exostring label="FamiliarName"><c^A^A^A></c></exostring>
For readability, it might an idea to wrap color-coded strings in a CDATA section, to increase readiblity with the angle brackets.
from xoreos-tools.
Related Issues (20)
- xml2gff not included in xoreos-tools-0.0.5 release? HOT 7
- XML2GFF: Tool needs to be game/encoding aware HOT 1
- NCSDIS: Control flow analysis failure in do-loop nested in while-loop HOT 15
- NCSDIS: Analyse control flow for recursive functions, and functions with incompatible fork merging HOT 1
- FEATURE: Compile xoreos-tools as a DLL/library HOT 6
- FEATURE: Make more tools read stdin for input HOT 5
- FEATURE: "File picker" for RIM/ERF/MOD files HOT 9
- ERF: MOD files have incorrect prefix HOT 4
- ERF: Recursion leads to infinite loop HOT 1
- Clarify XML format for xml2gff HOT 1
- GFF2XML -> XML2GFF (invalid base64 length) HOT 5
- UNOBB: Add support for KotOR2 Android obb archives HOT 29
- Issue running configure to build project: Syntax error near unexpected token `FT2 HOT 3
- xml2gff: NWN:EE invalid file format HOT 12
- error while compiling HOT 6
- another error while compiling HOT 1
- FEATURE: Add tool to convert TGA back to TPC
- CONVERT2DA: Code and documentation mismatch for the parameters HOT 5
- xml2tlk gives error when trying convert back cp1251 xml HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from xoreos-tools.