Comments (6)
I've just done this for now:
C_BYTES = (0..65535).inject(""){|string, c| string << c rescue RangeError; string}.freeze
Not really a solution, but enough so I can continue testing.
from stupidedi.
Unfortunately, X12 does not support UTF-8. The specification lists the set of allowed characters which is H_BASIC
(a subset of C_BYTES
). There is an extended set of characters, H_EXTENDED
, which can be used if both trading partners agree.
If you're constructing a X12 document to send to someone, you may have to transcode your UTF-8 to the limited character set. I would be surprised if extending C_BYTES
works, just because it wasn't intended to. You might try generating a document and then write it out as X12, then try reading it back in (even just using edi-pp
) to make sure it looks right.
from stupidedi.
I'm trying to think through how it would work, but one thing that might cause trouble is Reader.is_control_character?
returns true if a character isn't in H_BASIC
or H_EXTENDED
. So I think most UTF-8 would be treated as control characters.
From what I can work out, the consume_isa
method in StreamReader
will look for the start of the document, which is always ISA
, and ignore control characters. That should be OK unless you have an input like where something like IあああSああA
occurs before the X12 part of the file starts, since this will be tokenized as ISA
and it will think that's the beginning of a X12 document. Probably most people have files that are entirely X12, but I had files which had an arbitrary header message before the ISA
token (the spec doesn't forbid this), so consume_isa
is written to skip that.
So to summarize StreamReader
figures out where the X12 starts in a stream of arbitrary characters. Because it would throw away the new UTF-8 characters (it thinks they are control characters), it might identify a sequence of characters as the start of the X12 when it isn't. Seems unlikely to actually happen, unless your X12 files have random junk in between the ISA/ISE envelopes.
Next, TokenReader
is what scans the stream of characters for either segment identifiers like ST
, GE
etc or specific characters or delimiters, or entire parts of a segment, like all of its elements, etc. In most of these functions, when reading until a particular substring is matched, any "control characters" are thrown out, like they weren't even present in the input. So I think this will probably cause all of the new UTF-8 characters to be discarded, since they are classified as control characters along with things like line endings.
If you notice that happening, then you might look at Reader.is_control_character?
and change it so all the stuff that it previously classified as control characters (e.g., \n\t\f\v
, various single-byte characters) are still control characters, but the characters that you've added above 255.chr
aren't marked as control characters. That might actually work!
from stupidedi.
@kputnam Thanks very much for the detailed reply! If X12 does not support UTF-8, then I think that's enough to convince me to not use it. Actually our partner asked us if we could send it in non-UTF-8 characters so I think we'll have to do that.
Now just have to think of a way to map UTF-8 (Japanese) addresses into corresponding English addresses... which is a slightly different problem.
from stupidedi.
Good luck!
from stupidedi.
For anybody who encounters this problem, geocoder is your friend:
result = Geocoder.search("東京都武蔵野市吉祥寺本町二丁目5番10号 いちご吉祥寺ビル").first
result.address
=> "2 Chome-5-10 Kichijōji Honchō, Musashino-shi, Tōkyō-to 180-0004, Japan"
from stupidedi.
Related Issues (20)
- iterating over files with multiple ISAs HOT 3
- Doesn't handle arbitrarily large input files, and never will be able to HOT 5
- Duplicate Keys HOT 3
- Unknown patient gender in 270 causes validation error HOT 2
- value PR is not allowed in element NM101 HOT 2
- Segment ST Does not occur HOT 2
- Parse EDI 850 ERROR HOT 1
- Accept Tempfile in Stupidedi::Reader.build HOT 1
- Segment N1*PR~ cannot be reached
- Parsing an 837 and grabbing segment values HOT 2
- Is there a way to skip validations and build without some of the segments HOT 2
- 004010 version - B3-07 precision doesn't work as intended HOT 2
- value 091 is not allowed in element DTM01 Date/Time Qualifier HOT 2
- TD1 for X12 Release 4010 gets reduced to 2 fields instead of 10 for no reason HOT 1
- 835 - 005010X221A1 - NTE segment error HOT 2
- Stream writing HOT 2
- help wanted: Only the first element is return when iterating a repeated element HOT 2
- Inconsistent floating point output in 834 ICM data HOT 9
- Adding the additional accepted values for a segement HOT 1
- File permissions issue on v1.4.3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stupidedi.