Comments (14)
This would be great. How painful did this change look to be? I might be able to contribute if it's not huge.
from jsonnet.
My plan was to avoid having a dependency on ICU -- store everything
internally as wstring and assume that wchar is a unicode codepoint. Then
we just need to tweak the lexer to parse utf8 in string literals and the
string output function to render it back as utf8. It shouldn't be too hard
as I left some placeholders and TODOs in there. You're very welcome to
have a try at it.
I suggest 1) modifying the internal string representation in state.h 2)
modifying the output code to encode utf8 and testing it with std.char(x)
for x > 127 and 3) modifying the lexer to parse utf8. It would be possible
to run all tests and commit upstream at each intermediate point.
This would be great. How painful did this change look to be? I might be
able to contribute if it's not huge.
—
Reply to this email directly or view it on GitHub
#1 (comment).
from jsonnet.
Great, thanks for the info @sparkprime. I'll update here if I get a chance to try it; I need more emoji in my json 🍻
I really love Jsonnet BTW. My team is using it along with ApiDoc to create API documentation that doubles as a mock API server for developing apps against APIs that aren't finished yet.
from jsonnet.
Glad you like it!
I did some reading and it seems wstring is not what we want because it has UTF16 behavior on windows. So we probably need to do something like
typedef std::basic_string<char32_t> JsonnetString;
with functions to convert from UTF8-encoded std::string to that and back.
There are a bunch of places where the HeapString internal representation leaks out into other places as well, e.g. field names, std.extVar() keys, filenames (from std.thisFile) etc.
from jsonnet.
@hotdog929 you may be interested
from jsonnet.
I'm going to have a go at this because I think it's probably harder / more work than I originally thought.
from jsonnet.
That was a productive 4 hours ;)
from jsonnet.
Wow @sparkprime, way to kill it!!
from jsonnet.
Nice! :D
Perhaps I should also add a jsonnet_test
Bazel rule since it is possible to write tests in Jsonnet, such as the unicode.jsonnet
test you just added. :)
from jsonnet.
Looks like normal unicode characters (like “
”
‘
’
etc) are working fine, but longer sequences for emoji (like 🚀 -- "\xF0\x9F\x9A\x80") always become the sequence "\xEF\xBF\xBD\xEF\xBF\xBD\xEF\xBF\xBD\xEF\xBF\xBD"
I'm suspicious of the encode_utf8 method, but I'm struggling to understand what all the bit masking and shifting is doing.
from jsonnet.
I think I have a fix, looks like a typo on this line:
} else if ((c0 & 0xF8) == 0xF) { //11110zzz 10zzyyyy 10yyyyxx 10xxxxxx
Changing that to the following seems more right
} else if ((c0 & 0xF8) == 0xF0) { //11110zzz 10zzyyyy 10yyyyxx 10xxxxxx
from jsonnet.
Submitted a fix as #78.
I didn't see an easy way to test this as the \u
escape sequence only supports 4 hex digit escape sequences (ie up to character code 0xFFFF). So adding this is invalid:
std.assertEqual("\u1F680", "🚀") &&
One solution for testing could be to add support for the ECMAScript6 code point escapes (like \u{1F680}
).
If you have another idea for testing, I'd love to hear it!
from jsonnet.
Thanks for tracking this down!
I suppose you can do things like "🚀🚀🚀"[1] which should == "🚀".
\u{XXX} should be a no-brainer though, it could be added in the lexer quite easily.
I have been worried for a long time about the limitation of \u and whether it's necessary to support e.g. things like this as well https://bugs.launchpad.net/zorba/+bug/1024448
from jsonnet.
No problem! It was enlightening to learn more about the inner workings of unicode
from jsonnet.
Related Issues (20)
- Reflection of function arguments
- CMake: Non-existent dependency warning when attempting to disable BUILD_JSONNET option HOT 2
- std.parseYaml fails on non-standard yaml feature, supported in other implementations HOT 1
- adjacent object literals vs adjacent variables HOT 2
- Different behavior of hidden status inheritance between Jsonnet and Go-Jsonnet HOT 5
- stack-overflow exists in the function parse in parser.cpp
- stack-overflow exists in the function maybeParseGreedy in parser.cpp HOT 1
- add atan2 to std HOT 1
- Migrate Jsonnet from Travis CI to GitHub Actions HOT 4
- Unable to install jsonnet in Google Colab notebook: failed building wheel
- OSS-Fuzz issue 65944 HOT 2
- OSS-Fuzz issue 65950 HOT 4
- std.manifestIni should support repeated section names
- Can't parse object HOT 3
- Behavior of `std` overriding and desugaring. HOT 1
- std.parseYaml wraps result in an array when string values contain --- HOT 3
- Crash (assert/check fail) in RapidYAML if the input is a YAML stream
- Missing std.manifestJson and incorrect description of std.manifestJsonMinified in doc/ref/stdlib.html HOT 1
- Support Json5 features
- Support underscores or other separators in numeric literals.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jsonnet.