Comments (3)
A question though - if toml file contains non-standard (utf8) spaces (such as zero-width space), should toml parsing succeed or fail?
Now it fails.
If I understand it and recall it correctly:
String values must always be quoted, so the file ought to be parsed as long as non-standard white space is quoted (all else being well).
In Toml 1.0.0, no type of white space at all can be in a bare key (unquoted).
unquoted-key = 1*( ALPHA / DIGIT / %x2D / %x5F ) ; A-Z / a-z / 0-9 / - / _
So if the file contains unquoted non-standard whitespace, correct behaviour of a "strict-mode" Toml 1.0.0 parser is to raise an error. But I think one test suites lets the tester choose to allow things like this still to be parsed.
toml = expression *( newline expression )
expression = ws [ comment ]
expression =/ ws keyval ws [ comment ]
expression =/ ws table ws [ comment ]
;; Whitespace
ws = *wschar
wschar = %x20 ; Space
wschar =/ %x09 ; Horizontal tab
But Toml is still a language under active development. In the latest WIP, even Emoji could be legal in bare keys. I'm not familiar with ABNF notation or unicode ranges to say for sure what the ranges below contain
, but I believe the intention was still to exclude any type of white space from bare keys.
;; Unquoted key
unquoted-key = 1*unquoted-key-char
unquoted-key-char = ALPHA / DIGIT / %x2D / %x5F ; a-z A-Z 0-9 - _
unquoted-key-char =/ %xB2 / %xB3 / %xB9 / %xBC-BE ; superscript digits, fractions
unquoted-key-char =/ %xC0-D6 / %xD8-F6 / %xF8-37D ; non-symbol chars in Latin block
unquoted-key-char =/ %x37F-1FFF ; exclude GREEK QUESTION MARK, which is basically a semi-colon
unquoted-key-char =/ %x200C-200D / %x203F-2040 ; from General Punctuation Block, include the two tie symbols and ZWNJ, ZWJ
unquoted-key-char =/ %x2070-218F / %x2460-24FF ; include super-/subscripts, letterlike/numberlike forms, enclosed alphanumerics
unquoted-key-char =/ %x2C00-2FEF / %x3001-D7FF ; skip arrows, math, box drawing etc, skip 2FF0-3000 ideographic up/down markers and spaces
unquoted-key-char =/ %xF900-FDCF / %xFDF0-FFFD ; skip D800-DFFF surrogate block, E000-F8FF Private Use area, FDD0-FDEF intended for process-internal use (unicode)
unquoted-key-char =/ %x10000-EFFFF ; all chars outside BMP range, excluding Private Use planes (F0000-10FFFF)
toml-lang/toml#891
https://github.com/toml-lang/toml/blob/23c3fb79f3f54ebc01110b963d7119006d91facc/toml.abnf#L55
from toml.
Well done enumerating all the possibilities. I hope the devs deem fit to address it and give this one (and all the others) the attention they deserve.
In the mean time, while you wait for a fix, there are plenty of other great options. Don't feel that you too need to fork your own TOML reader and writer library like I did....
Why don't you formalise your findings, and add a test to: https://github.com/uiri/toml/blob/master/tests/test_api.py as a PR?
You'll face the same problem I did - you'll submit a PR that causes the CI pipeline to run the test to fail.
However this is because the underlying problems are: firstly the code in toml is broken, and secondly the existing test masks this problem so is also broken.
When code is broken, it should fail a test. Writing a test that fails is step 1 in Test Driven Development.
Just like the lack of tests does not imply code is correct, the existence of a broken test, no other test covering that area, and then passing all the tests, does not imply code is correct either.
from toml.
A question though - if toml file contains non-standard (utf8) spaces (such as zero-width space), should toml parsing succeed or fail?
Now it fails.
from toml.
Related Issues (20)
- keygroup with empty name is allowed in particular formatting, but not in what toml.dump creates. HOT 1
- Toml decoder is trying to convert IP address string to float HOT 3
- Default TomlEncoder does not escape backslashes correctly
- list object of compound type could not be parsed HOT 1
- New lines in multi-line strings trimmed incorrectly
- dump breaks numpy.str_ into lists of characters HOT 1
- TomlDecodeError: Loading with multiple dictionaries within a dictionary. HOT 1
- Does `toml` not implemente the `TOML` v1.0.0? HOT 1
- Dump Multiline string to toml file HOT 1
- Mystery solved! A fix for the infamous 5 year old \x bug, that is driving users away. HOT 1
- toml.dump turns my strings into arrays HOT 3
- Hash sign inside string value that includes quotes interpreted as comment
- Sub-key with multiline string value is parsed as key, not subkey
- Strings with apostrophes in within inline tables lead to empty arrays HOT 1
- String with escaping char dump error
- mis-parsing date in inline table with trailing spaces
- [BUG] single quote string parse differently in array of inline table HOT 1
- [BUG] empty dict in array list will not be parsed HOT 2
- more and more backslashes when repeatedly dump and load
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from toml.