Code Monkey home page Code Monkey logo

Comments (3)

JamesParrott avatar JamesParrott commented on June 12, 2024 1

A question though - if toml file contains non-standard (utf8) spaces (such as zero-width space), should toml parsing succeed or fail?

Now it fails.

If I understand it and recall it correctly:

String values must always be quoted, so the file ought to be parsed as long as non-standard white space is quoted (all else being well).

In Toml 1.0.0, no type of white space at all can be in a bare key (unquoted).

unquoted-key = 1*( ALPHA / DIGIT / %x2D / %x5F ) ; A-Z / a-z / 0-9 / - / _

So if the file contains unquoted non-standard whitespace, correct behaviour of a "strict-mode" Toml 1.0.0 parser is to raise an error. But I think one test suites lets the tester choose to allow things like this still to be parsed.

toml = expression *( newline expression )

expression =  ws [ comment ]
expression =/ ws keyval ws [ comment ]
expression =/ ws table ws [ comment ]

;; Whitespace

ws = *wschar
wschar =  %x20  ; Space
wschar =/ %x09  ; Horizontal tab

https://github.com/toml-lang/toml/blob/8eae5e1c005bc5836098505f85a7aa06568999dd/toml.abnf#L18C1-L28C33

But Toml is still a language under active development. In the latest WIP, even Emoji could be legal in bare keys. I'm not familiar with ABNF notation or unicode ranges to say for sure what the ranges below contain
, but I believe the intention was still to exclude any type of white space from bare keys.

;; Unquoted key

unquoted-key = 1*unquoted-key-char
unquoted-key-char = ALPHA / DIGIT / %x2D / %x5F         ; a-z A-Z 0-9 - _
unquoted-key-char =/ %xB2 / %xB3 / %xB9 / %xBC-BE       ; superscript digits, fractions
unquoted-key-char =/ %xC0-D6 / %xD8-F6 / %xF8-37D       ; non-symbol chars in Latin block
unquoted-key-char =/ %x37F-1FFF                         ; exclude GREEK QUESTION MARK, which is basically a semi-colon
unquoted-key-char =/ %x200C-200D / %x203F-2040          ; from General Punctuation Block, include the two tie symbols and ZWNJ, ZWJ
unquoted-key-char =/ %x2070-218F / %x2460-24FF          ; include super-/subscripts, letterlike/numberlike forms, enclosed alphanumerics
unquoted-key-char =/ %x2C00-2FEF / %x3001-D7FF          ; skip arrows, math, box drawing etc, skip 2FF0-3000 ideographic up/down markers and spaces
unquoted-key-char =/ %xF900-FDCF / %xFDF0-FFFD          ; skip D800-DFFF surrogate block, E000-F8FF Private Use area, FDD0-FDEF intended for process-internal use (unicode)
unquoted-key-char =/ %x10000-EFFFF                      ; all chars outside BMP range, excluding Private Use planes (F0000-10FFFF)

toml-lang/toml#891
https://github.com/toml-lang/toml/blob/23c3fb79f3f54ebc01110b963d7119006d91facc/toml.abnf#L55

from toml.

JamesParrott avatar JamesParrott commented on June 12, 2024

Well done enumerating all the possibilities. I hope the devs deem fit to address it and give this one (and all the others) the attention they deserve.

In the mean time, while you wait for a fix, there are plenty of other great options. Don't feel that you too need to fork your own TOML reader and writer library like I did....

Why don't you formalise your findings, and add a test to: https://github.com/uiri/toml/blob/master/tests/test_api.py as a PR?

You'll face the same problem I did - you'll submit a PR that causes the CI pipeline to run the test to fail.

However this is because the underlying problems are: firstly the code in toml is broken, and secondly the existing test masks this problem so is also broken.

When code is broken, it should fail a test. Writing a test that fails is step 1 in Test Driven Development.

Just like the lack of tests does not imply code is correct, the existence of a broken test, no other test covering that area, and then passing all the tests, does not imply code is correct either.

from toml.

Warchant avatar Warchant commented on June 12, 2024

A question though - if toml file contains non-standard (utf8) spaces (such as zero-width space), should toml parsing succeed or fail?

Now it fails.

from toml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.