kenkundert / nestedtext Goto Github PK

View Code? Open in Web Editor NEW

337.0 11.0 12.0 785 KB

Human readable and writable data interchange format

Home Page: https://nestedtext.org

License: MIT License

Shell 0.28% Python 99.72%

configuration nestedtext data json toml yaml serialization configuration-files config yaml-alternative

nestedtext's Introduction

NestedText — A Human Friendly Data Format

Authors: Ken & Kale Kundert
Version: 3.7
Released: 2024-04-27
Documentation: nestedtext.org
Please post all questions, suggestions, and bug reports to GitHub.

NestedText is a file format for holding structured data. It is similar in concept to JSON, except that NestedText is designed to make it easy for people to enter, edit, or view the data directly. It organizes the data into a nested collection of name-value pairs, lists, and strings. The syntax is intended to be very simple and intuitive for most people.

A unique feature of this file format is that it only supports one scalar type: strings. As such, quoting strings is unnecessary, and without quoting there is no need for escaping. While the decision to forego other types (integers, reals, Booleans, etc.) may seem counter productive, it leads to simpler data files and applications that are more robust.

NestedText is convenient for configuration files, data journals, address books, account information, and the like. Here is an example of a file that contains a few addresses:

# Contact information for our officers

Katheryn McDaniel:
    position: president
    address:
        > 138 Almond Street
        > Topeka, Kansas 20697
    phone:
        cell: 1-210-555-5297
        home: 1-210-555-8470
            # Katheryn prefers that we always call her on her cell phone.
    email: [email protected]
    additional roles:
        - board member

Margaret Hodge:
    position: vice president
    address:
        > 2586 Marigold Lane
        > Topeka, Kansas 20682
    phone: 1-470-555-0398
    email: [email protected]
    additional roles:
        - new membership task force
        - accounting task force

Typical Applications

Configuration

Configuration files are an attractive application for NestedText. NestedText configuration files tend to be simple, clean and unambiguous. Plus, they handle hierarchy much better than alternatives such as Ini and TOML.

Structured Code

One way to build tools to tackle difficult and complex tasks is to provide an application specific language. That can be a daunting challenge. However, in certain cases, such as specifying complex configurations, NestedText can help make the task much easier. NestedText conveys the structure of data leaving the end application to interpret the data itself. It can do so with a collection of small parsers that are tailored to the specific piece of data to which they are applied. This generally results in a simpler specification since each piece of data can be given in its natural format, which might otherwise confuse a shared parser. In this way, rather than building one large very general language and parser, a series of much smaller and simpler parsers are needed. These smaller parsers can be as simple as splitters or partitioners, value checkers, or converters for numbers in special forms (numbers with units, times or dates, GPS coordinates, etc.). Or they could be full-blown expression evaluators or mini-languages. Structured code provides a nice middle ground between data and code and its use is growing in popularity.

An example of structured code is provided by GitHub with its workflow specification files. They use YAML. Unfortunately, the syntax of the code snippets held in the various fields can be confused with YAML syntax, which leads to unnecessary errors, confusion, and complexity (see YAML issues). JSON suffers from similar problems. NestedText excels for these applications as it holds code snippets without any need for quoting or escaping. NestedText provides simple unambiguous rules for defining the structure of your data and when these rules are followed there is no way for any syntax or special characters in the values of your data to be confused with NestedText syntax. In fact, it is possible for NestedText to hold NestedText snippets without conflict.

Another example of structured code is provided by the files that contain the test cases used by Parametrize From File, a PyTest plugin. Parametrize From File simplifies the task of specifying test cases for PyTest by separating the test cases from the test code. Here it is being applied to test a command line program. Its response is checked using regular expressions. Each entry includes a shell command to run the program and a regular expression that must match the output for the test to pass:

-
    cmd: emborg version
    expected: emborg version: \d+\.\d+(\.\d+(\.?\w+\d+)?)?  \(\d\d\d\d-\d\d-\d\d\)
    expected type: regex
-
    cmd: emborg --quiet files -D
    expected:
        > Archive: home-\d\d\d\d-\d\d-\d\dT\d\d:\d\d:\d\d
        > \d\d\d\d-\d\d-\d\dT\d\d:\d\d:\d\d.\d\d\d\d\d\d configs/subdir/(file|)
        > \d\d\d\d-\d\d-\d\dT\d\d:\d\d:\d\d.\d\d\d\d\d\d configs/subdir/(file|)
            # Unfortunately, we cannot check the order as they were both 
            # created at the same time.
    expected type: regex
-
    cmd: emborg due --backup-days 1 --message "{elapsed} since last {action}"
    expected: home: (\d+(\.\d)? (seconds|minutes)) since last backup\.
    expected type: regex

Notice that the regular expressions are given clean, without any quoting or escaping.

Composable Utilities

Another attractive use-case for NestedText is command line programs whose output is meant to be consumed by either people or other programs. This is another growing trend. Many programs do this by supporting a --json command-line flag that indicates the output should be computer readable rather than human readable. But, with NestedText it is not necessary to make people choose. Just output the result in NestedText and it can be read by people or computers. For example, consider a program that reads your address list and output particular fields on demand:

> address --email
Katheryn McDaniel: [email protected]
Margaret Hodge: [email protected]

This output could be fed directly into another program that accepts NestedText as input:

> address --email | mail-to-list

Contributing

This package contains a Python reference implementation of NestedText and a test suite. Implementation in many languages is required for NestedText to catch on widely. If you like the format, please consider contributing additional implementations.

Also, please consider using NestedText for any applications you create.

nestedtext's People

Contributors

Stargazers

Watchers

Forkers

kalekundert edwardbetts greg2git fortyonehertz george-hopkins lewisgaul carl-j-ones erikw yoonsikp preuss gcxfd mayhemheroes

nestedtext's Issues

New implementation

Just an FYI, I did an implementation of the spec in the Janet programming language:

https://github.com/andrewchambers/janet-nested-text

Info: Implementation in Go (Golang) available

Hi Ken,

I just wanted to point you to an implementation of NestedText in Go, available on Github at
https://github.com/npillmayer/nestext.
Would you be so kind as to include a link to it on NestedText's GitHub page and nestedtext.org?

Thank you for the great work!

--Norbert

Issues with the official test submodule

I want to try my hand at Javascript and C# ports (in that order), currently working on rigging a test suit based on the test submodule.

However, firstly I noticed a number of json and nt files which simple have file names as their text content (e.g. .../dump_in.json having the content load_out.json, which is obviously not valid Json), and secondly there's stale issues and PRs in that repo which are small enough to be easy to address even if this library is on low priority.

I feel this needs to be handled for me to continue without too much friction. I could just copy the test suit, but that wouldn't feel proper in light of what seems to be the intention of a separate shared repo.

(I posted the issue in this repo as it seems to be more active)

The description of YAML in the README is inaccurate

The YAML project started in the Spring of 2001. We were unaware of JSON until at least 2005, possibly 2006. By that time, there were YAML frameworks in many languages and the Ruby language was already shipping with YAML included.

The first draft spec of YAML came out in 2001. The early origins of YAML were a reaction to XML, not JSON. The YAML 1.0 and 1.1 specs make no mention of JSON. Sometime after 1.1 spec came out we realized that YAML was very nearly a superset of JSON (a complete coincidence). The YAML 1.2 spec was mostly about making 3 or 4 very minor adjustments so that YAML would in fact be a complete superset of JSON.

Feel free to ask clarifying questions if you want, but I'd ask you change your documentation to not imply that YAML's design decisions had anything to do with JSON's (or vice versa).

Cheers!

Consider adding inline collection literals?

Hi! This is more of an experience report, rather than an issue :) I've tried converting moderatly complex toml documents to nestedtext:

https://gist.github.com/matklad/8e7502a83e5492040c4436b716922bf5

Overall, I like low-ceremony and simplicity of nestedtext, and I especially enjoy the fact that there's only one scalar data type -- a string. For my tomls, this is especially nice, because having to quote semantic versions always irked me.

However, I find the converted documents harder to read, because they are long and narrow. The second one has roughtly twice as many lines. I do think this significantly affects readability, as there's simply to little info on a line to get many important bits a glance (naturally, the flip side of this is that this format is easier to diff, but I tend to prefer plain readability to diff readability).

I think the main culprint here is that, as far as I understand it, there's no way to specify short collections inline.

members = [ "crates/*", "xtask/" ] turnes into

    members:
     - crates/*
     - xtask/

dictionary nested in a list item

The documentation seems to suggest that a list item can contain a nested dictionary:

A value for a dictionary or list item may be a rest-of-line string as shown above,
or it may be a nested dictionary, list or a multiline string.
(https://nestedtext.org/en/stable/how_to_write.html#nesting)

The example doesn't show an example of a dictionary nested into a list item, though.

The definition of a list item suggests that the value can only be a string:

A list item is introduced with a dash at the start of a line. Anything that follows
the space after the dash is the value and is treated as a string.
(https://nestedtext.org/en/stable/how_to_write.html#lists)

Is it possible to nest a dictionary inside a list item? An example would be very helpful.

Consider different approach to multiline strings?

Hi there! I'm a big fan of nested text, but as I went through the docs I was surprised with your choice for multiline strings:

address:
    > 138 Almond Street  
    > Topeka, Kansas 20697

Perhaps you'd consider an alternative way to specify this? I find needing to add the > character on every line isn't ideal especially long pieces of text or for html templates, etc!

My guess is you can use your indention rules pretty well to find the end of the string without a character on each line. I immediately was hoping for this syntax:

address:
    138 Almond Street  
    Topeka, Kansas 20697

But I think this is also clear:

address: >
    138 Almond Street  
    Topeka, Kansas 20697

Or even this if you prefer it as the first character

address: 
    > 138 Almond Street  
    Topeka, Kansas 20697

To my surprise YAML also supports all of these directly. Here's a playground to look (they have perhaps too many options) :
https://yaml-multiline.info

Additionally, here's a couple examples that would improve the nested text significantly with this as an option:

website:
    product page template: >
        <h4>Product Info: {{name}}</h4>
        <ul>
              <li>Product: {{name}}</li>
              <li>Color: {{color}}</li>
              <li>Price: ${{price}}</li>
        </ul>

In the current syntax it would have to be this, which is quite hard to type and pretty confusing!

website:
    product page template: 
        > <h4>Product Info: {{name}}</h4>
        > <ul>
        >      <li>Product: {{name}}</li>
        >      <li>Color: {{color}}</li>
        >      <li>Price: ${{price}}</li>
        > </ul>

The other situation is larger embedded text (readme, documentation), and again a character on each line would be very hard to read:

screenplay:
   name: Groundhog Day
   contents: >
      INT. BREAKFAST ROOM 
      
      Phil enters the old library of the house and finds everything
      exactly as it was the day before. Mrs. Lancaster spots Phil as
      she comes out of the kitchen with the fresh pot of coffee.
      
	      MRS. LANCASTER
	      Did you sleep well, Mr. Connors?
      
	      PHIL
	      (completely confused)
	      Did I? I don't know--
      
	      MRS. LANCASTER
	      Would you like some coffee?
      
	      PHIL
	      Yes, thank you. I ' m feeling a
	      little strange.
      
	      MRS. LANCASTER
	      (as she pours)
	      I wonder what the weather's going
	      to be like for all the festivities.
      
	      PHIL
	      Did you ever have deja vu, Mrs. Lancaster?

As for parsing, obviously it's more tricky because multi-line strings could contain a colon. The key, I think, is either to require the > character at the start (easiest), or to look at the first line. If it doesn't contain a colon or start with a - then it's a multi-line, and subsequent lines with the same indention are appended.

Thanks for considering. Very cool project!

Canonical form

Any idea if we can have a defined canonical form? Similar to "canonical JSON" and protobufs. Might be contrary to the design feature of nestedtext with variable amount of white space per line/indentation. Would be good for hashing and diffs

Expectation on encoding -> decoding of `None`

Hello,

I'm on a good way to make a Ruby implementation of NestedText (erikw/nestedtext-ruby, all official decode tests are passing!) and I'm currently writing unit tests for various edge-case inputs. Using the Python implementation, I'm unsure what the expectation is on encoding python's None (and ruby's nil in my case). I see that None is treated in different ways.

Using this base-program and just changing the obj = ... line in the different examples:

   import nestedtext as nt

    obj = ...

    dumped = nt.dumps(obj)
    print("dumped:")
    print(repr(dumped))

    loaded = nt.loads(dumped)
    print("loaded:")
    print(repr(loaded))

Just `None`

    obj =  None

gives the output:

dumped:
''
loaded:
{}

▶️ None encodes to empty string, but is decoded back as empty inline dict

`None` in list

    obj =  [None]

gives the output:

dumped:
'-'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "__main__.py", line 82, in main
    test_dump()
  File "__main__.py", line 73, in test_dump
    loaded = nt.loads(dumped)
  File "[...]/nestedtext.py", line 1088, in loads
    loader = NestedTextLoader(lines, top, source, on_dup, keymap)
  File "[...]/nestedtext.py", line 765, in __init__
    report('content must start with key or brace ({{).', lines.get_next())
  File "[...]/nestedtext.py", line 258, in report
    raise NestedTextError(template=message, *args, **kwargs)
  File "<string>", line 0
nestedtext.NestedTextError

▶️ None encodes to empty string, but can't be decoded back to Python

`None` as dict value

    obj = {"key": None}

gives the output:

dumped:
'key:'
loaded:
{'key': ''}

▶️ None encodes to empty string, but is decoded back to empty string and not None

`None` as dict key

    obj = {None: "value"}

gives the output:

dumped:
'None: value'
loaded:
{'None': 'value'}

▶️ None encodes to the string "None", and decodes back to the string "None"

Thus, None is treated different in all cases above. The encoding from Python is of course not a part of the specification for the NT data format, but it would still be nice to know the rules for None, or that None is always treated the same in all cases :).

Is the output above the expected? I suspect that in the case with None

in a list, that it should be possible to decode back to Python
as a key in a dict, that it should maybe instead render to empty string instead of the string "None", to be more consistent with the other cases?

I'm happy to hear your thoughts!

Files Fail To Load When Indented With Tabs

To make this format truly human friendly, I suggest allowing indentation on new lines with tabs.

Zig implementation progress/thoughts

I'll comment out the pointer to zig-nestedtext for now. Let me know when it is ready.

Originally posted by @KenKundert in #20 (comment)

@KenKundert zig-nestedtext is looking pretty good now, I think it's handling everything in the spec except for quotes around object keys (and the error reporting is extremely minimal). My next job will be to hook it up to your suite of testcases to iron out these remaining bits.

The main aspect of the language spec I'm questioning at this point is the fact that keys may be quoted - is there really a need to allow keys to start with - or >, or contain : or whitespace? This feels like unnecessary complexity for a language spec that strives to be simple, and indeed leads to more than two thirds of the text in the file format spec for dictionary lines:

The key must be quoted if it:

starts with a list-item or string-item tag,

contains a dict-item tag,

starts with a quote character, or

has leading or trailing spaces or tabs.

A key is quoted by delimiting it with matching single or double quote characters, which are discarded. Unlike traditional programming languages, a quoted key delimited with single quote characters may contain additional single quote characters. Similarly, a quoted key delimited with double quote characters may contain additional double quote characters. Also, backslash is not used as an escape character; backslash has no special meaning anywhere in NestedText.

A quoted key starts with the leading quote character and ends when the matching quote character is found along with a trailing colon (there may be white space between the closing quote and the colon). A key is invalid if it contains two or more instances of a quote character separated from :␣ by zero or more space characters where the quote character in one is a single quote and the quote character in another is the double quote. In this case the key cannot be quoted with either character so that the separator from the key and value can be identified unambiguously.

Here's an example of it working (modulo spaces in object keys and object field ordering):

$./zig-cache/bin/nt-cli -f samples/employees.nt | jq
{
  "treasurer": [
    {
      "email": "[email protected]",
      "name": "Fumiko Purvis",
      "address": "3636 Buffalo Ave\nTopeka, Kansas 20692\n",
      "phone": "1-268-555-0280",
      "additional-roles": [
        "accounting task force"
      ]
    },
    {
      "email": "[email protected]",
      "name": "Merrill Eldridge",
      "phone": "1-268-555-3602"
    }
  ],
  "vice-president": {
    "email": "[email protected]",
    "name": "Margaret Hodge",
    "address": "2586 Marigold Lane\nTopeka, Kansas 20682\n",
    "phone": "1-470-555-0398",
    "additional-roles": [
      "new membership task force",
      "accounting task force"
    ]
  },
  "president": {
    "email": "[email protected]",
    "name": "Katheryn McDaniel",
    "address": "138 Almond Street\nTopeka, Kansas 20697\n",
    "phone": {
      "cell": "1-210-555-5297",
      "home": "1-210-555-8470"
    },
    "additional-roles": [
      "board member"
    ]
  }
}

[Request] There seems to be no info about file extension for nestedtext documents

According to some examples it's ".nt" but i wasn't able to find that info anywhere else.

Editor modes?

I just started working with a tool that uses Nested Text format configuration files. I saw there were vim and Visual Studio syntax files. Does anyone have an emacs mode file for this or any suggestions?

FYI: Highlight.js grammar for NestedText

I really like what you all are doing here with the idea of a dead SIMPLE spec... so I'm adding a grammar to Highlight.js:

highlightjs/highlight.js#3114

We're in between major releases right now so it might be a while before it's released, but just thought I'd let you know.

Encoding object with cyclic references uncaught RecursionError error

This program

import nestedtext

a = []
b = [a]
a.append(b)

nestedtext.dumps(b)

will give

RecursionError: maximum recursion depth exceeded in comparison

As I just constructed these simple objects, it is obvious what this error mean. But for a more complex object, that I did not write just now or even myself, the user of NestedText might be confused.

Solution ideas:

Detect cyclic references and quit early with error message
- While still keeping the possibility of using the same references multiple times i.e. this program should still work:
```
  a = []
  b = [a, a]
  nestedtext.dumps(b)
```
Catch RecursionError and raise an error describing that the object to be dumped have cyclic references, and thus can not be dumped.

Impossible to represent empty lists/objects

Currently it is impossible to represent empty lists/objects in NestedText.

Relatedly, the language reference says the following about an empty document: "An empty document corresponds to an empty value of unknown type.".

The reason for this is that list/object items can only be represented by the presence of lines in a certain format, and the 'absence' of lines (e.g. blank lines) is not something that can be interpreted as a particular type.

The difference with YAML here is that inline collections (a.k.a. 'flow style') are not supported. The justification for this is it allows the simple statement that all values are interpreted as strings (just like how 'null' is always treated as a string).

Given the proposal to add multi-line keys (#23) based on a desire to make NestedText 'completely general', I'm wondering if it's been considered to add a way to express empty containers? This could be done by allowing 'flow-style'-like syntax but only permitting its use for empty containers, and requiring they be placed on their own line:

foo:
  []
bar:
  {}
baz:

This has the following nice properties:

Already valid YAML syntax
Backwards compatible change (the meaning of all previously valid syntax remains unchanged)
Maintains the property of every line type being identifiable without context of other lines
Could potentially provide a way to disambiguate the meaning of an empty file (make it an empty string, given the new way to represent empty collections)

Problems:

What to do with a file containing only [] or {}?
- I was going to propose that this should provide a way for a file to represent an empty collection, but actually this would be backwards incompatible as it's currently interpreted as a string.
- Note, however, that this wouldn't prevent from having a file corresponding to those strings, since > [] or > {} could be used.

Related discussion about removal of flow-style in strictyaml: https://hitchdev.com/strictyaml/why/flow-style-removed/ (see Counterarguments).

Add links to more NestedText implementations?

While I was browsing GitHub I discovered two more implementations that are currently not listed at
https://nestedtext.org/en/latest/related_projects.html,

that both pass the official tests. Maybe those can be included in the Related Projects page as well?

[question] can nestedtext do id reference like YAML?

Example in YAML

bill-to: &id001
    given  : Chris
    family : Dumars
ship-to: *id001

Convention for expressing the name of the schema used

An IDE can determine what "support package" to use for a text file based on its file extension.

An XML editor can determine the same thing based on namespaces or DTD declarations.

I propose that NestedText could document a convention for bootstrapping grammar-driven tools like IDEs, linters, validators, etc.

I offer two suggestions:

Files could have double-extensions of the form config.sls.nt vs config.ans.nt
Files could have a leading comment-line in a predictable format. Arguably it could be a URL which points to the documentation or schema file, which is how it works in XML.

Either way, it should be a universal convention so that tools know where to look. I guess I lean mildly towards the URL strategy because I think it is a good habit to document your grammars and putting the pointer to the documentation in the file makes it easier for both humans and machines to interpret the file.

Language spec changelog?

Hi! I came across this project the other day and it seems great, I agree with the motivations and design decisions on the whole. I decided to have a go at implementing a NestedText parser in Zig - zig-nestedtext (as suggested in https://github.com/KenKundert/nestedtext#contributing!).

I'd like to link to a fixed language specification version that I'm implementing (1.3.0?), but the docs only seem to have 'latest' (e.g. no docs at https://nestedtext.org/en/1.3.0/). This should be easy to set up with RTD.

It would also be good to have a changelog (in the docs and the repo root).

No schema enforced

Proposed changes to NestedText that are not backward compatible

We are considering deprecating quoted keys. This will be a change that is not backward compatible. You can see the discussion that triggered this decision here. To summarize, the feeling is that:

Quoted keys add considerable complexity to both the implementation and support (ex: they added considerable complexity to implementation of Vim syntax highlighting) that is not consistent with the design philosophy of NestedText.
The approach taken is unique and unfamilar to everyone that encounters it.
Distinguishing the key from the value can be difficult in some cases.
The approach taken is not in keeping with the other concepts of NestedText.
Even with quoting, there are some strings that cannot be used as keys.
Quoted keys provide too little value given the above issues.

Eliminating quoted keys further limits the strings that can be used as keys. We are considering adding multi-line keys to replace quoted keys. It is felt that multi-line keys are more in keeping with the style of NestedText than quoted keys were, and they allow NestedText to accept any string as a key.

Multi-line keys are patterned after multi-line strings, except the string tag >␣ is replaced by the dict tag :␣ and a trailing indented value is required. For example:

: this is the first line of a multi-line key
: this is the second line
    > this is the value

This would be interpreted as:

{
    "this is the first line of a multi-line key\nthis is the second line": "this is the value"
}

Multi-line keys are not expected to be commonly used, but they are being considered because the fit naturally in the language and they make NestedText completely general, meaning that with multi-line keys NestedText can handle any combination of lists, dictionaries, and strings, where the leaf values are all strings. We could not say that previously.

Comments?

Patch version 2.0.1 - update 'v2.0' tag

I'm not sure what was changed in v2.0.1 (since nothing is mentioned at https://nestedtext.org/en/latest/releases.html or https://nestedtext.org/en/latest/changelog.html) but the v2.0 tag seems to still point at the old 2.0.0 commit, meaning https://nestedtext.org/en/v2.0/ still goes to v2.0.0.

Could this be updated (and perhaps something in the CI be tweaked to ensure this isn't missed in the future)?

Handling of carriage return and other control characters

Is there a preferred way how to handle carriage returns and other control characters (e.g. form feed \f)? Especially on Windows, CR LF line endings quite common. Other characters (such as the aforementioned \f) are probably not intended but might sneak in if data was converted to NestedText from an other source.

Add stub files for type hints

PEP 561 compliant stub files support you during development, as they detect type inconsistencies early on through IDEs and checkers like MyPy. Therefore, it would be nice if the package supported more through stub files.

just a backlink: I have added to reddit r/tree_notations a link to this repo

https://www.reddit.com/r/tree_notations/comments/niioe1/kenkundertnestedtext_human_readable_and_writable/

Parsing error if the document starts with a dash

Hello! I get a NestedTextError exception when I try to parse an example from the documentation starting with a dash:

-
    cmd: emborg version
    expected: emborg version: \d+\.\d+(\.\d+(\.?\w+\d+)?)?  \(\d\d\d\d-\d\d-\d\d\)
    expected_type: regex
-
    cmd: emborg --quiet list
    expected: home-\d\d\d\d-\d\d-\d\dT\d\d:\d\d:\d\d
    expected_type: regex

My code:

import nestedtext as nt

n = nt.load('test.nt')

print(n)

I have found out experimentally that this error appears whenever the dash is the first character in the document. Even such a document causes this error:

- a
- b

But this one parses without errors:

a:
 - b
 - c

It looks like a bug.

New Go implementation

I have created an implementation of NestedText in Go.

https://git.sr.ht/~torresjrjr/go-nestedtext

It's in early development, and provides only an executable which converts NestedText to JSON for now. I wish to create a library too.

Comments and critiques welcome.

Thoughts on a Sentinel Value to Denote File Begin / End?

First of all, thanks for taking the time to publish this repo.

A common complaint about YAML-esque markup languages is that incomplete subsections of the file can be validly parsed, leading to data loss in some situations.

Would you consider extending nestedtext markup to include a sentinel value denoting the beginning and end of a file? I imagine the slight reduction in the initial usability of the markup language would be worth the gains in file integrity (for those users who choose not to implement their own validation / checksums).

YAML's main issue, semantic whitespace, still a problem

Subject says it all.

Some weird but valid input is converted to invalid NestedText -- value becomes part of key

I encountered this in the following roundabout way:

trouble.xml:

<?xml version="1.0" encoding="UTF-8"?>
<kcfg xmlns="http://www.kde.org/standards/kcfg/1.0"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.kde.org/standards/kcfg/1.0
                          http://www.kde.org/standards/kcfg/1.0/kcfg.xsd" >
    <kcfgfile name=""/>
    <group name="">
        <entry name="Duration" type="UInt">
            <default>250</default>
        </entry>
        <entry name="ExcludedWindowClasses" type="String">
            <default>krunner,yakuake</default>
        </entry>
    </group>
</kcfg>

I used xmljson to turn it into trouble.json:

{
  "{http://www.kde.org/standards/kcfg/1.0}kcfgfile": null,
  "{http://www.kde.org/standards/kcfg/1.0}group": {
    "{http://www.kde.org/standards/kcfg/1.0}entry": [
      {
        "{http://www.kde.org/standards/kcfg/1.0}default": 250
      },
      {
        "{http://www.kde.org/standards/kcfg/1.0}default": "krunner,yakuake"
      }
    ]
  }
}

This doesn't look great to me but it does seem to be valid JSON.

Then in Python:

In [1]: from pathlib import Path
In [2]: from json import loads as j_loads
In [3]: data = j_loads(Path('./trouble.json').read_text())
In [4]: from pprint import pprint
In [5]: pprint(data)
{'{http://www.kde.org/standards/kcfg/1.0}group': {'{http://www.kde.org/standards/kcfg/1.0}entry': [{'{http://www.kde.org/standards/kcfg/1.0}default': 250},
                                                                                                   {'{http://www.kde.org/standards/kcfg/1.0}default': 'krunner,yakuake'}]},
 '{http://www.kde.org/standards/kcfg/1.0}kcfgfile': None}
In [6]: from nestedtext import dumps as nt_dumps, loads as nt_loads
In [7]: print(nt_dumps(data))
: {http://www.kde.org/standards/kcfg/1.0}kcfgfile
: {http://www.kde.org/standards/kcfg/1.0}group
    : {http://www.kde.org/standards/kcfg/1.0}entry
        -
            : {http://www.kde.org/standards/kcfg/1.0}default250
        -
            : {http://www.kde.org/standards/kcfg/1.0}default
                > krunner,yakuake
In [8]: nt_loads(nt_dumps(data))
Unexpected exception formatting exception. Falling back to standard exception
Traceback (most recent call last):
  File "/home/andy/.local/share/venvs/765b9c89d8e269a433a53f3fcbacba89/venv/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3577, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-10-f65fd089d142>", line 1, in <module>
    nt_loads(dumps(data))
  File "/home/andy/.local/share/venvs/765b9c89d8e269a433a53f3fcbacba89/venv/lib/python3.10/site-packages/nestedtext/nestedtext.py", line 1204, in loads
    loader = NestedTextLoader(lines, top, source, on_dup, keymap, normalize_key)
  File "/home/andy/.local/share/venvs/765b9c89d8e269a433a53f3fcbacba89/venv/lib/python3.10/site-packages/nestedtext/nestedtext.py", line 821, in __init__
    self.values, self.keymap = self._read_value(0, ())
  File "/home/andy/.local/share/venvs/765b9c89d8e269a433a53f3fcbacba89/venv/lib/python3.10/site-packages/nestedtext/nestedtext.py", line 901, in _read_value
    return self._read_dict(depth, keys)
  File "/home/andy/.local/share/venvs/765b9c89d8e269a433a53f3fcbacba89/venv/lib/python3.10/site-packages/nestedtext/nestedtext.py", line 987, in _read_dict
    value, loc = self._read_value(depth_of_next, new_keys)
  File "/home/andy/.local/share/venvs/765b9c89d8e269a433a53f3fcbacba89/venv/lib/python3.10/site-packages/nestedtext/nestedtext.py", line 901, in _read_value
    return self._read_dict(depth, keys)
  File "/home/andy/.local/share/venvs/765b9c89d8e269a433a53f3fcbacba89/venv/lib/python3.10/site-packages/nestedtext/nestedtext.py", line 987, in _read_dict
    value, loc = self._read_value(depth_of_next, new_keys)
  File "/home/andy/.local/share/venvs/765b9c89d8e269a433a53f3fcbacba89/venv/lib/python3.10/site-packages/nestedtext/nestedtext.py", line 899, in _read_value
    return self._read_list(depth, keys)
  File "/home/andy/.local/share/venvs/765b9c89d8e269a433a53f3fcbacba89/venv/lib/python3.10/site-packages/nestedtext/nestedtext.py", line 931, in _read_list
    value, loc = self._read_value(depth_of_next, new_keys)
  File "/home/andy/.local/share/venvs/765b9c89d8e269a433a53f3fcbacba89/venv/lib/python3.10/site-packages/nestedtext/nestedtext.py", line 901, in _read_value
    return self._read_dict(depth, keys)
  File "/home/andy/.local/share/venvs/765b9c89d8e269a433a53f3fcbacba89/venv/lib/python3.10/site-packages/nestedtext/nestedtext.py", line 993, in _read_dict
    report("multiline key requires a value.", line, None, colno=depth)
  File "/home/andy/.local/share/venvs/765b9c89d8e269a433a53f3fcbacba89/venv/lib/python3.10/site-packages/nestedtext/nestedtext.py", line 276, in report
    raise NestedTextError(template=message, *args, **kwargs)
nestedtext.nestedtext.NestedTextError: 5: multiline key requires a value.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/andy/.local/share/venvs/765b9c89d8e269a433a53f3fcbacba89/venv/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 2166, in showtraceback
    stb = value._render_traceback_()
TypeError: 'NoneType' object is not callable

Looks like the value became part of the key:

: {http://www.kde.org/standards/kcfg/1.0}default250

Handling of tabs after unquoted keys

While implementing a parser in Rust, I discovered that nt.loads(nt.dumps({'a\t': 'b'})) returns {'a': 'b'}. dumps() does not quote the key (which is correct as far as I understood the specification) however the tabulator (→) will be stripped when a→: b is loaded again. Is this a bug in the implementation or should tabulators after unquoted keys be removed?

Empty collections cannot be round-tripped

>>> import json
>>> import nestedtext as nt
>>> import yaml

>>> empty_dict = {}
>>> json.loads(json.dumps(empty_dict)) == empty_dict
True
>>> yaml.load(yaml.dump(empty_dict)) == empty_dict
True
>>> nt.loads(nt.dumps(empty_dict)) == empty_dict
False

>>> empty_list = []
>>> json.loads(json.dumps(empty_list)) == empty_list
True
>>> yaml.load(yaml.dump(empty_list)) == empty_list
True
>>> nt.loads(nt.dumps(empty_list)) == empty_list
False


>>> sample_data = {"links": {}, "tags": []}
>>> nt.dumps(sample_data)
'links:\n\ntags:\n'
>>> nt.loads(_)
{'links': '', 'tags': ''}

Very interested in this as a yaml alternative, but this would be a blocker for me.

Suggestion of clarification in the language reference documentation

Please excuse me for posting this as an issue rather than a PR, but I find it hard to motivate a fork just for minor changes to the documentation.

As I read through the document the Line-type tags section had me imagine a "tag" as a single character due to all the examples being one character. Later sections do more directly speak of them as a character followed by space or line break, but I would like to suggest the following for clarifications:

Under the Line-type tags section, change the numbered list:

the first of these specific characters when followed immediately by a space or line break: dash (-␣, -↵), colon (:␣, :↵), or greater-than symbol (>␣, >↵), or

the first non-white space character on a line being one of a hash (#), left bracket ([), or left brace ({).

Replace each occurrence of the phrase (for some <char>):

by a space (<char>␣) or a line break

With:

by a space or a line break (<char>␣, <char>↵)

kenkundert / nestedtext Goto Github PK

nestedtext's Introduction

NestedText — A Human Friendly Data Format

Typical Applications

Configuration

Structured Code

Composable Utilities

Contributing

nestedtext's People

Contributors

Stargazers

Watchers

Forkers

nestedtext's Issues

Just None

None in list

None as dict value

None as dict key

Recommend Projects

Recommend Topics

Recommend Org

Just `None`

`None` in list

`None` as dict value

`None` as dict key