toml-lang / toml Goto Github PK

View Code? Open in Web Editor NEW

19.3K 323.0 844.0 2.23 MB

Tom's Obvious, Minimal Language

Home Page: https://toml.io

License: MIT License

toml's Introduction

TOML

Tom's Obvious, Minimal Language.

By Tom Preston-Werner, Pradyun Gedam, et al.

This repository contains the in-development version of the TOML specification. You can find the released versions at https://toml.io.

Objectives

TOML aims to be a minimal configuration file format that's easy to read due to obvious semantics. TOML is designed to map unambiguously to a hash table. TOML should be easy to parse into data structures in a wide variety of languages.

Example

# This is a TOML document.

title = "TOML Example"

[owner]
name = "Tom Preston-Werner"
dob = 1979-05-27T07:32:00-08:00 # First class dates

[database]
server = "192.168.1.1"
ports = [ 8000, 8001, 8002 ]
connection_max = 5000
enabled = true

[servers]

  # Indentation (tabs and/or spaces) is allowed but not required
  [servers.alpha]
  ip = "10.0.0.1"
  dc = "eqdc10"

  [servers.beta]
  ip = "10.0.0.2"
  dc = "eqdc10"

[clients]
data = [ ["gamma", "delta"], [1, 2] ]

# Line breaks are OK when inside arrays
hosts = [
  "alpha",
  "omega"
]

Comparison with Other Formats

TOML shares traits with other file formats used for application configuration and data serialization, such as YAML and JSON. TOML and JSON both are simple and use ubiquitous data types, making them easy to code for or parse with machines. TOML and YAML both emphasize human readability features, like comments that make it easier to understand the purpose of a given line. TOML differs in combining these, allowing comments (unlike JSON) but preserving simplicity (unlike YAML).

Because TOML is explicitly intended as a configuration file format, parsing it is easy, but it is not intended for serializing arbitrary data structures. TOML always has a hash table at the top level of the file, which can easily have data nested inside its keys, but it doesn't permit top-level arrays or floats, so it cannot directly serialize some data. There is also no standard identifying the start or end of a TOML file, which can complicate sending it through a stream. These details must be negotiated on the application layer.

INI files are frequently compared to TOML for their similarities in syntax and use as configuration files. However, there is no standardized format for INI and they do not gracefully handle more than one or two levels of nesting.

Get Involved

Documentation, bug reports, pull requests, and all other contributions are welcome!

Wiki

We have an Official TOML Wiki that catalogs the following:

Projects using TOML
Implementations
Validators
Language-agnostic test suite for TOML decoders and encoders
Editor support
Encoders
Converters

Please take a look if you'd like to view or add to that list. Thanks for being a part of the TOML community!

toml's People

Stargazers

Watchers

Forkers

calebhearth dirk rjsamson jpetazzo haileys alex emef benolee gcorreaq snormore fmendez leereilly jm uiri gilesbowkett c0mkid prlambert lcycon jbutkus ricardobeat tef binarymuse glenjamin mrflip agrison felipap leonelquinteros vdt dosire alexbeletsky hsmyers redhotvengeance liuggio pelletier manicolosi coop182 lantiga mbleigh goto-bus-stop alexkalderimis stevestreza bryant gitlisted supermarin johnlcox akatz seliopou kalta nvdnkpr biilmann javisoto netstu devinrhode2 evilncrazy rajeshpg mikhail-vnukov zidizei mackwic jonabrams cespare richardvasquez dryman ajwans mwanji hwinkel douwem asafh ayushchd mschuetz dreamfrog infininight ianstormtaylor pygy amattn kaos sumbach shayounala craiglittle chiehwen khanh0w3n yhslai teslaji dkris drducker 0x1b-xyz dredhorse fkoner invalidmail skystrife jack-bliss laurent22 sjith abc2001x pnathan lmno qutian gongo 88alex silviaovreiu mykook

toml's Issues

Spec fails to specify what occurs at a newline

The spec does not clearly specify how the following shoudl be parsed:

key = "value
  that spans across a newline"

key = [
  1, 2, 3
]

For a spec designed to be ambigious, the spec should clearly document whether a newline terminates a primitive or whether parsing of a primitve cross newlines.

Versioning Parsers

I think we should add the commit hashes as versions to each listed parser on the README, so that people can actually go and check what spec it conforms to, as most aren't up to date at the current bleeding edge stage of spec development.

arrays can mix multiple data types

I believe

No, you can't mix data types, that's stupid.

is at odds with

[ [ 1, 2 ], ["a", "b", "c"] ]

Namely, the above is an array with two elements: the first element is an array of integers and the second element is an array of strings. This is a mixing of data types.

Practically, this makes it awkward for languages whose native lists are homogeneous to read TOML correctly.

TOML allows mixing of data types at the level of the hash table. I propose that TOML not allow the mixing of data types at the array level.

What characters are whitespace?

The spec says that a key ends at the last non-whitespace character before an equals sign. What characters are whitespace? Is that purely or are\n,\r,\f,\v considered spaces? How about unicode spaces?

Bro, do you even RFC?

BNF or parser bugs for dayz.

Raw strings (no escapes)

It’s not uncommon to need backslashes in configuration files (e.g. a regular expression).

I propose single-quoted strings be “raw strings” where backslash is always literal.

The only special character sequence inside raw strings should be the single quote, which, when followed by whitespace or newline, terminates the string, when followed by a second single quote, is literal (and the second single-quote is eaten). All other cases are invalid (although one could treat them as literal).

Example:

str = 'we can use '' inside raw strings' # single quotes in single quotes
regexp = '\b\d{1,3}(\.\d{1,3}){3}\b' # match IP address

Reference to a hash

I think there needs to be a way to reference a hash in the same document, like in YAML.

A common practice in Rails is to have the root hash of a YAML config file map environment names to another hash containing the actual configuration vars, which really benefits from referencing when you want the production and staging environments to have the same configuration save for one or two keys.

Proposed format:

[production]
key = "value"
key2 = "value2"

[production.database]
host = "123.123.123"
port = 234
database = "prod"

[staging]
&production
key2 = "othervalue2"

[staging.database]
&production.database
database = "stage"

This is a little different from YAML in that there is no anchor and that we're using the ampersand for reference instead of anchor, but I like it this way because the reference can be read as "and other_hash", which makes sense because we're defining the hash to have values X, Y and Z and those from the other hash.

The syntax is just an ampersand followed by the full key for the hash to reference. You should never reference the root hash, and without anchors you can't.

Just to clear something up: production.database was copied to staging.database, but when we then define staging.database we don't want to override the whole hash, just one key, so we reference production.database again.

Idea: Inline includes?

In Python-land, it's not uncommon to include subsections of an INI file from another INI file. (Especially in Paste-style INI files.)

So I could have

# development.ini
[app]
database = "sqlite://whatevs.db"
# [... lots of other things]

And in my unit testing environment, I'd override just one key...

# testing.ini
[app]
use = config:development.ini#app
database = "sqlite:///:memory:"

TOML proposal

As for how this would look like in TOML if we wanted to support it, perhaps some variation of...

# Assign to section?
[app] = development.ini#app
database = "override value"

Or perhaps...

# Non-assignments are treated as includes?
[app]
development.ini#app
database = "override value"

# Could even support multiple includes
[app]
app_defaults.ini
development.ini#app
somekey = "foo"
someotherkey = "bar"
override_keys.ini#app

Quite possibly out of scope for TOML, but an interesting thought experiment.

Anonymous hashes

Unless I missed something, there's no way to have an arbitrary sized array of hashes.

Should TOML be considered a configuration or serialization format?

Fundamentally this is a question of wehther TOML documents should be treated as secure to parse or not. In practical terms: in languages like Ruby with a Symbol class which is insecure to create from arbitrary strings, should keys become Symbols or Strings?

Format and Syntax check tool

As you want there to be one way to do everything, you could consider having a simple tool to reformat toml files in a canonical style tomlfmt, like gofmt for go, to prevent bikeshedding over minor formatting/styling issues, and perhaps later to detect any syntax errors and emit friendly messages on how to fix them.

What size are floats are integers

Can we assume IEEE-754 doubles and 64-bit signed 2s complement integers? What does the spec require in terms of size for these types.

Allow keys with no (i.e. empty) value

For example:

[section]
mykey =

This is used, for example in mercurial's configuration files, to indicate that a certain config key has been defined, in cases in which you do not need an actual configuration key (e.g. to enable an extension).

Extension?

What would be the standard extension to use for toml files?

Are newlines lf, crlf, or all of the above?

Proposal: use spaces instead of periods for hierarchical separation

The .gitconfig uses spaces instead of periods to specify multiple levels of hierarchy, so instead of:

[parent.child]
key = 5

It would be

[parent child]
key = 5

One benefit of this is it would allow multi-word keys (if that was desirable).

[parent "child with spaces"]
key = 5

for instance.

Ambiguities/Lack of Specification—Non Printable characters, Unicode, Doubles, Keys, Keygroups.

Is the format designed to allow non printable characters inside of it? The special characters pick a small set of ascii control codes to escape, but ignores the other C0/Unicode control codes.

I.e Why is tab escaped, but not vertical tab? Additionally, unicode line breaks aren't escaped i.e \u2028 (Line separator) and \u2029 (Paragraph separator), but the ascii CR and LF are.

Since strings are in UTF-8, you won't normally expect a BOM, but the spec doesn't say if toml files should/must/may have a BOM. It should also be clear on if any unicode normalisation should be applied/expected on the strings inside, or the key names.

What should a compliant parser do when it encounters an invalid utf-8 byte sequence? Does it skip the line? Should it error? Interoperability means parsers should have the same failure behaviours.

Floats (doubles) don't represent the full spectra of acceptable IEEE values, missing -Infinity, +Infinity and NaN.)

What should a compliant parser do when it encounters a float it can't represent? Similarly, there is no rounding mechanism specified for when a decimal float value doesn't fit neatly into a double.

Note: If precision is required, perhaps C99 hex formatted floats should be allowed (or in c speak, scanf("%a"))

Datetimes: You may also want to refer to RFC 3339 over ISO8601. The RFC is more explicit about handling leap seconds and what information is required in a timestamp, and demands uppercase 'Z' and 'T' in datetimes, which toml doesn't mandate.

You ask for "Full Zulu" but your example skips the fractional seconds part of an ISO datetime, which is optional in the RFC above, but it isn't clear if a compliant parser should support them.

Keygroups/Keys trimming leading and trailing tab/spaces, and it is implied that these names will not have any whitespace (0x09 or 0x20) characters, but as mentioned, unicode's whitespace (including vertical tabs) are not forbidden or stripped. Note: Unicode defines 26 whitespace characters.

Keys and Keygroup names have a whole load of ambiguity and I have a lot of questions:

Can escape sequences appear inside of key/keygroup names?

Can a key/group have a space in the name? Is "[foo bar]" valid? is "foo bar = 1" valid?

Can key groups have []'s in their name? Is "[[foo]]" a valid keygroup name, or "[[foo]", "[foo]]" ?

How do you parse "foo = bar = 1"? is this the key "foo = bar" or assigning to two keys "foo" and "bar"? The spec doesn't say which equals sign to use when there is more than one, or if more than one is an error.

Can keygroups have empty names? i.e is [] a valid keygroup.

Can keygroups have nothing between dots? is [foo..bar] the keygroup for "foo", "", "bar".

Can keys be empty? Does "=1" parse as with the key '' and the value 1.

Can keys have the characters '[' or ']' inside of them. What if you have "[foo] = blah"?

Can keys have '.' in the name? Would "foo.blah = 1" be the key 'foo.blah' or would it work like a keygroup?

Can keys be numbers or dates? Can keygroup names be numbers or dates?

What happens if you use the name 'true' or 'false' as a keygroup or a key? Does it use the string value or the implied boolean value?

Test Suite?

Do we need a test suite to test a parser's compliance level?

"Front matter"

You can parse any file as YAML if you start it the right way. Will TOML be similar?

Code back-ticks in the test files.

This is strange -- and so may not be an issue. I go to mojumbo/toml/tests click on example.toml, where I can see three back-ticks at the bottom of the file. I click raw, and they are still there in the file making me wonder if the back-ticks are part of the file grammar.

Here's where it get's strange. I hit the back-button and I'm on the update from C0mkid when I started on the updated from rossipedia. Meanwhile the preview of example.toml doesn't have the three back-ticks, but the raw does?

In the end all I really wonder is if this back-ticks are supposed be part of the grammar -- possibly signalling EOF?

Round-trippable doubles

You make a point of saying "64-bit double precision", which is dandy. But they don't round-trip in all cases, which sucks if you want to use this format for interchange instead of just config.

Most of the solutions to this suck. printf/scanf defines a %a conversion, for a hex representation. Its not brilliant for readability, but at least it works. Think about making it natively recognised as a number by the parser (in JSON I had to put it in a string with some extra notation so that I could tell it wasn't a string. That's poo).

: instead of [,] and =

    [owner]
    name = "Tom Preston-Werner"
    organization = "GitHub"
    bio = "GitHub Cofounder & CEO\nLikes tater tots and beer."
    dob = 1979-05-27T07:32:00Z # First class dates? Why not?

why not

    owner:
    name: "Tom Preston-Werner"
    organization:  "GitHub"
    bio: "GitHub Cofounder & CEO\nLikes tater tots and beer."
    dob: 1979-05-27T07:32:00Z # First class dates? Why not?

Is a leading or trailing zero required in a float

Are .3 or 3. legal representation of a float (as they are in Python)?

"No, JSON doesn't count. You know why."

Actually, no, I don't know why. Why? (not trolling, serious)

No way to go back to global

While trying to see how I would write some of the json I have lying around,
I noticed that one restriction that TOML has is that it doesn't allow
to go back to the global hash.

name = "dotset"
version = "2013.02.24"

[engines]
node = ">=0.4.0"
[scripts]
test = "node js/test"
# You don't want the following key to be part of "scripts"
main = "js/set.js"

Obviously, we can move it back to the start of the document,
but that breaks the promise that you can move things around
(which is the reason we don't have [.nested-keygroup]).
Also, it means that we cannot put related elements together
unless they're both one level deep.

Proposal: use [].

name = "dotset"
version = "2013.02.24"

[engines]
node = ">=0.4.0"
[scripts]
test = "node js/test"
# You don't want the following key to be part of "scripts"
[]
main = "js/set.js"

boolean

We use these kinds of files for configuration, right?

Looking through a random configuration files about half of it was booleans.

I know "yes" can be truthy, but I'd much rather have a definitive true/false than something like:

enableSymmetricUpgradableAdapter = "true"
enableNonVolatileDataStreamProtocol = "TRUE"
enableSolidStateMechanicsArchive = "yes"
enableFuzzyDataStreamVariable = "sure why not"
enableAsynchronousRigidSignal = 1
enableDigitalFloatingPointProcedure = 3.14159265359

I propose true/false because it directly translates to JSON without any coercion.

Buzzwords from some random generator.

Steal from JSON

TOML is not JSON, and TOML is necessary: JSON is not a configuration language. Humans need comments, and newlines, and maybe don't always want to type a quote mark. But that doesn't mean we can't love the shit out of JSON. Crockford's RFC for JSON is the most elegant, readable spec you'll ever see, with well-nuanced choices between simplicity and expressiveness.

More immediately: the more that is stolen from JSON, the more code we can re-use from its many fast (and safe) parsers.

Strings

Consider adopting exactly the same definition and syntax of a JSON string. That would mean these changes:

Follow JSON and require that all control characters must be escaped (U+0000 through U+001F are not allowed in raw form within a string). Note specifically: that means tabs are disallowed within a string. (They'd still be OK outside of a string in TOML)
Allow for \u encoding of characters. This is extremely important, as you can't trust text editors and browsers to faithfully preserve naked unicode (they often apply normalization).

Any character may be escaped. If the character is in the Basic
Multilingual Plane (U+0000 through U+FFFF), then it may be
represented as a six-character sequence: a reverse solidus, followed
by the lowercase letter u, followed by four hexadecimal digits that
encode the character's code point. The hexadecimal letters A though
F can be upper or lowercase. So, for example, a string containing
only a single reverse solidus character may be represented as
"\u005C".
To escape an extended character that is not in the Basic Multilingual
Plane, the character is represented as a twelve-character sequence,
encoding the UTF-16 surrogate pair. So, for example, a string
containing only the G clef character (U+1D11E) may be represented as
"\uD834\uDD1E".

Numbers

JSON's numbers:

A number contains an integer component that may be prefixed with
an optional minus sign, which may be followed by a fraction part
and/or an exponent part. Octal and hex forms are not allowed.
Leading zeros are not allowed. A fraction part is a decimal
point followed by one or more digits.
An exponent part begins with the letter E in upper or lowercase,
which may be followed by a plus or minus sign. The E and
optional sign are followed by one or more digits. Numeric values
that cannot be represented as sequences of digits
(such as Infinity and NaN) are not permitted.

JSON doesn't allow NaN or Infinitity. This is a good idea; those are fiddly to handle
There are no octal or hex strings. Hex strings are a pretty common need in a config file, so might be a justifiable addition. Octal is inessential if hex is provided, and so TOML should not allow octal either way. If hex is allowed, there should be syntax for hexadecimal integers, and one for hexadecimal floating-point numbers, and in both cases only what you can express in the 64-bit signed form of that quantity. You'd then also have to decide whether a hex-encoded NaN or Infinity is OK.

Encoding

JSON allows any Unicode encoding. TOML should require the entire document be in UTF-8 only.
A JSON sequence must have two octets ([] or {}), which is also a secret message to the parser about what encoding it uses. Since there's only one encoding allowed for TOML, we're free to allow a blank file to be a valid TOML file.

Encoding

The very first line in the file should specify the encoding. If absent, UTF-8 should be assumed.

-*- Shift_JIS -*-

Or something like that. Some Shift JIS characters do not have UTF-8 code points.

Simple Perl implementation

I created a simple Perl implementation, which passes the test in the tests/ directory and has minimal external requirements

Normalize space inside keys?

An offshoot of #65:

The current spec makes keys "space sensitive", i.e. foo bar is different from foo\tbar and foo bar (actually two spaces, coalesced in the HTML output) are all different. In some cases, the first and the second ones are visually indistinguishable, and it is easy to overlook the extra space in the third case.

It would make sense to normalize embedded white space to a single space.

`Bar  \t bar` ==> `Bar bar`

[RFC] adding a Schema validation

A great feature of XML is that you could validate a given XML file with another file.
In that file you could define the schema of the file with default values, optional values, children, data types etc.

In some env the schema is a killer feature, its usage should be suggested as best practice.

The schema file could be called
TOSF - Schema File
TOSD - as XSD
TOLO - Liuggio Obvious

Versioning Parsers

Support "special" numeric values such as NaN, +Inf and -Inf

This would be necessary to cover all possible double64 IEEE 754 floating point numbers.

Strings without double quotes

It will be great if you support strings not only but even without double quotes. This way TOML can handle most of the config files well.

{brace} yourself

How 'bout braces instead of brackets for the key[groups]? That will help to differentiate between arrays and keys and increase awesome.

Where awesome correlates directly with readability and at-a-glanciness.

Allow `_` underbar separator within numbers (eg 5_000)

TOML should allow _ underbar separators within numbers: 100_000, 3.141_593, etc. Numbers may not start with an underbar. Underbars are removed from numbers before parsing.

Readablility saves lives:

[nukes]
launch_confirmation_delay_ms = 30000   # thought I'd typed five minutes, oops
launch_confirmation_delay_ms = 30_000  # clearly not five minutes

Symbols vs Strings

Not all languages have a string-vs-symbol distinction, but when they do, it'd be nice to have a way to distinguish the difference for both keys and values.

Redis's #new takes a hash of symbol -> value, and it would be really nice if TOML could directly generate this symbol-keyed hash without any conversions.

Redis.new(cfg)

Also, it is often desirable to have symbols as values.

Perhaps issue #65 points at a syntax for this (unquoted identifiers). or perhaps a prefix is in order.

Because this is a configuration language and not a communication language, the fact that symbols can be created should not be a security issue.

`nil` or `null` values

(moved discussion from #11)

It seems like nil or null values must be allowed. For example,

# is this equivalent to {"foo":null,"bar":{"baz":null}} or {} ?
[foo]
[bar.baz]

in this case, it seems like it would make sense to be able to set them with the normal key = value syntax. Here are some alternatives for thought:

key = nil
key = null
key = # empty value ala bash

Use semantic versioning on the spec

Since the spec may have breaking changes, it would be nice to be able to use semantic versioning so we know which implementations will be compatible.

Allow using scientific notation for floats (i.e. allow exponents)

It should be possible to specify float values using a base-exponent such as:

-123.2835E-21

to represent -123.2835 * 10 ^ -21

are negative numbers allowed?

what about negative numbers?

-35
-1.2

I'll throw in another question just because I haven't seen it yet -- what about nil?

Implicit comma

eg.
["a" "super" "string"] ↔ ["a", "super", "string"]

↕
key = [0, 1, 2, 3, 4, 5, 6, 7, 8]

Why:

easy typing
???

Rule:
Automagically insert a comma between two values in an array. (or something that achieves the same goal)

Keys should not allow arbitrary characters: they should be `[a-zA-Z_]\w*` only

I'm surprised that keys are allowed to be arbitrary characters. What is the configuration file use case for this? Can those cases be just as well handled as a literal hash, or as an array of pairs? Allowing funny characters seems to go against the "simple as possible" ethos.

I think of keys as living in the control path, not data path, and thus predictability should win over expressiveness (no matter how fun it would to configure ☃.♛=シ). I can't say for sure what would go wrong, but allowing nulls, vertical tab characters, funny unicode spaces and so forth sounds like an eventual security flaw. Unicode opens a lot of "there should only be one way to do anything" holes: for just one example, the strings dīáçṙïtĭč and dīáçṙïtĭč are semantically equivalent but not byte-comparable (one has combining diacritics, the other precomposed). Are two keys equal if they character comparable, or only if they are byte-comparable?

The spec should require keys to be identifier-like: they must start with a letter or underbar, and contain only letter, underbar or number. That is: [a-zA-Z_]\w*.

some templating

I'm sure this will get shot down as adding too much complexity but hear me out as this was done in Grunt and makes Gruntfiles that much more awesome and readable. It's like SASS/LESS for config.

Format: <%= parent.child %>
Inserts: The value(s) with some intelligence as to not break anything.

[database]
name = "buckeye"
server = "192.168.1.1"
ports = [ "8001", "8001", "8002" ]
connection_max = 5000
enabled = true

[database.qa]
name = "<%= database.name%> QA"
server = "<%= database.server %>"
ports = [ <%= database.ports %>, "80003", "80004" ]
connection_max = <%= database.connection_max %>
enabled = <%= database.enabled %>

[database.stage]
<%= database.qa %>
name = "<%= database.name%> Stage"
enabled = false

Use { for arrays?

Hello,

I was pondering writing a parser for toml (in Go, just saw one that is created by some else), what struck me was the different semantics attached to [ and ]. They are used for keys and also to start/stop arrays. Wouldn't it be even easier to parse toml if arrays are started by { and stopped with }.

Regards,
Miek

"Special Characters" is undefined.

Please formally define what you mean by "special characters".

kthnksbye

pls clarify datetime

iso8601 is very broad.

should datetimes in toml include durations? should it be possible to omit the time?

maybe it is useful to restrict 8601 to e.g. the same profile as the w3:
http://www.w3.org/TR/NOTE-datetime

Trailing commas

Please let me do:

a = [
    1,
    2,
    3,
]

Otherwise my diffs look like arse.

Allow hexadecimal, binary and octal integer representations

In many areas (such as science and engineering) integer configuration values are very common. Having TOML support them would make the format a little more general and less web-centric.

Arrays and keys?

Is it correct to understand arrays as integer indexed keys?
If so, wouldn't the two be the same, whilst only the latter conforms to the No, you can't mix data types, that's stupid.

array = [ [1,2], ["a", "b"] ]

[array]
0 = [1,2]
1 = ["a", "b"]

toml-lang / toml Goto Github PK

toml's Introduction

TOML

Objectives

Example

Comparison with Other Formats

Get Involved

Wiki

toml's People

Stargazers

Watchers

Forkers

toml's Issues

TOML proposal

Strings

Numbers

Encoding

Recommend Projects

Recommend Topics

Recommend Org