Code Monkey home page Code Monkey logo

tomlc99's Introduction

tomlc99

TOML in c99; v1.0 compliant.

If you are looking for a C++ library, you might try this wrapper: https://github.com/cktan/tomlcpp.

Usage

Please see the toml.h file for details. The following is a simple example that parses this config file:

[server]
	host = "www.example.com"
	port = [ 8080, 8181, 8282 ]

These are the usual steps for getting values from a file:

  1. Parse the TOML file.
  2. Traverse and locate a table in TOML.
  3. Extract values from the table.
  4. Free up allocated memory.

Below is an example of parsing the values from the example table.

#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <stdlib.h>
#include "toml.h"

static void error(const char* msg, const char* msg1)
{
    fprintf(stderr, "ERROR: %s%s\n", msg, msg1?msg1:"");
    exit(1);
}


int main()
{
    FILE* fp;
    char errbuf[200];

    // 1. Read and parse toml file
    fp = fopen("sample.toml", "r");
    if (!fp) {
        error("cannot open sample.toml - ", strerror(errno));
    }

    toml_table_t* conf = toml_parse_file(fp, errbuf, sizeof(errbuf));
    fclose(fp);

    if (!conf) {
        error("cannot parse - ", errbuf);
    }

    // 2. Traverse to a table.
    toml_table_t* server = toml_table_in(conf, "server");
    if (!server) {
        error("missing [server]", "");
    }

    // 3. Extract values
    toml_datum_t host = toml_string_in(server, "host");
    if (!host.ok) {
        error("cannot read server.host", "");
    }

    toml_array_t* portarray = toml_array_in(server, "port");
    if (!portarray) {
        error("cannot read server.port", "");
    }

    printf("host: %s\n", host.u.s);
    printf("port: ");
    for (int i = 0; ; i++) {
        toml_datum_t port = toml_int_at(portarray, i);
        if (!port.ok) break;
        printf("%d ", (int)port.u.i);
    }
    printf("\n");

    // 4. Free memory
    free(host.u.s);
    toml_free(conf);
    return 0;
}

Accessing Table Content

TOML tables are dictionaries where lookups are done using string keys. In general, all access functions on tables are named toml_*_in(...).

In the normal case, you know the key and its content type, and retrievals can be done using one of these functions:

toml_string_in(tab, key);
toml_bool_in(tab, key);
toml_int_in(tab, key);
toml_double_in(tab, key);
toml_timestamp_in(tab, key);
toml_table_in(tab, key);
toml_array_in(tab, key);

You can also interrogate the keys in a table using an integer index:

toml_table_t* tab = toml_parse_file(...);
for (int i = 0; ; i++) {
    const char* key = toml_key_in(tab, i);
    if (!key) break;
    printf("key %d: %s\n", i, key);
}

Accessing Array Content

TOML arrays can be deref-ed using integer indices. In general, all access methods on arrays are named toml_*_at().

To obtain the size of an array:

int size = toml_array_nelem(arr);

To obtain the content of an array, use a valid index and call one of these functions:

toml_string_at(arr, idx);
toml_bool_at(arr, idx);
toml_int_at(arr, idx);
toml_double_at(arr, idx);
toml_timestamp_at(arr, idx);
toml_table_at(arr, idx);
toml_array_at(arr, idx);

toml_datum_t

Some toml_*_at and toml_*_in functions return a toml_datum_t structure. The ok flag in the structure indicates if the function call was successful. If so, you may proceed to read the value corresponding to the type of the content.

For example:

toml_datum_t host = toml_string_in(tab, "host");
if (host.ok) {
	printf("host: %s\n", host.u.s);
	free(host.u.s);   /* FREE applies to string and timestamp types only */
}

** IMPORTANT: if the accessed value is a string or a timestamp, you must call free(datum.u.s) or free(datum.u.ts) respectively after usage. **

Building and installing

A normal make suffices. You can also simply include the toml.c and toml.h files in your project.

Invoking make install will install the header and library files into /usr/local/{include,lib}.

Alternatively, specify make install prefix=/a/file/path to install into /a/file/path/{include,lib}.

Testing

To test against the standard test set provided by toml-lang/toml-test:

% make
% cd test1
% bash build.sh   # do this once
% bash run.sh     # this will run the test suite

To test against the standard test set provided by iarna/toml:

% make
% cd test2
% bash build.sh   # do this once
% bash run.sh     # this will run the test suite

tomlc99's People

Contributors

arp242 avatar bamchoh avatar cktan avatar derekcresswell avatar dvdx avatar fridfinnurm avatar iain-anderson avatar kgiszczak avatar miniwoffer avatar moorereason avatar mweshahy avatar narru19 avatar obreitwi avatar ownesis avatar theakman2 avatar vladh avatar xchgeax avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tomlc99's Issues

Request - include print capability in core library

I'm using your library to parse TOML without trouble, thanks.

I notice toml_cat.c contains logic for printing a parsed TOML file. Could you add that functionality into toml.c/h so that printing to a file pointer is supported?

My app will read and write the config file, so the ability to write is needed. But you have 95% of what is needed already.

BurntSushi Tests Fail due to deprecation in go get

When running test1/build.sh, go outputs the following notice:

$ bash build.sh 
go get: installing executables with 'go get' in module mode is deprecated.
	Use 'go install pkg@version' instead.
	For more information, see https://golang.org/doc/go-get-install-deprecation
	or run 'go help get' or 'go help install'.

As expected, running run.sh after this fails:

$ bash run.sh 
run.sh: line 5: ./toml-test: No such file or directory

some BurntSushi invalid input files parse without error

The following "invalid" test input files are parsed without error from tomlc99:

datetime-malformed-no-leads.toml

no-leads = 1987-7-05T17:45:00Z

datetime-malformed-no-secs.toml

no-secs = 1987-07-05T17:45Z

datetime-malformed-no-t.toml

no-t = 1987-07-0517:45:00Z

datetime-malformed-with-milli.toml

with-milli = 1987-07-5T17:45:00.12Z

float-no-leading-zero.toml

answer = .12345
neganswer = -.12345

float-no-trailing-digits.toml

answer = 1.
neganswer = -1.

MSVC W4 warnings and strndup not found

When compiling toml.c on windows x64 Microsoft Visual Studios Compiler on warning level 4 (/W4) there are a lot of implicit cast warnings. (int64 or int not explicitly cast to char, etc.).

Another unrelated topic: windows standard libraries do not seem to have a definition for strndup. It's possible I'm missing something here though?

tomlerrors

Get line number of toml table or value in file

First off, thank you so much for making this library :)

Is there a way to get the line number of a table within a file? If not, is it a good idea to implement? I'd be happy to do the pull request if it is!

For example, if a config file has an invalid value (like there is a string value that must be one of a set number of values), someone could use this function to print out "invalid value at line X" to help find and fix their config file for their app.

toml_utf8_to_ucs() returns incorrect results

We have tomlc99 included in another project that monitors test coverage as part of its continuous integration testing. toml_ucs_to_utf8() and toml_utf8_to_ucs() were flagged as uncovered, so I thought I'd create some simple unit tests using libtap

I cross checked multibyte values obtained from this online conversion tool. No probs with toml_ucs_to_utf8():

    /* Check boundary values against results from:
     *   http://www.ltg.ed.ac.uk/~richard/utf-8.cgi
     */

    ok (toml_ucs_to_utf8 (0x80, buf) == 2 && !memcmp (buf, "\xc2\x80", 2),
        "ucs_to_utf8: 0x80 converted to 2-char UTF8");
    ok (toml_ucs_to_utf8 (0x7ff, buf) == 2 && !memcmp (buf, "\xdf\xbf", 2),
        "ucs_to_utf8: 0x7ff converted to 2-char UTF8");

    ok (toml_ucs_to_utf8 (0x800, buf) == 3 && !memcmp (buf, "\xe0\xa0\x80", 3),
        "ucs_to_utf8: 0x800 converted to 3-char UTF8");
    ok (toml_ucs_to_utf8 (0xfffd, buf) == 3 && !memcmp (buf, "\xef\xbf\xbd", 3),
        "ucs_to_utf8: 0xfffd converted to 3-char UTF8");

    ok (toml_ucs_to_utf8 (0x10000, buf) == 4
        && !memcmp (buf, "\xf0\x90\x80\x80", 4),
        "ucs_to_utf8: 0x10000 converted to 4-char UTF8");
    ok (toml_ucs_to_utf8 (0x1fffff, buf) == 4
        && !memcmp (buf, "\xf7\xbf\xbf\xbf", 4),
        "ucs_to_utf8: 0x1ffff converted to 4-char UTF8");

    ok (toml_ucs_to_utf8 (0x200000, buf) == 5
        && !memcmp (buf, "\xf8\x88\x80\x80\x80", 5),
        "ucs_to_utf8: 0x200000 converted to 5-char UTF8");
    ok (toml_ucs_to_utf8 (0x3ffffff, buf) == 5
        && !memcmp (buf, "\xfb\xbf\xbf\xbf\xbf", 5),
        "ucs_to_utf8: 0x3ffffff converted to 5-char UTF8");

    ok (toml_ucs_to_utf8 (0x4000000, buf) == 6
        && !memcmp (buf, "\xfc\x84\x80\x80\x80\x80", 6),
        "ucs_to_utf8: 0x4000000 converted to 6-char UTF8");
    ok (toml_ucs_to_utf8 (0x7fffffff, buf) == 6
        && !memcmp (buf, "\xfd\xbf\xbf\xbf\xbf\xbf", 6),
        "ucs_to_utf8: 0x7fffffff converted to 6-char UTF8");

However, attempting to reverse these values through toml_utf8_to_ucs() failed in all multibyte cases:

    ok (toml_utf8_to_ucs ("\xc2\x80", 2, &code) == 2 && code == 0x80,
        "utf8_to_ucs: 0x80 converted from 2-char UTF8");
    ok (toml_utf8_to_ucs ("\xdf\xbf", 2, &code) == 2 && code == 0x7ff,
        "utf8_to_ucs: 0x7ff converted from 2-char UTF8");

    ok (toml_utf8_to_ucs ("\xe0\xa0\x80", 3, &code) == 3 && code == 0x800,
        "utf8_to_ucs: 0x800 converted from 3-char UTF8");
    ok (toml_utf8_to_ucs ("\xef\xbf\xbd", 3, &code) == 3 && code == 0xfffd,
        "utf8_to_ucs: 0xfffd converted from 3-char UTF8");

    ok (toml_utf8_to_ucs ("\xf0\x90\x80\x80", 4, &code) == 4 && code == 0x10000,
        "utf8_to_ucs: 0x10000 converted from 4-char UTF8");
    ok (toml_utf8_to_ucs ("\xf0\x90\x80\x80", 4, &code) == 4 && code == 0x10000,
        "utf8_to_ucs: 0x10000 converted from 4-char UTF8");

    ok (toml_utf8_to_ucs ("\xf8\x88\x80\x80\x80", 5, &code) == 5 && code == 0x200000,
        "utf8_to_ucs: 0x200000 converted from 5-char UTF8");
    ok (toml_utf8_to_ucs ("\xfb\xbf\xbf\xbf\xbf", 5, &code) == 5 && code == 0x3ffffff,
        "utf8_to_ucs: 0x3ffffff converted from 5-char UTF8");

    ok (toml_utf8_to_ucs ("\xfc\x84\x80\x80\x80\x80", 6, &code) == 6 && code == 0x4000000,
        "utf8_to_ucs: 0x4000000 converted from 6-char UTF8");
    ok (toml_utf8_to_ucs ("\xfd\xbf\xbf\xbf\xbf\xbf", 6, &code) == 6 && code == 0x7fffffff,
        "utf8_to_ucs: 0x7fffffff converted from 6-char UTF8");

I may just go ahead and comment out the function in our copy since we are not using it, but thought you might want to be aware of the results.

Missing Values

Hello,

I have the following .toml file:

restart = true

with a key named restart that must have a boolean value. Also, it should be legal to omit this value from the config file, i.e. it is optional.

I use the following code to read the value:

toml_datum_t datum = toml_bool_in(root_table, "restart");
if (!datum.ok) {
    printf("Key 'restart' is either missing or not a boolean.\n");
} else {
    // do something
}

Is there a way to distinguish between a value missing in the config file or having the incorrect type? E.g. when the config file would look like this:

restart = "a string instead of a boolean"

Thanks a lot!

Add save functionality

For applications that want to save modified settings to TOML besides just reading them, it would be useful to have something along the lines of toml_dump and toml_dumps.

Type checking

Looking at the API I don't think this library actually knows what type a value in a table is. I'd like to iterate over all keys in a table and determine their type directly. This isn't possible right now, is it?

How to free string and timestamp return by this lib?

struct toml_datum_t {
  int ok;
  union {
    toml_timestamp_t *ts; /* ts must be freed after use */
    char *s;              /* string value. s must be freed after use */
    int b;                /* bool value */
    int64_t i;            /* int value */
    double d;             /* double value */
  } u;
};

It seems that s and ts should be freed by the user. But xfree is internal, toml_free(toml_table_t *tab) only for table. Then how can I free these returned value?

Also, it seems the returned value of toml_array_in and toml_table_in need not to be freed?

suggestion

Are there any plans to develop an interface that writes back to the file after modification of the toml file configuration?

bad input file causes segfault

I wrote a little TAP test suite for a project that is importing tomlc99. It attempts to parse the BurntSushi invalid and valid input files. I ran across one input file that caused a segfault. It is table-array-implicit.toml and looks like this:

# This test is a bit tricky. It should fail because the first use of
# `[[albums.songs]]` without first declaring `albums` implies that `albums`
# must be a table. The alternative would be quite weird. Namely, it wouldn't
# comply with the TOML spec: "Each double-bracketed sub-table will belong to 
# the most *recently* defined table element *above* it."
#
# This is in contrast to the *valid* test, table-array-implicit where
# `[[albums.songs]]` works by itself, so long as `[[albums]]` isn't declared
# later. (Although, `[albums]` could be.)
[[albums.songs]]
name = "Glory Days"

[[albums]]
name = "Born in the USA"

Here's a backtrace from gdb

Program received signal SIGSEGV, Segmentation fault.
parse_select (ctx=0x7fffffffd120) at toml.c:1143
1143		if (arr->kind == 0) arr->kind = 't';
(gdb) bt full
#0  parse_select (ctx=0x7fffffffd120) at toml.c:1143
        arr = 0x0
        dest = <optimized out>
        count_lbracket = 2
        z = {tok = <optimized out>, lineno = 13, 
          ptr = 0x60bf5c "albums]]\nname = \"Born in the USA\"\n", len = 6, 
          eof = <optimized out>}
#1  toml_parse (
    conf=conf@entry=0x60bd10 "# This test is a bit tricky. It should fail because the first use of\n# `[[albums.songs]]` without first declaring `albums` implies that `albums`\n# must be a table. The alternative would be quite weird"..., 
    errbuf=errbuf@entry=0x7fffffffd4a0 "", errbufsz=errbufsz@entry=255)
    at toml.c:1258
        tok = {tok = <optimized out>, lineno = <optimized out>, 
          ptr = <optimized out>, len = <optimized out>, eof = 0}
        ctx = {
          start = 0x60bd10 "# This test is a bit tricky. It should fail because the first use of\n# `[[albums.songs]]` without first declaring `albums` implies that `albums`\n# must be a table. The alternative would be quite weird"..., 
          stop = 0x60bf7e "", errbuf = 0x7fffffffd4a0 "", errbufsz = 255, 
          jmp = {{__jmpbuf = {140737488343328, 7620089810268263470, 
                140737488344224, 255, 3001, 6339856, -7620090080383921106, 
                7620089533462025262}, __mask_was_saved = 0, __saved_mask = {
                __val = {0 <repeats 16 times>}}}}, tok = {tok = RBRACKET, 
            lineno = 13, ptr = 0x60bf62 "]]\nname = \"Born in the USA\"\n", 
            len = 1, eof = 0}, root = 0x60a010, curtab = 0x60a010, tpath = {
            top = 0, key = {0x612790 "", 0x6127b0 "P\242`", 0x0, 0x0, 0x0, 
              0x0, 0x0, 0x0, 0x0, 0x0}, tok = {{tok = STRING, lineno = 13, 
                ptr = 0x60bf5c "albums]]\nname = \"Born in the USA\"\n", 
                len = 6, eof = 0}, {tok = STRING, lineno = 10, 
                ptr = 0x60bf3d "songs]]\nname = \"Glory Days\"\n\n[[albums]]\nname = \"Born in the USA\"\n", len = 5, eof = 0}, {tok = INVALID, lineno = 0, 
                ptr = 0x0, len = 0, eof = 0}, {tok = INVALID, lineno = 0, 
                ptr = 0x0, len = 0, eof = 0}, {tok = INVALID, lineno = 0, 
                ptr = 0x0, len = 0, eof = 0}, {tok = INVALID, lineno = 0, 
                ptr = 0x0, len = 0, eof = 0}, {tok = INVALID, lineno = 0, 
                ptr = 0x0, len = 0, eof = 0}, {tok = INVALID, lineno = 0, 
                ptr = 0x0, len = 0, eof = 0}, {tok = INVALID, lineno = 0, 
                ptr = 0x0, len = 0, eof = 0}, {tok = INVALID, lineno = 0, 
                ptr = 0x0, len = 0, eof = 0}}}}
#2  0x0000000000404b4f in toml_parse_file (fp=fp@entry=0x60ae80, 
    errbuf=errbuf@entry=0x7fffffffd4a0 "", errbufsz=errbufsz@entry=255)
    at toml.c:1316
        bufsz = 2000
        buf = 0x60bd10 "# This test is a bit tricky. It should fail because the first use of\n# `[[albums.songs]]` without first declaring `albums` implies that `albums`\n# must be a table. The alternative would be quite weird"...
        off = 622
        ret = <optimized out>
#3  0x0000000000401698 in parse_bad_file (path=<optimized out>, 
    errbuf=0x7fffffffd4a0 "", errsize=255) at toml_tap.c:99
        fp = 0x60ae80
        conf = 0x0
#4  0x000000000040185a in parse_bad_input () at toml_tap.c:156
        errbuf = "\000ine 1: unterminated quote\000key", '\000' <repeats 224 times>
        name = 0x60a06b "table-array-implicit.toml"
        reason = 0x406702 "parses"
        blacklisted = <optimized out>
        pattern = "./BurntSushi_input/invalid/*.toml", '\000' <repeats 1127 times>...
        results = {gl_pathc = 40, gl_pathv = 0x60b170, gl_offs = 0, 
          gl_flags = 256, gl_closedir = 0x0, gl_readdir = 0x0, 
          gl_opendir = 0x0, gl_lstat = 0x0, gl_stat = 0x0}
        i = <optimized out>
#5  0x0000000000401211 in main (argc=<optimized out>, argv=<optimized out>)
    at toml_tap.c:213
No locals.
(gdb) 

TOML encode with TOMLC99

Hi.
I was looking for tools to use TOML in C and found something like this.
It seems that TOML can be read, but if TOML can be written, what can be done in C?

Better support for nostdlib (no Glibc) environments

Thank you @cktan for the great TOML library!

We would like to use your TOML parser to parse configuration files in the Graphene framework (https://github.com/oscarlab/graphene). However, Graphene is a very low-level Linux emulation framework and thus sits below the Standard C Library (Glibc/Musl) and is built with -nostdlib .

Graphene provides some utility functions like malloc/calloc/memcpy/free/strcmp and so on (basically a small subset of memory management and string utilities). However, your TOML implementation currently relies on:

  1. ctype.h functions: toupper(), isalnum(), isdigit(). These can be easily replaced with actual implementations.
  2. strtoll() -- on 64-bit systems, this is just the alias to strtol(), easy via macros.
  3. realloc() -- Graphene doesn't have realoc (complicated reasons...), but your TOML code seems to use the same simple extend-memory pattern, so easy to replace with malloc + memcpy + free.
  4. setjmp()/longjmp() -- Graphene doesn't have these functions. These are the hardest ones to change, it took me significant modifications of your code to revert to the "classical C" style of propagating error codes to the caller.

I forked your TOML repo and introduced the changes described above: https://github.com/dimakuv/tomlc99/commits/master.

With these changes, your TOML library becomes very self-contained and requires only a handful of standard functions. You can review the changes via my link above.

I wonder if this is something you'd be willing to merge? We in the Graphene community would be happy to download-and-use the master of your repository (maybe with minimal patching). But right now the changes are significant. I could prepare a sequence of PRs for merge if you think my changes are reasonable.

Getting strings prints quotation marks

I've got something like this :

toml_array_t* profileNames = toml_array_in(table, "profileNames");
const char* nameOne = toml_raw_at(profileNames, 0);

Where profile names is :

profileNames = [
    "personal",
    "work"
]

In this case when I print out nameOne I get "personal" when I would expect personal. It seems like when getting a string literal it should not include the first and last quotation marks.

Re-defined strdup in toml.c

Hi!

I've included this library in a project and when trying to compile in a somewhat outdated OS it shows the following:

make[2]:
../src/config/toml.c:73: warning: "strdup" redefined
   73 | #define strdup(x) error:do-not-use---use-STRDUP-instead
      | 
In file included from /usr/include/string.h:630,
                 from ../src/config/toml.c:39:
/usr/include/bits/string2.h:1291: note: this is the location of the previous definition
 1291 | #   define strdup(s) __strdup (s)

Since it seems this is a system header, perhaps this solution would be a good workaround:

#undef strdup
#define strdup(x)	error:do-not-use---use-STRDUP-instead
static char* STRDUP(const char* s)
{
	int len = strlen(s);
	char* p = MALLOC(len+1);
	if (p) {
		memcpy(p, s, len);
		p[len] = 0;
	}
	return p;
}

#undef strndup
#define strndup(x)	error:do-not-use---use-STRNDUP-instead
static char* STRNDUP(const char* s, size_t n)
{
	size_t len = strnlen(s, n);
	char* p = MALLOC(len+1);
	if (p) {
		memcpy(p, s, len);
		p[len] = 0;
	}
	return p;
}

int64_t not found

Building with clang on amd64. I had to include stdint.h before including it. int64_t is not a built-in typedef.

clang --std=gnu99 does not change this.

toml 0.5.0 support

A newer version of the TOML spec (0.5.0) has been released. Do you intend to update tomlc99 at some point?

malloc-free

Hi,

would you be Ok with wrapping all use of malloc into a macro which can be externally defined?
That would be the quickest way of using this lib in a non-malloc environment.

toml encoding?

is there any plan in regards to adding toml encoding?
this appears to be the most up to date library, and I'd love to continue using it :D

toml_datum_t distinguish between non-present keys and invalid conversions

Hi!
I updated the the library in my project, which was using toml_raw_t. I noticed that was now deprecated in favor of the toml_datum_t interface.

However, in my case I assume missing keys in the configuration are fine (the configuration file does not need to be complete), but type conversion errors are reported to the user. However, when using toml_datum_t, I can not distinguish between a missing key or a type conversion problem. Is there a canonical way to check if a key is present in a table with the new interface, or do i have to resort to using toml_raw_in(..) != NULL?

Thanks in advance!

toml_parse_file including outside quotes in value

When parsing a file like this:

[acronym]
aa = "acronym add"
ae = "acronym edit"

[git]
ga = "git add -A"
gC = "git commit"
gc = "git commit -m"

The values in the tables acronym and git still have the outer double quotes. Is there something I'm missing when parsing files? The following except is the code I'm running.

FILE *fp = fopen(toml_fname, "r");
if (!fp)
    return 0;
char errbuf[128];
toml_table_t *t = toml_parse_file(fp, errbuf, sizeof(errbuf));

toml-spec-tests/values/qa-array-inline-nested-1000 and qa-table-inline-nested-1000 are broken on master

tomlc99 at cdbb9de. toml-spec-tests at 64effd796

make; cd test2; bash build.sh; bash run.sh|grep -v OK
cc -std=c99 -Wall -Wextra -fpic -O2 -DNDEBUG   -c -o toml.o toml.c
ar -rcs libtoml.a toml.o
cc -shared -o libtoml.so toml.o
cc -std=c99 -Wall -Wextra -fpic -O2 -DNDEBUG    toml_json.c libtoml.a   -o toml_json
cc -std=c99 -Wall -Wextra -fpic -O2 -DNDEBUG    toml_cat.c libtoml.a   -o toml_cat
Cloning into 'toml-spec-tests'...
remote: Enumerating objects: 119, done.
remote: Counting objects: 100% (119/119), done.
remote: Compressing objects: 100% (102/102), done.
remote: Total 402 (delta 17), reused 112 (delta 11), pack-reused 283
Receiving objects: 100% (402/402), 98.65 KiB | 0 bytes/s, done.
Resolving deltas: 100% (37/37), done.
toml-spec-tests/values/qa-array-inline-nested-1000  ... [FAILED]
toml-spec-tests/values/qa-table-inline-nested-1000  ... [FAILED]

toml_cat hang

This input hangs toml_cat:

thevoid = [[[[[M]]]]

To reproduce:

  • Create file with above text
  • Run toml_cat FILENAME

Hangs indefinitely

It this intended behavior or a bug?

Value of “0000000” isn’t caught as invalid syntax.

The spec says “Leading zeros are not allowed.” But if I create a file fg.toml:

SSL_DEFAULT_KEY_TYPE='system'
STARTDATE=0000000000

… and feed it to toml_cat, I get:

> ./toml_cat fg.toml
{
  SSL_DEFAULT_KEY_TYPE = "system",
ERROR: unable to decode value in table

Mixed-type arrays not supported

First of all, thanks for this great library. I'm writing some Kotlin Native bindings for it and so far it's working really well.

I just had a minor issue when writing a test for mixed-type arrays. The library produces an array type mismatch while processing array of values error.

According to the array spec, mixed-type arrays are allowed:

# Mixed-type arrays are allowed
numbers = [ 0.1, 0.2, 0.5, 1, 2, 5 ]
contributors = [
  "Foo Bar <[email protected]>",
  { name = "Baz Qux", email = "[email protected]", url = "https://example.com/bazqux" }
]

This has no impact on my application, as tables (or arrays of tables) work just fine, but it would be nice for supporting 100% of the spec.

Crash with malformed input

Hello,

The following input file (as produced by AFL) causes a NULL pointer derefence in libtomlc99:

[0]
0.0=[]
+.

For example with tomlcat:

% ./toml_cat ~/fuzzing/sessions/tomlc99/crash.toml                                                                                                                          [*][master]
AddressSanitizer:DEADLYSIGNAL
=================================================================
==1218247==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7efe0688e07b bp 0x7ffdcd28b3b0 sp 0x7ffdcd28ab40 T0)
==1218247==The signal is caused by a READ memory access.
==1218247==Hint: address points to the zero page.
    #0 0x7efe0688e07b in __interceptor_strcmp ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:442
    #1 0x5581a2f29478 in toml_table_in /home/kali/fuzzing/tomlc99/toml.c:1772
    #2 0x5581a2f2c2bf in parse_keyval /home/kali/fuzzing/tomlc99/toml.c:1066
    #3 0x5581a2f2d5ec in toml_parse /home/kali/fuzzing/tomlc99/toml.c:1378
    #4 0x5581a2f2f13d in toml_parse_file /home/kali/fuzzing/tomlc99/toml.c:1459
    #5 0x5581a2f24a63 in cat /home/kali/fuzzing/tomlc99/toml_cat.c:158
    #6 0x5581a2f24417 in main /home/kali/fuzzing/tomlc99/toml_cat.c:188
    #7 0x7efe06666d09 in __libc_start_main ../csu/libc-start.c:308
    #8 0x5581a2f24579 in _start (/home/kali/fuzzing/tomlc99/toml_cat+0x4579)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:442 in __interceptor_strcmp
==1218247==ABORTING

or without ASAN in GDB:

Program received signal SIGSEGV, Segmentation fault.
[----------------------------------registers-----------------------------------]
RAX: 0x940 ('@\t')
RBX: 0x55555555f960 --> 0x55555555f980 --> 0x55555555f940 --> 0x30 ('0')
RCX: 0x0
RDX: 0x0
RSI: 0x55555555f940 --> 0x30 ('0')
RDI: 0x0
RBP: 0x0
RSP: 0x7fffffffe358 --> 0x5555555573e0 (<toml_table_in+64>:     test   eax,eax)
RIP: 0x7ffff7f3f00a (<__strcmp_avx2+26>:        vmovdqu ymm1,YMMWORD PTR [rdi])
R8 : 0x0
R9 : 0x7fffffffe190 --> 0x0
R10: 0xfffffffffffff287
R11: 0x7ffff7f3eff0 (<__strcmp_avx2>:   mov    eax,edi)
R12: 0x55555555f980 --> 0x55555555f940 --> 0x30 ('0')
R13: 0x55555555f968 --> 0x0
R14: 0x55555555e48b --> 0xa2e2b ('+.\n')
R15: 0x1
EFLAGS: 0x10287 (CARRY PARITY adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x7ffff7f3effa <__strcmp_avx2+10>:   and    eax,0xfff
   0x7ffff7f3efff <__strcmp_avx2+15>:   cmp    eax,0xf80
   0x7ffff7f3f004 <__strcmp_avx2+20>:   jg     0x7ffff7f3f360 <__strcmp_avx2+880>
=> 0x7ffff7f3f00a <__strcmp_avx2+26>:   vmovdqu ymm1,YMMWORD PTR [rdi]
   0x7ffff7f3f00e <__strcmp_avx2+30>:   vpcmpeqb ymm0,ymm1,YMMWORD PTR [rsi]
   0x7ffff7f3f012 <__strcmp_avx2+34>:   vpminub ymm0,ymm0,ymm1
   0x7ffff7f3f016 <__strcmp_avx2+38>:   vpcmpeqb ymm0,ymm0,ymm7
   0x7ffff7f3f01a <__strcmp_avx2+42>:   vpmovmskb ecx,ymm0
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe358 --> 0x5555555573e0 (<toml_table_in+64>:    test   eax,eax)
0008| 0x7fffffffe360 --> 0x7fffffffe430 --> 0x55555555e480 ("[0]\n0.0=[]\n+.\n")
0016| 0x7fffffffe368 --> 0x7fffffffe430 --> 0x55555555e480 ("[0]\n0.0=[]\n+.\n")
0024| 0x7fffffffe370 --> 0x7fffffffe430 --> 0x55555555e480 ("[0]\n0.0=[]\n+.\n")
0032| 0x7fffffffe378 --> 0x55555555f900 --> 0x55555555f8c0 --> 0x30 ('0')
0040| 0x7fffffffe380 --> 0x3
0048| 0x7fffffffe388 --> 0x555555558299 (<parse_keyval+217>:    mov    r8,QWORD PTR [rsp+0x8])
0056| 0x7fffffffe390 --> 0x55555555f8c0 --> 0x30 ('0')
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
__strcmp_avx2 () at ../sysdeps/x86_64/multiarch/strcmp-avx2.S:101

This can be prevented by checking if key == NULL in toml_table_in() before attempting to dereference it.

AUR Package

Not an issue persay, I just wanted to let you know that I've packaged tomlc99 and submitted it to the aur. The package name is tomlc99-git.

Thanks for making such a simple, useful library!

Makefile install prefix missing for libtoml.pc

It seems the install command for the $(PCFILE) is missing the prefix variable. Is this normal?

install: all
    install -d ${prefix}/include ${prefix}/lib
    install toml.h ${prefix}/include
    install $(LIB) ${prefix}/lib
    install $(LIB_SHARED) ${prefix}/lib
    install $(PCFILE) /usr/lib/pkgconfig # ${prefix} missing here

RFE: slow to large large TOML files

https://gist.githubusercontent.com/feeeper/2197d6d734729625a037af1df14cf2aa/raw/2f22b120e476d897179be3c1e2483d18067aa7df/config.toml

^^ When I try to parse that file with this library, it takes quite a while … the pure-Perl parser TOML::Tiny is actually noticeably faster.

I dug in a bit and see that it’s doing a lot of strchr …

~/code/tomlc99 $ google-pprof --text a.out ./prof.out
Using local file a.out.
Using local file ./prof.out.
Total: 1154 samples
    1120  97.1%  97.1%     1120  97.1% strchr
      12   1.0%  98.1%       12   1.0% memcpy
       7   0.6%  98.7%        7   0.6% next_token
       2   0.2%  98.9%        2   0.2% 0xffff0fe0
       2   0.2%  99.0%        2   0.2% scan_string
       2   0.2%  99.2%        2   0.2% scan_time
       1   0.1%  99.3%        1   0.1% __libc_malloc
       1   0.1%  99.4%        1   0.1% check_key
       1   0.1%  99.5%        1   0.1% normalize_key
       1   0.1%  99.6%        1   0.1% operator delete[]
       1   0.1%  99.7%        1   0.1% parse_select
       1   0.1%  99.7%        1   0.1% set_token
       1   0.1%  99.8%        1   0.1% tcmalloc::CentralFreeList::Populate
       1   0.1%  99.9%        2   0.2% tcmalloc::CentralFreeList::RemoveRange
       1   0.1% 100.0%        1   0.1% toml_array_in
       0   0.0% 100.0%        1   0.1% tcmalloc::CentralFreeList::FetchFromOneSpansSafe
       0   0.0% 100.0%        2   0.2% tcmalloc::ThreadCache::FetchFromCentralCache

Does that maybe offer any hint of how this might be improved?

tomlc99 is much faster than TOML::Tiny for small & medium TOML files; I’d like to see if it can be faster for large ones, too.

Thank you!

Why if(x) in xfree?

free is a no-op if given a NULL pointer. Why is if(x) in xfree? It seems unnecessary.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.