cktan / tomlc99 Goto Github PK

TOML C library

License: Other

Makefile 1.23% Shell 1.78% C 92.63% ReScript 4.36%

toml toml-parser c

tomlc99's Issues

Mixed-type arrays not supported

First of all, thanks for this great library. I'm writing some Kotlin Native bindings for it and so far it's working really well.

I just had a minor issue when writing a test for mixed-type arrays. The library produces an array type mismatch while processing array of values error.

According to the array spec, mixed-type arrays are allowed:

# Mixed-type arrays are allowed
numbers = [ 0.1, 0.2, 0.5, 1, 2, 5 ]
contributors = [
  "Foo Bar <[email protected]>",
  { name = "Baz Qux", email = "[email protected]", url = "https://example.com/bazqux" }
]

This has no impact on my application, as tables (or arrays of tables) work just fine, but it would be nice for supporting 100% of the spec.

BurntSushi Tests Fail due to deprecation in go get

When running test1/build.sh, go outputs the following notice:

$ bash build.sh 
go get: installing executables with 'go get' in module mode is deprecated.
	Use 'go install pkg@version' instead.
	For more information, see https://golang.org/doc/go-get-install-deprecation
	or run 'go help get' or 'go help install'.

As expected, running run.sh after this fails:

$ bash run.sh 
run.sh: line 5: ./toml-test: No such file or directory

Add save functionality

For applications that want to save modified settings to TOML besides just reading them, it would be useful to have something along the lines of toml_dump and toml_dumps.

toml_cat hang

This input hangs toml_cat:

thevoid = [[[[[M]]]]

To reproduce:

Create file with above text
Run toml_cat FILENAME

Hangs indefinitely

It this intended behavior or a bug?

AUR Package

Not an issue persay, I just wanted to let you know that I've packaged tomlc99 and submitted it to the aur. The package name is tomlc99-git.

Thanks for making such a simple, useful library!

bad input file causes segfault

I wrote a little TAP test suite for a project that is importing tomlc99. It attempts to parse the BurntSushi invalid and valid input files. I ran across one input file that caused a segfault. It is table-array-implicit.toml and looks like this:

# This test is a bit tricky. It should fail because the first use of
# `[[albums.songs]]` without first declaring `albums` implies that `albums`
# must be a table. The alternative would be quite weird. Namely, it wouldn't
# comply with the TOML spec: "Each double-bracketed sub-table will belong to 
# the most *recently* defined table element *above* it."
#
# This is in contrast to the *valid* test, table-array-implicit where
# `[[albums.songs]]` works by itself, so long as `[[albums]]` isn't declared
# later. (Although, `[albums]` could be.)
[[albums.songs]]
name = "Glory Days"

[[albums]]
name = "Born in the USA"

Here's a backtrace from gdb

Program received signal SIGSEGV, Segmentation fault.
parse_select (ctx=0x7fffffffd120) at toml.c:1143
1143		if (arr->kind == 0) arr->kind = 't';
(gdb) bt full
#0  parse_select (ctx=0x7fffffffd120) at toml.c:1143
        arr = 0x0
        dest = <optimized out>
        count_lbracket = 2
        z = {tok = <optimized out>, lineno = 13, 
          ptr = 0x60bf5c "albums]]\nname = \"Born in the USA\"\n", len = 6, 
          eof = <optimized out>}
#1  toml_parse (
    conf=conf@entry=0x60bd10 "# This test is a bit tricky. It should fail because the first use of\n# `[[albums.songs]]` without first declaring `albums` implies that `albums`\n# must be a table. The alternative would be quite weird"..., 
    errbuf=errbuf@entry=0x7fffffffd4a0 "", errbufsz=errbufsz@entry=255)
    at toml.c:1258
        tok = {tok = <optimized out>, lineno = <optimized out>, 
          ptr = <optimized out>, len = <optimized out>, eof = 0}
        ctx = {
          start = 0x60bd10 "# This test is a bit tricky. It should fail because the first use of\n# `[[albums.songs]]` without first declaring `albums` implies that `albums`\n# must be a table. The alternative would be quite weird"..., 
          stop = 0x60bf7e "", errbuf = 0x7fffffffd4a0 "", errbufsz = 255, 
          jmp = {{__jmpbuf = {140737488343328, 7620089810268263470, 
                140737488344224, 255, 3001, 6339856, -7620090080383921106, 
                7620089533462025262}, __mask_was_saved = 0, __saved_mask = {
                __val = {0 <repeats 16 times>}}}}, tok = {tok = RBRACKET, 
            lineno = 13, ptr = 0x60bf62 "]]\nname = \"Born in the USA\"\n", 
            len = 1, eof = 0}, root = 0x60a010, curtab = 0x60a010, tpath = {
            top = 0, key = {0x612790 "", 0x6127b0 "P\242`", 0x0, 0x0, 0x0, 
              0x0, 0x0, 0x0, 0x0, 0x0}, tok = {{tok = STRING, lineno = 13, 
                ptr = 0x60bf5c "albums]]\nname = \"Born in the USA\"\n", 
                len = 6, eof = 0}, {tok = STRING, lineno = 10, 
                ptr = 0x60bf3d "songs]]\nname = \"Glory Days\"\n\n[[albums]]\nname = \"Born in the USA\"\n", len = 5, eof = 0}, {tok = INVALID, lineno = 0, 
                ptr = 0x0, len = 0, eof = 0}, {tok = INVALID, lineno = 0, 
                ptr = 0x0, len = 0, eof = 0}, {tok = INVALID, lineno = 0, 
                ptr = 0x0, len = 0, eof = 0}, {tok = INVALID, lineno = 0, 
                ptr = 0x0, len = 0, eof = 0}, {tok = INVALID, lineno = 0, 
                ptr = 0x0, len = 0, eof = 0}, {tok = INVALID, lineno = 0, 
                ptr = 0x0, len = 0, eof = 0}, {tok = INVALID, lineno = 0, 
                ptr = 0x0, len = 0, eof = 0}, {tok = INVALID, lineno = 0, 
                ptr = 0x0, len = 0, eof = 0}}}}
#2  0x0000000000404b4f in toml_parse_file (fp=fp@entry=0x60ae80, 
    errbuf=errbuf@entry=0x7fffffffd4a0 "", errbufsz=errbufsz@entry=255)
    at toml.c:1316
        bufsz = 2000
        buf = 0x60bd10 "# This test is a bit tricky. It should fail because the first use of\n# `[[albums.songs]]` without first declaring `albums` implies that `albums`\n# must be a table. The alternative would be quite weird"...
        off = 622
        ret = <optimized out>
#3  0x0000000000401698 in parse_bad_file (path=<optimized out>, 
    errbuf=0x7fffffffd4a0 "", errsize=255) at toml_tap.c:99
        fp = 0x60ae80
        conf = 0x0
#4  0x000000000040185a in parse_bad_input () at toml_tap.c:156
        errbuf = "\000ine 1: unterminated quote\000key", '\000' <repeats 224 times>
        name = 0x60a06b "table-array-implicit.toml"
        reason = 0x406702 "parses"
        blacklisted = <optimized out>
        pattern = "./BurntSushi_input/invalid/*.toml", '\000' <repeats 1127 times>...
        results = {gl_pathc = 40, gl_pathv = 0x60b170, gl_offs = 0, 
          gl_flags = 256, gl_closedir = 0x0, gl_readdir = 0x0, 
          gl_opendir = 0x0, gl_lstat = 0x0, gl_stat = 0x0}
        i = <optimized out>
#5  0x0000000000401211 in main (argc=<optimized out>, argv=<optimized out>)
    at toml_tap.c:213
No locals.
(gdb)

malloc-free

Hi,

would you be Ok with wrapping all use of malloc into a macro which can be externally defined?
That would be the quickest way of using this lib in a non-malloc environment.

Get line number of toml table or value in file

First off, thank you so much for making this library :)

Is there a way to get the line number of a table within a file? If not, is it a good idea to implement? I'd be happy to do the pull request if it is!

For example, if a config file has an invalid value (like there is a string value that must be one of a set number of values), someone could use this function to print out "invalid value at line X" to help find and fix their config file for their app.

toml_parse_file including outside quotes in value

When parsing a file like this:

[acronym]
aa = "acronym add"
ae = "acronym edit"

[git]
ga = "git add -A"
gC = "git commit"
gc = "git commit -m"

The values in the tables acronym and git still have the outer double quotes. Is there something I'm missing when parsing files? The following except is the code I'm running.

FILE *fp = fopen(toml_fname, "r");
if (!fp)
    return 0;
char errbuf[128];
toml_table_t *t = toml_parse_file(fp, errbuf, sizeof(errbuf));

Type checking

Looking at the API I don't think this library actually knows what type a value in a table is. I'd like to iterate over all keys in a table and determine their type directly. This isn't possible right now, is it?

toml 0.5.0 support

A newer version of the TOML spec (0.5.0) has been released. Do you intend to update tomlc99 at some point?

Request - include print capability in core library

I'm using your library to parse TOML without trouble, thanks.

I notice toml_cat.c contains logic for printing a parsed TOML file. Could you add that functionality into toml.c/h so that printing to a file pointer is supported?

My app will read and write the config file, so the ability to write is needed. But you have 95% of what is needed already.

some BurntSushi invalid input files parse without error

The following "invalid" test input files are parsed without error from tomlc99:

datetime-malformed-no-leads.toml

no-leads = 1987-7-05T17:45:00Z

datetime-malformed-no-secs.toml

no-secs = 1987-07-05T17:45Z

datetime-malformed-no-t.toml

no-t = 1987-07-0517:45:00Z

datetime-malformed-with-milli.toml

with-milli = 1987-07-5T17:45:00.12Z

float-no-leading-zero.toml

answer = .12345
neganswer = -.12345

float-no-trailing-digits.toml

answer = 1.
neganswer = -1.

How to free string and timestamp return by this lib?

struct toml_datum_t {
  int ok;
  union {
    toml_timestamp_t *ts; /* ts must be freed after use */
    char *s;              /* string value. s must be freed after use */
    int b;                /* bool value */
    int64_t i;            /* int value */
    double d;             /* double value */
  } u;
};

It seems that s and ts should be freed by the user. But xfree is internal, toml_free(toml_table_t *tab) only for table. Then how can I free these returned value?

Also, it seems the returned value of toml_array_in and toml_table_in need not to be freed?

toml.c and toml.h contain a mix of tabs and spaces

unfortunately, this causes some challenge in viewing the files.

TOML encode with TOMLC99

Hi.
I was looking for tools to use TOML in C and found something like this.
It seems that TOML can be read, but if TOML can be written, what can be done in C?

Re-defined strdup in toml.c

Hi!

I've included this library in a project and when trying to compile in a somewhat outdated OS it shows the following:

make[2]:
../src/config/toml.c:73: warning: "strdup" redefined
   73 | #define strdup(x) error:do-not-use---use-STRDUP-instead
      | 
In file included from /usr/include/string.h:630,
                 from ../src/config/toml.c:39:
/usr/include/bits/string2.h:1291: note: this is the location of the previous definition
 1291 | #   define strdup(s) __strdup (s)

Since it seems this is a system header, perhaps this solution would be a good workaround:

#undef strdup
#define strdup(x)	error:do-not-use---use-STRDUP-instead
static char* STRDUP(const char* s)
{
	int len = strlen(s);
	char* p = MALLOC(len+1);
	if (p) {
		memcpy(p, s, len);
		p[len] = 0;
	}
	return p;
}

#undef strndup
#define strndup(x)	error:do-not-use---use-STRNDUP-instead
static char* STRNDUP(const char* s, size_t n)
{
	size_t len = strnlen(s, n);
	char* p = MALLOC(len+1);
	if (p) {
		memcpy(p, s, len);
		p[len] = 0;
	}
	return p;
}

Value of “0000000” isn’t caught as invalid syntax.

The spec says “Leading zeros are not allowed.” But if I create a file fg.toml:

SSL_DEFAULT_KEY_TYPE='system'
STARTDATE=0000000000

… and feed it to toml_cat, I get:

> ./toml_cat fg.toml
{
  SSL_DEFAULT_KEY_TYPE = "system",
ERROR: unable to decode value in table

Why if(x) in xfree?

free is a no-op if given a NULL pointer. Why is if(x) in xfree? It seems unnecessary.

MSVC W4 warnings and strndup not found

When compiling toml.c on windows x64 Microsoft Visual Studios Compiler on warning level 4 (/W4) there are a lot of implicit cast warnings. (int64 or int not explicitly cast to char, etc.).

Another unrelated topic: windows standard libraries do not seem to have a definition for strndup. It's possible I'm missing something here though?

suggestion

Are there any plans to develop an interface that writes back to the file after modification of the toml file configuration?

Writing to TOML file?

Hello, where could I find examples of writing to a TOML file with this library?

question: reason for "len" in toml_utf8_to_ucs?

What is the reason for the "len" variable in toml_utf8_to_ucs? Is it to prevent parsing beyond the null terminator? Thanks.

toml-spec-tests/values/qa-array-inline-nested-1000 and qa-table-inline-nested-1000 are broken on master

tomlc99 at cdbb9de. toml-spec-tests at 64effd796

make; cd test2; bash build.sh; bash run.sh|grep -v OK
cc -std=c99 -Wall -Wextra -fpic -O2 -DNDEBUG   -c -o toml.o toml.c
ar -rcs libtoml.a toml.o
cc -shared -o libtoml.so toml.o
cc -std=c99 -Wall -Wextra -fpic -O2 -DNDEBUG    toml_json.c libtoml.a   -o toml_json
cc -std=c99 -Wall -Wextra -fpic -O2 -DNDEBUG    toml_cat.c libtoml.a   -o toml_cat
Cloning into 'toml-spec-tests'...
remote: Enumerating objects: 119, done.
remote: Counting objects: 100% (119/119), done.
remote: Compressing objects: 100% (102/102), done.
remote: Total 402 (delta 17), reused 112 (delta 11), pack-reused 283
Receiving objects: 100% (402/402), 98.65 KiB | 0 bytes/s, done.
Resolving deltas: 100% (37/37), done.
toml-spec-tests/values/qa-array-inline-nested-1000  ... [FAILED]
toml-spec-tests/values/qa-table-inline-nested-1000  ... [FAILED]

millisecond wrongly read as octal value if the first digit il '0'

This is a result of how strtol works if the conversion base is indicated as 0. I think it should be forced as 10 in that case

toml_utf8_to_ucs() returns incorrect results

We have tomlc99 included in another project that monitors test coverage as part of its continuous integration testing. toml_ucs_to_utf8() and toml_utf8_to_ucs() were flagged as uncovered, so I thought I'd create some simple unit tests using libtap

I cross checked multibyte values obtained from this online conversion tool. No probs with toml_ucs_to_utf8():

    /* Check boundary values against results from:
     *   http://www.ltg.ed.ac.uk/~richard/utf-8.cgi
     */

    ok (toml_ucs_to_utf8 (0x80, buf) == 2 && !memcmp (buf, "\xc2\x80", 2),
        "ucs_to_utf8: 0x80 converted to 2-char UTF8");
    ok (toml_ucs_to_utf8 (0x7ff, buf) == 2 && !memcmp (buf, "\xdf\xbf", 2),
        "ucs_to_utf8: 0x7ff converted to 2-char UTF8");

    ok (toml_ucs_to_utf8 (0x800, buf) == 3 && !memcmp (buf, "\xe0\xa0\x80", 3),
        "ucs_to_utf8: 0x800 converted to 3-char UTF8");
    ok (toml_ucs_to_utf8 (0xfffd, buf) == 3 && !memcmp (buf, "\xef\xbf\xbd", 3),
        "ucs_to_utf8: 0xfffd converted to 3-char UTF8");

    ok (toml_ucs_to_utf8 (0x10000, buf) == 4
        && !memcmp (buf, "\xf0\x90\x80\x80", 4),
        "ucs_to_utf8: 0x10000 converted to 4-char UTF8");
    ok (toml_ucs_to_utf8 (0x1fffff, buf) == 4
        && !memcmp (buf, "\xf7\xbf\xbf\xbf", 4),
        "ucs_to_utf8: 0x1ffff converted to 4-char UTF8");

    ok (toml_ucs_to_utf8 (0x200000, buf) == 5
        && !memcmp (buf, "\xf8\x88\x80\x80\x80", 5),
        "ucs_to_utf8: 0x200000 converted to 5-char UTF8");
    ok (toml_ucs_to_utf8 (0x3ffffff, buf) == 5
        && !memcmp (buf, "\xfb\xbf\xbf\xbf\xbf", 5),
        "ucs_to_utf8: 0x3ffffff converted to 5-char UTF8");

    ok (toml_ucs_to_utf8 (0x4000000, buf) == 6
        && !memcmp (buf, "\xfc\x84\x80\x80\x80\x80", 6),
        "ucs_to_utf8: 0x4000000 converted to 6-char UTF8");
    ok (toml_ucs_to_utf8 (0x7fffffff, buf) == 6
        && !memcmp (buf, "\xfd\xbf\xbf\xbf\xbf\xbf", 6),
        "ucs_to_utf8: 0x7fffffff converted to 6-char UTF8");

However, attempting to reverse these values through toml_utf8_to_ucs() failed in all multibyte cases:

    ok (toml_utf8_to_ucs ("\xc2\x80", 2, &code) == 2 && code == 0x80,
        "utf8_to_ucs: 0x80 converted from 2-char UTF8");
    ok (toml_utf8_to_ucs ("\xdf\xbf", 2, &code) == 2 && code == 0x7ff,
        "utf8_to_ucs: 0x7ff converted from 2-char UTF8");

    ok (toml_utf8_to_ucs ("\xe0\xa0\x80", 3, &code) == 3 && code == 0x800,
        "utf8_to_ucs: 0x800 converted from 3-char UTF8");
    ok (toml_utf8_to_ucs ("\xef\xbf\xbd", 3, &code) == 3 && code == 0xfffd,
        "utf8_to_ucs: 0xfffd converted from 3-char UTF8");

    ok (toml_utf8_to_ucs ("\xf0\x90\x80\x80", 4, &code) == 4 && code == 0x10000,
        "utf8_to_ucs: 0x10000 converted from 4-char UTF8");
    ok (toml_utf8_to_ucs ("\xf0\x90\x80\x80", 4, &code) == 4 && code == 0x10000,
        "utf8_to_ucs: 0x10000 converted from 4-char UTF8");

    ok (toml_utf8_to_ucs ("\xf8\x88\x80\x80\x80", 5, &code) == 5 && code == 0x200000,
        "utf8_to_ucs: 0x200000 converted from 5-char UTF8");
    ok (toml_utf8_to_ucs ("\xfb\xbf\xbf\xbf\xbf", 5, &code) == 5 && code == 0x3ffffff,
        "utf8_to_ucs: 0x3ffffff converted from 5-char UTF8");

    ok (toml_utf8_to_ucs ("\xfc\x84\x80\x80\x80\x80", 6, &code) == 6 && code == 0x4000000,
        "utf8_to_ucs: 0x4000000 converted from 6-char UTF8");
    ok (toml_utf8_to_ucs ("\xfd\xbf\xbf\xbf\xbf\xbf", 6, &code) == 6 && code == 0x7fffffff,
        "utf8_to_ucs: 0x7fffffff converted from 6-char UTF8");

I may just go ahead and comment out the function in our copy since we are not using it, but thought you might want to be aware of the results.

Crash with malformed input

Hello,

The following input file (as produced by AFL) causes a NULL pointer derefence in libtomlc99:

[0]
0.0=[]
+.

For example with tomlcat:

% ./toml_cat ~/fuzzing/sessions/tomlc99/crash.toml                                                                                                                          [*][master]
AddressSanitizer:DEADLYSIGNAL
=================================================================
==1218247==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7efe0688e07b bp 0x7ffdcd28b3b0 sp 0x7ffdcd28ab40 T0)
==1218247==The signal is caused by a READ memory access.
==1218247==Hint: address points to the zero page.
    #0 0x7efe0688e07b in __interceptor_strcmp ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:442
    #1 0x5581a2f29478 in toml_table_in /home/kali/fuzzing/tomlc99/toml.c:1772
    #2 0x5581a2f2c2bf in parse_keyval /home/kali/fuzzing/tomlc99/toml.c:1066
    #3 0x5581a2f2d5ec in toml_parse /home/kali/fuzzing/tomlc99/toml.c:1378
    #4 0x5581a2f2f13d in toml_parse_file /home/kali/fuzzing/tomlc99/toml.c:1459
    #5 0x5581a2f24a63 in cat /home/kali/fuzzing/tomlc99/toml_cat.c:158
    #6 0x5581a2f24417 in main /home/kali/fuzzing/tomlc99/toml_cat.c:188
    #7 0x7efe06666d09 in __libc_start_main ../csu/libc-start.c:308
    #8 0x5581a2f24579 in _start (/home/kali/fuzzing/tomlc99/toml_cat+0x4579)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:442 in __interceptor_strcmp
==1218247==ABORTING

or without ASAN in GDB:

Program received signal SIGSEGV, Segmentation fault.
[----------------------------------registers-----------------------------------]
RAX: 0x940 ('@\t')
RBX: 0x55555555f960 --> 0x55555555f980 --> 0x55555555f940 --> 0x30 ('0')
RCX: 0x0
RDX: 0x0
RSI: 0x55555555f940 --> 0x30 ('0')
RDI: 0x0
RBP: 0x0
RSP: 0x7fffffffe358 --> 0x5555555573e0 (<toml_table_in+64>:     test   eax,eax)
RIP: 0x7ffff7f3f00a (<__strcmp_avx2+26>:        vmovdqu ymm1,YMMWORD PTR [rdi])
R8 : 0x0
R9 : 0x7fffffffe190 --> 0x0
R10: 0xfffffffffffff287
R11: 0x7ffff7f3eff0 (<__strcmp_avx2>:   mov    eax,edi)
R12: 0x55555555f980 --> 0x55555555f940 --> 0x30 ('0')
R13: 0x55555555f968 --> 0x0
R14: 0x55555555e48b --> 0xa2e2b ('+.\n')
R15: 0x1
EFLAGS: 0x10287 (CARRY PARITY adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x7ffff7f3effa <__strcmp_avx2+10>:   and    eax,0xfff
   0x7ffff7f3efff <__strcmp_avx2+15>:   cmp    eax,0xf80
   0x7ffff7f3f004 <__strcmp_avx2+20>:   jg     0x7ffff7f3f360 <__strcmp_avx2+880>
=> 0x7ffff7f3f00a <__strcmp_avx2+26>:   vmovdqu ymm1,YMMWORD PTR [rdi]
   0x7ffff7f3f00e <__strcmp_avx2+30>:   vpcmpeqb ymm0,ymm1,YMMWORD PTR [rsi]
   0x7ffff7f3f012 <__strcmp_avx2+34>:   vpminub ymm0,ymm0,ymm1
   0x7ffff7f3f016 <__strcmp_avx2+38>:   vpcmpeqb ymm0,ymm0,ymm7
   0x7ffff7f3f01a <__strcmp_avx2+42>:   vpmovmskb ecx,ymm0
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe358 --> 0x5555555573e0 (<toml_table_in+64>:    test   eax,eax)
0008| 0x7fffffffe360 --> 0x7fffffffe430 --> 0x55555555e480 ("[0]\n0.0=[]\n+.\n")
0016| 0x7fffffffe368 --> 0x7fffffffe430 --> 0x55555555e480 ("[0]\n0.0=[]\n+.\n")
0024| 0x7fffffffe370 --> 0x7fffffffe430 --> 0x55555555e480 ("[0]\n0.0=[]\n+.\n")
0032| 0x7fffffffe378 --> 0x55555555f900 --> 0x55555555f8c0 --> 0x30 ('0')
0040| 0x7fffffffe380 --> 0x3
0048| 0x7fffffffe388 --> 0x555555558299 (<parse_keyval+217>:    mov    r8,QWORD PTR [rsp+0x8])
0056| 0x7fffffffe390 --> 0x55555555f8c0 --> 0x30 ('0')
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
__strcmp_avx2 () at ../sysdeps/x86_64/multiarch/strcmp-avx2.S:101

This can be prevented by checking if key == NULL in toml_table_in() before attempting to dereference it.

RFE: slow to large large TOML files

https://gist.githubusercontent.com/feeeper/2197d6d734729625a037af1df14cf2aa/raw/2f22b120e476d897179be3c1e2483d18067aa7df/config.toml

^^ When I try to parse that file with this library, it takes quite a while … the pure-Perl parser TOML::Tiny is actually noticeably faster.

I dug in a bit and see that it’s doing a lot of strchr …

~/code/tomlc99 $ google-pprof --text a.out ./prof.out
Using local file a.out.
Using local file ./prof.out.
Total: 1154 samples
    1120  97.1%  97.1%     1120  97.1% strchr
      12   1.0%  98.1%       12   1.0% memcpy
       7   0.6%  98.7%        7   0.6% next_token
       2   0.2%  98.9%        2   0.2% 0xffff0fe0
       2   0.2%  99.0%        2   0.2% scan_string
       2   0.2%  99.2%        2   0.2% scan_time
       1   0.1%  99.3%        1   0.1% __libc_malloc
       1   0.1%  99.4%        1   0.1% check_key
       1   0.1%  99.5%        1   0.1% normalize_key
       1   0.1%  99.6%        1   0.1% operator delete[]
       1   0.1%  99.7%        1   0.1% parse_select
       1   0.1%  99.7%        1   0.1% set_token
       1   0.1%  99.8%        1   0.1% tcmalloc::CentralFreeList::Populate
       1   0.1%  99.9%        2   0.2% tcmalloc::CentralFreeList::RemoveRange
       1   0.1% 100.0%        1   0.1% toml_array_in
       0   0.0% 100.0%        1   0.1% tcmalloc::CentralFreeList::FetchFromOneSpansSafe
       0   0.0% 100.0%        2   0.2% tcmalloc::ThreadCache::FetchFromCentralCache

Does that maybe offer any hint of how this might be improved?

tomlc99 is much faster than TOML::Tiny for small & medium TOML files; I’d like to see if it can be faster for large ones, too.

Thank you!

Makefile install prefix missing for libtoml.pc

It seems the install command for the $(PCFILE) is missing the prefix variable. Is this normal?

install: all
    install -d ${prefix}/include ${prefix}/lib
    install toml.h ${prefix}/include
    install $(LIB) ${prefix}/lib
    install $(LIB_SHARED) ${prefix}/lib
    install $(PCFILE) /usr/lib/pkgconfig # ${prefix} missing here

Memory leak in `toml_rtos`

There seems to be a double allocation going on here:

https://github.com/cktan/tomlc99/blob/master/toml.c#L2123-L2128

Memory is first being allocated for *ret, but then *ret is immediately overwritten with the allocation from norm_lit_str (which always allocates) without the initial memory being freed.

The *ret = STRNDUP(sp, sq - sp); line should be removed since it doesn't seem to be doing anything.

Better support for nostdlib (no Glibc) environments

Thank you @cktan for the great TOML library!

We would like to use your TOML parser to parse configuration files in the Graphene framework (https://github.com/oscarlab/graphene). However, Graphene is a very low-level Linux emulation framework and thus sits below the Standard C Library (Glibc/Musl) and is built with -nostdlib .

Graphene provides some utility functions like malloc/calloc/memcpy/free/strcmp and so on (basically a small subset of memory management and string utilities). However, your TOML implementation currently relies on:

ctype.h functions: toupper(), isalnum(), isdigit(). These can be easily replaced with actual implementations.
strtoll() -- on 64-bit systems, this is just the alias to strtol(), easy via macros.
realloc() -- Graphene doesn't have realoc (complicated reasons...), but your TOML code seems to use the same simple extend-memory pattern, so easy to replace with malloc + memcpy + free.
setjmp()/longjmp() -- Graphene doesn't have these functions. These are the hardest ones to change, it took me significant modifications of your code to revert to the "classical C" style of propagating error codes to the caller.

I forked your TOML repo and introduced the changes described above: https://github.com/dimakuv/tomlc99/commits/master.

With these changes, your TOML library becomes very self-contained and requires only a handful of standard functions. You can review the changes via my link above.

I wonder if this is something you'd be willing to merge? We in the Graphene community would be happy to download-and-use the master of your repository (maybe with minimal patching). But right now the changes are significant. I could prepare a sequence of PRs for merge if you think my changes are reasonable.

toml encoding?

is there any plan in regards to adding toml encoding?
this appears to be the most up to date library, and I'd love to continue using it :D

use after free in e_key_exists_error()

After updating our project's copy of tomlc99 to current master, we noticed some use-after-free type errors when running burntsushi invalid inputs through a unit test that was run under valgrind and asan. It looks like the newkey arg is freed, then passed to e_key_exists_error() in several places.

I'll put a PR up shortly for a fix.

tl;dr:
flux-framework/flux-security#98
flux-framework/flux-core#2630

int64_t not found

Building with clang on amd64. I had to include stdint.h before including it. int64_t is not a built-in typedef.

clang --std=gnu99 does not change this.

Getting strings prints quotation marks

I've got something like this :

toml_array_t* profileNames = toml_array_in(table, "profileNames");
const char* nameOne = toml_raw_at(profileNames, 0);

Where profile names is :

profileNames = [
    "personal",
    "work"
]

In this case when I print out nameOne I get "personal" when I would expect personal. It seems like when getting a string literal it should not include the first and last quotation marks.

Could you give a example on iterate over a conf who's context not known

Missing Values

Hello,

I have the following .toml file:

restart = true

with a key named restart that must have a boolean value. Also, it should be legal to omit this value from the config file, i.e. it is optional.

I use the following code to read the value:

toml_datum_t datum = toml_bool_in(root_table, "restart");
if (!datum.ok) {
    printf("Key 'restart' is either missing or not a boolean.\n");
} else {
    // do something
}

Is there a way to distinguish between a value missing in the config file or having the incorrect type? E.g. when the config file would look like this:

restart = "a string instead of a boolean"

Thanks a lot!

toml_datum_t distinguish between non-present keys and invalid conversions

Hi!
I updated the the library in my project, which was using toml_raw_t. I noticed that was now deprecated in favor of the toml_datum_t interface.

However, in my case I assume missing keys in the configuration are fine (the configuration file does not need to be complete), but type conversion errors are reported to the user. However, when using toml_datum_t, I can not distinguish between a missing key or a type conversion problem. Is there a canonical way to check if a key is present in a table with the new interface, or do i have to resort to using toml_raw_in(..) != NULL?

Thanks in advance!

cktan / tomlc99 Goto Github PK

tomlc99's Issues

Recommend Projects

Recommend Topics

Recommend Org