Code Monkey home page Code Monkey logo

rapidyaml's Introduction

Rapid YAML

MIT Licensed release Documentation Status

PyPI Gitter

Codecov

Or ryml, for short. ryml is a C++ library to parse and emit YAML, and do it fast, on everything from x64 to bare-metal chips without operating system. (If you are looking to use your programs with a YAML tree as a configuration tree with override facilities, take a look at c4conf).

ryml parses both read-only and in-situ source buffers; the resulting data nodes hold only views to sub-ranges of the source buffer. No string copies or duplications are done, and no virtual functions are used. The data tree is a flat index-based structure stored in a single array. Serialization happens only at your direct request, after parsing / before emitting. Internally, the data tree representation stores only string views and has no knowledge of types, but of course, every node can have a YAML type tag. ryml makes it easy and fast to read and modify the data tree.

ryml is available as a single header file, or it can be used as a simple library with cmake -- both separately (ie build->install->find_package()) or together with your project (ie with add_subdirectory()). (See below for examples).

ryml can use custom global and per-tree memory allocators and error handler callbacks, and is exception-agnostic. ryml provides a default implementation for the allocator (using std::malloc()) and error handlers (using using either exceptions, longjmp() or std::abort()), but you can opt out and provide your own memory allocation and eg, exception-throwing callbacks.

ryml does not depend on the STL, ie, it does not use any std container as part of its data structures), but it can serialize and deserialize these containers into the data tree, with the use of optional headers. ryml ships with c4core, a small C++ utilities multiplatform library.

ryml is written in C++11, and compiles cleanly with:

  • Visual Studio 2015 and later
  • clang++ 3.9 and later
  • g++ 4.8 and later
  • Intel Compiler

ryml's API documentation is available at ReadTheDocs.

ryml is extensively unit-tested in Linux, Windows and MacOS. The tests cover x64, x86, wasm (emscripten), arm, aarch64, ppc64le and s390x architectures, and include analysing ryml with:

  • valgrind
  • clang-tidy
  • gcc/clang sanitizers:
    • memory
    • address
    • undefined behavior

ryml also runs in bare-metal, and RISC-V architectures. Both of these are pending implementation of CI actions for continuous validation, but ryml has been proven to work there.

ryml is available in Python, and can very easily be compiled to JavaScript through emscripten (see below).

See also the changelog and the roadmap.


Table of contents


Is it rapid?

You bet! On a i7-6800K CPU @3.40GHz:

  • ryml parses YAML at about ~150MB/s on Linux and ~100MB/s on Windows (vs2017).
  • ryml parses JSON at about ~450MB/s on Linux, faster than sajson (didn't try yet on Windows).
  • compared against the other existing YAML libraries for C/C++:
    • ryml is in general between 2 and 3 times faster than libyaml
    • ryml is in general between 10 and 70 times faster than yaml-cpp, and in some cases as much as 100x and even 200x faster.

Here's the benchmark. Using different approaches within ryml (in-situ/read-only vs. with/without reuse), a YAML / JSON buffer is repeatedly parsed, and compared against other libraries.

Comparison with yaml-cpp

The first result set is for Windows, and is using a appveyor.yml config file. A comparison of these results is summarized on the table below:

Read rates (MB/s) ryml yamlcpp compared
appveyor / vs2017 / Release 101.5 5.3 20x / 5.2%
appveyor / vs2017 / Debug 6.4 0.0844 76x / 1.3%

The next set of results is taken in Linux, comparing g++ 8.2 and clang++ 7.0.1 in parsing a YAML buffer from a travis.yml config file or a JSON buffer from a compile_commands.json file. You can see the full results here. Summarizing:

Read rates (MB/s) ryml yamlcpp compared
json / clang++ / Release 453.5 15.1 30x / 3%
json / g++ / Release 430.5 16.3 26x / 4%
json / clang++ / Debug 61.9 1.63 38x / 3%
json / g++ / Debug 72.6 1.53 47x / 2%
travis / clang++ / Release 131.6 8.08 16x / 6%
travis / g++ / Release 176.4 8.23 21x / 5%
travis / clang++ / Debug 10.2 1.08 9x / 1%
travis / g++ / Debug 12.5 1.01 12x / 8%

The 450MB/s read rate for JSON puts ryml squarely in the same ballpark as RapidJSON and other fast json readers (data from here). Even parsing full YAML is at ~150MB/s, which is still in that performance ballpark, albeit at its lower end. This is something to be proud of, as the YAML specification is much more complex than JSON: 23449 vs 1969 words.

Performance reading JSON

So how does ryml compare against other JSON readers? Well, it's one of the fastest!

The benchmark is the same as above, and it is reading the compile_commands.json, The _arena suffix notes parsing a read-only buffer (so buffer copies are performed), while the _inplace suffix means that the source buffer can be parsed in place. The _reuse means the data tree and/or parser are reused on each benchmark repeat.

Here's what we get with g++ 8.2:

Benchmark Release,MB/s Debug,MB/s
rapidjson_arena 509.9 43.4
rapidjson_inplace 1329.4 68.2
sajson_inplace 434.2 176.5
sajson_arena 430.7 175.6
jsoncpp_arena 183.6 ? 187.9
nlohmann_json_arena 115.8 21.5
yamlcpp_arena 16.6 1.6
libyaml_arena 113.9 35.7
libyaml_arena_reuse 114.6 35.9
ryml_arena 388.6 36.9
ryml_inplace 393.7 36.9
ryml_arena_reuse 446.2 74.6
ryml_inplace_reuse 457.1 74.9

You can verify that (at least for this test) ryml beats most json parsers at their own game, with the only exception of rapidjson. And actually, in Debug, rapidjson is slower than ryml, and sajson manages to be faster (but not sure about jsoncpp; need to scrutinize there the suspicious fact that the Debug result is faster than the Release result).

Performance emitting

Emitting benchmarks also show similar speedups from the existing libraries, also anecdotally reported by some users (eg, here's a user reporting 25x speedup from yaml-cpp). Also, in some cases (eg, block folded multiline scalars), the speedup is as high as 200x (eg, 7.3MB/s -> 1.416MG/s).

CI results and request for files

While a more effective way of showing the benchmark results is not available yet, you can browse through the runs of the benchmark workflow in the CI to scroll through the results for yourself.

Also, if you have a case where ryml behaves very nicely or not as nicely as claimed above, we would definitely like to see it! Please open an issue, or submit a pull request adding the file to bm/cases, or just send us the files.


Quick start

If you're wondering whether ryml's speed comes at a usage cost, you need not: with ryml, you can have your cake and eat it too. Being rapid is definitely NOT the same as being unpractical, so ryml was written with easy AND efficient usage in mind, and comes with a two level API for accessing and traversing the data tree.

The following snippet is a very quick overview taken from quickstart sample (see on doxygen/see on github. After cloning ryml (don't forget the --recursive flag for git), you can very easily build and run this executable using any of the build samples, eg the add_subdirectory() sample (see the relevant section).

// Parse YAML code in place, potentially mutating the buffer:
char yml_buf[] = "{foo: 1, bar: [2, 3], john: doe}";
ryml::Tree tree = ryml::parse_in_place(yml_buf);

// ryml has a two-level API:
//
// The lower level index API is based on the indices of nodes,
// where the node's id is the node's position in the tree's data
// array. This API is very efficient, but somewhat difficult to use:
size_t root_id = tree.root_id();
size_t bar_id = tree.find_child(root_id, "bar"); // need to get the index right
CHECK(tree.is_map(root_id)); // all of the index methods are in the tree
CHECK(tree.is_seq(bar_id));  // ... and receive the subject index

// The node API is a lightweight abstraction sitting on top of the
// index API, but offering a much more convenient interaction:
ryml::ConstNodeRef root = tree.rootref();  // a const node reference
ryml::ConstNodeRef bar = tree["bar"];
CHECK(root.is_map());
CHECK(bar.is_seq());

// The resulting tree stores only string views to the YAML source buffer.
CHECK(root["foo"] == "1");
CHECK(root["foo"].key().str == yml_buf + 1);
CHECK(bar[0] == "2");
CHECK(root["john"] == "doe");

//------------------------------------------------------------------
// To get actual values, you need to deserialize the nodes.
// Deserializing: use operator>>
{
    int foo = 0, bar0 = 0, bar1 = 0;
    std::string john_str;
    std::string bar_str;
    root["foo"] >> foo;
    root["bar"][0] >> bar0;
    root["bar"][1] >> bar1;
    root["john"] >> john_str; // requires from_chars(std::string). see API doc.
    root["bar"] >> ryml::key(bar_str); // to deserialize the key, use the tag function ryml::key()
    CHECK(foo == 1);
    CHECK(bar0 == 2);
    CHECK(bar1 == 3);
    CHECK(john_str == "doe");
    CHECK(bar_str == "bar");
}

//------------------------------------------------------------------
// To modify existing nodes, use operator= or operator<<.

// operator= assigns an existing string to the receiving node.
// The contents are NOT copied, and this pointer will be in effect
// until the tree goes out of scope! So BEWARE to only assign from
// strings outliving the tree.
wroot["foo"] = "says you";
wroot["bar"][0] = "-2";
wroot["bar"][1] = "-3";
wroot["john"] = "ron";
// Now the tree is _pointing_ at the memory of the strings above.
// In this case it is OK because those are static strings and will
// outlive the tree.
CHECK(root["foo"].val() == "says you");
CHECK(root["bar"][0].val() == "-2");
CHECK(root["bar"][1].val() == "-3");
CHECK(root["john"].val() == "ron");
// But WATCHOUT: do not assign from temporary objects:
// {
//     std::string crash("will dangle");
//     root["john"] = ryml::to_csubstr(crash);
// }
// CHECK(root["john"] == "dangling"); // CRASH! the string was deallocated

// operator<< first serializes the input to the tree's arena, then
// assigns the serialized string to the receiving node. This avoids
// constraints with the lifetime, since the arena lives with the tree.
CHECK(tree.arena().empty());
wroot["foo"] << "says who";  // requires to_chars(). see serialization samples below.
wroot["bar"][0] << 20;
wroot["bar"][1] << 30;
wroot["john"] << "deere";
CHECK(root["foo"].val() == "says who");
CHECK(root["bar"][0].val() == "20");
CHECK(root["bar"][1].val() == "30");
CHECK(root["john"].val() == "deere");
CHECK(tree.arena() == "says who2030deere"); // the result of serializations to the tree arena


//------------------------------------------------------------------
// Adding new nodes:

// adding a keyval node to a map:
CHECK(root.num_children() == 5);
wroot["newkeyval"] = "shiny and new"; // using these strings
wroot.append_child() << ryml::key("newkeyval (serialized)") << "shiny and new (serialized)"; // serializes and assigns the serialization
CHECK(root.num_children() == 7);
CHECK(root["newkeyval"].key() == "newkeyval");
CHECK(root["newkeyval"].val() == "shiny and new");
CHECK(root["newkeyval (serialized)"].key() == "newkeyval (serialized)");
CHECK(root["newkeyval (serialized)"].val() == "shiny and new (serialized)");


//------------------------------------------------------------------
// Emitting:

ryml::csubstr expected_result = R"(foo: says who
bar:
- 20
- 30
- oh so nice
- oh so nice (serialized)
john: in_scope
float: 2.4
digits: 2.400000
newkeyval: shiny and new
newkeyval (serialized): shiny and new (serialized)
newseq: []
newseq (serialized): []
newmap: {}
newmap (serialized): {}
I am something: indeed
)";

// emit to a FILE*
ryml::emit_yaml(tree, stdout);
// emit to a stream
std::stringstream ss;
ss << tree;
std::string stream_result = ss.str();
// emit to a buffer:
std::string str_result = ryml::emitrs_yaml<std::string>(tree);
// can emit to any given buffer:
char buf[1024];
ryml::csubstr buf_result = ryml::emit_yaml(tree, buf);
// now check
CHECK(buf_result == expected_result);
CHECK(str_result == expected_result);
CHECK(stream_result == expected_result);

//------------------------------------------------------------------
// UTF8
ryml::Tree langs = ryml::parse_in_arena(R"(
en: Planet (Gas)
fr: Planète (Gazeuse)
ru: Планета (Газ)
ja: 惑星(ガス)
zh: 行星(气体)
# UTF8 decoding only happens in double-quoted strings,
# as per the YAML standard
decode this: "\u263A \xE2\x98\xBA"
and this as well: "\u2705 \U0001D11E"
not decoded: '\u263A \xE2\x98\xBA'
neither this: '\u2705 \U0001D11E'
)");
// in-place UTF8 just works:
CHECK(langs["en"].val() == "Planet (Gas)");
CHECK(langs["fr"].val() == "Planète (Gazeuse)");
CHECK(langs["ru"].val() == "Планета (Газ)");
CHECK(langs["ja"].val() == "惑星(ガス)");
CHECK(langs["zh"].val() == "行星(气体)");
// and \x \u \U codepoints are decoded, but only when they appear
// inside double-quoted strings, as dictated by the YAML
// standard:
CHECK(langs["decode this"].val() == "☺ ☺");
CHECK(langs["and this as well"].val() == "✅ 𝄞");
CHECK(langs["not decoded"].val() == "\\u263A \\xE2\\x98\\xBA");
CHECK(langs["neither this"].val() == "\\u2705 \\U0001D11E");


//------------------------------------------------------------------
// Getting the location of nodes in the source:
//
// Location tracking is opt-in:
ryml::Parser parser(ryml::ParserOptions().locations(true));
// Now the parser will start by building the accelerator structure:
ryml::Tree tree2 = parser.parse_in_arena("expected.yml", expected_result);
// ... and use it when querying
ryml::Location loc = parser.location(tree2["bar"][1]);
CHECK(parser.location_contents(loc).begins_with("30"));
CHECK(loc.line == 3u);
CHECK(loc.col == 4u);

Using ryml in your project

Package managers

ryml is available in most package managers (thanks to all the contributors!) and linux distributions. But please be aware: those packages are maintained downstream of this repository, so if you have issues with the package, file a report with the respective maintainer.

Here's a quick roundup (not maintained):

Although package managers are very useful for quickly getting up to speed, the advised way is still to bring ryml as a submodule of your project, building both together. This makes it easy to track any upstream changes in ryml. Also, ryml is small and quick to build, so there's not much of a cost for building it with your project.

Single header file

ryml is provided chiefly as a cmake library project, but it can also be used as a single header file, and there is a tool to amalgamate the code into a single header file. The amalgamated header file is provided with each release, but you can also generate a customized file suiting your particular needs (or commit):

[user@host rapidyaml]$ python3 tools/amalgamate.py -h
usage: amalgamate.py [-h] [--c4core | --no-c4core] [--fastfloat | --no-fastfloat] [--stl | --no-stl] [output]

positional arguments:
  output          output file. defaults to stdout

optional arguments:
  -h, --help      show this help message and exit
  --c4core        amalgamate c4core together with ryml. this is the default.
  --no-c4core     amalgamate c4core together with ryml. the default is --c4core.
  --fastfloat     enable fastfloat library. this is the default.
  --no-fastfloat  enable fastfloat library. the default is --fastfloat.
  --stl           enable stl interop. this is the default.
  --no-stl        enable stl interop. the default is --stl.

The amalgamated header file contains all the function declarations and definitions. To use it in the project, #include the header at will in any header or source file in the project, but in one source file, and only in that one source file, #define the macro RYML_SINGLE_HDR_DEFINE_NOW before including the header. This will enable the function definitions. For example:

// foo.h
#include <ryml_all.hpp>

// foo.cpp
// ensure that foo.h is not included before this define!
#define RYML_SINGLE_HDR_DEFINE_NOW
#include <ryml_all.hpp>

If you wish to package the single header into a shared library, then you will need to define the preprocessor symbol RYML_SHARED during compilation.

As a library

The single header file is a good approach to quickly try the library, but if you wish to make good use of CMake and its tooling ecosystem, (and get better compile times), then ryml has you covered.

As with any other cmake library, you have the option to integrate ryml into your project's build setup, thereby building ryml together with your project, or -- prior to configuring your project -- you can have ryml installed either manually or through package managers.

Currently cmake is required to build ryml; we recommend a recent cmake version, at least 3.13.

Note that ryml uses submodules. Take care to use the --recursive flag when cloning the repo, to ensure ryml's submodules are checked out as well:

git clone --recursive https://github.com/biojppm/rapidyaml

If you omit --recursive, after cloning you will have to do git submodule update --init --recursive to ensure ryml's submodules are checked out.

Quickstart samples

These samples show different ways of getting ryml into your application. All the samples use the same quickstart executable source, but are built in different ways, showing several alternatives to integrate ryml into your project. We also encourage you to refer to the quickstart source itself, which extensively covers most of the functionality that you may want out of ryml.

Each sample brings a run.sh script with the sequence of commands required to successfully build and run the application (this is a bash script and runs in Linux and MacOS, but it is also possible to run in Windows via Git Bash or the WSL). Click on the links below to find out more about each sample:

Sample name ryml is part of build? cmake file commands
singleheader yes
ryml brought as a single header file,
not as a library
CMakeLists.txt run.sh
singleheaderlib yes
ryml brought as a library
but from the single header file
CMakeLists.txt run_shared.sh (shared library)
run_static.sh (static library)
add_subdirectory yes CMakeLists.txt run.sh
fetch_content yes CMakeLists.txt run.sh
find_package no
needs prior install or package
CMakeLists.txt run.sh

CMake build settings for ryml

The following cmake variables can be used to control the build behavior of ryml:

  • RYML_WITH_TAB_TOKENS=ON/OFF. Enable/disable support for tabs as valid container tokens after : and -. Defaults to OFF, because this may cost up to 10% in processing time.
  • RYML_DEFAULT_CALLBACKS=ON/OFF. Enable/disable ryml's default implementation of error and allocation callbacks. Defaults to ON.
  • RYML_DEFAULT_CALLBACK_USES_EXCEPTIONS=ON/OFF - Enable/disable the same-named macro, which will make the default error handler provided by ryml throw a std::runtime_error exception.
  • RYML_USE_ASSERT - enable assertions in the code regardless of build type. This is disabled by default. Failed assertions will trigger a call to the error callback.
  • RYML_STANDALONE=ON/OFF. ryml uses c4core, a C++ library with low-level multi-platform utilities for C++. When RYML_STANDALONE=ON, c4core is incorporated into ryml as if it is the same library. Defaults to ON.

If you're developing ryml or just debugging problems with ryml itself, the following cmake variables can be helpful:

  • RYML_DEV=ON/OFF: a bool variable which enables development targets such as unit tests, benchmarks, etc. Defaults to OFF.
  • RYML_DBG=ON/OFF: a bool variable which enables verbose prints from parsing code; can be useful to figure out parsing problems. Defaults to OFF.

Forcing ryml to use a different c4core version

ryml is strongly coupled to c4core, and this is reinforced by the fact that c4core is a submodule of the current repo. However, it is still possible to use a c4core version different from the one in the repo (of course, only if there are no incompatibilities between the versions). You can find out how to achieve this by looking at the custom_c4core sample.


Other languages

One of the aims of ryml is to provide an efficient YAML API for other languages. JavaScript is fully available, and there is already a cursory implementation for Python using only the low-level API. After ironing out the general approach, other languages are likely to follow (all of this is possible because we're using SWIG, which makes it easy to do so).

JavaScript

A JavaScript+WebAssembly port is available, compiled through emscripten.

Python

(Note that this is a work in progress. Additions will be made and things will be changed.) With that said, here's an example of the Python API:

import ryml

# ryml cannot accept strings because it does not take ownership of the
# source buffer; only bytes or bytearrays are accepted.
src = b"{HELLO: a, foo: b, bar: c, baz: d, seq: [0, 1, 2, 3]}"

def check(tree):
    # for now, only the index-based low-level API is implemented
    assert tree.size() == 10
    assert tree.root_id() == 0
    assert tree.first_child(0) == 1
    assert tree.next_sibling(1) == 2
    assert tree.first_sibling(5) == 2
    assert tree.last_sibling(1) == 5
    # use bytes objects for queries
    assert tree.find_child(0, b"foo") == 1
    assert tree.key(1) == b"foo")
    assert tree.val(1) == b"b")
    assert tree.find_child(0, b"seq") == 5
    assert tree.is_seq(5)
    # to loop over children:
    for i, ch in enumerate(ryml.children(tree, 5)):
        assert tree.val(ch) == [b"0", b"1", b"2", b"3"][i]
    # to loop over siblings:
    for i, sib in enumerate(ryml.siblings(tree, 5)):
        assert tree.key(sib) == [b"HELLO", b"foo", b"bar", b"baz", b"seq"][i]
    # to walk over all elements
    visited = [False] * tree.size()
    for n, indentation_level in ryml.walk(tree):
        # just a dumb emitter
        left = "  " * indentation_level
        if tree.is_keyval(n):
           print("{}{}: {}".format(left, tree.key(n), tree.val(n))
        elif tree.is_val(n):
           print("- {}".format(left, tree.val(n))
        elif tree.is_keyseq(n):
           print("{}{}:".format(left, tree.key(n))
        visited[inode] = True
    assert False not in visited
    # NOTE about encoding!
    k = tree.get_key(5)
    print(k)  # '<memory at 0x7f80d5b93f48>'
    assert k == b"seq"               # ok, as expected
    assert k != "seq"                # not ok - NOTE THIS! 
    assert str(k) != "seq"           # not ok
    assert str(k, "utf8") == "seq"   # ok again

# parse immutable buffer
tree = ryml.parse_in_arena(src)
check(tree) # OK

# parse mutable buffer.
# requires bytearrays or objects offering writeable memory
mutable = bytearray(src)
tree = ryml.parse_in_place(mutable)
check(tree) # OK

As expected, the performance results so far are encouraging. In a timeit benchmark compared against PyYaml and ruamel.yaml, ryml parses quicker by generally 100x and up to 400x:

+----------------------------------------+-------+----------+----------+-----------+
| style_seqs_blck_outer1000_inner100.yml | count | time(ms) | avg(ms)  | avg(MB/s) |
+----------------------------------------+-------+----------+----------+-----------+
| parse:RuamelYamlParse                  |     1 | 4564.812 | 4564.812 |     0.173 |
| parse:PyYamlParse                      |     1 | 2815.426 | 2815.426 |     0.280 |
| parse:RymlParseInArena                 |    38 |  588.024 |   15.474 |    50.988 |
| parse:RymlParseInArenaReuse            |    38 |  466.997 |   12.289 |    64.202 |
| parse:RymlParseInPlace                 |    38 |  579.770 |   15.257 |    51.714 |
| parse:RymlParseInPlaceReuse            |    38 |  462.932 |   12.182 |    64.765 |
+----------------------------------------+-------+----------+----------+-----------+

(Note that the parse timings above are somewhat biased towards ryml, because it does not perform any type conversions in Python-land: return types are merely memoryviews to the source buffer, possibly copied to the tree's arena).

As for emitting, the improvement can be as high as 3000x:

+----------------------------------------+-------+-----------+-----------+-----------+
| style_maps_blck_outer1000_inner100.yml | count |  time(ms) |  avg(ms)  | avg(MB/s) |
+----------------------------------------+-------+-----------+-----------+-----------+
| emit_yaml:RuamelYamlEmit               |     1 | 18149.288 | 18149.288 |     0.054 |
| emit_yaml:PyYamlEmit                   |     1 |  2683.380 |  2683.380 |     0.365 |
| emit_yaml:RymlEmitToNewBuffer          |    88 |   861.726 |     9.792 |    99.976 |
| emit_yaml:RymlEmitReuse                |    88 |   437.931 |     4.976 |   196.725 |
+----------------------------------------+-------+-----------+-----------+-----------+

YAML standard conformance

ryml is feature complete with regards to the YAML specification. All the YAML features are well covered in the unit tests, and expected to work, unless in the exceptions noted below.

Of course, there are many dark corners in YAML, and there certainly can appear cases which ryml fails to parse. Your bug reports or pull requests are very welcome.

See also the roadmap for a list of future work.

Known limitations

ryml deliberately makes no effort to follow the standard in the following situations:

  • Containers are not accepted as mapping keys: keys must be scalars.
  • Tab characters after : and - are not accepted tokens, unless ryml is compiled with the macro RYML_WITH_TAB_TOKENS. This requirement exists because checking for tabs introduces branching into the parser's hot code and in some cases costs as much as 10% in parsing time.
  • Anchor names must not end with a terminating colon: eg &anchor: key: val.
  • Non-unique map keys are allowed. Enforcing key uniqueness in the parser or in the tree would cause log-linear parsing complexity (for root children on a mostly flat tree), and would increase code size through added structural, logical and cyclomatic complexity. So enforcing uniqueness in the parser would hurt users who may not care about it (they may not care either because non-uniqueness is OK for their use case, or because it is impossible to occur). On the other hand, any user who requires uniqueness can easily enforce it by doing a post-parse walk through the tree. So choosing to not enforce key uniqueness adheres to the spirit of "don't pay for what you don't use".
  • %YAML directives have no effect and are ignored.
  • %TAG directives are limited to a default maximum of 4 instances per Tree. To increase this maximum, define the preprocessor symbol RYML_MAX_TAG_DIRECTIVES to a suitable value. This arbitrary limit reflects the usual practice of having at most 1 or 2 tag directives; also, be aware that this feature is under consideration for removal in YAML 1.3.

Also, ryml tends to be on the permissive side where the YAML standard dictates there should be an error; in many of these cases, ryml will tolerate the input. This may be good or bad, but in any case is being improved on (meaning ryml will grow progressively less tolerant of YAML errors in the coming releases). So we strongly suggest to stay away from those dark corners of YAML which are generally a source of problems, which is a good practice anyway.

If you do run into trouble and would like to investigate conformance of your YAML code, beware of existing online YAML linters, many of which are not fully conformant; instead, try using https://play.yaml.io, an amazing tool which lets you dynamically input your YAML and continuously see the results from all the existing parsers (kudos to @ingydotnet and the people from the YAML test suite). And of course, if you detect anything wrong with ryml, please open an issue so that we can improve.

Test suite status

As part of its CI testing, ryml uses the YAML test suite. This is an extensive set of reference cases covering the full YAML spec. Each of these cases have several subparts:

  • in-yaml: mildly, plainly or extremely difficult-to-parse YAML
  • in-json: equivalent JSON (where possible/meaningful)
  • out-yaml: equivalent standard YAML
  • emit-yaml: equivalent standard YAML
  • events: reference results (ie, expected tree)

When testing, ryml parses each of the 4 yaml/json parts, then emits the parsed tree, then parses the emitted result and verifies that emission is idempotent, ie that the emitted result is semantically the same as its input without any loss of information. To ensure consistency, this happens over four levels of parse/emission pairs. And to ensure correctness, each of the stages is compared against the events spec from the test, which constitutes the reference. The tests also check for equality between the reference events in the test case and the events emitted by ryml from the data tree parsed from the test case input. All of this is then carried out combining several variations: both unix \n vs windows \r\n line endings, emitting to string, file or streams, which results in ~250 tests per case part. With multiple parts per case and ~400 reference cases in the test suite, this makes over several hundred thousand individual tests to which ryml is subjected, which are added to the unit tests in ryml, which also employ the same extensive combinatorial approach.

Also, note that in their own words, the tests from the YAML test suite contain a lot of edge cases that don't play such an important role in real world examples. And yet, despite the extreme focus of the test suite, currently ryml only fails a minor fraction of the test cases, mostly related with the deliberate limitations noted above. Other than those limitations, by far the main issue with ryml is that several standard-mandated parse errors fail to materialize. For the up-to-date list of ryml failures in the test-suite, refer to the list of known exceptions from ryml's test suite runner, which is used as part of ryml's CI process.


Alternative libraries

Why this library? Because none of the existing libraries was quite what I wanted. When I started this project in 2018, I was aware of these two alternative C/C++ libraries:

  • libyaml. This is a bare C library. It does not create a representation of the data tree, so I don't see it as practical. My initial idea was to wrap parsing and emitting around libyaml's convenient event handling, but to my surprise I found out it makes heavy use of allocations and string duplications when parsing. I briefly pondered on sending PRs to reduce these allocation needs, but not having a permanent tree to store the parsed data was too much of a downside.
  • yaml-cpp. This library may be full of functionality, but is heavy on the use of node-pointer-based structures like std::map, allocations, string copies, polymorphism and slow C++ stream serializations. This is generally a sure way of making your code slower, and strong evidence of this can be seen in the benchmark results above.

Recently libfyaml appeared. This is a newer C library, fully conformant to the YAML standard with an amazing 100% success in the test suite; it also offers the tree as a data structure. As a downside, it does not work in Windows, and it is also multiple times slower parsing and emitting.

When performance and low latency are important, using contiguous structures for better cache behavior and to prevent the library from trampling caches, parsing in place and using non-owning strings is of central importance. Hence this Rapid YAML library which, with minimal compromise, bridges the gap from efficiency to usability. This library takes inspiration from RapidJSON and RapidXML.


License

ryml is permissively licensed under the MIT license.

rapidyaml's People

Contributors

1099255210 avatar aviktorov avatar biojppm avatar captain-yoshi avatar cburgard avatar costashatz avatar dancingbug avatar dmachaj avatar ericcitaire avatar gei0r avatar jppm avatar leoetlino avatar lgtm-migrator avatar litghost avatar mboutet avatar mbs-c avatar mithro avatar musicinmybrain avatar neko-box-coder avatar pig208 avatar pixelparas avatar simu avatar tbleher avatar whiteabelincoln avatar xtvaser avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rapidyaml's Issues

Parse error on very simple json document

As I understand it, yaml is a superset of json, but rapidyml throws an error when trying to parse this very simple document

#include <ryml.hpp>
int main() {
    ryml::Tree t = ryml::parse("{\"a\":\"b\"}");
}

Output:

ERROR parsing yml: parse error
line 1: '{"a":"b"}' (sz=9)
             ^~~~~  (cols 5-10)

error in README example

The example tree emission in the README doesn't currently work (error message attached). The offending line is
std::string tmp = "child5"; s.append_child() << tmp;

It boils down to a missing to_chars for std::string. Possible to circumvent by using
std::string tmp = "child5"; s.append_child() << tmp.c_str();
in the example, but not very nice. Tested with gcc 6.2.0.
error.txt

ryml accepts setting a VAL node to a MAP node without removing val or triggering error

meas:
  createParameterList:
    - Lumi
      value: 1
      relErr: 0.1
    - mu
    - alpha_syst1
    - mu
      value: 1
      low: 0
      high: 3
      const: 0
ERROR parsing yml: parse error - indentation should not increase at this point
line 4: '      value: 1' (sz=14)
         ^~~~~~~~~~~~~~  (cols 1-15)

the document was written with rapidyaml. So irrespective of whether it's valid YAML or not, I'd hope it would at least be able to parse back what it emitted :-)

Originally posted by @cburgard in #28 (comment)

Fails to parse plain flow (unquoted) scalars

Tried to parse my file, but throws:

ERROR parsing yml: parse error - indentation should not increase at this point
line 996

I am not sure if the file respects the yaml specification.
I tried yaml-cpp too and it did work.
Yaml-cpp took 52 seconds with 8% cpu usage with i9 9900k.

You can download / check:

https://developers.eveonline.com/resource/resources

Download the file "sde-TRANQUILITY.zip" or use this link

https://eve-static-data-export.s3-eu-west-1.amazonaws.com/tranquility/sde.zip

try to parse sde/fsd/typeIDs.yaml

src/c4/yml/detail/atof.c

src/c4/yml/detail/atof.c declares double atof(char* p) but it's guarded by #ifdef FLOATINGPT and that doesn't seem to be defined anywhere. src/c4/yml/detail/strtod.hpp also defines atof. Is atof.c junk?

span / sub naming is confusing

I've been trying to introduce other people to the API, and the span concept was a bit confusing at first, but not too bad. The renamed sub and csub though seem to be a regression.

While from our POV they might seem obvious, the (admittedly anecdotal) first question I got was "sub what?" - is it sub as in below something? Subscription? Even the C++ STL didn't go shorter than substr.

Looking at the STL for inspiration, why can't spans be replaced with std::string_view<> if C++17 is an acceptable target, or a custom ryml::string_view if previous C++ standards compatibility is required? The name is a lot more obvious, not too verbose, and we have using declarations now. I think this would make it a lot easier for someone new to the API to understand what a piece of code is doing without having to read comments or headers, which is always A Good Thing.

Example Error

Can't get the example from the readme to work.

Expected:
Regular output, no errors.

Got:
Some functions are not existing anymore, so my code is modified a little

#include <ryml.hpp>

// not needed by ryml, just for these examples (and below)
#include <iostream>
#include <cassert>

// convenience functions to print a node
void show_keyval(ryml::NodeRef n)
{
    assert(n.has_key());
    assert(n.has_val());
    std::cout << n.key() << ": " << n.val() << "\n";
}
void show_val(ryml::NodeRef n)
{
//    std::cout << typeid(n).name() << std::endl;
    assert(n.has_val());
    std::cout << n.val() << "\n";
}

int main()
{
    // ryml can parse in situ (and read-only buffers too):
    char src[] = "{foo: 1, bar: [a: 2, b: 3]}";
    c4::substr srcview = src; // a mutable view to the source buffer
    // there are also overloads for reusing the tree and parser
    ryml::Tree tree = ryml::parse(srcview);

    // get a reference to the "foo" node
    ryml::NodeRef node = tree["foo"];

    show_keyval(node);  // "foo: 1"
//    std::cout << tree["bar"][0];
    show_val(tree["bar"][0]);  // "2"
    show_val(tree["bar"][1]);  // "3"

    // deserializing:
    int foo;
    node >> foo; // now foo == 1
}

Got error in show_val(tree["bar"][0]); // "2"
yml_tests: /home/user/CLionProjects/yml_tests/main.cpp:17: void show_val(c4::yml::NodeRef): Assertion n.has_val()' failed.`

If trying show_val(node["bar"][0]); // "2" (like in example) error is:
ERROR: ERROR: expected true: _p(node)->is_map()

parse error

test was commited in 8416731

Here's the log:

---------------
videos:
  - UQxRibHKEDI:
    - UQxRibHKEDI.640x210.mp4
    - UQxRibHKEDI.1280x418.mp4
    - UQxRibHKEDI.1920x628.mp4
    - UQxRibHKEDI.2560x838.mp4
    - UQxRibHKEDI.3840x1256.webm
  - DcYsg8VFdC0:
    - DcYsg8VFdC0.640x164.mp4
    - DcYsg8VFdC0.1280x326.mp4
    - DcYsg8VFdC0.1920x490.mp4
    - DcYsg8VFdC0.2560x652.mp4
    - DcYsg8VFdC0.3840x978.webm
  - Yt3ymqZXzLY:
    - Yt3ymqZXzLY.640x118.mp4
    - Yt3ymqZXzLY.1280x236.mp4
    - Yt3ymqZXzLY.1920x354.mp4
    - Yt3ymqZXzLY.3840x706.webm
---------------

-----------
parse.cpp:
line 1: 'videos:' (sz=7)
         ^~~~~~~  (cols 1-8)
top state: RTOP|RUNK
parse.cpp:227: handle_unk
parse.cpp:238: got a scalar
parse.cpp:1450: line[1] (8 cols) progressed by 6:  col 1 --> 7   offset 0 --> 6
parse.cpp:1382: scalar was 'videos'
parse.cpp:1763: state[0]: storing scalar 'videos' (flag: 0) (old scalar='')
parse.cpp:2509: state[0]: adding flags SSCL: before=RTOP|RUNK after=RTOP|RUNK|SSCL

-----------
parse.cpp:
line 1: 'videos:' (sz=7)
               ^  (cols 7-8)
top state: RTOP|RUNK|SSCL
parse.cpp:227: handle_unk
parse.cpp:301: there's a stored scalar: 'videos'
parse.cpp:340: got a ':' -- it's a map (as_child=0)
parse.cpp:1580: start_map (as child=0)
parse.cpp:2522: state[0]: adding flags RMAP|RVAL / removing flags RUNK|RKEY: before=RTOP|RUNK|SSCL after=RTOP|RMAP|RVAL|SSCL
parse.cpp:1607: start_map: id=0
parse.cpp:1450: line[1] (8 cols) progressed by 1:  col 7 --> 8   offset 6 --> 7
parse.cpp:1459: line[1] (8 cols) ended! offset 7 --> 8

-----------
parse.cpp:
line 2: '  - UQxRibHKEDI:' (sz=16)
         ^~~~~~~~~~~~~~~~  (cols 1-17)
top state: RTOP|RMAP|RVAL|SSCL
parse.cpp:844: handle_map_impl: node_id=0  level=0
parse.cpp:1874: larger indentation (2 > 0)!!!
parse.cpp:2522: state[0]: adding flags RKEY / removing flags RVAL: before=RTOP|RMAP|RVAL|SSCL after=RTOP|RMAP|RKEY|SSCL
parse.cpp:1535: start_unk
parse.cpp:1479: pushing level! currnode=0  currlevel=0
parse.cpp:2497: state[1]: setting flags to RUNK: before=RTOP|RMAP|RKEY|SSCL
parse.cpp:1498: pushing level: now, currlevel=1
parse.cpp:1787: moving scalar 'videos' from state[0] to state[1] (overwriting 'videos')
parse.cpp:2509: state[1]: adding flags SSCL: before=RUNK after=RUNK|SSCL
parse.cpp:2535: state[0]: removing flags SSCL: before=RTOP|RMAP|RKEY|SSCL after=RTOP|RMAP|RKEY
parse.cpp:1450: line[2] (17 cols) progressed by 2:  col 1 --> 3   offset 8 --> 10
parse.cpp:1473: state[1]: saving indentation: 2

-----------
parse.cpp:
line 2: '  - UQxRibHKEDI:' (sz=16)
           ^~~~~~~~~~~~~~  (cols 3-17)
top state: RUNK|SSCL
parse.cpp:227: handle_unk
parse.cpp:254: it's a seq (as_child=1)
parse.cpp:1479: pushing level! currnode=-1  currlevel=1
parse.cpp:1483: pushing level! actually no, current node is null
parse.cpp:1633: start_seq (as child=1)
parse.cpp:2522: state[1]: adding flags RSEQ|RVAL / removing flags RUNK: before=RUNK|SSCL after=RSEQ|RVAL|SSCL
parse.cpp:1771: state[1]: consuming scalar 'videos' (flag: 512))
parse.cpp:2535: state[1]: removing flags SSCL: before=RSEQ|RVAL|SSCL after=RSEQ|RVAL
parse.cpp:1647: start_seq: id=1 name='videos'
parse.cpp:1473: state[1]: saving indentation: 2
parse.cpp:1450: line[2] (17 cols) progressed by 2:  col 3 --> 5   offset 10 --> 12

-----------
parse.cpp:
line 2: '  - UQxRibHKEDI:' (sz=16)
             ^~~~~~~~~~~~  (cols 5-17)
top state: RSEQ|RVAL
parse.cpp:481: handle_seq_impl: node_id=1 level=1
parse.cpp:547: it's a scalar
parse.cpp:1282: RSEQ|RVAL
parse.cpp:1450: line[2] (17 cols) progressed by 11:  col 5 --> 16   offset 12 --> 23
parse.cpp:1382: scalar was 'UQxRibHKEDI'
parse.cpp:552: actually, the scalar is the first key of a map, and it opens a new scope
parse.cpp:2522: state[1]: adding flags RNXT / removing flags RVAL: before=RSEQ|RVAL after=RSEQ|RNXT
parse.cpp:1479: pushing level! currnode=1  currlevel=1
parse.cpp:2497: state[2]: setting flags to RUNK: before=RSEQ|RNXT
parse.cpp:1498: pushing level: now, currlevel=2
parse.cpp:1580: start_map (as child=1)
parse.cpp:2522: state[2]: adding flags RMAP|RVAL / removing flags RUNK|RKEY: before=RUNK after=RMAP|RVAL
parse.cpp:1598: start_map: id=2
parse.cpp:1763: state[2]: storing scalar 'UQxRibHKEDI' (flag: 0) (old scalar='')
parse.cpp:2509: state[2]: adding flags SSCL: before=RMAP|RVAL after=RMAP|RVAL|SSCL
parse.cpp:1473: state[2]: saving indentation: 4
parse.cpp:2522: state[2]: adding flags RVAL / removing flags RKEY: before=RMAP|RVAL|SSCL after=RMAP|RVAL|SSCL
parse.cpp:1450: line[2] (17 cols) progressed by 1:  col 16 --> 17   offset 23 --> 24
parse.cpp:1459: line[2] (17 cols) ended! offset 24 --> 25

-----------
parse.cpp:
line 3: '    - UQxRibHKEDI.640x210.mp4' (sz=29)
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~  (cols 1-30)
top state: RMAP|RVAL|SSCL
parse.cpp:844: handle_map_impl: node_id=2  level=2
parse.cpp:1771: state[2]: consuming scalar 'UQxRibHKEDI' (flag: 512))
parse.cpp:2535: state[2]: removing flags SSCL: before=RMAP|RVAL|SSCL after=RMAP|RVAL
parse.cpp:1724: append keyval: 'UQxRibHKEDI' '' to parent id=2 (level=2)
parse.cpp:1727: append keyval: id=3 name='UQxRibHKEDI' val=''
parse.cpp:2522: state[2]: adding flags RKEY / removing flags RVAL: before=RMAP|RVAL after=RMAP|RKEY
parse.cpp:1450: line[3] (30 cols) progressed by 4:  col 1 --> 5   offset 25 --> 29

-----------
parse.cpp:
line 3: '    - UQxRibHKEDI.640x210.mp4' (sz=29)
             ^~~~~~~~~~~~~~~~~~~~~~~~~  (cols 5-30)
top state: RMAP|RKEY
parse.cpp:844: handle_map_impl: node_id=2  level=2
parse.cpp:1218: .... tag was empty

parse.cpp:
line 3: '    - UQxRibHKEDI.640x210.mp4' (sz=29)
             ^~~~~~~~~~~~~~~~~~~~~~~~~  (cols 5-30)
top state: RMAP|RKEY

Clarification for child access complexity

Hi, I need some clarification about the complexity of child lookups (O(1) vs O(n)).

The readme say:

the complexity of operator[] is linear on the number of children of the node on which it is invoked.
[...]
with a node index, a lookup is O(1)

So which one is it?

On a related note, Tree::lookup_path() returns a size_t target. How can I efficiently get the looked up node?

Header include order with ryml_std.hpp

From #52 (comment):

Generally you have to include the std header before any other headers that use functions from it. This approach has served me well every time. For example, see this example or this other example.

But I am confused by your report, which goes against what I just put above. Can you confirm?

What you said above makes sense and that is also what I expected. However, I can confirm that I get compiler errors with the following file:

#include <ryml.hpp>
#include <ryml_std.hpp>
#include <c4/yml/preprocess.hpp>

int main()
{
    std::string in = R"({"a":"b","c":null,"d":"e"})";
    std::string yaml = ryml::preprocess_json<std::string>(c4::to_csubstr(in));
}

Error:

[ 50%] Building CXX object CMakeFiles/ryml_ausprobieren.dir/main2.cpp.obj

In file included from D:/User/Adrian/Programmierung/ryml/install/include/c4/yml/yml.hpp:8,
                 from D:/User/Adrian/Programmierung/ryml/install/include/ryml.hpp:4,
                 from D:/User/Adrian/Programmierung/ryml/ausprobieren/main2.cpp:1:
D:/User/Adrian/Programmierung/ryml/install/include/c4/yml/preprocess.hpp: In instantiation of 'c4::substr c4::yml::detail::preprocess_into_container(c4::csubstr, CharContainer*) [with size_t (* PP)(c4::csubstr, c4::substr) = c4::yml::preprocess_json; CharContainer = std::__cxx11::basic_string<char>; c4::substr = c4::basic_substring<char>; c4::csubstr = c4::basic_substring<const char>]':
D:/User/Adrian/Programmierung/ryml/install/include/c4/yml/preprocess.hpp:77:62:   required from 'c4::substr c4::yml::preprocess_json(c4::csubstr, CharContainer*) [with CharContainer = std::__cxx11::basic_string<char>; c4::substr = c4::basic_substring<char>; c4::csubstr = c4::basic_substring<const char>]'
D:/User/Adrian/Programmierung/ryml/install/include/c4/yml/preprocess.hpp:86:20:   required from 'CharContainer c4::yml::preprocess_json(c4::csubstr) [with CharContainer = std::__cxx11::basic_string<char>; c4::csubstr = c4::basic_substring<const char>]'
D:/User/Adrian/Programmierung/ryml/ausprobieren/main2.cpp:8:77:   required from here
D:/User/Adrian/Programmierung/ryml/install/include/c4/yml/preprocess.hpp:33:36: error: no matching function for call to 'to_substr(std::__cxx11::basic_string<char>&)'
   33 |     size_t sz = PP(input, to_substr(*out));
      |                           ~~~~~~~~~^~~~~~
In file included from D:/User/Adrian/Programmierung/ryml/install/include/c4/yml/common.hpp:5,
                 from D:/User/Adrian/Programmierung/ryml/install/include/c4/yml/tree.hpp:6,
                 from D:/User/Adrian/Programmierung/ryml/install/include/c4/yml/yml.hpp:4,
                 from D:/User/Adrian/Programmierung/ryml/install/include/ryml.hpp:4,
                 from D:/User/Adrian/Programmierung/ryml/ausprobieren/main2.cpp:1:
D:/User/Adrian/Programmierung/ryml/install/include/c4/substr.hpp:1464:15: note: candidate: 'c4::substr c4::to_substr(char*)'
 1464 | inline substr to_substr(char *s)
      |               ^~~~~~~~~
D:/User/Adrian/Programmierung/ryml/install/include/c4/substr.hpp:1464:31: note:   no known conversion for argument 1 from 'std::__cxx11::basic_string<char>' to 'char*'
 1464 | inline substr to_substr(char *s)
      |                         ~~~~~~^
D:/User/Adrian/Programmierung/ryml/install/include/c4/substr.hpp:1513:15: note: candidate: 'c4::substr c4::to_substr(c4::substr)'
 1513 | inline substr to_substr(substr s)
      |               ^~~~~~~~~
D:/User/Adrian/Programmierung/ryml/install/include/c4/substr.hpp:1513:32: note:   no known conversion for argument 1 from 'std::__cxx11::basic_string<char>' to 'c4::substr' {aka 'c4::basic_substring<char>'}
 1513 | inline substr to_substr(substr s)
      |                         ~~~~~~~^
In file included from D:/User/Adrian/Programmierung/ryml/install/include/c4/yml/yml.hpp:8,
                 from D:/User/Adrian/Programmierung/ryml/install/include/ryml.hpp:4,
                 from D:/User/Adrian/Programmierung/ryml/ausprobieren/main2.cpp:1:
D:/User/Adrian/Programmierung/ryml/install/include/c4/yml/preprocess.hpp:39:33: error: no matching function for call to 'to_substr(std::__cxx11::basic_string<char>&)'
   39 |         sz = PP(input, to_substr(*out));
      |                        ~~~~~~~~~^~~~~~
In file included from D:/User/Adrian/Programmierung/ryml/install/include/c4/yml/common.hpp:5,
                 from D:/User/Adrian/Programmierung/ryml/install/include/c4/yml/tree.hpp:6,
                 from D:/User/Adrian/Programmierung/ryml/install/include/c4/yml/yml.hpp:4,
                 from D:/User/Adrian/Programmierung/ryml/install/include/ryml.hpp:4,
                 from D:/User/Adrian/Programmierung/ryml/ausprobieren/main2.cpp:1:
D:/User/Adrian/Programmierung/ryml/install/include/c4/substr.hpp:1464:15: note: candidate: 'c4::substr c4::to_substr(char*)'
 1464 | inline substr to_substr(char *s)
      |               ^~~~~~~~~
D:/User/Adrian/Programmierung/ryml/install/include/c4/substr.hpp:1464:31: note:   no known conversion for argument 1 from 'std::__cxx11::basic_string<char>' to 'char*'
 1464 | inline substr to_substr(char *s)
      |                         ~~~~~~^
D:/User/Adrian/Programmierung/ryml/install/include/c4/substr.hpp:1513:15: note: candidate: 'c4::substr c4::to_substr(c4::substr)'
 1513 | inline substr to_substr(substr s)
      |               ^~~~~~~~~
D:/User/Adrian/Programmierung/ryml/install/include/c4/substr.hpp:1513:32: note:   no known conversion for argument 1 from 'std::__cxx11::basic_string<char>' to 'c4::substr' {aka 'c4::basic_substring<char>'}
 1513 | inline substr to_substr(substr s)
      |                         ~~~~~~~^
In file included from D:/User/Adrian/Programmierung/ryml/install/include/c4/yml/yml.hpp:8,
                 from D:/User/Adrian/Programmierung/ryml/install/include/ryml.hpp:4,
                 from D:/User/Adrian/Programmierung/ryml/ausprobieren/main2.cpp:1:
D:/User/Adrian/Programmierung/ryml/install/include/c4/yml/preprocess.hpp:41:21: error: no matching function for call to 'to_substr(std::__cxx11::basic_string<char>&)'
   41 |     return to_substr(*out).first(sz);
      |            ~~~~~~~~~^~~~~~
In file included from D:/User/Adrian/Programmierung/ryml/install/include/c4/yml/common.hpp:5,
                 from D:/User/Adrian/Programmierung/ryml/install/include/c4/yml/tree.hpp:6,
                 from D:/User/Adrian/Programmierung/ryml/install/include/c4/yml/yml.hpp:4,
                 from D:/User/Adrian/Programmierung/ryml/install/include/ryml.hpp:4,
                 from D:/User/Adrian/Programmierung/ryml/ausprobieren/main2.cpp:1:
D:/User/Adrian/Programmierung/ryml/install/include/c4/substr.hpp:1464:15: note: candidate: 'c4::substr c4::to_substr(char*)'
 1464 | inline substr to_substr(char *s)
      |               ^~~~~~~~~
D:/User/Adrian/Programmierung/ryml/install/include/c4/substr.hpp:1464:31: note:   no known conversion for argument 1 from 'std::__cxx11::basic_string<char>' to 'char*'
 1464 | inline substr to_substr(char *s)
      |                         ~~~~~~^
D:/User/Adrian/Programmierung/ryml/install/include/c4/substr.hpp:1513:15: note: candidate: 'c4::substr c4::to_substr(c4::substr)'
 1513 | inline substr to_substr(substr s)
      |               ^~~~~~~~~
D:/User/Adrian/Programmierung/ryml/install/include/c4/substr.hpp:1513:32: note:   no known conversion for argument 1 from 'std::__cxx11::basic_string<char>' to 'c4::substr' {aka 'c4::basic_substring<char>'}
 1513 | inline substr to_substr(substr s)
      |                         ~~~~~~~^
make[2]: *** [CMakeFiles/ryml_ausprobieren.dir/build.make:83: CMakeFiles/ryml_ausprobieren.dir/main2.cpp.obj] Error 1
make[1]: *** [CMakeFiles/Makefile2:96: CMakeFiles/ryml_ausprobieren.dir/all] Error 2
make: *** [Makefile:104: all] Error 2

`Tree['...'].key().str` returns an incompletely parsed string value

OS: Windows 7 x64
Compiler: MSVC 2017

yaml:

headers:
  xml1: |
    <?xml version='1.0' encoding='UTF-8'?>
    <mydoc>
  xml2: |
    <?xml version='1.0' encoding='UTF-8'?>

footers:
  xml1: |
    </mydoc>
  xml2: ""

schema:
  - xpath: [mydoc/blabla, blabla]
ryml::Tree myyaml = ryml::parse(c4::csubstr{ myyaml_str.c_str(), myyaml_str.length() });
std::string a = myyaml["schema"].key().str;

After parse the schema key is not schema. It contains all the text after the colon character: schema:\n ...

Emitting styles (and removing extra new lines)

Is it possible to configure the emitting style (block/flow and newlines)?

Currently, I have a bunch of existing YAML documents that look like this:

enemy:
- actors:
  - {name: Enemy_Bokoblin_Junior, value: 4.0} 
  - {name: Enemy_Bokoblin_Middle, value: 16.0} 
  - {name: Enemy_Bokoblin_Senior, value: 32.0} 
  - {name: Enemy_Bokoblin_Dark, value: 48.0} 
  species: BokoblinSeries 
# ...

Nice and readable, with the flow (inline) style being used for simple maps that don't contain nested maps (which is the PyYAML default).

Emitting the same document with ryml currently results in the following:

enemy:
  -
    actors:
      -
        name: Enemy_Bokoblin_Junior
        value: 4
      -
        name: Enemy_Bokoblin_Middle
        value: 16
      -
        name: Enemy_Bokoblin_Senior
        value: 32
      -
        name: Enemy_Bokoblin_Dark
        value: 48
    species: BokoblinSeries

which isn't that bad but it's comparatively harder to read and it's also a bit too long.

So to get back to the questions:

  • Is it possible to get rid of the new line that is emitted for maps that are sequence items? For some reason, this only affects maps that are sequence items.

  • Does ryml make it possible to set the flow style (block/inline) for maps and sequences? I was unable to find any such functionality in the API but it's possible I've missed something.

Incorrect handling of quoted scalars in some cases

Thanks for fixing #32. I ran some parsing tests and it looks like there are issues with the handling of quoted scalars:

The problematic document is as follows:

-  MessageID: 'MapRegion_HyrulePrairie '

Note that the string contains a trailing whitespace, which is incidentally why it has to be quoted.

Now, rapidyaml appears to be having trouble with such scalars:

>>> parse("-  MessageID:          'MapRegion_HyrulePrairie'").v[0]
Byml(Map({'MessageID': Byml("MapRegion_HyrulePrairie'o\x00\x00\x00r\x00\x00\x00y")}))
# incorrect: uninitialised memory?

>>> parse("-  MessageID:          'MapRegion_HyrulePrairie '").v[0]
Byml(Map({'MessageID': Byml("MapRegion_HyrulePrairie 'FM\x02\x00\x00\x00\x00\x01\x00")}))
# incorrect: uninitialised memory?

>>> parse("-  MessageID: 'MapRegion_HyrulePrairie '").v[0]
Byml(Map({'MessageID': Byml("MapRegion_HyrulePrairie '")}))
# incorrect: for some reason the ' is included in the string

>>> parse("- MessageID: 'MapRegion_HyrulePrairie '").v[0]
Byml(Map({'MessageID': Byml("MapRegion_HyrulePrairie '")}))
# incorrect: same issue

>>> parse("- MessageID: 'MapRegion_HyrulePrairie'").v[0]
Byml(Map({'MessageID': Byml("MapRegion_HyrulePrairie'")}))
# incorrect: still has the trailing quote

>>> parse("- key: true\n MessageID:          'MapRegion_HyrulePrairie '").v[0]
# *segfault*

>>> parse("- key: true\n  MessageID:          'MapRegion_HyrulePrairie '").v[0]
Byml(Map({'MessageID': Byml('MapRegion_HyrulePrairie '), 'key': Byml(True)}))
# the string is parsed correctly in this case

leoetlino@00c665e prevents the issue from manifesting but obviously that is not the right fix since it probably breaks multi-line scalars.

Build system macro complaints

A strict build system complains about macros like "_C4_foo" since according to the C++ standard:
"identifiers beginning with an underscore followed immediately by an uppercase letter [are implementation reserved]".

Deserializing "false" leads to NOT IMPLEMENTED error on windows

Hi, me again 😉

I suspect this is multiple errors in one. On windows, the following code yields

ext/c4core/src/c4/error.cpp:168: ERROR: NOT IMPLEMENTED
#include <ryml.hpp>

int main()
{
    char input[] = "a: false";   // this works:  "a: 0"

    ryml::Tree tree = ryml::parse(c4::substr(input));
    ryml::NodeRef a = tree["a"];

    bool result;   // std::string works (also need to include ryml_std.hpp)
    a >> result;
}

Now the C4_NOT_IMPLEMENTED() comes from is_debugger_attached(), which doesn't have a windows implementation. But why the function is triggered in the first place, I don't know.

convenience helpers for `std::vector`

I've found it a bit cumbersome to read/write std::vectors of things that already support a << and >> operator, hence I came up with these helpers for my own purposes (not very efficient coding on my side, but you get the message):

template<class T> void read(NodeRef const& n, std::vector<T> *v){
  for(size_t i=0; i<n.num_children(); ++i){
    T e;
    n[i]>>e;
    v->push_back(e);
  }
}

template<class T> void write(NodeRef *n, std::vector<T> const& v){
  *n |= c4::yml::SEQ;
  for(auto e:v){
    n->append_child() << e;
  }
}

Any chance of adding them to node.hpp, or are there reservations against including vector?

include could not find load file: ..../c4Project.cmake

i am new user at cmake,
i want to build a .lib for Visual Studio 2015 in windows,
when i run :
cmake -S source_dir -B build_dir -DCMAKE_INSTALL_PREFIX=install_dir

it final show error :
CMake Error at CMakeLists.txt:4 (include):
include could not find load file:
./ext/c4core/cmake/c4Project.cmake

CMake Error at CMakeLists.txt:6 (c4_declare_project):
Unknown CMake command "c4_declare_project".

how to build and copy c4core, for c4core can incorporated into ryml ?
need help, thank you ~

Fails to parse other map keys after a sequence if the sequence items are not indented

First, let me thank you for making this library! I am currently in the process of switching from PyYAML to ryml for performance reasons, and so far the initial results are very encouraging. ryml is typically 20-25x faster for emitting and one file that took 2s to emit with PyYAML or yaml-cpp only takes ~90ms with ryml.

However, parsing existing documents that were emitted by PyYAML (or libyaml) is proving to be a bit more problematic.

Currently any documents that look like this fail to parse with ryml:

enemy: #L1
- actors: #L2
  - {name: Enemy_Bokoblin_Junior, value: 4.0} #L3
  - {name: Enemy_Bokoblin_Middle, value: 16.0} #L4
  - {name: Enemy_Bokoblin_Senior, value: 32.0} #L5
  - {name: Enemy_Bokoblin_Dark, value: 48.0} #L6
  species: BokoblinSeries #L7
# ...
/home/leo/projects/oead/src/lib/rapidyaml/src/c4/yml/parse.cpp:597: ERROR parsing yml: parse error
line 7: '  species: BokoblinSeries' (sz=25)
           ^~~~~~~~~~~~~~~~~~~~~~~  (cols 3-26)
top state: RSEQ|RNXT

If the actors array is indented as follows:

enemy:
- actors:
    - {name: Enemy_Bokoblin_Junior, value: 4.0}
    - {name: Enemy_Bokoblin_Middle, value: 16.0}
    - {name: Enemy_Bokoblin_Senior, value: 32.0}
    - {name: Enemy_Bokoblin_Dark, value: 48.0}
  species: BokoblinSeries

then the file is parsed correctly. It seems that the lack of extra indentation for the array is throwing ryml off and causing it to miss the end of the sequence?

warnings: empty expression statement

joining to @dkeeney :
tons of warnings like "empty expression statement has no effect; remove unnecessary ';' to silence this warning" may be fixed with macroses like this:

#define ev_idle_init(ev,cb)  do { ev_init ((ev), (cb)); ev_idle_set ((ev)); } while (0)

Originally posted by @parihaaraka in #12 (comment)

Parsing can become accidentally quadratic because of sscanf

While investigating a performance issue in a library I am working on, I noticed that parsing a particular document (warning: very large file) causes the program to spend a lot of time in memchr (roughly 70% of its time). Pausing execution in gdb gives the following stack trace: (this is an image to keep the colors)

image

to the point that parsing is only ~3x faster than doing it with PyYAML. The speedup I get for other files is >10x.

It looks like the problem is that reading floats is done using c4::from_chars -> c4::atof -> sscanf which ultimately calls _IO_str_init_static_internal in the glibc implementation. As far as I understand, this essentially causes strlen to be called on the string that rapidyaml passes to it, essentially causing strlen to be called O(n^2) times.

Since there are a lot of floats throughout the entire document, parsing appears to become accidentally quadratic.

Not sure how to fix this. Even using the standard atof might not help since that function doesn't take the length either...

segfault in destructor of Parser

Hi!
Not sure if I'm doing anything stupid, but I do get a segfault during the destructor of the parser. It seems to be related to the m_stack being NULL, but that's about where my understanding stops.

Compile and run with

g++ -o test test.cpp -DNDEBUG -lryml
./test

test.cpp

#include <sstream>
#include <iostream>
#include <fstream>

#include <ryml.hpp>
#include <c4/yml/std/map.hpp>
#include <c4/yml/std/string.hpp>
#include <c4/yml/common.hpp>

size_t count_nlines(c4::csubstr src) {
  // helper function to count the lines
  size_t n = (src.len > 0);
  while(src.len > 0)
    {
      n += (src.begins_with('\n') || src.begins_with('\r'));
        src = src.sub(1);
    }
  return n;
}

c4::yml::Tree makeTree(std::istream& is){
  if(!is.good()) throw std::runtime_error("invalid input!");

  std::cout << "reading string" << std::endl;
  std::string s(std::istreambuf_iterator<char>(is), {});
  auto src = c4::to_csubstr(s.c_str());
  size_t nlines = count_nlines(src);
  std::cout << "creating tree" << std::endl;
  c4::yml::Tree tree(nlines,s.size());
  std::cout << "parsing" << std::endl;
  c4::yml::Parser np;
  np.parse({}, tree.copy_to_arena(src), &tree);
  std::cout << "returning" << std::endl;  
  return tree;
}

int main(){
  std::ifstream is("myfile.json");
  auto t = makeTree(is);
  std::cout << "all good!" << std::endl;
  return 1;
}

myfile.json:

{
    "Gaussian": {
        "class": "RooGaussian",
        "arguments": [
            "x",
            "mean",
            "sigma"
        ]
    },
    "Poisson": {
        "class": "RooPoisson",
        "arguments": [
            "x",
            "mean"
        ]
    },
    "prod": {
        "class": "RooProduct",
        "arguments": [
            "factors"
        ]
    },
    "interpolation0d": {
        "class": "RooStats::HistFactory::FlexibleInterpVar",
        "arguments": [
            "vars",
            "nom",
            "low",
            "high"
        ]
    },
    "sumpdf": {
        "class": "RooRealSumPdf",
        "arguments": [
            "samples",
            "coefficients"
        ]
    },
    "paramhist": {
        "class": "ParamHistFunc",
        "arguments": [
            "observables",
            "parameters"
        ]
    },
    "ARGUS": {
        "class": "RooArgusBG",
        "arguments": [
            "mass",
            "resonance",
            "slope",
            "power"
        ]
    }
}

CMake build fail

The current master of rapidyaml/c4core does not build out-of-the-box.

  add_subdirectory given source "/c4core" which is not an existing directory.
Call Stack (most recent call first):
  ext/c4core/cmake/c4Project.cmake:541 (c4_add_subproj)
  CMakeLists.txt:22 (c4_require_subproject)


-- ryml: -----> target ryml PUBLIC incorporating lib c4core
CMake Error at ext/c4core/cmake/c4Project.cmake:291 (get_target_property):
  get_target_property() called with non-existent target "c4core".
Call Stack (most recent call first):
  ext/c4core/cmake/c4Project.cmake:313 (_c4_get_tgt_prop)
  ext/c4core/cmake/c4Project.cmake:279 (_c4_append_tgt_prop)
  ext/c4core/cmake/c4Project.cmake:1051 (c4_append_target_prop)
  ext/c4core/cmake/c4Project.cmake:1034 (_c4_incorporate_lib)
  ext/c4core/cmake/c4Project.cmake:951 (_c4_link_with_libs)
  ext/c4core/cmake/c4Project.cmake:802 (c4_add_target)
  CMakeLists.txt:25 (c4_add_library)


CMake Error at CMakeLists.txt:101 (c4_add_dev_targets):
  Unknown CMake command "c4_add_dev_targets".

I suppose that there's a problem with the RYML_EXT_DIR variable, which seems to be unset by default. However, even when setting it, I end up with

  Unknown CMake command "c4_add_dev_targets".

Any suggestions?

emit formatting

Eg, #29 or #30 . Some thoughts:

  • implement custom EmitFormatters:
    • block (the current emitter)
    • single-line flow
    • multi-line flow
    • semi-intelligent: use a general emitter, using heuristics such as counting columns and breaking and selecting block/flow styles as appropriate (eg, nested containers)
  • implement per-node format flags? i.e., add flags forcing format selection?
  • add type-based implicit tagging? e.g. implicit tag bools so that they show as "true" rather than "1"

Reflection-capable YAML parsing?

Hi!

I'm considering adopting the YAML format as the serialization format in my apps, and I've been searching for all the C/C++ YAML libraries available, but... to my surprise, all of them assume your program knows -or decides- the "schema" (data types, data structures, etc.)

However, I want that the "schema" and the data types and structures are owned by the YAML file, and not by the program (of course I agree programs are written with some certain data structures, but I want them to be able to read the YAML file and then later check if the YAML schema matches the program structures or not).

In other words, I was hoping to find a YAML library that handled the data types for you, so that you would tell it data=load("thisfile.yaml"); and then the library could tell you something like "data contains 3 objects of structure car and 2 objects of structure bike. Structure car has fields for brand (string), model (string), price (float), available (boolean)... etc".

Can this be done with your library? Do you have any example/demo that does this?

Thanks a lot!

emit json example

could you please explain how to create and emit this kind of json:
{"1": null}

Cannot dereference a yaml reference

I am trying to parse some yaml files with anchors and references all over, of different types. However, I am unable to get the dereferenced value. Maybe I am doing something wrong, or there is some magic method that will do this for me and I do not know about. How can I dereference a yaml reference without having to traverse the whole yaml tree to find the correct reference?

One of the yaml files I am trying to parse is:

collection:
    col_id: &col_id 'col_0'
    col_num: &col_num 3

test:
    id: *col_id
    num: *col_num

The sample code to parse it is:

#include <iostream>
#include <string>
#include <ryml.hpp>
#include <ryml_std.hpp>

namespace rapidyaml = ryml;

int main()
{
        std::string s =
R"(
collection:
    col_id: &col_id 'col_0'
    col_num: &col_num 3

test:
    id: *col_id
    num: *col_num
)";
        rapidyaml::Tree yaml = rapidyaml::parse(c4::to_csubstr(s));
        std::cout << yaml << std::endl;

        std::string collection_id;
        int collection_num;
        std::string test_id;
        int test_num;

        yaml["collection"]["col_id"] >> collection_id;
        std::cout << collection_id << std::endl;
        std::cout << yaml["collection"]["col_id"].val() << std::endl;
        std::cout << std::endl;

        yaml["collection"]["col_num"] >> collection_num;
        std::cout << collection_num << std::endl;
        std::cout << yaml["collection"]["col_num"].val() << std::endl;
        std::cout << std::endl;

        yaml["test"]["id"] >> test_id; // expected 'col_0' obtained '*col_id'
        std::cout << test_id << std::endl;
        std::cout << yaml["test"]["id"].val() << std::endl;
        std::cout << yaml["test"]["id"].val_ref() << std::endl; // expected, maybe, 'col_0' obtained 'col_id'
        std::cout << std::endl;

        yaml["test"]["num"] >> test_num; // assertion failed, but int(3) expected
        std::cout << test_num << std::endl;
        std::cout << yaml["test"]["num"].val() << std::endl;
        std::cout << yaml["test"]["num"].val_ref() << std::endl;
        std::cout << std::endl;

        return 0;
}

Which headers to package?

I'm trying to write an Arch Linux package (PKGBUILD) for rapidyaml. I successfully built libryml.a and see that it includes the object files from libc4core.a.

Which include files should I package? In particular, what of extern/c4core/src and extern/c4core/extern/* are used in rapidyaml's interface?

I'm not very familiar with cmake, but at least the generated Makefile doesn't have an install target.

Custom allocator support

rapidYAML sounds like a great project! Do you support, or plan to support, custom memory allocators?

Compilation error "no matching function for call to 'from_chars'" for size_t

Hi! Here is compilation issue in function
size_t Tree::_next_node(lookup_result * r, bool modify, _lookup_path_token *parent)

Full log

[  0%] Building CXX object ThirdParty/rapidyaml/CMakeFiles/ryml.dir/src/c4/yml/tree.cpp.o
/jenkins/workspace/fonline_dev/ThirdParty/rapidyaml/src/c4/yml/tree.cpp:1430:15: error: no matching function for call to 'from_chars'
        if( ! from_chars(tk, &idx))
              ^~~~~~~~~~
/jenkins/workspace/fonline_dev/ThirdParty/rapidyaml/extern/c4core/src/c4/to_chars.hpp:732:1: note: candidate function not viable: no known conversion from 'size_t *' (aka 'unsigned long *') to 'void **__restrict' for 2nd argument
_C4_DEFINE_TO_FROM_CHARS(void*   , "p"             , "p"             )
^
/jenkins/workspace/fonline_dev/ThirdParty/rapidyaml/extern/c4core/src/c4/to_chars.hpp:715:13: note: expanded from macro '_C4_DEFINE_TO_FROM_CHARS'
inline bool from_chars(csubstr buf, ty * C4_RESTRICT v)                 \
            ^
/jenkins/workspace/fonline_dev/ThirdParty/rapidyaml/extern/c4core/src/c4/to_chars.hpp:733:1: note: candidate function not viable: no known conversion from 'size_t *' (aka 'unsigned long *') to 'float *__restrict' for 2nd argument
_C4_DEFINE_TO_FROM_CHARS_TOA(   float, f)
^
/jenkins/workspace/fonline_dev/ThirdParty/rapidyaml/extern/c4core/src/c4/to_chars.hpp:627:13: note: expanded from macro '_C4_DEFINE_TO_FROM_CHARS_TOA'
inline bool from_chars(csubstr buf, ty *C4_RESTRICT v)          \
            ^
/jenkins/workspace/fonline_dev/ThirdParty/rapidyaml/extern/c4core/src/c4/to_chars.hpp:734:1: note: candidate function not viable: no known conversion from 'size_t *' (aka 'unsigned long *') to 'double *__restrict' for 2nd argument
_C4_DEFINE_TO_FROM_CHARS_TOA(  double, d)
^
/jenkins/workspace/fonline_dev/ThirdParty/rapidyaml/extern/c4core/src/c4/to_chars.hpp:627:13: note: expanded from macro '_C4_DEFINE_TO_FROM_CHARS_TOA'
inline bool from_chars(csubstr buf, ty *C4_RESTRICT v)          \
            ^
/jenkins/workspace/fonline_dev/ThirdParty/rapidyaml/extern/c4core/src/c4/to_chars.hpp:735:1: note: candidate function not viable: no known conversion from 'size_t *' (aka 'unsigned long *') to 'int8_t *__restrict' (aka 'signed char *__restrict') for 2nd argument
_C4_DEFINE_TO_FROM_CHARS_TOA(  int8_t, i)
^
/jenkins/workspace/fonline_dev/ThirdParty/rapidyaml/extern/c4core/src/c4/to_chars.hpp:627:13: note: expanded from macro '_C4_DEFINE_TO_FROM_CHARS_TOA'
inline bool from_chars(csubstr buf, ty *C4_RESTRICT v)          \
            ^
/jenkins/workspace/fonline_dev/ThirdParty/rapidyaml/extern/c4core/src/c4/to_chars.hpp:736:1: note: candidate function not viable: no known conversion from 'size_t *' (aka 'unsigned long *') to 'int16_t *__restrict' (aka 'short *__restrict') for 2nd argument
_C4_DEFINE_TO_FROM_CHARS_TOA( int16_t, i)
^
/jenkins/workspace/fonline_dev/ThirdParty/rapidyaml/extern/c4core/src/c4/to_chars.hpp:627:13: note: expanded from macro '_C4_DEFINE_TO_FROM_CHARS_TOA'
inline bool from_chars(csubstr buf, ty *C4_RESTRICT v)          \
            ^
/jenkins/workspace/fonline_dev/ThirdParty/rapidyaml/extern/c4core/src/c4/to_chars.hpp:737:1: note: candidate function not viable: no known conversion from 'size_t *' (aka 'unsigned long *') to 'int32_t *__restrict' (aka 'int *__restrict') for 2nd argument
_C4_DEFINE_TO_FROM_CHARS_TOA( int32_t, i)
^
/jenkins/workspace/fonline_dev/ThirdParty/rapidyaml/extern/c4core/src/c4/to_chars.hpp:627:13: note: expanded from macro '_C4_DEFINE_TO_FROM_CHARS_TOA'
inline bool from_chars(csubstr buf, ty *C4_RESTRICT v)          \
            ^
/jenkins/workspace/fonline_dev/ThirdParty/rapidyaml/extern/c4core/src/c4/to_chars.hpp:738:1: note: candidate function not viable: no known conversion from 'size_t *' (aka 'unsigned long *') to 'int64_t *__restrict' (aka 'long long *__restrict') for 2nd argument
_C4_DEFINE_TO_FROM_CHARS_TOA( int64_t, i)
^
/jenkins/workspace/fonline_dev/ThirdParty/rapidyaml/extern/c4core/src/c4/to_chars.hpp:627:13: note: expanded from macro '_C4_DEFINE_TO_FROM_CHARS_TOA'
inline bool from_chars(csubstr buf, ty *C4_RESTRICT v)          \
            ^
/jenkins/workspace/fonline_dev/ThirdParty/rapidyaml/extern/c4core/src/c4/to_chars.hpp:739:1: note: candidate function not viable: no known conversion from 'size_t *' (aka 'unsigned long *') to 'uint8_t *__restrict' (aka 'unsigned char *__restrict') for 2nd argument
_C4_DEFINE_TO_FROM_CHARS_TOA( uint8_t, u)
^
/jenkins/workspace/fonline_dev/ThirdParty/rapidyaml/extern/c4core/src/c4/to_chars.hpp:627:13: note: expanded from macro '_C4_DEFINE_TO_FROM_CHARS_TOA'
inline bool from_chars(csubstr buf, ty *C4_RESTRICT v)          \
            ^
/jenkins/workspace/fonline_dev/ThirdParty/rapidyaml/extern/c4core/src/c4/to_chars.hpp:740:1: note: candidate function not viable: no known conversion from 'size_t *' (aka 'unsigned long *') to 'uint16_t *__restrict' (aka 'unsigned short *__restrict') for 2nd argument
_C4_DEFINE_TO_FROM_CHARS_TOA(uint16_t, u)
^
/jenkins/workspace/fonline_dev/ThirdParty/rapidyaml/extern/c4core/src/c4/to_chars.hpp:627:13: note: expanded from macro '_C4_DEFINE_TO_FROM_CHARS_TOA'
inline bool from_chars(csubstr buf, ty *C4_RESTRICT v)          \
            ^
/jenkins/workspace/fonline_dev/ThirdParty/rapidyaml/extern/c4core/src/c4/to_chars.hpp:741:1: note: candidate function not viable: no known conversion from 'size_t *' (aka 'unsigned long *') to 'uint32_t *__restrict' (aka 'unsigned int *__restrict') for 2nd argument
_C4_DEFINE_TO_FROM_CHARS_TOA(uint32_t, u)
^
/jenkins/workspace/fonline_dev/ThirdParty/rapidyaml/extern/c4core/src/c4/to_chars.hpp:627:13: note: expanded from macro '_C4_DEFINE_TO_FROM_CHARS_TOA'
inline bool from_chars(csubstr buf, ty *C4_RESTRICT v)          \
            ^
/jenkins/workspace/fonline_dev/ThirdParty/rapidyaml/extern/c4core/src/c4/to_chars.hpp:742:1: note: candidate function not viable: no known conversion from 'size_t *' (aka 'unsigned long *') to 'uint64_t *__restrict' (aka 'unsigned long long *__restrict') for 2nd argument
_C4_DEFINE_TO_FROM_CHARS_TOA(uint64_t, u)
^
/jenkins/workspace/fonline_dev/ThirdParty/rapidyaml/extern/c4core/src/c4/to_chars.hpp:627:13: note: expanded from macro '_C4_DEFINE_TO_FROM_CHARS_TOA'
inline bool from_chars(csubstr buf, ty *C4_RESTRICT v)          \
            ^
/jenkins/workspace/fonline_dev/ThirdParty/rapidyaml/extern/c4core/src/c4/to_chars.hpp:773:13: note: candidate function not viable: no known conversion from 'size_t *' (aka 'unsigned long *') to 'bool *__restrict' for 2nd argument
inline bool from_chars(csubstr buf, bool * C4_RESTRICT v)
            ^
/jenkins/workspace/fonline_dev/ThirdParty/rapidyaml/extern/c4core/src/c4/to_chars.hpp:804:13: note: candidate function not viable: no known conversion from 'size_t *' (aka 'unsigned long *') to 'char *__restrict' for 2nd argument
inline bool from_chars(csubstr buf, char * C4_RESTRICT v)
            ^
/jenkins/workspace/fonline_dev/ThirdParty/rapidyaml/extern/c4core/src/c4/to_chars.hpp:833:13: note: candidate function not viable: no known conversion from 'size_t *' (aka 'unsigned long *') to 'c4::csubstr *__restrict' (aka 'basic_substring *__restrict') for 2nd argument
inline bool from_chars(csubstr buf, csubstr *C4_RESTRICT v)
            ^
/jenkins/workspace/fonline_dev/ThirdParty/rapidyaml/extern/c4core/src/c4/to_chars.hpp:862:13: note: candidate function not viable: no known conversion from 'size_t *' (aka 'unsigned long *') to 'c4::substr *__restrict' (aka 'basic_substring *__restrict') for 2nd argument
inline bool from_chars(csubstr buf, substr * C4_RESTRICT v)
            ^
1 error generated.
make[3]: *** [ThirdParty/rapidyaml/CMakeFiles/ryml.dir/src/c4/yml/tree.cpp.o] Error 1

Cannot compile or install on MacOS

Platform OsX, IDE CLion, CMake minimum version 3.16

Here are the logs:

/Applications/CLion.app/Contents/bin/cmake/mac/bin/cmake -DCMAKE_BUILD_TYPE=Debug -G "CodeBlocks - Unix Makefiles" /Users/lalit/CLionProjects/automator
-- ryml: using C++ standard: C++11
-- ryml: setting global C++ standard: 11
-- ryml: importing subproject c4core (SUBDIRECTORY)... /Users/lalit/clibs/rapidyaml/ext/c4core
-- c4core: using C++ standard: C++11
-- c4core: setting global C++ standard: 11
CMake Error at /Users/lalit/clibs/rapidyaml/ext/c4core/cmake/c4Project.cmake:1843 (message):
  not implemented
Call Stack (most recent call first):
  /Users/lalit/clibs/rapidyaml/ext/c4core/CMakeLists.txt:76 (c4_install_exports)

here is the CMakeLists.txt:

cmake_minimum_required(VERSION 3.16)
project(automator)

set(CMAKE_CXX_STANDARD 14)

add_library(rapidyaml a.cpp b.cpp)
add_subdirectory(/Users/lalit/clibs/rapidyaml ryml)
target_link_libraries(rapidyaml PUBLIC ryml)

add_executable(automator main.cpp Docs.cpp Docs.h)

target_link_libraries(automator rapidyaml)

Thanks :)

Support FIND_PACKAGE cmake command

For now in order to include directories I need in my CMakeFile.txt to do

INCLUDE_DIRECTORIES("3d_party/linux/rapidyaml/src")
INCLUDE_DIRECTORIES("3d_party/linux/rapidyaml/ext/c4core/src")
INCLUDE_DIRECTORIES("3d_party/linux/rapidyaml/ext/c4core/ext")

even for very simple use case scenario. But this does not work:

LIST(APPEND CMAKE_PREFIX_PATH "3d_party/linux/rapidyaml")
FIND_PACKAGE(ryml REQUIRED)

It simply cannot find some cmake configs.

Bundling of libyaml and yaml-cpp?

While I understand the bundling of googletest, I'm not sure everyone who clones this for inclusion in a project might want yaml-cpp and libyaml too. It might be better to have a separate git repo that contains rapidYAML + libyaml + yaml-cpp for performance / correctness comparisons.

Or it might not, really. I don't know that much about your code and modus operandi so feel free to disregard this if it doesn't make much sense.

In situ parsing

Is parsing a YAML buffer in-situ a use case you plan to support? rapidJSON has that feature and it's great from a performance standpoint.

c4_install_exports(DEPENDENCIES c4core) error

Ubuntu 18.04.1 64bit
commit 6c7a241

ryml: importing subproject c4core (SUBDIRECTORY)... /home/dev/100/projects/myproject/cooler/third_party/rapidyaml/extern/c4core
ryml: -----> target ryml PUBLIC incorporating lib c4core
CMake Error at third_party/rapidyaml/extern/c4core/cmake/c4Project.cmake:1489 (file):
  file attempted to create a directory:
  /home/dev/100/projects/myproject/cooler/third_party/rapidyaml/export_cases/lib//cmake/ryml
  into a source directory.
Call Stack (most recent call first):
  third_party/rapidyaml/extern/c4core/cmake/c4Project.cmake:1570 (__c4_install_exports)
  third_party/rapidyaml/CMakeLists.txt:71 (c4_install_exports)

ryml::parse complains about ambiguous call, reading file

Trying to use your example, I receive the following error:

"call of overloaded 'parse(char [7])' is ambiguous"

char str[] = "{test}";
ryml::Tree tree = ryml::parse(str);

I also tried loading a file:

std::ifstream in(path);
std::string contents((std::istreambuf_iterator<char>(in)), std::istreambuf_iterator<char>());
char* s = &contents[0];
ryml::Tree tree = ryml::parse(s);

But parse does not accept one argument.

Do you have an easy example for loading a large file (75,6 mb) including UTF-8 characters and iterate over it? yaml-cpp takes like 10+ seconds to read the file.

incorrect indentation causes ryml to stop

Yep, which is why I mentioned it. It's invalid YAML but for some reason rather than error() being called (and my custom error handler throwing an exception) the entire program crashes. It's odd because other parse errors do result in the error handler being called.

Originally posted by @leoetlino in #34 (comment)

parse error when have space(s) BEFORE colon (':') char

if have space BEFORE colon, ex:

keyA :
    keyB : test value

the ryml::parse() will fail, but I saw many web/valid_tools allow the formatting :
* example data : https://yaml.org/start.html
* JS-YAML demo. YAML JavaScript parser : http://nodeca.github.io/js-yaml/
* Online YAML Parser : http://yaml-online-parser.appspot.com/
* Minify YAML - Online YAML Tools : https://onlineyamltools.com/minify-yaml

I don't know which one is correct,
or could you provide a options to decide ?

Release plans?

Thanks for the interesting work done on rapidyaml. We are looking at available options for C++ yaml readers. Are there any plans for an upcoming release, even if it's not stable (e.g. 0.1) ? Thanks.

crash on tree destruction

I want to get changed nodes on config reload.

static std::unique_ptr<c4::yml::Tree> _tree;
    ...

auto new_tree = std::make_unique<c4::yml::Tree>();
c4::yml::parse(c4::csubstr{_yml_content.data(), size}, new_tree.get());
if (delta)
    diff(_tree ? _tree->rootref() : c4::yml::NodeRef{}, new_tree->rootref(), *delta);
_tree.swap(new_tree);

On _tree destruction i get SIGABRT in m_alloc.free(m_buf, m_cap * sizeof(NodeData));
Is there any way to make it work?

Preserving scalar type information when emitting

(I hope you'll forgive me for opening three issues in a row...)

Consider the following document:

- actorType: WeaponSmallSword
  actors:
  - {name: Weapon_Sword_031, plus: -1, value: 15.0}
  - {name: Weapon_Sword_031, plus: 0, value: 20.0}
  - {name: Weapon_Sword_031, plus: 1, value: 24.0}
  not_rank_up: false
  series: DragonRoostSeries
# ...

Emitting it with ryml gives the following:

  -
    actorType: WeaponSmallSword
    actors:
      -
        name: Weapon_Sword_031
        plus: '-1'
        value: 15
      -
        name: Weapon_Sword_031
        plus: 0
        value: 20
      -
        name: Weapon_Sword_031
        plus: 1
        value: 24
    not_rank_up: 0
    series: DragonRoostSeries

Ignoring the strange newlines and the different flow styles (#29), something that's problematic for loading ryml-emitted documents with PyYAML -- or any other parser that relies on detecting types -- is that value types don't seem to be preserved.

In particular, -1 (an integer) gets turned into '-1', which would cause any such parser to load the value as a string and not as an integer.

The value that is associated with not_rank_up, which is supposed to be a boolean, seems to have been implicitly converted to an integer (0). Other than looking somewhat worse, this causes parsers to load it as an integer and not as a boolean.

Something similar happens for floats that also happen to be integers; 24.0 is emitted as 24, which causes the value to be loaded as an integer and not as a float.

I've tried overloading the c4::yml::write function for bool, but that doesn't seem to work as it's not called. More generally, do you have any ideas or suggestions for getting output that matches the existing document?

emit JSON

There is no JSON emitter as far as I can tell. Are there any plans to add one? It should be a minimal addition, as far as I can tell.

Failure to parse YAML

I have some YAML code generated by yaml-cpp and according to yamllint.com, it is valid, but while yaml-cpp can parse it just fine rapidYAML seems to invoke the error handler when asked to parse it. A repro case is attached.

ryaml-repro.zip

Weird segfault when parsing a map in a sequence in some cases

The following document causes ryml to simply crash when parsing:

- 'a': 1
  b: 2

These ones work, though:

- a: 1
  b: 2
- b: 2
  'a': 1
- b: 2
  'a': 1
  c: 3
'a': 1
b: 2
b: 2
'a': 1

Using the inline flow style for the map in the problematic document also works:

- {'a': 1, b: 2}

It looks like something goes wrong when parsing a map in a sequence, but only if the first key-value pair has a quoted key, and only if the map uses the block style.

Serialize to JSON

It seems it is possible to do this, but neither this:
ryml::Tree tree = ryml::parse(c4::to_csubstr(str.c_str()));
std::ostringstream jsonStream;
jsonStream << ryml::as_json(node);

nor this works for me:
ryml::NodeRef node = tree["parameters"];
std::ostringstream jsonStream;
jsonStream << ryml::as_json(node);

The first is failing with "no doc processing for JSON", the second is with "_expected true: p(node)->is_map()".

Am I doing something incorrectly?

Thanks

Observations during our review

I am working with the open source htm-community for numenta, working on AI software algorithms. Our software is a set of Python apps with an C++ extension library. We build packages for this library for Windows (MSVC 2017), Linux (Ubuntu), and OSx using CMake. Working with at least C++11 compilers. Our CI builds and runs unit tests for each platform. One of those also builds and tests in debug mode to make sure we don't miss any errors that this might uncover.

We use Yaml to specify algorithm parameters. We have been using yaml-cpp but it can have an issue in debug mode on ubuntu so we were looking for an alternative. The features you list and the speed measurements were exactly what we were looking for. I made a list of the issues that we found while evaluating RapidYaml.

  • We do automatic downloads of dependencies as part of the install and found that we had to download 4 separate files and make sure they were stuck in the right place in the source tree. I noticed that you had links in there but the master.zip downloaded from github does not follow those. I recommend that you consolidate these into one repository.
  • You do not have a released version so we had to download directly from the master. The master will change and when it does you could break our release. If you had a release, even if not complete, would allow our release to remain stable while we verify your next release still works with our code.
  • The downloaded package required 3 separate include directories. I figured out what they needed to be but some documentation would have been helpful. Combining into one repository would help reducing this to one Include Directory.
  • No convenient way to convert between std::string and your internal string format. I understand why you use the internal string format and that is cool because that is where you get your speed.
    But you might provide something like this for the rest of us that must use STL
// The convert to and from the special string types used by ryml
static  csubstr _S(const std::string &s) { return to_csubstr(s.c_str()); }
static std::string S_(csubstr s) { return std::string(s.data(), s.size()); }
  • We need an iterator (both const and non-const) for map and sequences. I could not get your begin(), end() to compile and could not find a cbegin(),cend(). I was able to write my own using the NodeRef::first_child(), last_child(), next_sibling(), etc. But you should check the iterators you provide.
  • Exceptions. This is the killer. We live and die by C++ exceptions and an exception does not necessarily mean that the program needs to die. For example, our unit tests contain negative as well as positive checks. So it will expect to find an exception in some cases. stopping at a break-point is not an option, even in debug mode. I was able to turn on exception handling in your library. It created the exception ok but then the program aborted. You also have a callback facility that can be called on an error. So I setup a callback handler and had that bootstrap back into C++ where it could throw an exception. That did not work either. I suspect that the internal structures my be in an undefined state at the point when the error was detected and the exception thrown. I tried this on Windows MSVC 2019 which I know to be the most picky about exceptions. I did not try this on the other two platforms.

So the conclusion is that we will have to wait until this package matures a little more and until we can be sure that when errors occur in your C code action is taken to insure that data structures are in a known state and a status code can be returned or a C++ std::exceptions can be thrown.

At this point, being thread-safe is not an issue but in the near future it may become important. I did not see any logic to insure thread safety in your code. It is also likely that we will need to have more than one Yaml tree open at the same time in one program. I did see a few globals that might cause problems in scenarios like this.

Our open-source repo is at https://github.com/htm-community/htm.core in case you are interested.

Parser doesn't accept space before colon

The ryml parser does not accept a space before the colon if the parent object is enclosed in braces.

#include <ryml.hpp>

int main()
{
    ryml::Tree t = ryml::parse("{a : b}");

    // This works:
    // ryml::Tree t = ryml::parse("a : b");
}

Output:

1:3: (2B):ERROR: ERROR parsing yml: parse error
line 1: '{a : b}' (sz=7)
           ^~~~~  (cols 3-8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.