Code Monkey home page Code Monkey logo

rapidjson's Introduction

RapidJSON logo

Release version

A fast JSON parser/generator for C++ with both SAX/DOM style API

Tencent is pleased to support the open source community by making RapidJSON available.

Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.

Build status

Linux Windows Coveralls
lin-badge win-badge cov-badge

Introduction

RapidJSON is a JSON parser and generator for C++. It was inspired by RapidXml.

  • RapidJSON is small but complete. It supports both SAX and DOM style API. The SAX parser is only a half thousand lines of code.

  • RapidJSON is fast. Its performance can be comparable to strlen(). It also optionally supports SSE2/SSE4.2 for acceleration.

  • RapidJSON is self-contained and header-only. It does not depend on external libraries such as BOOST. It even does not depend on STL.

  • RapidJSON is memory-friendly. Each JSON value occupies exactly 16 bytes for most 32/64-bit machines (excluding text string). By default it uses a fast memory allocator, and the parser allocates memory compactly during parsing.

  • RapidJSON is Unicode-friendly. It supports UTF-8, UTF-16, UTF-32 (LE & BE), and their detection, validation and transcoding internally. For example, you can read a UTF-8 file and let RapidJSON transcode the JSON strings into UTF-16 in the DOM. It also supports surrogates and "\u0000" (null character).

More features can be read here.

JSON(JavaScript Object Notation) is a light-weight data exchange format. RapidJSON should be in full compliance with RFC7159/ECMA-404, with optional support of relaxed syntax. More information about JSON can be obtained at

Highlights in v1.1 (2016-8-25)

For other changes please refer to change log.

Compatibility

RapidJSON is cross-platform. Some platform/compiler combinations which have been tested are shown as follows.

  • Visual C++ 2008/2010/2013 on Windows (32/64-bit)
  • GNU C++ 3.8.x on Cygwin
  • Clang 3.4 on Mac OS X (32/64-bit) and iOS
  • Clang 3.4 on Android NDK

Users can build and run the unit tests on their platform/compiler.

Installation

RapidJSON is a header-only C++ library. Just copy the include/rapidjson folder to system or project's include path.

Alternatively, if you are using the vcpkg dependency manager you can download and install rapidjson with CMake integration in a single command:

  • vcpkg install rapidjson

RapidJSON uses following software as its dependencies:

  • CMake as a general build tool
  • (optional) Doxygen to build documentation
  • (optional) googletest for unit and performance testing

To generate user documentation and run tests please proceed with the steps below:

  1. Execute git submodule update --init to get the files of thirdparty submodules (google test).
  2. Create directory called build in rapidjson source directory.
  3. Change to build directory and run cmake .. command to configure your build. Windows users can do the same with cmake-gui application.
  4. On Windows, build the solution found in the build directory. On Linux, run make from the build directory.

On successful build you will find compiled test and example binaries in bin directory. The generated documentation will be available in doc/html directory of the build tree. To run tests after finished build please run make test or ctest from your build tree. You can get detailed output using ctest -V command.

It is possible to install library system-wide by running make install command from the build tree with administrative privileges. This will install all files according to system preferences. Once RapidJSON is installed, it is possible to use it from other CMake projects by adding find_package(RapidJSON) line to your CMakeLists.txt.

Usage at a glance

This simple example parses a JSON string into a document (DOM), make a simple modification of the DOM, and finally stringify the DOM to a JSON string.

// rapidjson/example/simpledom/simpledom.cpp`
#include "rapidjson/document.h"
#include "rapidjson/writer.h"
#include "rapidjson/stringbuffer.h"
#include <iostream>

using namespace rapidjson;

int main() {
    // 1. Parse a JSON string into DOM.
    const char* json = "{\"project\":\"rapidjson\",\"stars\":10}";
    Document d;
    d.Parse(json);

    // 2. Modify it by DOM.
    Value& s = d["stars"];
    s.SetInt(s.GetInt() + 1);

    // 3. Stringify the DOM
    StringBuffer buffer;
    Writer<StringBuffer> writer(buffer);
    d.Accept(writer);

    // Output {"project":"rapidjson","stars":11}
    std::cout << buffer.GetString() << std::endl;
    return 0;
}

Note that this example did not handle potential errors.

The following diagram shows the process.

simpledom

More examples are available:

  • DOM API

  • SAX API

    • simplereader: Dumps all SAX events while parsing a JSON by Reader.
    • condense: A command line tool to rewrite a JSON, with all whitespaces removed.
    • pretty: A command line tool to rewrite a JSON with indents and newlines by PrettyWriter.
    • capitalize: A command line tool to capitalize strings in JSON.
    • messagereader: Parse a JSON message with SAX API.
    • serialize: Serialize a C++ object into JSON with SAX API.
    • jsonx: Implements a JsonxWriter which stringify SAX events into JSONx (a kind of XML) format. The example is a command line tool which converts input JSON into JSONx format.
  • Schema

    • schemavalidator : A command line tool to validate a JSON with a JSON schema.
  • Advanced

    • prettyauto: A modified version of pretty to automatically handle JSON with any UTF encodings.
    • parsebyparts: Implements an AsyncDocumentParser which can parse JSON in parts, using C++11 thread.
    • filterkey: A command line tool to remove all values with user-specified key.
    • filterkeydom: Same tool as above, but it demonstrates how to use a generator to populate a Document.

Contributing

RapidJSON welcomes contributions. When contributing, please follow the code below.

Issues

Feel free to submit issues and enhancement requests.

Please help us by providing minimal reproducible examples, because source code is easier to let other people understand what happens. For crash problems on certain platforms, please bring stack dump content with the detail of the OS, compiler, etc.

Please try breakpoint debugging first, tell us what you found, see if we can start exploring based on more information been prepared.

Workflow

In general, we follow the "fork-and-pull" Git workflow.

  1. Fork the repo on GitHub
  2. Clone the project to your own machine
  3. Checkout a new branch on your fork, start developing on the branch
  4. Test the change before commit, Make sure the changes pass all the tests, including unittest and preftest, please add test case for each new feature or bug-fix if needed.
  5. Commit changes to your own branch
  6. Push your work back up to your fork
  7. Submit a Pull request so that we can review your changes

NOTE: Be sure to merge the latest from "upstream" before making a pull request!

Copyright and Licensing

You can copy and paste the license summary from below.

Tencent is pleased to support the open source community by making RapidJSON available.

Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.

Licensed under the MIT License (the "License"); you may not use this file except
in compliance with the License. You may obtain a copy of the License at

http://opensource.org/licenses/MIT

Unless required by applicable law or agreed to in writing, software distributed 
under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR 
CONDITIONS OF ANY KIND, either express or implied. See the License for the 
specific language governing permissions and limitations under the License.

rapidjson's People

Contributors

abolz avatar alberthunggarmin avatar ardb-uk avatar bluehero avatar chwarr avatar corporateshark avatar drewnoakes avatar efidler avatar escherstair avatar esrrhs avatar jollyroger avatar kaitohh avatar kosta-github avatar lelit avatar lichray avatar lukedan avatar miloyip avatar mloskot avatar pah avatar pkasting avatar smhdfdl avatar spl avatar stilescrisis avatar sylveon avatar thebusytypist avatar thedrow avatar womsersap avatar yachoor avatar ylavic avatar yurikhan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rapidjson's Issues

Inconsistent Values

Hi,
I am writing a program using RapidJson.

I loop through a (rather large) file, parsing it using getline first one line at a time, expecting one json obj per line as a string char *line.
Within the body of my loop, I invoke the following:

Document d;
doc.Parse(line);
float receive_val = (float) doc["receive"].GetDouble();

However if I invoke within the loop:

printf("%f\n", receive_val);

I get a wildly different float value than if I call:

printf("%f\n", (float) doc["receive"].GetDouble());

The latter way produces the correct value as reflected in the file.

Attempting to reproduce the problem in a new standalone test file with a specific json input file, I discovered both methods were returning the wrong value in my test. I found this also occurred with GetString() when I changed my json field to string values.

Any intuition or insight on why this is occurring and how to work around it would be appreciated.

Thanks!

Running with this commit
2e0b3de
On Mac OS X 10.9
Compiling with g++

Research on using custom stack instead of call-stack in Reader

Currently, ParseObject(), ParseArray() and ParseValue() will recursively call other parse functions. This has potential stack overflow problem if a JSON tree is very deep (maybe a synthesized JSON for security attack).

  • Research changing these recursive call using custom stack.
  • Evaluate the performance impact
  • May add a configurable limit of tree depth.

Writer/PrettyWriter Conversion failes

Hi,
I discovered a tiny "issue" with the conversion of the PrettyWriter to Writer base class with default allocator template argument.

GenericStringBuffer<UTF8<>> buffer;
Writer<GenericStringBuffer<UTF8<>>, UTF16<>>* writer = new PrettyWriter<GenericStringBuffer<UTF8<>>, UTF16<>>(buffer);

Visual Studio error description:
a value of type "rapidjson::PrettyWriterrapidjson::GenericStringBuffer<rapidjson::UTF8<char, rapidjson::CrtAllocator>, rapidjson::UTF16<wchar_t>, rapidjson::UTF8, rapidjson::MemoryPoolAllocatorrapidjson::CrtAllocator> *" cannot be used to initialize an entity of type "rapidjson::Writerrapidjson::GenericStringBuffer<rapidjson::UTF8<char, rapidjson::CrtAllocator>, rapidjson::UTF16<wchar_t>, rapidjson::UTF8, rapidjson::CrtAllocator> *"

The difference is the last template default parameter.
Writer -> rapidjson::CrtAllocator
PrettyWriter -> rapidjson::MemoryPoolAllocatorrapidjson::CrtAllocator

Probably because of this change in "writer.h":
change history: "Change Reader/Writer's stack allocator to CrtAllocator"
id: 2e23787

Clang -std=gnu++11: error: use of overloaded operator '!=' is ambiguous

Using -std=gnu++11 when building RapidJSON using Clang results in build errors like this:

In file included from ../../test/unittest/valuetest.cpp:22:
../../include/rapidjson/document.h:646:76: error: use of overloaded operator
      '!=' is ambiguous (with operand types 'const
      GenericValue<rapidjson::UTF8<char>,
      rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator> >' and 'const
      GenericValue<rapidjson::UTF8<char>,
      rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator> >')
  ...== rhs.MemberEnd() || lhsMemberItr->value != rhsMemberItr->value)
                           ~~~~~~~~~~~~~~~~~~~ ^  ~~~~~~~~~~~~~~~~~~~

Access violation when save json via Document::Accept().

My function is like this, it load data form msgpack and try to save as JSON.

void Data::load_msgpack(std::string& buf)
{
    msgpack::unpacked msg;
    msgpack::unpack(&msg, buf.data(), buf.size());

    rapidjson::Document doc;

    msg.get().convert(static_cast<rapidjson::Value*>(&doc));
    using namespace rapidjson;

    FILE* fp = fopen("D:/Users/xpol/Desktop/Dragon.mpack.json", "wb"); // non-Windows use "w"
    char writeBuffer[65536];
    FileWriteStream os(fp, writeBuffer, sizeof(writeBuffer));
    Writer<FileWriteStream> writer(os);
    doc.Accept(writer);
    fclose(fp);
}

The line msg.get().convert(static_cast<rapidjson::Value*>(&doc)); compiled because i have write an adapter for rapidjson::Value.

When the doc.Accept(writer) called. I got the follow exception:

First-chance exception at 0x000ED9A8 in MyGame.exe: 0xC0000005: Access violation reading location 0xFEEEFEEE.
Unhandled exception at 0x000ED9A8 in MyGame.exe: 0xC0000005: Access violation reading location 0xFEEEFEEE.

Here is the stack trace:

rapidjson::GenericStringStream<rapidjson::UTF8<char> >::Peek() Line 443 C++
rapidjson::Writer<rapidjson::FileWriteStream,rapidjson::UTF8<char>,rapidjson::UTF8<char>,rapidjson::CrtAllocator>::WriteString(const char * str, unsigned int length) Line 252  C++
rapidjson::Writer<rapidjson::FileWriteStream,rapidjson::UTF8<char>,rapidjson::UTF8<char>,rapidjson::CrtAllocator>::String(const char * str, unsigned int length, bool copy) Line 126    C++
rapidjson::GenericValue<rapidjson::UTF8<char>,rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator> >::Accept<rapidjson::Writer<rapidjson::FileWriteStream,rapidjson::UTF8<char>,rapidjson::UTF8<char>,rapidjson::CrtAllocator> >(rapidjson::Writer<rapidjson::FileWriteStream,rapidjson::UTF8<char>,rapidjson::UTF8<char>,rapidjson::CrtAllocator> & handler) Line 1299  C++
...
...
...

I have tried StringBuffer it also crashes at rapidjson::GenericStringStream<...>::Peek().

add real C++11 move semantics support

Add real C++11 move semantics support; e.g., for Value:

Value::Value(Value&& other) noexcept { ... }
Value& Value::operator=(Value&& other) noexcept { ... }

And also for moving a std::string into a Value:

Value& Value::SetString(std::string str) {
    internal_data.str = std::move(str); // or something equivalent...
    return *this;
}

With this you would be able to move strings into a Value without copying them again like this:

std::string hello = ...;
val.SetString(std::move(hello)); // string is moved, not copied!

Terminate parsing from SAX handler

Currently there is no way to terminate the parsing process in GenericReader::ParseStream() from a handler.

I suggest that each handler function (e.g. StartObject(), String()) should return a bool, indicating if the event has been consumed successfully. If the return value is false, the parsing process should be terminated immediately, stop further reading character from the stream.

Document does not requires this feature because it always able to consume the event and build the DOM. However, custom SAX handler which consumes JSON for specific schema need this.

Without this feature, the handler needs to keep track of error state and skip events after any error occurred. This wastes processing power and reduces performance due to error checking overheads in every handler function.

Another situation is when directing Reader to Writer or Document to Writer, if the Writer fails to write (e.g. disk full), the source object will continue producing SAX events.

Adding this feature will change the API of handler concept though.

Update compliancy to current RFC7159

RapidJSON currently implements RFC4627, which has been superseeded by RFC7159 in March 2014.

Among mostly editorial changes, the most important update is:

  • Changed the definition of "JSON text" so that it can be any JSON
    value, removing the constraint that it be an object or array.
    (This effectively closes #21 and issue80.)

RapidJSON seems to be compliant with the other changes already, AFAICS.

If you want to keep strict RFC4627-compliancy available, we could add a RAPIDJSON_RFC4627 macro to enforce the object/array top-level requirement. An AcceptAnyRoot API as proposed by #21 is then no longer needed.

SIMD instruction can cause seg-fault/UMR

https://code.google.com/p/rapidjson/issues/detail?id=104

According to Agner's SSE A_strlen() implementation (http://www.agner.org/optimize/asmlib.zip),

; Misaligned parts of the string are read from the nearest 16-bytes boundary
; and the irrelevant part masked out. It may read both before the begin of 
; the string and after the end, but will never load any unnecessary cache 
; line and never trigger a page fault for reading from non-existing memory 
; pages because it never reads past the nearest following 16-bytes boundary.
; It may, though, trigger any debug watch within the same 16-bytes boundary.

So it should be possible to modify RapidJSON's SkipWhitespace_SIMD() to make it work safely. Currently it read 16 bytes unaligned from start of string, which will be possible to read bytes past the end of page boundary and cause segmentation fault.

Please add license headers to source files

In order to safely track the copyright origins of the source files, e.g. for inclusion in third-party libraries, the source files should carry individual license headers. Something like:

// This file is part of RapidJSON, a fast JSON parser/generator.
// http://miloyip.github.io/rapidjson
// 
// Copyright (C) 2011-2014 Milo Yip and all Contributors
// 
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
// 
// The above copyright notice and this permission notice shall be included in
// all copies or substantial portions of the Software.
// 
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
// THE SOFTWARE.

Especially the optimized algorithms for itoa (#80) and dtoa (#97), which are based on third-party implementations, should mention their origins (and licenses, if different) correctly to avoid licensing issues.

Building on Mac OS X: error: 'tmpnam' is deprecated

Building in a rather standard Mac OS X 10.9.4 install gives the following error message:

In file included from ../../test/unittest/documenttest.cpp:21:
../../test/unittest/unittest.h:71:16: error: 'tmpnam' is deprecated: This
      function is provided for compatibility reasons only. Due to security
      concerns inherent in the design of tmpnam(3), it is highly recommended
      that you use mkstemp(3) instead. [-Werror,-Wdeprecated-declarations]
    filename = tmpnam(filename);
               ^
/usr/include/stdio.h:274:7: note: 'tmpnam' declared here
char    *tmpnam(char *);
         ^
1 error generated.

Adding the compiler flag -Wno-deprecated-declarations gets past this (but the deprecation warning might also be worth following).

Member query performance

From the beginning of RapidJSON, performance of FindMember() is O(n). I personally also jot this down a long time ago.

https://code.google.com/p/rapidjson/issues/detail?id=5&can=1&q=performance

However, today I got e-mail complaining on this again.

I cannot think of a good solution for this. But I would like to write down the thoughts here for public discussion.

String equality test

When querying member by a key, it must involve string equality test operation.

Currently the string equality test is implemented as:

bool GenericValue::StringEqual(const GenericValue& rhs) const {
    return data_.s.length == rhs.data_.s.length &&
        (data_.s.str == rhs.data_.s.str // fast path for constant string
        || memcmp(data_.s.str, rhs.data_.s.str, sizeof(Ch) * data_.s.length) == 0);
}

There are three condition checks:

  1. Strings are unequal if their lengths are unequal. O(1) because length is specified or pre-calculated.
  2. Constant string shortcut for strings pointed to same address (and have same length). O(1)
  3. Compare memory for O(m), where m is the length of both strings. The worst case is when two strings are equal.

An idea for improvement is to hash the string into a hash code. I noticed that it is possible to add 32-bit hash code without additional memory overhead, and have considered this in the design of the DOM:

class GenericValue {
// ...
    struct String {
        const Ch* str;
        SizeType length;
        unsigned hashcode;  //!< reserved
    };  // 12 bytes in 32-bit mode, 16 bytes in 64-bit mode
};

The hashcode can be evaluated during parsing with an new option. And then a new variation of SetString() or constructor can evaluate the hash code of a string. Finally, use that string to query member. We may initialize hashcode = 0 to represent an invalid hash code. Then the code may become:

bool GenericValue::StringEqual(const GenericValue& rhs) const {
    if (data_.s.length != rhs.data_.s.length) return false;
    if (data_.s.str == rhs.data_.s.str) return true;  // fast path for constant string
    if (data_.s.hashcode == 0 || rhs.data_.s.hashcode == 0 || data_.s.hashcode == rhs.data_.s.hashcode)
        return memcmp(data_.s.str, rhs.data_.s.str, sizeof(Ch) * data_.s.length) == 0;
    return false;
}

Although the worst case of the test is still O(m), the test can be finished in O(1) when two strings are unequal (thus their hash codes are unequal).

For member query, most string equal tests should be false, so this may improve performance. However, there will be additional O(m) costs for evaluating hash code of each string initially.

Associative Array

A JSON object is basically an associative array, which maps string keys to JSON values.
Currently members are stored in a std::vector like manner. Order of members are controlled by user, which depends on the sequence of AddMember() or the JSON being parsed. This is actually an important feature when user want to maintain the order of members.

However, of course, the query time is O(n) without additional data structure.

There are other possibilities for representing associative array:

  1. Sorted array.

    After parsing an object, the members are sorted by their keys by O(n log n). Binary search can be used in query, improving query time to O(log n). Adding new member needs sorting again later (amortized O(log n)). No additional space overhead.

    Besides, note that this requires comparison of string lexicographically (not equality), which is an O(m) operation for m equals to minimal length of two strings.

  2. Hash table.

    The simple way is using open addressing, so that it will still be a single buffer. Hash code can be computed as in the last section. Insert, query and remove is O(1). When adding new member and the load ratio (member count divided by capacity) is over a constant, the hash table need to be rebuild by O(n). In addition, iterating members will be linear to the capacity, not the count.

    Current (Vector) Sorted Array Hash Table
    Custom Order Yes No No
    Sorted by Key May be Yes No
    Initialization O(n) O(n log n) O(n)
    AddMember Amortized O(1) Amortized O(log n) Amortized O(1)
    RemoveMember O(n) O(n) O(1)
    FindMember O(n) O(log n) O(1)
    IterateMember O(n) O(n) O(capacity)
    Resize O(n) O(n) O(capacity)
    ObjectEquality O(n^2) O(n) O(n)
    Space O(capacity) O(capacity) O(capacity)
  • n = number of member
  • The hidden m (string length of key) is not shown.
  • In hash table, n <= capacity * load_ratio. In others, n <= capacity.

If we want to support either or all of these data structures, we have two options: (1) use template parameter to specify which to use in compile-time; or (2) use flags to specify in run-time with overheads in all related operations.

Move performance tests into another repository

Performance tests, libraries and data are not needed for most users.

I suggest moving them to another repository, which dedicated for comparing performance among C/C++ JSON libraries. Let everyone can add.or update libraries.

The framework needed to be revised to get more precise results. And need to generate tables/charts automatically.

I think benchmarks are not just numbers, but it helps developers to learn from each other and improve the code. It's precious in open source community.

Float point reading is lossy.

The float point value 0.9868011474609375 become 0.9868011474609376.

If it is a performance tradeoff, I think we should provide an option to user that prefers correctness over speed.

Use AutoUTFInputStream with StringStream

I need to support different encoded JSON file but finally output UTF16 GenericDocument.

I can't use AutoUTFInputStream with FileReadStream directly because rapidjson is not able to filter the javascript comment in the file. So I use another library to trim out the comment and then parse stream back into GenericDocument.

There is pseudo code below:

rapidjson::GenericDocument<rapidjson::UTF16<> > doc;

std::string input;

// ...
// Read the input from file
// Use another library to minified the JSON string 'input'
// ...

rapidjson::StringStream ss(input.c_str());

// error C2039: 'Peek4' : is not a member of 'rapidjson::GenericStringStream<Encoding>'
rapidjson::AutoUTFInputStream<unsigned, rapidjson::StringStream> ais(ss);

doc.ParseStream<0, rapidjson::AutoUTF<unsigned>, rapidjson::AutoUTFInputStream<unsigned, rapidjson::StringStream> >(ais);

It generates error C2039 as comment above while compile.

Finally, I add the Peek() function to GenericStringStream like FileReadStream.

const Ch* Peek4() const {
    return (src_ + 4 <= src_ + strlen(src_)) ? src_ : 0;
}

It works!!

However, will there be another way without adding custom code to do so? or in other word, am I misunderstanding the usage of AutoUTFInputStream?

Writer object should be reusable and do validation by assertion

Writer should always produce well-formed JSON. Invalid sequence of events should be detected via assertion. However, currently generating two or more JSONs into output stream does not generate assertion fail. For example,

Writer<Stream> writer(stream);
writer.StartObject();
writer.EndObject();
writer.StartObject();
writer.EndObject();

Will output {}{} to the stream.

I suggest that an explicitly reset API should be added:

Writer<Stream> writer(stream);
writer.StartObject();
writer.EndObject();
writer.Reset(newStream);
writer.StartObject();
writer.EndObject();

So that, error of multiple root elements can be detected, and also Writer object can be reused. The advantage of reusing an Writer object is to prevent reallocating memory for Writer's internal data structure (basically a stack).

Redo the documentation

IMHO, rapidjson has many good features and characteristics (fast, small, easy integration, ...) but it seems not so popular. I think documentation is a main issue. I hope we can do it better.

The old documentation includes:

Issues:

  • The user guide is incomplete.
  • The writing style is quite verbose and uninteresting.
  • User need to dig into details to do some basics stuff.

I suggest to redo the documentation from ground up. There are several decisions to be made:

  • What format should be used (Markdown?)
  • What tools can be used (https://readthedocs.org/ ?)
  • Can the tools generate API automatically?
  • What kind of documentation is needed? How to structure them?

I hope we can make some decision and create a rough plan. And then we do it step-by-step.

Welcome for suggestions.

Allow Reader to read only one/first JSON object

Slightly related to #66, a Reader instance should be configurable to read only one (i.e. the first) JSON object from a stream. This enables the extraction of JSON from mixed-content files and/or reusing the stream for multiple operations on a sequence of JSON objects.

Currently, the Reader creates an error (kParseErrorDocumentRootNotSingular), if the stream contains anything other than whitespace after the JSON object.

Also mentioned/requested in Google Code issue:
https://code.google.com/p/rapidjson/issues/detail?id=20#c2

This is also partly related to #36, as it may affect the error handling API.

provide a sample handler calculating a SHA1 hash for a document

I was playing a bit with rapidjson and came up with this simple Handler for calculating a SHA1 hash value for a Document (with a little bit of a help from boost::uuids::detail::sha1):

#include <boost/uuid/sha1.hpp>

struct HashCalculator {
  using namespace rapidjson;

  boost::uuids::detail::sha1 sha1;

  template<typename T>
  inline bool add_to_hash(T&& t) { return add_to_hash(&t, sizeof(t)); }
  inline bool add_to_hash(const void* str, size_t len) { sha1.process_bytes(str, len); return true; }

  inline std::string get_hash() {
    unsigned int digest[] = { 0, 0, 0, 0, 0 };
    sha1.get_digest(digest);
    return base64_encode( // use a base64 encoding lib at your liking...
      reinterpret_cast<const unsigned char*>(digest), sizeof(digest)
    );
  }

  inline bool Null()                                         { return add_to_hash('N'); }
  inline bool Bool(bool b)                                   { return add_to_hash(b ? 'T' : 'F'); }
  inline bool Int(int i)                                     { return add_to_hash(i); }
  inline bool Uint(unsigned u)                               { return add_to_hash(u); }
  inline bool Int64(int64_t i)                               { return add_to_hash(i); }
  inline bool Uint64(uint64_t u)                             { return add_to_hash(u); }
  inline bool Double(double d)                               { return add_to_hash(d); }
  inline bool String(const char* str, SizeType length, bool) { return add_to_hash(str, length); }
  inline bool StartObject()                                  { return add_to_hash('O'); }
  inline bool Key(const char* str, SizeType length, bool)    { return add_to_hash(str, length); }
  inline bool EndObject(SizeType memberCount)                { return add_to_hash(memberCount); }
  inline bool StartArray()                                   { return add_to_hash('A'); }
  inline bool EndArray(SizeType elementCount)                { return add_to_hash(elementCount); }
};

std::string calculate_hash(rapidjson::Document const& doc_) {
  HashCalculator hasher;
  doc_.Accept(hasher);
  return hasher.get_hash();
}

Maybe you find that useful and want to provide as an additional example?

Push mirror of SVN repository to GitHub?

There are a bunch of rapidjson forks available on GitHub already, each with a different set of fixes and with differing histories. See for instance (skipped the ones without any additional fixes/features beyond the current trunk ):

Is there any chance to merge those repositories to be based on a single, "official" SVN mirror here at GitHub? Even if the current upstream development of rapidjson seems to have slowed down, maybe a more collaborative model could help to keep the project alive and to join forces among the current users.

Thanks for your work on rapidjson!

Strange stream object copying while parsing

https://code.google.com/p/rapidjson/issues/detail?id=78

In a rapidjson reader class there's some strange code:

template<typename Stream>
void SkipWhitespace(Stream& stream)
{
    Stream s = stream;  // Use a local copy for optimization
    while (s.Peek() == ' ' || s.Peek() == '\n' || s.Peek() == '\r' || s.Peek() == '\t')
        s.Take();
    stream = s;
}

Why stream object is reassigned two times, what optimization this gives comparing to direct accessing it via reference? What if stream object copying is a heavy process by itself and also can cause an exception? What if I want to create stream class with non-copyable semantic? For me all of this is a clear pessimization instead of optimization.

Encoding conversion

From [email protected] on November 27, 2011 00:33:27

Currently, the input and output of Reader uses the same encoding.

It is often needed to read a stream of one encoding (e.g. UTF-8), and output string of another encoding (e.g. UTF-16). Or in the other way, stringify a DOM from one encoding (e.g. UTF-16) to an output stream of another encoding (e.g. UTF-8)

The most simple solution is converting the stream into a memory buffer of another encoding. This requires more memory storage and memory access.

Another solution is to convert the input stream into another encoding before sending it to the parser. However, only characters in JSON string type are really the ones necessary to be converted. Conversion of other characters just wastes time.

The third solution is letting the parser distinguish the input and output encoding. It uses an encoding converter to convert characters of JSON string type. However, since the output length may longer than the original length, in situ parsing cannot be permitted.

Try to design a mechanism to generalize encoding conversion. And it should support UTF-8, UTF-16LE, UTF-16BE, UTF-32LE, UTF-32BE. It can also support automatic encoding detection with BOM, while incurring some overheads in dynamic dispatching.

Original issue: http://code.google.com/p/rapidjson/issues/detail?id=4

Potential endianness detection problem

Upstream issue https://code.google.com/p/rapidjson/issues/detail?id=100

Currently Endianness is very important in Value because it uses this for aligning 32-bit and 64-bit integer, in order to implement "no overhead integer conversion".

This "optimization" depends on the RAPIDJSON_ENDIAN macro. Currently it has assumes little endian if gcc __BYTE_ORDER__ is not defined. This is error-prone.

Several potential fixes:

  1. Better endianness detection with more compilers.
  2. If unable to detect, fail to compile and ask user to define the macro explicitly.
  3. Automatic validation of endianness in compile-time or run-time.

Shall check how other cross-platform libraries deal with this issue.

Strict/Relaxed JSON syntax

Currently RapidJSON is tried to be as strict as possible to the JSON spec.
There are several requests in adding relaxed JSON syntax for parsing. Here lists some possible ones.

Root JSON type

  • A document with types other than object and array. (Done in #101)

Comment

  • Single line JS Comment (#443)
  • Multiple line JS Comment (#443)
  • Comment handler event
  • Comment stored in Value

Object

  • Key without quotes (limited to a sequence of confined character sets, e.g. (A-Za-z\-_)+[0-9A-Za-z\-_]*
  • extra comma at the end (#584)

Array

  • extra comma at the end (#584)

String

  • Single quote pair

Number

  • Fractions begins with ., e.g. .123
  • Number with '.' but without fraction digits, e.g. 0., 123.
  • Hexadecimal integer, e.g. 0xDEADBEEF
  • Infinity and NaN (#641)

These relaxed syntax may be useful for some applications. However, this may increase complexity of code base.

Would like to gather opinions on this.

Parse error for very small yet valid double numbers

Note - this is a different issue than the similar one solved last month:
https://code.google.com/p/rapidjson/issues/detail?id=75

I've found that numbers with an exponent smaller than -308 cause parse errors in rapidjson. This is weird since web browsers have no problem serializing these numbers during JSON.stringify. I've been trying to get to the bottom of what the various specs say about what the minimum representable number is.

Consider the following code:

JSON.stringify(Number.MIN_VALUE);

It outputs "5e-324" which is indeed the value of Number.MIN_VALUE.
According to:
http://en.wikipedia.org/wiki/Double-precision_floating-point_format
the minimum double is approx 4.9406564584124654e-324 and the maximum is approx 2.2250738585072009e308 and this is presumably what motivated the values of Number.MIN_VALUE and Number.MAX_VALUE.

There's the following code rapidjson's reader.h:

if (s.Peek() >= '0' && s.Peek() <= '9') {
    exp = s.Take() - '0';
    while (s.Peek() >= '0' && s.Peek() <= '9') {
        exp = exp * 10 + (s.Take() - '0');
        if (exp > 308)
            RAPIDJSON_PARSE_ERROR(kParseErrorNumberTooBig, s.Tell());
    }
}

Debugging my code shows that it's this exp > 308 that's catching numbers smaller than -308. So it looks like this is correct for +ve exponents but not correct for -ve exponents. Changing the code to the following seems to fix the issue in my code - does this look reasonable?

if (s.Peek() >= '0' && s.Peek() <= '9') {
    exp = s.Take() - '0';
    while (s.Peek() >= '0' && s.Peek() <= '9') {
        exp = exp * 10 + (s.Take() - '0');
        if (( (!expMinus) && (exp > 308) ) || (expMinus && (exp > 324) ))
            RAPIDJSON_PARSE_ERROR(kParseErrorNumberTooBig, s.Tell());
    }
}

Thanks!

Changing the Handler interface to differentiate between key and string

Would it be possible to introduce a slight interface breaking change to the Handler interface?

Consider this example from the SAX documentation:

{
    "hello": "world",
    "t": true ,
    "f": false,
    "n": null,
    "i": 123,
    "pi": 3.1416,
    "a": [1, 2, 3, 4]
}

which creates the following calling sequence for a Handler:

BeginObject()
String("hello", 5, true)
String("world", 5, true)
String("t", 1, true)
Bool(true)
String("f", 1, true)
Bool(false)
String("n", 1, true)
Null()
String("i", 1, true)
UInt(123)
String("pi", 2, true)
Double(3.1416)
String("a", 1, true)
BeginArray()
Uint(1)
Uint(2)
Uint(3)
Uint(4)
EndArray(4)
EndObject(7)

My problem with this is that the String() method is overloaded and used for two different purposes:

  1. for defining a string property, and
  2. for defining the key for a kay-value-pair inside of the currently active object.

Writing your own Handler needs to keep track of the alternating calling sequence of String() and X where X can be Bool(), UInt(), ... BeginObject(), BeginArray(), and also String() and this even for nested objects!

This could be improved and solved by using another additional method for specifying the key of a key-value-pair within an active object, .e.g:

bool Key(const char* k, SizeType length, bool copy);

The calling sequence from above would change to this:

BeginObject()
Key("hello", 5, true)
String("world", 5, true)
Key("t", 1, true)
Bool(true)
Key("f", 1, true)
Bool(false)
Key("n", 1, true)
Null()
Key("i", 1, true)
UInt(123)
Key("pi", 2, true)
Double(3.1416)
Key("a", 1, true)
BeginArray()
Uint(1)
Uint(2)
Uint(3)
Uint(4)
EndArray(4)
EndObject(7)

This would allow to write much simpler Handler implementations.

But I recognize that this would be a rather breaking API change...

AddMember() may be incorrectly used.

In upstream issue 66,

user misuse the AddMember() API:

doc.AddMember(lvl.c_str(), score, doc.GetAllocator());

lvl.c_str() is a temp string, it needs to be duplicated, otherwise when the function returns the pointer become invalid.

For this situation, the user should call

rapidjson::Value name(lvl.c_str(), doc.GetAllocator()); // copy the string
rapidjson::Value value(score);
doc.AddMember(name, value, doc.GetAllocator());

But it may be confusing for user.

All overloads of AddMember(const char* name,... ) assume the name being literal string, no need for making copies. But it may be misused like this.

Any suggestion to improve the API for DOM manipulation?

Reader.ParseNumberHandler fails on 32-bit

The test Reader.ParseNumberHandler fails on some 32-bit configurations both with GCC and Clang in release mode, see the build for #103:

[ RUN      ] Reader.ParseNumberHandler
../../test/unittest/readertest.cpp:170: Failure
Value of: h.step_
Actual: 0
Expected: 1u
Which is: 1
../../test/unittest/readertest.cpp:170: Failure
Value of: h.actual_
Actual: 0
Expected: 1E308
Which is: 1e+308
[ FAILED ] Reader.ParseNumberHandler (0 ms)

The Double handler is not even called in release mode for the corner case value of 10...0 (with 308 zeros), instead parsing (silently) fails with kParseErrorNumberTooBig.

Better compatibility of 64-bit integer literals

There are compiler differences in parsing 64-bit integer literals.

Some C++ compilers generate warning if ULL suffix is used, while some complaints the opposite.

https://code.google.com/p/rapidjson/issues/detail?id=10

GCC 4.2 on MacOS: integer constant is to large for 'long' type

The C standard way is to use UINT64_C() et al macros in <stdint.h>. But in C++ it needs to define __STDC_CONSTANT_MACROS=1 before including this header.

Currently RapidJSON includes this file in rapidjson.h. But It will become a problem if user firstly include <stdint.h> before including rapidjson.h. A workaround is to define the macro in compiler settings, which is quite annoying. See https://code.google.com/p/rapidjson/issues/detail?id=65#c8

I propose to remove the dependency to UINT64_C() macro, by defining some macros like:

#define RAPIDJSON_UINT64(high32, low32) ((uint64_t(high32) << 32) | (low32))

Migrate Premake to CMAKE

Due to late official release of premake, many problems cannot be solved easily. For example, VC2012/2013 are not supported yet, Maverick (OS X 10.9) has problem with the linking options, XCode for iOS, etc

I would like to migrate the build system to CMAKE, which is more active and widely used.

Compilation error on gcc 3.4.6

Hello,

I hit some trouble to use last version on old RedHat:

In file included from JavaScriptRuntime.cc:126:
/home/albert/albert/3albert/include/arch_independent/rapidjson/document.h: In destructor rapidjson::GenericValue<Encoding, Allocator>::~GenericValue()': /home/albert/albert/3albert/include/arch_independent/rapidjson/document.h:489: error: expected class-name before '(' token /home/albert/albert/3albert/include/arch_independent/rapidjson/document.h: In member functiontypename rapidjson::GenericMemberIterator<false, Encoding, Allocator>::Iterator rapidjson::GenericValue<Encoding, Allocator>::RemoveMember(typename rapidjson::GenericMemberIterator< false, Encoding, Allocator>::Iterator)':
/home/albert/albert/3albert/include/arch_independent/rapidjson/document.h:924: error: expected class-name before '(' token

Error come from:
m->~GenericMember();

gcc -v
Lecture des spécification à partir de /usr/lib/gcc/x86_64-redhat-linux/3.4.6/specs
Configuré avec: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --disable-checking --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-java-awt=gtk --host=x86_64-redhat-linux
Modèle de thread: posix
version gcc 3.4.6 20060404 (Red Hat 3.4.6-9)

Of course build without problem on the same host with gcc-4.4.7

Any clue ?
Thanks

Make fails to build on OSX

Greetings! I was interested in testing your demos out to see how the library performs, but it seems there is some issue with the *.make files, as the bin/ directory is devoid of most of the executables. Here is a copy of my shell log. Hope it helps.

lylemoffitt% git clone https://github.com/miloyip/rapidjson.git
Cloning into 'rapidjson'...
remote: Counting objects: 3606, done.
remote: Compressing objects: 100% (225/225), done.
remote: Total 3606 (delta 132), reused 0 (delta 0)
Receiving objects: 100% (3606/3606), 5.70 MiB | 1.41 MiB/s, done.
Resolving deltas: 100% (2173/2173), done.
Checking connectivity... done.
lylemoffitt% cd rapidjson
lylemoffitt% ls
bin         build       doc         example     include     license.txt readme.md   test        thirdparty
lylemoffitt% cd build
lylemoffitt% ls
Doxyfile          premake.bat       premake.sh        premake4.lua      travis-doxygen.sh
lylemoffitt% premake4 gmake --os=macosx
Building configurations...
Running action 'gmake'...
Generating gmake/test.make...
Generating gmake/gtest.make...
Generating gmake/unittest.make...
Generating gmake/perftest.make...
Generating gmake/example.make...
Generating gmake/capitalize.make...
Generating gmake/condense.make...
Generating gmake/messagereader.make...
Generating gmake/pretty.make...
Generating gmake/prettyauto.make...
Generating gmake/serialize.make...
Generating gmake/simpledom.make...
Generating gmake/simplereader.make...
Generating gmake/simplewriter.make...
Generating gmake/tutorial.make...
Done.
lylemoffitt% cd gmake
lylemoffitt% ls
capitalize.make    example.make       messagereader.make pretty.make        serialize.make     simplereader.make  test.make          unittest.make
condense.make      gtest.make         perftest.make      prettyauto.make    simpledom.make     simplewriter.make  tutorial.make
lylemoffitt% gnumake -f *.make all
gnumake: Nothing to be done for `condense.make'.
gnumake: Nothing to be done for `example.make'.
gnumake: Nothing to be done for `gtest.make'.
gnumake: Nothing to be done for `messagereader.make'.
gnumake: Nothing to be done for `perftest.make'.
gnumake: Nothing to be done for `pretty.make'.
gnumake: Nothing to be done for `prettyauto.make'.
gnumake: Nothing to be done for `serialize.make'.
gnumake: Nothing to be done for `simpledom.make'.
gnumake: Nothing to be done for `simplereader.make'.
gnumake: Nothing to be done for `simplewriter.make'.
gnumake: Nothing to be done for `test.make'.
gnumake: Nothing to be done for `tutorial.make'.
gnumake: Nothing to be done for `unittest.make'.
Creating ../../intermediate/debug/gmake/capitalize/x32
capitalize.cpp
Linking capitalize
ld: warning: directory not found for option '-L/usr/lib32'
lylemoffitt% ls ../../bin
capitalize_debug_x32_gmake data                       encodings                  jsonchecker

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.