Tencent is pleased to support the open source community by making RapidJSON available.
Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
- RapidJSON GitHub
- RapidJSON Documentation
Linux | Windows | Coveralls |
---|---|---|
RapidJSON is a JSON parser and generator for C++. It was inspired by RapidXml.
-
RapidJSON is small but complete. It supports both SAX and DOM style API. The SAX parser is only a half thousand lines of code.
-
RapidJSON is fast. Its performance can be comparable to
strlen()
. It also optionally supports SSE2/SSE4.2 for acceleration. -
RapidJSON is self-contained and header-only. It does not depend on external libraries such as BOOST. It even does not depend on STL.
-
RapidJSON is memory-friendly. Each JSON value occupies exactly 16 bytes for most 32/64-bit machines (excluding text string). By default it uses a fast memory allocator, and the parser allocates memory compactly during parsing.
-
RapidJSON is Unicode-friendly. It supports UTF-8, UTF-16, UTF-32 (LE & BE), and their detection, validation and transcoding internally. For example, you can read a UTF-8 file and let RapidJSON transcode the JSON strings into UTF-16 in the DOM. It also supports surrogates and "\u0000" (null character).
More features can be read here.
JSON(JavaScript Object Notation) is a light-weight data exchange format. RapidJSON should be in full compliance with RFC7159/ECMA-404, with optional support of relaxed syntax. More information about JSON can be obtained at
- Introducing JSON
- RFC7159: The JavaScript Object Notation (JSON) Data Interchange Format
- Standard ECMA-404: The JSON Data Interchange Format
- Added JSON Pointer
- Added JSON Schema
- Added relaxed JSON syntax (comment, trailing comma, NaN/Infinity)
- Iterating array/object with C++11 Range-based for loop
- Reduce memory overhead of each
Value
from 24 bytes to 16 bytes in x86-64 architecture.
For other changes please refer to change log.
RapidJSON is cross-platform. Some platform/compiler combinations which have been tested are shown as follows.
- Visual C++ 2008/2010/2013 on Windows (32/64-bit)
- GNU C++ 3.8.x on Cygwin
- Clang 3.4 on Mac OS X (32/64-bit) and iOS
- Clang 3.4 on Android NDK
Users can build and run the unit tests on their platform/compiler.
RapidJSON is a header-only C++ library. Just copy the include/rapidjson
folder to system or project's include path.
Alternatively, if you are using the vcpkg dependency manager you can download and install rapidjson with CMake integration in a single command:
- vcpkg install rapidjson
RapidJSON uses following software as its dependencies:
- CMake as a general build tool
- (optional) Doxygen to build documentation
- (optional) googletest for unit and performance testing
To generate user documentation and run tests please proceed with the steps below:
- Execute
git submodule update --init
to get the files of thirdparty submodules (google test). - Create directory called
build
in rapidjson source directory. - Change to
build
directory and runcmake ..
command to configure your build. Windows users can do the same with cmake-gui application. - On Windows, build the solution found in the build directory. On Linux, run
make
from the build directory.
On successful build you will find compiled test and example binaries in bin
directory. The generated documentation will be available in doc/html
directory of the build tree. To run tests after finished build please run make test
or ctest
from your build tree. You can get detailed output using ctest -V
command.
It is possible to install library system-wide by running make install
command
from the build tree with administrative privileges. This will install all files
according to system preferences. Once RapidJSON is installed, it is possible
to use it from other CMake projects by adding find_package(RapidJSON)
line to
your CMakeLists.txt.
This simple example parses a JSON string into a document (DOM), make a simple modification of the DOM, and finally stringify the DOM to a JSON string.
// rapidjson/example/simpledom/simpledom.cpp`
#include "rapidjson/document.h"
#include "rapidjson/writer.h"
#include "rapidjson/stringbuffer.h"
#include <iostream>
using namespace rapidjson;
int main() {
// 1. Parse a JSON string into DOM.
const char* json = "{\"project\":\"rapidjson\",\"stars\":10}";
Document d;
d.Parse(json);
// 2. Modify it by DOM.
Value& s = d["stars"];
s.SetInt(s.GetInt() + 1);
// 3. Stringify the DOM
StringBuffer buffer;
Writer<StringBuffer> writer(buffer);
d.Accept(writer);
// Output {"project":"rapidjson","stars":11}
std::cout << buffer.GetString() << std::endl;
return 0;
}
Note that this example did not handle potential errors.
The following diagram shows the process.
More examples are available:
-
DOM API
- tutorial: Basic usage of DOM API.
-
SAX API
- simplereader: Dumps all SAX events while parsing a JSON by
Reader
. - condense: A command line tool to rewrite a JSON, with all whitespaces removed.
- pretty: A command line tool to rewrite a JSON with indents and newlines by
PrettyWriter
. - capitalize: A command line tool to capitalize strings in JSON.
- messagereader: Parse a JSON message with SAX API.
- serialize: Serialize a C++ object into JSON with SAX API.
- jsonx: Implements a
JsonxWriter
which stringify SAX events into JSONx (a kind of XML) format. The example is a command line tool which converts input JSON into JSONx format.
- simplereader: Dumps all SAX events while parsing a JSON by
-
Schema
- schemavalidator : A command line tool to validate a JSON with a JSON schema.
-
Advanced
- prettyauto: A modified version of pretty to automatically handle JSON with any UTF encodings.
- parsebyparts: Implements an
AsyncDocumentParser
which can parse JSON in parts, using C++11 thread. - filterkey: A command line tool to remove all values with user-specified key.
- filterkeydom: Same tool as above, but it demonstrates how to use a generator to populate a
Document
.
RapidJSON welcomes contributions. When contributing, please follow the code below.
Feel free to submit issues and enhancement requests.
Please help us by providing minimal reproducible examples, because source code is easier to let other people understand what happens. For crash problems on certain platforms, please bring stack dump content with the detail of the OS, compiler, etc.
Please try breakpoint debugging first, tell us what you found, see if we can start exploring based on more information been prepared.
In general, we follow the "fork-and-pull" Git workflow.
- Fork the repo on GitHub
- Clone the project to your own machine
- Checkout a new branch on your fork, start developing on the branch
- Test the change before commit, Make sure the changes pass all the tests, including
unittest
andpreftest
, please add test case for each new feature or bug-fix if needed. - Commit changes to your own branch
- Push your work back up to your fork
- Submit a Pull request so that we can review your changes
NOTE: Be sure to merge the latest from "upstream" before making a pull request!
You can copy and paste the license summary from below.
Tencent is pleased to support the open source community by making RapidJSON available.
Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
Licensed under the MIT License (the "License"); you may not use this file except
in compliance with the License. You may obtain a copy of the License at
http://opensource.org/licenses/MIT
Unless required by applicable law or agreed to in writing, software distributed
under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
rapidjson's People
Forkers
arkadiuszsz jiaqiang vmcpherson dugusword dreamever99 pah jacklinmaker bobbyzhu lcode jeffhensley vicormvr qihuaheng slimek thebusytypist modk kkundi soukis01 gnilheb runcy emaxerrno lookou fedly martin-ly urban82 lemonhall jimmyjose-dev ljia310 prathik babynewton liwenzhuo g12mcgov sld666666 mikepsinn felixdae lowsar zjx20 paulcn eduardme kosta-github luhuajun123 agrdmns pdohara congnghia0609 romitagl eqic softwarekid liuyxpp charany1 dayu321 zhaog lbleon easy2go marstau wcyz666 lyyiangang ricecake erickt yl365 lichray ecorm gerryyang taabodim martinec wfb29 meltwaterarchive drewnoakes mitnickx cyendra rogerdx9 ghscan alexguoq starwalker suxinde2009 antonindrawan jwdai jiaqubo fsky skfeng36 mizukusak ihad zhangf911 brinkqiang darklost aydinsakar magastzheng stonejiang208 changbiao dawn110110 ccpaging gisupc wzb56 mlucool patlecat plenluno zam5607822 tomec ez1222 jrzhou zgpxgame leldridgerapidjson's Issues
Inconsistent Values
Hi,
I am writing a program using RapidJson.
I loop through a (rather large) file, parsing it using getline first one line at a time, expecting one json obj per line as a string char *line.
Within the body of my loop, I invoke the following:
Document d;
doc.Parse(line);
float receive_val = (float) doc["receive"].GetDouble();
However if I invoke within the loop:
printf("%f\n", receive_val);
I get a wildly different float value than if I call:
printf("%f\n", (float) doc["receive"].GetDouble());
The latter way produces the correct value as reflected in the file.
Attempting to reproduce the problem in a new standalone test file with a specific json input file, I discovered both methods were returning the wrong value in my test. I found this also occurred with GetString() when I changed my json field to string values.
Any intuition or insight on why this is occurring and how to work around it would be appreciated.
Thanks!
Running with this commit
2e0b3de
On Mac OS X 10.9
Compiling with g++
Object's member access is O(n)
From [email protected] on November 30, 2011 13:31:34
Object member access, i.e. Value::FindMember(), is in O(n) by linear searching.
Design a faster approach for member access.
Original issue: http://code.google.com/p/rapidjson/issues/detail?id=5
Why not add XML support too to replace RapidXML?
Since RapidXML is assembling dust, wouldn't it be cool to use the same cool techniques from RapidJson and apply them to support XML (SAX) too? 👯
Research on using custom stack instead of call-stack in Reader
Currently, ParseObject(), ParseArray() and ParseValue() will recursively call other parse functions. This has potential stack overflow problem if a JSON tree is very deep (maybe a synthesized JSON for security attack).
- Research changing these recursive call using custom stack.
- Evaluate the performance impact
- May add a configurable limit of tree depth.
Writer/PrettyWriter Conversion failes
Hi,
I discovered a tiny "issue" with the conversion of the PrettyWriter to Writer base class with default allocator template argument.
GenericStringBuffer<UTF8<>> buffer;
Writer<GenericStringBuffer<UTF8<>>, UTF16<>>* writer = new PrettyWriter<GenericStringBuffer<UTF8<>>, UTF16<>>(buffer);
Visual Studio error description:
a value of type "rapidjson::PrettyWriterrapidjson::GenericStringBuffer<rapidjson::UTF8<char, rapidjson::CrtAllocator>, rapidjson::UTF16<wchar_t>, rapidjson::UTF8, rapidjson::MemoryPoolAllocatorrapidjson::CrtAllocator> *" cannot be used to initialize an entity of type "rapidjson::Writerrapidjson::GenericStringBuffer<rapidjson::UTF8<char, rapidjson::CrtAllocator>, rapidjson::UTF16<wchar_t>, rapidjson::UTF8, rapidjson::CrtAllocator> *"
The difference is the last template default parameter.
Writer -> rapidjson::CrtAllocator
PrettyWriter -> rapidjson::MemoryPoolAllocatorrapidjson::CrtAllocator
Probably because of this change in "writer.h":
change history: "Change Reader/Writer's stack allocator to CrtAllocator"
id: 2e23787
Clang -std=gnu++11: error: use of overloaded operator '!=' is ambiguous
Using -std=gnu++11 when building RapidJSON using Clang results in build errors like this:
In file included from ../../test/unittest/valuetest.cpp:22:
../../include/rapidjson/document.h:646:76: error: use of overloaded operator
'!=' is ambiguous (with operand types 'const
GenericValue<rapidjson::UTF8<char>,
rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator> >' and 'const
GenericValue<rapidjson::UTF8<char>,
rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator> >')
...== rhs.MemberEnd() || lhsMemberItr->value != rhsMemberItr->value)
~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~
Access violation when save json via Document::Accept().
My function is like this, it load data form msgpack and try to save as JSON.
void Data::load_msgpack(std::string& buf)
{
msgpack::unpacked msg;
msgpack::unpack(&msg, buf.data(), buf.size());
rapidjson::Document doc;
msg.get().convert(static_cast<rapidjson::Value*>(&doc));
using namespace rapidjson;
FILE* fp = fopen("D:/Users/xpol/Desktop/Dragon.mpack.json", "wb"); // non-Windows use "w"
char writeBuffer[65536];
FileWriteStream os(fp, writeBuffer, sizeof(writeBuffer));
Writer<FileWriteStream> writer(os);
doc.Accept(writer);
fclose(fp);
}
The line msg.get().convert(static_cast<rapidjson::Value*>(&doc));
compiled because i have write an adapter for rapidjson::Value
.
When the doc.Accept(writer)
called. I got the follow exception:
First-chance exception at 0x000ED9A8 in MyGame.exe: 0xC0000005: Access violation reading location 0xFEEEFEEE.
Unhandled exception at 0x000ED9A8 in MyGame.exe: 0xC0000005: Access violation reading location 0xFEEEFEEE.
Here is the stack trace:
rapidjson::GenericStringStream<rapidjson::UTF8<char> >::Peek() Line 443 C++
rapidjson::Writer<rapidjson::FileWriteStream,rapidjson::UTF8<char>,rapidjson::UTF8<char>,rapidjson::CrtAllocator>::WriteString(const char * str, unsigned int length) Line 252 C++
rapidjson::Writer<rapidjson::FileWriteStream,rapidjson::UTF8<char>,rapidjson::UTF8<char>,rapidjson::CrtAllocator>::String(const char * str, unsigned int length, bool copy) Line 126 C++
rapidjson::GenericValue<rapidjson::UTF8<char>,rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator> >::Accept<rapidjson::Writer<rapidjson::FileWriteStream,rapidjson::UTF8<char>,rapidjson::UTF8<char>,rapidjson::CrtAllocator> >(rapidjson::Writer<rapidjson::FileWriteStream,rapidjson::UTF8<char>,rapidjson::UTF8<char>,rapidjson::CrtAllocator> & handler) Line 1299 C++
...
...
...
I have tried StringBuffer
it also crashes at rapidjson::GenericStringStream<...>::Peek()
.
add real C++11 move semantics support
Add real C++11 move semantics support; e.g., for Value
:
Value::Value(Value&& other) noexcept { ... }
Value& Value::operator=(Value&& other) noexcept { ... }
And also for moving a std::string
into a Value
:
Value& Value::SetString(std::string str) {
internal_data.str = std::move(str); // or something equivalent...
return *this;
}
With this you would be able to move strings into a Value
without copying them again like this:
std::string hello = ...;
val.SetString(std::move(hello)); // string is moved, not copied!
Terminate parsing from SAX handler
Currently there is no way to terminate the parsing process in GenericReader::ParseStream()
from a handler.
I suggest that each handler function (e.g. StartObject()
, String()
) should return a bool
, indicating if the event has been consumed successfully. If the return value is false, the parsing process should be terminated immediately, stop further reading character from the stream.
Document
does not requires this feature because it always able to consume the event and build the DOM. However, custom SAX handler which consumes JSON for specific schema need this.
Without this feature, the handler needs to keep track of error state and skip events after any error occurred. This wastes processing power and reduces performance due to error checking overheads in every handler function.
Another situation is when directing Reader
to Writer
or Document
to Writer
, if the Writer
fails to write (e.g. disk full), the source object will continue producing SAX events.
Adding this feature will change the API of handler concept though.
Remove setjmp()/longjmp()
https://code.google.com/p/rapidjson/issues/detail?id=79
As reported by Anton.Breusov, there are potential issues with setjmp()/longjmp().
Try removing them and evaluate the performance loss.
"pretty" example is much slower than YAJL's reformat
From [email protected] on November 22, 2011 11:31:26
https://github.com/lloyd/yajl_vs_rapidjson This contradicts the results shown in https://code.google.com/p/rapidjson/wiki/Performance
Original issue: http://code.google.com/p/rapidjson/issues/detail?id=1
Update compliancy to current RFC7159
RapidJSON currently implements RFC4627, which has been superseeded by RFC7159 in March 2014.
Among mostly editorial changes, the most important update is:
- Changed the definition of "JSON text" so that it can be any JSON
value, removing the constraint that it be an object or array.
(This effectively closes #21 and issue80.)
RapidJSON seems to be compliant with the other changes already, AFAICS.
If you want to keep strict RFC4627-compliancy available, we could add a RAPIDJSON_RFC4627
macro to enforce the object/array top-level requirement. An AcceptAnyRoot
API as proposed by #21 is then no longer needed.
SIMD instruction can cause seg-fault/UMR
https://code.google.com/p/rapidjson/issues/detail?id=104
According to Agner's SSE A_strlen()
implementation (http://www.agner.org/optimize/asmlib.zip),
; Misaligned parts of the string are read from the nearest 16-bytes boundary
; and the irrelevant part masked out. It may read both before the begin of
; the string and after the end, but will never load any unnecessary cache
; line and never trigger a page fault for reading from non-existing memory
; pages because it never reads past the nearest following 16-bytes boundary.
; It may, though, trigger any debug watch within the same 16-bytes boundary.
So it should be possible to modify RapidJSON's SkipWhitespace_SIMD()
to make it work safely. Currently it read 16 bytes unaligned from start of string, which will be possible to read bytes past the end of page boundary and cause segmentation fault.
Please add license headers to source files
In order to safely track the copyright origins of the source files, e.g. for inclusion in third-party libraries, the source files should carry individual license headers. Something like:
// This file is part of RapidJSON, a fast JSON parser/generator.
// http://miloyip.github.io/rapidjson
//
// Copyright (C) 2011-2014 Milo Yip and all Contributors
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in
// all copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
// THE SOFTWARE.
Especially the optimized algorithms for itoa
(#80) and dtoa
(#97), which are based on third-party implementations, should mention their origins (and licenses, if different) correctly to avoid licensing issues.
Write JSON with all Unicode character escaped
Original upstream issue https://code.google.com/p/rapidjson/issues/detail?id=40
By escaping all Unicode characters (U+0080 and above) in JSON string into escaped character sequence ("\u0080" and others), the output are in ASCII encoding.
This may be useful when some (external) applications can only deal with ASCII encoding.
compiler error (preprocessor) while using std::string
Hi,
at the document.h (line 1268) is a slightly typo
if RAPIDJSON_HAS_STDSTRING
should be
ifdef RAPIDJSON_HAS_STDSTRING
commit id: 744b485
Building on Mac OS X: error: 'tmpnam' is deprecated
Building in a rather standard Mac OS X 10.9.4 install gives the following error message:
In file included from ../../test/unittest/documenttest.cpp:21:
../../test/unittest/unittest.h:71:16: error: 'tmpnam' is deprecated: This
function is provided for compatibility reasons only. Due to security
concerns inherent in the design of tmpnam(3), it is highly recommended
that you use mkstemp(3) instead. [-Werror,-Wdeprecated-declarations]
filename = tmpnam(filename);
^
/usr/include/stdio.h:274:7: note: 'tmpnam' declared here
char *tmpnam(char *);
^
1 error generated.
Adding the compiler flag -Wno-deprecated-declarations
gets past this (but the deprecation warning might also be worth following).
Tag for each release
Hi,
Please provide tags for every release.
Thank you!
Member query performance
From the beginning of RapidJSON, performance of FindMember()
is O(n)
. I personally also jot this down a long time ago.
https://code.google.com/p/rapidjson/issues/detail?id=5&can=1&q=performance
However, today I got e-mail complaining on this again.
I cannot think of a good solution for this. But I would like to write down the thoughts here for public discussion.
String equality test
When querying member by a key, it must involve string equality test operation.
Currently the string equality test is implemented as:
bool GenericValue::StringEqual(const GenericValue& rhs) const {
return data_.s.length == rhs.data_.s.length &&
(data_.s.str == rhs.data_.s.str // fast path for constant string
|| memcmp(data_.s.str, rhs.data_.s.str, sizeof(Ch) * data_.s.length) == 0);
}
There are three condition checks:
- Strings are unequal if their lengths are unequal. O(1) because length is specified or pre-calculated.
- Constant string shortcut for strings pointed to same address (and have same length). O(1)
- Compare memory for O(m), where m is the length of both strings. The worst case is when two strings are equal.
An idea for improvement is to hash the string into a hash code. I noticed that it is possible to add 32-bit hash code without additional memory overhead, and have considered this in the design of the DOM:
class GenericValue {
// ...
struct String {
const Ch* str;
SizeType length;
unsigned hashcode; //!< reserved
}; // 12 bytes in 32-bit mode, 16 bytes in 64-bit mode
};
The hashcode can be evaluated during parsing with an new option. And then a new variation of SetString()
or constructor can evaluate the hash code of a string. Finally, use that string to query member. We may initialize hashcode = 0
to represent an invalid hash code. Then the code may become:
bool GenericValue::StringEqual(const GenericValue& rhs) const {
if (data_.s.length != rhs.data_.s.length) return false;
if (data_.s.str == rhs.data_.s.str) return true; // fast path for constant string
if (data_.s.hashcode == 0 || rhs.data_.s.hashcode == 0 || data_.s.hashcode == rhs.data_.s.hashcode)
return memcmp(data_.s.str, rhs.data_.s.str, sizeof(Ch) * data_.s.length) == 0;
return false;
}
Although the worst case of the test is still O(m), the test can be finished in O(1) when two strings are unequal (thus their hash codes are unequal).
For member query, most string equal tests should be false, so this may improve performance. However, there will be additional O(m) costs for evaluating hash code of each string initially.
Associative Array
A JSON object is basically an associative array, which maps string keys to JSON values.
Currently members are stored in a std::vector
like manner. Order of members are controlled by user, which depends on the sequence of AddMember()
or the JSON being parsed. This is actually an important feature when user want to maintain the order of members.
However, of course, the query time is O(n) without additional data structure.
There are other possibilities for representing associative array:
-
Sorted array.
After parsing an object, the members are sorted by their keys by O(n log n). Binary search can be used in query, improving query time to O(log n). Adding new member needs sorting again later (amortized O(log n)). No additional space overhead.
Besides, note that this requires comparison of string lexicographically (not equality), which is an O(m) operation for m equals to minimal length of two strings.
-
Hash table.
The simple way is using open addressing, so that it will still be a single buffer. Hash code can be computed as in the last section. Insert, query and remove is O(1). When adding new member and the load ratio (member count divided by capacity) is over a constant, the hash table need to be rebuild by O(n). In addition, iterating members will be linear to the capacity, not the count.
Current (Vector) Sorted Array Hash Table Custom Order Yes No No Sorted by Key May be Yes No Initialization O(n) O(n log n) O(n) AddMember Amortized O(1) Amortized O(log n) Amortized O(1) RemoveMember O(n) O(n) O(1) FindMember O(n) O(log n) O(1) IterateMember O(n) O(n) O(capacity) Resize O(n) O(n) O(capacity) ObjectEquality O(n^2) O(n) O(n) Space O(capacity) O(capacity) O(capacity)
- n = number of member
- The hidden m (string length of key) is not shown.
- In hash table, n <= capacity * load_ratio. In others, n <= capacity.
If we want to support either or all of these data structures, we have two options: (1) use template parameter to specify which to use in compile-time; or (2) use flags to specify in run-time with overheads in all related operations.
Move performance tests into another repository
Performance tests, libraries and data are not needed for most users.
I suggest moving them to another repository, which dedicated for comparing performance among C/C++ JSON libraries. Let everyone can add.or update libraries.
The framework needed to be revised to get more precise results. And need to generate tables/charts automatically.
I think benchmarks are not just numbers, but it helps developers to learn from each other and improve the code. It's precious in open source community.
stack.h warnning because pointer alignment is not the same
Hi.
I was wondering if some changes to source code can be usefully to reduce the number of warning that gcc produce.
Next it is a diff or the propose changes.
https://gist.github.com/crespo2014/8a40a9053b2c21fbc840
Many thanks by the suggestion
Custom number parser
From [email protected] on November 26, 2011 23:17:34
Parse expressions with custom variables and functions.
Original issue: http://code.google.com/p/rapidjson/issues/detail?id=3
Float point reading is lossy.
The float point value 0.9868011474609375
become 0.9868011474609376
.
If it is a performance tradeoff, I think we should provide an option to user that prefers correctness over speed.
Use AutoUTFInputStream with StringStream
I need to support different encoded JSON file but finally output UTF16 GenericDocument.
I can't use AutoUTFInputStream with FileReadStream directly because rapidjson is not able to filter the javascript comment in the file. So I use another library to trim out the comment and then parse stream back into GenericDocument.
There is pseudo code below:
rapidjson::GenericDocument<rapidjson::UTF16<> > doc;
std::string input;
// ...
// Read the input from file
// Use another library to minified the JSON string 'input'
// ...
rapidjson::StringStream ss(input.c_str());
// error C2039: 'Peek4' : is not a member of 'rapidjson::GenericStringStream<Encoding>'
rapidjson::AutoUTFInputStream<unsigned, rapidjson::StringStream> ais(ss);
doc.ParseStream<0, rapidjson::AutoUTF<unsigned>, rapidjson::AutoUTFInputStream<unsigned, rapidjson::StringStream> >(ais);
It generates error C2039 as comment above while compile.
Finally, I add the Peek() function to GenericStringStream like FileReadStream.
const Ch* Peek4() const {
return (src_ + 4 <= src_ + strlen(src_)) ? src_ : 0;
}
It works!!
However, will there be another way without adding custom code to do so? or in other word, am I misunderstanding the usage of AutoUTFInputStream?
Writer object should be reusable and do validation by assertion
Writer
should always produce well-formed JSON. Invalid sequence of events should be detected via assertion. However, currently generating two or more JSONs into output stream does not generate assertion fail. For example,
Writer<Stream> writer(stream);
writer.StartObject();
writer.EndObject();
writer.StartObject();
writer.EndObject();
Will output {}{}
to the stream.
I suggest that an explicitly reset API should be added:
Writer<Stream> writer(stream);
writer.StartObject();
writer.EndObject();
writer.Reset(newStream);
writer.StartObject();
writer.EndObject();
So that, error of multiple root elements can be detected, and also Writer
object can be reused. The advantage of reusing an Writer
object is to prevent reallocating memory for Writer
's internal data structure (basically a stack).
Are the docs for this completely obsolete?
I don't even see a Document class.
Redo the documentation
IMHO, rapidjson has many good features and characteristics (fast, small, easy integration, ...) but it seems not so popular. I think documentation is a main issue. I hope we can do it better.
The old documentation includes:
- Readme
- https://code.google.com/p/rapidjson/wiki/UserGuide
- https://code.google.com/p/rapidjson/wiki/Performance
- Doxygen API
Issues:
- The user guide is incomplete.
- The writing style is quite verbose and uninteresting.
- User need to dig into details to do some basics stuff.
I suggest to redo the documentation from ground up. There are several decisions to be made:
- What format should be used (Markdown?)
- What tools can be used (https://readthedocs.org/ ?)
- Can the tools generate API automatically?
- What kind of documentation is needed? How to structure them?
I hope we can make some decision and create a rough plan. And then we do it step-by-step.
Welcome for suggestions.
Allow Reader to read only one/first JSON object
Slightly related to #66, a Reader
instance should be configurable to read only one (i.e. the first) JSON object from a stream. This enables the extraction of JSON from mixed-content files and/or reusing the stream for multiple operations on a sequence of JSON objects.
Currently, the Reader
creates an error (kParseErrorDocumentRootNotSingular
), if the stream contains anything other than whitespace after the JSON object.
Also mentioned/requested in Google Code issue:
https://code.google.com/p/rapidjson/issues/detail?id=20#c2
This is also partly related to #36, as it may affect the error handling API.
provide a sample handler calculating a SHA1 hash for a document
I was playing a bit with rapidjson
and came up with this simple Handler
for calculating a SHA1
hash value for a Document
(with a little bit of a help from boost::uuids::detail::sha1
):
#include <boost/uuid/sha1.hpp>
struct HashCalculator {
using namespace rapidjson;
boost::uuids::detail::sha1 sha1;
template<typename T>
inline bool add_to_hash(T&& t) { return add_to_hash(&t, sizeof(t)); }
inline bool add_to_hash(const void* str, size_t len) { sha1.process_bytes(str, len); return true; }
inline std::string get_hash() {
unsigned int digest[] = { 0, 0, 0, 0, 0 };
sha1.get_digest(digest);
return base64_encode( // use a base64 encoding lib at your liking...
reinterpret_cast<const unsigned char*>(digest), sizeof(digest)
);
}
inline bool Null() { return add_to_hash('N'); }
inline bool Bool(bool b) { return add_to_hash(b ? 'T' : 'F'); }
inline bool Int(int i) { return add_to_hash(i); }
inline bool Uint(unsigned u) { return add_to_hash(u); }
inline bool Int64(int64_t i) { return add_to_hash(i); }
inline bool Uint64(uint64_t u) { return add_to_hash(u); }
inline bool Double(double d) { return add_to_hash(d); }
inline bool String(const char* str, SizeType length, bool) { return add_to_hash(str, length); }
inline bool StartObject() { return add_to_hash('O'); }
inline bool Key(const char* str, SizeType length, bool) { return add_to_hash(str, length); }
inline bool EndObject(SizeType memberCount) { return add_to_hash(memberCount); }
inline bool StartArray() { return add_to_hash('A'); }
inline bool EndArray(SizeType elementCount) { return add_to_hash(elementCount); }
};
std::string calculate_hash(rapidjson::Document const& doc_) {
HashCalculator hasher;
doc_.Accept(hasher);
return hasher.get_hash();
}
Maybe you find that useful and want to provide as an additional example?
UTF-8/16 string validation
From [email protected] on November 22, 2011 22:40:20
String in JSON are not validated to be UTF-8 or UTF-16. The parser do not detect whether the string contain invalid encoding.
Both YAJL and ultrajspon supports UTF-8 validation. YAJL can turn it on/off, while ultrajson make it mandatory.
Add an parsing option to validate UTF-8/UTF-16.
Original issue: http://code.google.com/p/rapidjson/issues/detail?id=2
Segmentation fault - document.h line 618
Push mirror of SVN repository to GitHub?
There are a bunch of rapidjson forks available on GitHub already, each with a different set of fixes and with differing histories. See for instance (skipped the ones without any additional fixes/features beyond the current trunk
):
- chrismanning/rapidjson (by @chrismanning, with C++11 integration)
- kanma/rapidjson (by @Kanma, several additions)
- pah/rapidjson (my personal fork, see
svn/trunk
branch for SVN history) - rjeczalik/rapidjson (by @rjeczalik, Cmake integration, Git submodules)
Is there any chance to merge those repositories to be based on a single, "official" SVN mirror here at GitHub? Even if the current upstream development of rapidjson seems to have slowed down, maybe a more collaborative model could help to keep the project alive and to join forces among the current users.
Thanks for your work on rapidjson!
Value::MemberCount() or Value::ObjectSize()
I think it is necessary to get the size of a Object Value.
Because other format like msgpack needs to know the number of members before pack.
Strange stream object copying while parsing
https://code.google.com/p/rapidjson/issues/detail?id=78
In a rapidjson reader class there's some strange code:
template<typename Stream>
void SkipWhitespace(Stream& stream)
{
Stream s = stream; // Use a local copy for optimization
while (s.Peek() == ' ' || s.Peek() == '\n' || s.Peek() == '\r' || s.Peek() == '\t')
s.Take();
stream = s;
}
Why stream object is reassigned two times, what optimization this gives comparing to direct accessing it via reference? What if stream object copying is a heavy process by itself and also can cause an exception? What if I want to create stream class with non-copyable semantic? For me all of this is a clear pessimization instead of optimization.
Decimal separator follows the locale
Upstream https://code.google.com/p/rapidjson/issues/detail?id=84
A locale may output "3,1416" instead of "3.1416" due to locale setting. JSON should always uses .
as decimal separator. Need to also check other similar issues such as digit grouping, e.g. "123,456.789" is invalid JSON number.
Encoding conversion
From [email protected] on November 27, 2011 00:33:27
Currently, the input and output of Reader uses the same encoding.
It is often needed to read a stream of one encoding (e.g. UTF-8), and output string of another encoding (e.g. UTF-16). Or in the other way, stringify a DOM from one encoding (e.g. UTF-16) to an output stream of another encoding (e.g. UTF-8)
The most simple solution is converting the stream into a memory buffer of another encoding. This requires more memory storage and memory access.
Another solution is to convert the input stream into another encoding before sending it to the parser. However, only characters in JSON string type are really the ones necessary to be converted. Conversion of other characters just wastes time.
The third solution is letting the parser distinguish the input and output encoding. It uses an encoding converter to convert characters of JSON string type. However, since the output length may longer than the original length, in situ parsing cannot be permitted.
Try to design a mechanism to generalize encoding conversion. And it should support UTF-8, UTF-16LE, UTF-16BE, UTF-32LE, UTF-32BE. It can also support automatic encoding detection with BOM, while incurring some overheads in dynamic dispatching.
Original issue: http://code.google.com/p/rapidjson/issues/detail?id=4
Potential endianness detection problem
Upstream issue https://code.google.com/p/rapidjson/issues/detail?id=100
Currently Endianness is very important in Value
because it uses this for aligning 32-bit and 64-bit integer, in order to implement "no overhead integer conversion".
This "optimization" depends on the RAPIDJSON_ENDIAN
macro. Currently it has assumes little endian if gcc __BYTE_ORDER__
is not defined. This is error-prone.
Several potential fixes:
- Better endianness detection with more compilers.
- If unable to detect, fail to compile and ask user to define the macro explicitly.
- Automatic validation of endianness in compile-time or run-time.
Shall check how other cross-platform libraries deal with this issue.
FindMember() return value equal to null pointer when member is not found.
This behavior differs from std::map::find()
.
I propose to modify it to return MemberEnd()
instead when member is not found.
But this may affect those who was using this API and generates _runtime_ error.
Strict/Relaxed JSON syntax
Currently RapidJSON is tried to be as strict as possible to the JSON spec.
There are several requests in adding relaxed JSON syntax for parsing. Here lists some possible ones.
Root JSON type
- A document with types other than object and array. (Done in #101)
Comment
- Single line JS Comment (#443)
- Multiple line JS Comment (#443)
- Comment handler event
- Comment stored in Value
Object
- Key without quotes (limited to a sequence of confined character sets, e.g.
(A-Za-z\-_)+[0-9A-Za-z\-_]*
- extra comma at the end (#584)
Array
- extra comma at the end (#584)
String
- Single quote pair
Number
- Fractions begins with
.
, e.g..123
- Number with '.' but without fraction digits, e.g.
0.
,123.
- Hexadecimal integer, e.g.
0xDEADBEEF
-
Infinity
andNaN
(#641)
These relaxed syntax may be useful for some applications. However, this may increase complexity of code base.
Would like to gather opinions on this.
Optimize integer-to-string conversion in GenericWriter
Current implementation is a basic algorithm which costs one division per decimal digit.
Compare with other faster implementations and adopt suitable algorithm.
Parse error for very small yet valid double numbers
Note - this is a different issue than the similar one solved last month:
https://code.google.com/p/rapidjson/issues/detail?id=75
I've found that numbers with an exponent smaller than -308 cause parse errors in rapidjson. This is weird since web browsers have no problem serializing these numbers during JSON.stringify. I've been trying to get to the bottom of what the various specs say about what the minimum representable number is.
Consider the following code:
JSON.stringify(Number.MIN_VALUE);
It outputs "5e-324" which is indeed the value of Number.MIN_VALUE.
According to:
http://en.wikipedia.org/wiki/Double-precision_floating-point_format
the minimum double is approx 4.9406564584124654e-324 and the maximum is approx 2.2250738585072009e308 and this is presumably what motivated the values of Number.MIN_VALUE and Number.MAX_VALUE.
There's the following code rapidjson's reader.h:
if (s.Peek() >= '0' && s.Peek() <= '9') {
exp = s.Take() - '0';
while (s.Peek() >= '0' && s.Peek() <= '9') {
exp = exp * 10 + (s.Take() - '0');
if (exp > 308)
RAPIDJSON_PARSE_ERROR(kParseErrorNumberTooBig, s.Tell());
}
}
Debugging my code shows that it's this exp > 308 that's catching numbers smaller than -308. So it looks like this is correct for +ve exponents but not correct for -ve exponents. Changing the code to the following seems to fix the issue in my code - does this look reasonable?
if (s.Peek() >= '0' && s.Peek() <= '9') {
exp = s.Take() - '0';
while (s.Peek() >= '0' && s.Peek() <= '9') {
exp = exp * 10 + (s.Take() - '0');
if (( (!expMinus) && (exp > 308) ) || (expMinus && (exp > 324) ))
RAPIDJSON_PARSE_ERROR(kParseErrorNumberTooBig, s.Tell());
}
}
Thanks!
Lacks functions for removing elements in JSON array.
I suggest to add:
ValueIterator Erase(ValueIterator pos);
ValueIterator Erase(ValueIterator first, ValueIterator last);
which mimics std::vector.
Changing the Handler interface to differentiate between key and string
Would it be possible to introduce a slight interface breaking change to the Handler
interface?
Consider this example from the SAX documentation:
{
"hello": "world",
"t": true ,
"f": false,
"n": null,
"i": 123,
"pi": 3.1416,
"a": [1, 2, 3, 4]
}
which creates the following calling sequence for a Handler
:
BeginObject()
String("hello", 5, true)
String("world", 5, true)
String("t", 1, true)
Bool(true)
String("f", 1, true)
Bool(false)
String("n", 1, true)
Null()
String("i", 1, true)
UInt(123)
String("pi", 2, true)
Double(3.1416)
String("a", 1, true)
BeginArray()
Uint(1)
Uint(2)
Uint(3)
Uint(4)
EndArray(4)
EndObject(7)
My problem with this is that the String()
method is overloaded and used for two different purposes:
- for defining a
string
property, and - for defining the
key
for akay-value
-pair inside of the currently activeobject
.
Writing your own Handler
needs to keep track of the alternating calling sequence of String()
and X
where X
can be Bool()
, UInt()
, ... BeginObject()
, BeginArray()
, and also String()
and this even for nested objects!
This could be improved and solved by using another additional method for specifying the key
of a key-value
-pair within an active object
, .e.g:
bool Key(const char* k, SizeType length, bool copy);
The calling sequence from above would change to this:
BeginObject()
Key("hello", 5, true)
String("world", 5, true)
Key("t", 1, true)
Bool(true)
Key("f", 1, true)
Bool(false)
Key("n", 1, true)
Null()
Key("i", 1, true)
UInt(123)
Key("pi", 2, true)
Double(3.1416)
Key("a", 1, true)
BeginArray()
Uint(1)
Uint(2)
Uint(3)
Uint(4)
EndArray(4)
EndObject(7)
This would allow to write much simpler Handler
implementations.
But I recognize that this would be a rather breaking API change...
AddMember() may be incorrectly used.
In upstream issue 66,
user misuse the AddMember()
API:
doc.AddMember(lvl.c_str(), score, doc.GetAllocator());
lvl.c_str()
is a temp string, it needs to be duplicated, otherwise when the function returns the pointer become invalid.
For this situation, the user should call
rapidjson::Value name(lvl.c_str(), doc.GetAllocator()); // copy the string
rapidjson::Value value(score);
doc.AddMember(name, value, doc.GetAllocator());
But it may be confusing for user.
All overloads of AddMember(const char* name,... )
assume the name being literal string, no need for making copies. But it may be misused like this.
Any suggestion to improve the API for DOM manipulation?
Provide error code in addition to text message
https://code.google.com/p/rapidjson/issues/detail?id=79
As mentioned by Anton.Breusov, currently the parsing error is provided as text string, which cannot be identified by code easily, and incurs difficulty in localization.
Provide error code.
User can obtain error message provided by rapidjson but can also localize the message by themselves.
Reader.ParseNumberHandler fails on 32-bit
The test Reader.ParseNumberHandler
fails on some 32-bit configurations both with GCC and Clang in release mode, see the build for #103:
[ RUN ] Reader.ParseNumberHandler
../../test/unittest/readertest.cpp:170: Failure
Value of: h.step_
Actual: 0
Expected: 1u
Which is: 1
../../test/unittest/readertest.cpp:170: Failure
Value of: h.actual_
Actual: 0
Expected: 1E308
Which is: 1e+308
[ FAILED ] Reader.ParseNumberHandler (0 ms)
The Double
handler is not even called in release mode for the corner case value of 10...0
(with 308 zeros), instead parsing (silently) fails with kParseErrorNumberTooBig
.
Better compatibility of 64-bit integer literals
There are compiler differences in parsing 64-bit integer literals.
Some C++ compilers generate warning if ULL
suffix is used, while some complaints the opposite.
https://code.google.com/p/rapidjson/issues/detail?id=10
GCC 4.2 on MacOS: integer constant is to large for 'long' type
The C standard way is to use UINT64_C()
et al macros in <stdint.h>
. But in C++ it needs to define __STDC_CONSTANT_MACROS=1
before including this header.
Currently RapidJSON includes this file in rapidjson.h
. But It will become a problem if user firstly include <stdint.h>
before including rapidjson.h
. A workaround is to define the macro in compiler settings, which is quite annoying. See https://code.google.com/p/rapidjson/issues/detail?id=65#c8
I propose to remove the dependency to UINT64_C()
macro, by defining some macros like:
#define RAPIDJSON_UINT64(high32, low32) ((uint64_t(high32) << 32) | (low32))
Migrate Premake to CMAKE
Due to late official release of premake, many problems cannot be solved easily. For example, VC2012/2013 are not supported yet, Maverick (OS X 10.9) has problem with the linking options, XCode for iOS, etc
I would like to migrate the build system to CMAKE, which is more active and widely used.
Compilation error on gcc 3.4.6
Hello,
I hit some trouble to use last version on old RedHat:
In file included from JavaScriptRuntime.cc:126:
/home/albert/albert/3albert/include/arch_independent/rapidjson/document.h: In destructor rapidjson::GenericValue<Encoding, Allocator>::~GenericValue()': /home/albert/albert/3albert/include/arch_independent/rapidjson/document.h:489: error: expected class-name before '(' token /home/albert/albert/3albert/include/arch_independent/rapidjson/document.h: In member function
typename rapidjson::GenericMemberIterator<false, Encoding, Allocator>::Iterator rapidjson::GenericValue<Encoding, Allocator>::RemoveMember(typename rapidjson::GenericMemberIterator< false, Encoding, Allocator>::Iterator)':
/home/albert/albert/3albert/include/arch_independent/rapidjson/document.h:924: error: expected class-name before '(' token
Error come from:
m->~GenericMember();
gcc -v
Lecture des spécification à partir de /usr/lib/gcc/x86_64-redhat-linux/3.4.6/specs
Configuré avec: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --disable-checking --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-java-awt=gtk --host=x86_64-redhat-linux
Modèle de thread: posix
version gcc 3.4.6 20060404 (Red Hat 3.4.6-9)
Of course build without problem on the same host with gcc-4.4.7
Any clue ?
Thanks
Make fails to build on OSX
Greetings! I was interested in testing your demos out to see how the library performs, but it seems there is some issue with the *.make
files, as the bin/
directory is devoid of most of the executables. Here is a copy of my shell log. Hope it helps.
lylemoffitt% git clone https://github.com/miloyip/rapidjson.git
Cloning into 'rapidjson'...
remote: Counting objects: 3606, done.
remote: Compressing objects: 100% (225/225), done.
remote: Total 3606 (delta 132), reused 0 (delta 0)
Receiving objects: 100% (3606/3606), 5.70 MiB | 1.41 MiB/s, done.
Resolving deltas: 100% (2173/2173), done.
Checking connectivity... done.
lylemoffitt% cd rapidjson
lylemoffitt% ls
bin build doc example include license.txt readme.md test thirdparty
lylemoffitt% cd build
lylemoffitt% ls
Doxyfile premake.bat premake.sh premake4.lua travis-doxygen.sh
lylemoffitt% premake4 gmake --os=macosx
Building configurations...
Running action 'gmake'...
Generating gmake/test.make...
Generating gmake/gtest.make...
Generating gmake/unittest.make...
Generating gmake/perftest.make...
Generating gmake/example.make...
Generating gmake/capitalize.make...
Generating gmake/condense.make...
Generating gmake/messagereader.make...
Generating gmake/pretty.make...
Generating gmake/prettyauto.make...
Generating gmake/serialize.make...
Generating gmake/simpledom.make...
Generating gmake/simplereader.make...
Generating gmake/simplewriter.make...
Generating gmake/tutorial.make...
Done.
lylemoffitt% cd gmake
lylemoffitt% ls
capitalize.make example.make messagereader.make pretty.make serialize.make simplereader.make test.make unittest.make
condense.make gtest.make perftest.make prettyauto.make simpledom.make simplewriter.make tutorial.make
lylemoffitt% gnumake -f *.make all
gnumake: Nothing to be done for `condense.make'.
gnumake: Nothing to be done for `example.make'.
gnumake: Nothing to be done for `gtest.make'.
gnumake: Nothing to be done for `messagereader.make'.
gnumake: Nothing to be done for `perftest.make'.
gnumake: Nothing to be done for `pretty.make'.
gnumake: Nothing to be done for `prettyauto.make'.
gnumake: Nothing to be done for `serialize.make'.
gnumake: Nothing to be done for `simpledom.make'.
gnumake: Nothing to be done for `simplereader.make'.
gnumake: Nothing to be done for `simplewriter.make'.
gnumake: Nothing to be done for `test.make'.
gnumake: Nothing to be done for `tutorial.make'.
gnumake: Nothing to be done for `unittest.make'.
Creating ../../intermediate/debug/gmake/capitalize/x32
capitalize.cpp
Linking capitalize
ld: warning: directory not found for option '-L/usr/lib32'
lylemoffitt% ls ../../bin
capitalize_debug_x32_gmake data encodings jsonchecker
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.