Code Monkey home page Code Monkey logo

bad_json_parsers's Introduction

Nesting levels for JSON parsers

Build Status

Documenting how JSON parsers of several programming languages deal with deeply nested structures.

Introduction

Many JSON parsers (and many parsers in general) use recursion to parse nested structures. This is very convenient while programming the parser, but it has consequences on what the parser can parse: indeed, the size of the call stack is usually limited to a value several orders of magnitude smaller than the available RAM, and this implies that a program with too many levels of recursion will fail.

The two most recent JSON standards RFC 8259 and RFC 7159 both say "An implementation may set limits on the maximum depth of nesting". However, the ECMA-404 specification doesn't contain any limit on how deeply nested JSON structures can be.

This means that there is not a defined level of nesting which is correct or incorrect with regard to the JSON specification, and JSON parsers may differ when parsing nested structures.

Some recursive parser libraries implement a safety check in order to avoid crashing the calling program: they artificially limit the maximum depth they accept (often making that limit configurable), hoping that the size of the stack at the moment they are called plus the artificial limit will always be smaller than the total stack size. This limit is an arbitrary choice of the library implementer, and it explains all the lower values of the comparison you'll see below.

Some parsers do not use the operating system stack at all to parse nested structures (they usually implement a state machine instead). These can usually accept arbitrarily deeply nested structures. Of course, for non-streaming parsers, they cannot physically be provided infinitely large inputs, and thus cannot produce infinitely-large outputs.

You should note that parsers that set an arbitrary limit on the input nesting level are not safer and do not provide any more memory consumption guarantees than parsers that can handle arbitrarily nested input: they still consume an amount of resources proportional to the size of their input.

This repository contains tools to measure the nesting limits of JSON parsers of different languages.

How to use

This repository contains a script called test_parser.py that takes a JSON parser and uses binary search to find the smallest JSON structure it fails to parse and print its nesting level.

The json parser must be a program that reads JSON on its standard input, exits with a status of 0 if it managed to parse it and any other status if an error occurred.

How it works

test_parser.py constructs json structures composed uniquely of nested arrays, and gives them to the program it tests. For instance, for a depth of 3, it builds the following json : [[[]]]. This allows to create a structure of only 2n bytes that has n nesting levels. It uses binary search to find the smallest structure for which the program fails.

Results

The various implementations in this repository are continuously tested by Travis CI on a virtual machine running Ubuntu 18.04, with 8Gb of RAM, and a maximum stack size of 8.192 Mb.

Here are the results we found, sorted from least nesting allowed by default to the most:

language json library nesting level file size notes
C# System.Text.Json 65 130 bytes configurable (JsonSerializerOptions.MaxDepth) *
ruby json 101 202 bytes configurable (:max_nesting) *
rust serde_json 128 256 bytes disableable (disable_recursion_limit) *
shell jq 257 514 bytes undocumented
php json_decode 512 1.0 KB configurable ($depth) *
perl JSON::PP 513 1.0 KB configurable (max_depth) *
swift JSONDecoder 514 1.0 KB undocumented
python3 json 995 2.0 KB configurable (sys.setrecursionlimit) *, undocumented
C jansson 2049 4.0 KB
javascript JSON.parse 5712 11.4 KB Node.js 8 LTS
java Gson 6100 12 KB
java Jackson 6577 13 KB
go json-iterator 10002 20 KB configurable (Config.MaxDepth) *
PostgreSQL json type 11887 23 KB configurable (max_stack_depth), undocumented
D std.json 37370 74.7 KB segfaults
C++ RapidJSON 87266 175 KB segfaults
Nim json 104769 209 KB segfaults
OCaml yojson 130380 260 KB
go encoding/json 1973784 3.9 MiB fatal error, goroutine stack exceeds 1000000000-byte limit
C++ JSON for Modern C++ segfault fixed in v3.7.2
C# Newtonsoft.Json
ruby Oj
Haskell Aeson

* Note that configurable and disableable mean only that the default depth check inside the parser itself can be configured or disabled, not that the parser can be made to accept any nesting depth. When disabling the limit or increasing it too much, the parser will crash the calling program instead of returning a clean error.

Remarks

I tried to test the most popular json library of each language. If you want to add a new language or a new library, feel free to open a pull request. All the parameters were left to their default values.

bad_json_parsers's People

Contributors

azihassan avatar bbrks avatar dependabot[bot] avatar iffy avatar jwilk avatar lovasoa avatar madvikinggod avatar mwllgr avatar nlohmann avatar pmdhaussy avatar pulkomandy avatar robertdober avatar stolendata avatar tristan971 avatar verdie-g avatar wilg avatar yottster avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bad_json_parsers's Issues

.NET JSON parser methodology

Newtonsoft.Json can parse JSON in different ways, both with and without using a String representation and this will give you different results (e.g. a String cannot exceed 1,073,741,824 characters (due to the 2GiB single object size limit and the fact String always uses UTF-16) so that's an upper-limit when using JsonConvert.DeserializeObject<T>(String) but you should be able to read an input stream exceeding that limit using JsonTextReader and passing that into DeserializeObject.

Additionally, the first-party JSON parser that shipped with WCF 3.5 ( System.Runtime.Serialization.Json is built on the XML parser, which means it inherits the configurable nested object depth-limit - and I've seen this is not well understood in the .NET community - so just throwing that out there.

Suggestion: Stata

Hi,

I'm not a Stata user myself, so I'm unable to add it, but my service outputs JSON and Stata users (and only Stata users) often have problems because the language internal JSON parser can't deal with values that contain single quotes. For example valid JSON like:

"suburb": "Earl's Court"

causes the parser to die. Very, very annoying.

Shell scripts do not work correctly on macOS

While attempting to add Objective-C (NSJSONSerialization) tests, I found that the test scripts do not work correctly on macOS; the output is always 2. The first time I ran it, I got a number of Abort Traps from broken pipes between the scripts:

./utils/binary_search.sh: line 1: 65239 Broken pipe: 13 ./utils/deep_json_array.sh $n 65240 Abort trap: 6 | $json_parsing_command 2> /dev/null > /dev/null
repeated several times with different process numbers, followed by
./utils/binary_search.sh: line 1: 65292 Done ./utils/deep_json_array.sh $n 65293 Abort trap: 6 | $json_parsing_command 2> /dev/null > /dev/null
repeated several times with different process numbers, followed by
2
Subsequent runs just output 2

(Running the tests manually, NSJSONSerialization tops out at 512 levels of nesting.)

macOS 10.14.6, GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin18)

serde_json can disable the recursion limit

FYI, serde_json has a recursion limit to protect from malicious clients sending a deeply recursive structure and DOS-ing a server. However, it does have an optional feature unbounded_depth that disables this protection. It still uses the stack though, so it will still eventually blow the stack instead of using all available memory.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.