Code Monkey home page Code Monkey logo

jsonparser's Introduction

Go Report Card License

Alternative JSON parser for Go (10x times faster standard library)

It does not require you to know the structure of the payload (eg. create structs), and allows accessing fields by providing the path to them. It is up to 10 times faster than standard encoding/json package (depending on payload size and usage), allocates no memory. See benchmarks below.

Rationale

Originally I made this for a project that relies on a lot of 3rd party APIs that can be unpredictable and complex. I love simplicity and prefer to avoid external dependecies. encoding/json requires you to know exactly your data structures, or if you prefer to use map[string]interface{} instead, it will be very slow and hard to manage. I investigated what's on the market and found that most libraries are just wrappers around encoding/json, there is few options with own parsers (ffjson, easyjson), but they still requires you to create data structures.

Goal of this project is to push JSON parser to the performance limits and not sacrifice with compliance and developer user experience.

Example

For the given JSON our goal is to extract the user's full name, number of github followers and avatar.

import "github.com/buger/jsonparser"

...

data := []byte(`{
  "person": {
    "name": {
      "first": "Leonid",
      "last": "Bugaev",
      "fullName": "Leonid Bugaev"
    },
    "github": {
      "handle": "buger",
      "followers": 109
    },
    "avatars": [
      { "url": "https://avatars1.githubusercontent.com/u/14009?v=3&s=460", "type": "thumbnail" }
    ]
  },
  "company": {
    "name": "Acme"
  }
}`)

// You can specify key path by providing arguments to Get function
jsonparser.Get(data, "person", "name", "fullName")

// There is `GetInt` and `GetBoolean` helpers if you exactly know key data type
jsonparser.GetInt(data, "person", "github", "followers")

// When you try to get object, it will return you []byte slice pointer to data containing it
// In `company` it will be `{"name": "Acme"}`
jsonparser.Get(data, "company")

// If the key doesn't exist it will throw an error
var size int64
if value, err := jsonparser.GetInt(data, "company", "size"); err == nil {
  size = value
}

// You can use `ArrayEach` helper to iterate items [item1, item2 .... itemN]
jsonparser.ArrayEach(data, func(value []byte, dataType jsonparser.ValueType, offset int, err error) {
	fmt.Println(jsonparser.Get(value, "url"))
}, "person", "avatars")

// Or use can access fields by index!
jsonparser.GetString(data, "person", "avatars", "[0]", "url")

// You can use `ObjectEach` helper to iterate objects { "key1":object1, "key2":object2, .... "keyN":objectN }
jsonparser.ObjectEach(data, func(key []byte, value []byte, dataType jsonparser.ValueType, offset int) error {
        fmt.Printf("Key: '%s'\n Value: '%s'\n Type: %s\n", string(key), string(value), dataType)
	return nil
}, "person", "name")

// The most efficient way to extract multiple keys is `EachKey`

paths := [][]string{
  []string{"person", "name", "fullName"},
  []string{"person", "avatars", "[0]", "url"},
  []string{"company", "url"},
}
jsonparser.EachKey(data, func(idx int, value []byte, vt jsonparser.ValueType, err error){
  switch idx {
  case 0: // []string{"person", "name", "fullName"}
    ...
  case 1: // []string{"person", "avatars", "[0]", "url"}
    ...
  case 2: // []string{"company", "url"},
    ...
  }
}, paths...)

// For more information see docs below

Reference

Library API is really simple. You just need the Get method to perform any operation. The rest is just helpers around it.

You also can view API at godoc.org

Get

func Get(data []byte, keys ...string) (value []byte, dataType jsonparser.ValueType, offset int, err error)

Receives data structure, and key path to extract value from.

Returns:

  • value - Pointer to original data structure containing key value, or just empty slice if nothing found or error
  • dataType - Can be: NotExist, String, Number, Object, Array, Boolean or Null
  • offset - Offset from provided data structure where key value ends. Used mostly internally, for example for ArrayEach helper.
  • err - If the key is not found or any other parsing issue, it should return error. If key not found it also sets dataType to NotExist

Accepts multiple keys to specify path to JSON value (in case of quering nested structures). If no keys are provided it will try to extract the closest JSON value (simple ones or object/array), useful for reading streams or arrays, see ArrayEach implementation.

Note that keys can be an array indexes: jsonparser.GetInt("person", "avatars", "[0]", "url"), pretty cool, yeah?

GetString

func GetString(data []byte, keys ...string) (val string, err error)

Returns strings properly handing escaped and unicode characters. Note that this will cause additional memory allocations.

GetUnsafeString

If you need string in your app, and ready to sacrifice with support of escaped symbols in favor of speed. It returns string mapped to existing byte slice memory, without any allocations:

s, _, := jsonparser.GetUnsafeString(data, "person", "name", "title")
switch s {
  case 'CEO':
    ...
  case 'Engineer'
    ...
  ...
}

Note that unsafe here means that your string will exist until GC will free underlying byte slice, for most of cases it means that you can use this string only in current context, and should not pass it anywhere externally: through channels or any other way.

GetBoolean, GetInt and GetFloat

func GetBoolean(data []byte, keys ...string) (val bool, err error)

func GetFloat(data []byte, keys ...string) (val float64, err error)

func GetInt(data []byte, keys ...string) (val int64, err error)

If you know the key type, you can use the helpers above. If key data type do not match, it will return error.

ArrayEach

func ArrayEach(data []byte, cb func(value []byte, dataType jsonparser.ValueType, offset int, err error), keys ...string)

Needed for iterating arrays, accepts a callback function with the same return arguments as Get.

ObjectEach

func ObjectEach(data []byte, callback func(key []byte, value []byte, dataType ValueType, offset int) error, keys ...string) (err error)

Needed for iterating object, accepts a callback function. Example:

var handler func([]byte, []byte, jsonparser.ValueType, int) error
handler = func(key []byte, value []byte, dataType jsonparser.ValueType, offset int) error {
	//do stuff here
}
jsonparser.ObjectEach(myJson, handler)

EachKey

func EachKey(data []byte, cb func(idx int, value []byte, dataType jsonparser.ValueType, err error), paths ...[]string)

When you need to read multiple keys, and you do not afraid of low-level API EachKey is your friend. It read payload only single time, and calls callback function once path is found. For example when you call multiple times Get, it has to process payload multiple times, each time you call it. Depending on payload EachKey can be multiple times faster than Get. Path can use nested keys as well!

paths := [][]string{
	[]string{"uuid"},
	[]string{"tz"},
	[]string{"ua"},
	[]string{"st"},
}
var data SmallPayload

jsonparser.EachKey(smallFixture, func(idx int, value []byte, vt jsonparser.ValueType, err error){
	switch idx {
	case 0:
		data.Uuid, _ = value
	case 1:
		v, _ := jsonparser.ParseInt(value)
		data.Tz = int(v)
	case 2:
		data.Ua, _ = value
	case 3:
		v, _ := jsonparser.ParseInt(value)
		data.St = int(v)
	}
}, paths...)

Set

func Set(data []byte, setValue []byte, keys ...string) (value []byte, err error)

Receives existing data structure, key path to set, and value to set at that key. This functionality is experimental.

Returns:

  • value - Pointer to original data structure with updated or added key value.
  • err - If any parsing issue, it should return error.

Accepts multiple keys to specify path to JSON value (in case of updating or creating nested structures).

Note that keys can be an array indexes: jsonparser.Set(data, []byte("http://github.com"), "person", "avatars", "[0]", "url")

Delete

func Delete(data []byte, keys ...string) value []byte

Receives existing data structure, and key path to delete. This functionality is experimental.

Returns:

  • value - Pointer to original data structure with key path deleted if it can be found. If there is no key path, then the whole data structure is deleted.

Accepts multiple keys to specify path to JSON value (in case of updating or creating nested structures).

Note that keys can be an array indexes: jsonparser.Delete(data, "person", "avatars", "[0]", "url")

What makes it so fast?

  • It does not rely on encoding/json, reflection or interface{}, the only real package dependency is bytes.
  • Operates with JSON payload on byte level, providing you pointers to the original data structure: no memory allocation.
  • No automatic type conversions, by default everything is a []byte, but it provides you value type, so you can convert by yourself (there is few helpers included).
  • Does not parse full record, only keys you specified

Benchmarks

There are 3 benchmark types, trying to simulate real-life usage for small, medium and large JSON payloads. For each metric, the lower value is better. Time/op is in nanoseconds. Values better than standard encoding/json marked as bold text. Benchmarks run on standard Linode 1024 box.

Compared libraries:

TLDR

If you want to skip next sections we have 2 winner: jsonparser and easyjson. jsonparser is up to 10 times faster than standard encoding/json package (depending on payload size and usage), and almost infinitely (literally) better in memory consumption because it operates with data on byte level, and provide direct slice pointers. easyjson wins in CPU in medium tests and frankly i'm impressed with this package: it is remarkable results considering that it is almost drop-in replacement for encoding/json (require some code generation).

It's hard to fully compare jsonparser and easyjson (or ffson), they a true parsers and fully process record, unlike jsonparser which parse only keys you specified.

If you searching for replacement of encoding/json while keeping structs, easyjson is an amazing choice. If you want to process dynamic JSON, have memory constrains, or more control over your data you should try jsonparser.

jsonparser performance heavily depends on usage, and it works best when you do not need to process full record, only some keys. The more calls you need to make, the slower it will be, in contrast easyjson (or ffjson, encoding/json) parser record only 1 time, and then you can make as many calls as you want.

With great power comes great responsibility! :)

Small payload

Each test processes 190 bytes of http log as a JSON record. It should read multiple fields. https://github.com/buger/jsonparser/blob/master/benchmark/benchmark_small_payload_test.go

Library time/op bytes/op allocs/op
encoding/json struct 7879 880 18
encoding/json interface{} 8946 1521 38
Jeffail/gabs 10053 1649 46
bitly/go-simplejson 10128 2241 36
antonholmquist/jason 27152 7237 101
github.com/ugorji/go/codec 8806 2176 31
mreiferson/go-ujson 7008 1409 37
a8m/djson 3862 1249 30
pquerna/ffjson 3769 624 15
mailru/easyjson 2002 192 9
buger/jsonparser 1367 0 0
buger/jsonparser (EachKey API) 809 0 0

Winners are ffjson, easyjson and jsonparser, where jsonparser is up to 9.8x faster than encoding/json and 4.6x faster than ffjson, and slightly faster than easyjson. If you look at memory allocation, jsonparser has no rivals, as it makes no data copy and operates with raw []byte structures and pointers to it.

Medium payload

Each test processes a 2.4kb JSON record (based on Clearbit API). It should read multiple nested fields and 1 array.

https://github.com/buger/jsonparser/blob/master/benchmark/benchmark_medium_payload_test.go

Library time/op bytes/op allocs/op
encoding/json struct 57749 1336 29
encoding/json interface{} 79297 10627 215
Jeffail/gabs 83807 11202 235
bitly/go-simplejson 88187 17187 220
antonholmquist/jason 94099 19013 247
github.com/ugorji/go/codec 114719 6712 152
mreiferson/go-ujson 56972 11547 270
a8m/djson 28525 10196 198
pquerna/ffjson 20298 856 20
mailru/easyjson 10512 336 12
buger/jsonparser 15955 0 0
buger/jsonparser (EachKey API) 8916 0 0

The difference between ffjson and jsonparser in CPU usage is smaller, while the memory consumption difference is growing. On the other hand easyjson shows remarkable performance for medium payload.

gabs, go-simplejson and jason are based on encoding/json and map[string]interface{} and actually only helpers for unstructured JSON, their performance correlate with encoding/json interface{}, and they will skip next round. go-ujson while have its own parser, shows same performance as encoding/json, also skips next round. Same situation with ugorji/go/codec, but it showed unexpectedly bad performance for complex payloads.

Large payload

Each test processes a 24kb JSON record (based on Discourse API) It should read 2 arrays, and for each item in array get a few fields. Basically it means processing a full JSON file.

https://github.com/buger/jsonparser/blob/master/benchmark/benchmark_large_payload_test.go

Library time/op bytes/op allocs/op
encoding/json struct 748336 8272 307
encoding/json interface{} 1224271 215425 3395
a8m/djson 510082 213682 2845
pquerna/ffjson 312271 7792 298
mailru/easyjson 154186 6992 288
buger/jsonparser 85308 0 0

jsonparser now is a winner, but do not forget that it is way more lightweight parser than ffson or easyjson, and they have to parser all the data, while jsonparser parse only what you need. All ffjson, easysjon and jsonparser have their own parsing code, and does not depend on encoding/json or interface{}, thats one of the reasons why they are so fast. easyjson also use a bit of unsafe package to reduce memory consuption (in theory it can lead to some unexpected GC issue, but i did not tested enough)

Also last benchmark did not included EachKey test, because in this particular case we need to read lot of Array values, and using ArrayEach is more efficient.

Questions and support

All bug-reports and suggestions should go though Github Issues.

Contributing

  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Added some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request

Development

All my development happens using Docker, and repo include some Make tasks to simplify development.

  • make build - builds docker image, usually can be called only once
  • make test - run tests
  • make fmt - run go fmt
  • make bench - run benchmarks (if you need to run only single benchmark modify BENCHMARK variable in make file)
  • make profile - runs benchmark and generate 3 files- cpu.out, mem.mprof and benchmark.test binary, which can be used for go tool pprof
  • make bash - enter container (i use it for running go tool pprof above)

jsonparser's People

Contributors

a8m avatar adamkorcz avatar allenx2018 avatar buger avatar cshubhamrao avatar d-hat avatar daboyuka avatar daria-kay avatar diedsmiling avatar floren avatar jensenak avatar jstroem avatar kcasctiv avatar ketilovre avatar lrsk avatar mamal72 avatar matijavizintin avatar moredure avatar nagesh4193 avatar onepill avatar pytlesk4 avatar richardartoul avatar rrgilchrist avatar saginadir avatar soniabhishek avatar twpayne avatar unxcepted avatar villenny avatar vkd avatar zabil avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jsonparser's Issues

EachKey with nested object at same level and same keys

If you have a json like this:

{
    "nested": {
        "a": "test"
    },
    "nested2": {
        "a": "test2"
    }
}

and you want to use EachKey with the following paths:

paths := [][]string{
    []string{"nested", "a"},
    []string{"nested2", "a"},
}

it returns the correct value only for the first path. For the second one, it returns an empty byte slice.

Few additional tests

(Disclaimer: trying to keep suggestions concise and to the point, add IMHO to everything:) )

I'm too lazy to fork etc. anyways, validate the values:

func TestMoreInvalid(t *testing.T) {
    if _, _, e := GetBoolean([]byte(`{"c": txyz}`), "c"); e == nil {
        t.Errorf("Invalid value")
    }
    if _, _, e := GetBoolean([]byte(`{"c": fxyz}`), "c"); e == nil {
        t.Errorf("Invalid value")
    }
    if v, _, _, _ := Get([]byte(`{"c": "15\u00f8C"}`), "c"); !bytes.Equal(v, []byte("15ยฐC")) {
        t.Errorf("Invalid value %s", v)
    }
    if v, _, _, _ := Get([]byte(`{"c": "\\\""}`), "c"); !bytes.Equal(v, []byte(`\"`)) {
        t.Errorf("Invalid value %s", v)
    }
}

Make the tests to table driven, it will be easier to add new ones. e.g.

    type T struct {
        In   string
        Path []string
        Out  string
    }

    var tests = []T{
        {`{"a": { "b": 1}, "c": 2}`, []string{"a", "b"}, `1`},
        {`{"a": { "b":{"c":"d" }}}`, []string{"a", "b", "c"}, `d`},
        // ...
    }

    for i, test := range tests {
        v, _, _, err := Get([]byte(test.In), test.Path...)
        if err != nil {
            t.Errorf("%02d expected %v got error %v", i, test.Out, err)
            continue
        }
        if !bytes.Equal(v, []byte(test.Out)) {
            t.Errorf("%02d expected %v got %v", i, test.Out, string(v))
        }
    }

PS: Run go vet on your code, your tests have invalid use of t.Errorf.

Exposing jsonparser.ValueType

Is there any way to access the dataType's at all?

jsonparser.ObjectEach(value, func(key []byte, value []byte, dataType jsonparser.ValueType, offset int) error {

}

I want to be able to switch on the dataType but the ValueType doesn't appear to be exported?

Respect errors in cb function in ArrayEach and EachKey

Suppose we have following code:

    result := []Foo{}
    err := jsonparser.ArrayEach(data, func(value []byte, dataType jsonparser.ValueType, offset int, err error) {
        if err != nil {
            return
        }
        bars, err := extractBars(value)
        if err != nil {
            return
        }
        append(result, Foo{FieldOne: jsonparser.GetString(value, "target"), Bars: bars})
    })

We have no options to stop iterate over data even if our function extractBars will return an error.

I think there are two possible ways of fixing this issue.

1. If we don't care about backward compatibility
Just change cb signature to cb func(value []byte, dataType ValueType, offset int, err error) error.

2. If we respect backward compatibility
Then we could add counterpart for both there functions and name these like IterArray, IterKeys or something like this.

P.S.: In both cases I could contribute with PR.

EachKey panics if more than 64 keys in path

I'm dealing with a large, flat json payload that has 100+ keys in a map.

I hit panics when using EachKey with the full list of json keys:

goroutine 1 [running]:
panic(0x4e8ac0, 0xc42000a0f0)
    /home/niek/go17/src/runtime/panic.go:500 +0x1a1
github.com/buger/jsonparser.EachKey(0xc421480000, 0x11ac, 0x11ac, 0xc420050e30, 0xc4200da000, 0xaf, 0x100, 0x0)
    /home/niek/workspace/src/github.com/buger/jsonparser/parser.go:237 +0x6a6
  ...snipped...

The problem code seems here:
https://github.com/buger/jsonparser/blob/master/parser.go#L236

If I'm reading it right, there is a limit of 64 keys per EachKey lookup before the int64 bitmask overflows.

Limit should probably be documented and code should return an error rather than panic.

pre-allocate for Set() new component for lower memory usage

Extra allocations are done when using an empty bytes.Buffer here in createInsertComponents(). bytes.Buffer uses a normal grow strategy of successively doubling in size. This could be avoided by counting the length of the needed keys and the final set value and starting with a Buffer that is about the right size.

Get array element by its index-id

It is not very efficient to fetch array elements using EachKey if you know the index of the element you want to fetch.

Not sure how that is possible in current implementation, would be good to complete this lib

Can not use unsafe on Google App Engine

I would like to use the lib for a project hosted on Google App Engine. Unfortunately, Google App Engine does not support the unsafe package.

Is it possible to refactor to not have to use the unsafe package? This would open jsonparser up for developers on Google App Engine.

Thanks!

Set values

It would be really nice if this package provided a Set(fieldName string, value []byte) or Set(fieldName string, value json.RawMessage) function. Right now we are processing huge amount of big json-line files. Each line has to be unmarshalled as map[string]json.RawMessage, then update just one filed and finally marshal it again. It would be really nice if we could update a value without unmarshalling!
What do you think?

jsonparser.Get() returns 4 values?

I tried to run token := jsonparser.Get(responseData, "token") but got an error says "multiple-value jsonparser.Get() in single-value context".
I had to run token, _, _, err := jsonparser.Get(responseData, "token") for it to work.

Get number of Keys in Object (or Array)

I would like to know the number of Keys in an Object. currently I use ObjectEach and int++ to count it which is of course very slow. Is there a better way of doing it?

Add GetInt helper (and make GetNumber faster)?

First, thank you for this package!

What do you think about adding a new helper function: GetInt(data []byte, keys ...string) (val int64, offset int, err error)?
I think this is a frequent use-case, so this function would be quite useful.

Also to increase the performances of GetNumber and GetInt, you could copy the strconv package from the stdlib into your package and update the signatures of ParseFloat and ParseInt to use []byte instead of string.
It would avoid allocating memory and make your package even faster.

I think this is the kind of optimization that is acceptable in a performance-oriented package like yours.

What do you think?

Can not handle special characters

hi
I am following the exception when dealing with json containing special characters

Example json:
{"name":"test","sql":"select `name`\ from tables"}

Panic:
Value looks like Number/Boolean/None, but can't find its end: ',' or '}' symbol

Unify names and declarations

Currently we have :

EachKey
ArrayEach
ObjectEach

ArrayEach and ObjectEach almost do same function, and have same parameters, but the function declarations/def are bit different

Suggest overriding EachKey with KeyEach (or override to EachObject and EachArray whatever sounds more logial) and make the function declaration/parameters as close as possible

Would be greate to have this marked as to-do, so whenever I have time i'll get back to it

Errors in parsing simple JSON

@buger hello, nice package idea, keep up a good work!

Just tried example from readme and got a couple of errors, here is code to reproduce:

package main

import (
    "fmt"

    "github.com/buger/jsonparser"
)

func main() {

    data := []byte(`{
  "person": {
    "married": true
  },
  "company": {
    "name": "Acme",
    "size": 109
  }
}`)

    // If the key doesn't exist it will throw an error
    var size float64 = 0
    if value, _, err := jsonparser.GetNumber(data, "company", "size"); err == nil {
        size = value
    } else {
        fmt.Println(err.Error())
    }
    fmt.Println(size)

    var married bool
    if value, _, err := jsonparser.GetBoolean(data, "person", "married"); err == nil {
        married = value
    } else {
        fmt.Println(err.Error())
    }
    fmt.Println(married)
}

Output:

strconv.ParseFloat: parsing "109\n  ": invalid syntax
0
panic: runtime error: index out of range

goroutine 1 [running]:
github.com/buger/jsonparser.GetBoolean(0xc820058070, 0x61, 0x70, 0xc82003bf28, 0x2, 0x2, 0x0, 0x26, 0x0, 0x0)
    /private/var/www/different/go/gopath/src/github.com/buger/jsonparser/parser.go:286 +0x28a
main.main()
    /private/var/www/different/go/gopath/src/github.com/centrifugal/parse.go:30 +0x2a9
exit status 2

Seems like

  1. some values need proper trimming
  2. I think there must be v[0] on this line

Also example is broken, instead of

// If the key doesn't exist it will throw an error
size := 0
if value, _, err := jsonparser.GetNumber(data, "company", "size"); err != nil {
  size = value
}

There must be sth like this I suppose:

// If the key doesn't exist it will throw an error
var size float64 = 0
if value, _, err := jsonparser.GetNumber(data, "company", "size"); err == nil {
  size = value
}

Also I found confusing return in tests

Sorry for not sending pr!

Internal panic on bad input

The following code panics. It should err.

All credit for finding the bug goes to the excellent go-fuzz library.

package main

import (
	"fmt"

	"github.com/buger/jsonparser"
)

func main() {
	data := "{}\"\":"
	_, err := jsonparser.GetString([]byte(data), "type")
	if err != nil {
		fmt.Println(err.Error())
	}
}

Bug in Get API while fetching non-existing key

Hi, according to documentation the Get API should return when key is not found. But I found that it is not the case for the following example:

package main

import "fmt"
import "github.com/buger/jsonparser"

func main() {
        data := []byte(`{
  "person": {
    "avatars": [
      { "url": "https://a.b.com", "type": "thumbnail" }
    ]
  }
}`)

       // ask for non-existing key in avatars sub-structure
        keys := []string{"person", "avatars", "[0]", "bla"}
        val, dType, _, err := jsonparser.Get(data, keys...)
        fmt.Println(string(val), dType, err)
}

it returns

{ "url": "https://a.b.com", "type": "thumbnail" } object <nil>

Could you please fix the problem.
Thanks,
Valentin.

So... What about msgPack?)

Could you add a msgPack support?)

Just suggestion, it will be awesome,
the same functional but for msgpack;)

Can parse Wikipedia response

Here is a code to reproduce

package main

import (
	"github.com/buger/jsonparser"
	"fmt"
)

func main() {
	body := []byte(`["api",["Application programming interface"],["In computer programming, an application programming interface (API) is a set of subroutine definitions, protocols, and tools for building application software."],["https://en.wikipedia.org/wiki/Application_programming_interface"]]`)
	jsonparser.ArrayEach(body, func(value []byte, dataType jsonparser.ValueType, offset int, err error) {
		a, err := jsonparser.GetString(value, "[0]")
		if err == nil {
			fmt.Println(a)
		} else {
			fmt.Println(err)
		}
	})
}

The JSON comes directly from wikipedia. The first element of the array is not an array so I get why "Key path not found" is correct in that case but the other 3 are arrays with single element which is a string. I get "Unknown value type" for those.

Issues parsing array getting by index

Hello,

I have the following JSON and I want to be able to parse it using jsonparser. The standard JSON unmarshalling is crazy slow!

["A9DDB0",{"altitude":14675,"heading":105,"vert_rate":-1408,"speed":328},{"sharecode":"newclientdev","local_ip":"192.168.1.100"},1481276460]

I can't seem to get to the elements by index at all.

I've tried using

jsonparser.ArrayEach(rawJSON, func(value []byte, dataType jsonparser.ValueType, offset int, err error) {
		fmt.Printf("%v:%v \n", offset, dataType)
})

Which does iterate them all but it doesn't seem to say the index, am I missing something there? Also then how would I access say the altitude from the second element?

ValueType Number: float or int?

I'm using jsonparser, to parse a json structure with a more or less unknown structure. I have no way of knowing if a ValueType number is a float or an int, and thus if I should call ParseFloat or ParseInt.

would it make sense to split the ValueType number into Integer and Float?

for now I will solve this by running trough the bytearray and check for a "." char, but since the tokenizer already went over this it would be nice if this would be supported from the lib itself.

Compliance test

We need to make sure that it parse any json.

Does anyone knows good source of json specs?

Get() returns error when key is simply not found

When Get() is called on a key that does not exist in the JSON, it currently returns NotExist and an error. I personally would much prefer it return NotExist and no error. That way, one can differentiate between the three possible cases: key found, key not found, and malformed JSON/other error. It doesn't seem like simply not finding a key is really an "error".

I can make a PR updating all of the tests. Would that be OK?

Get() key search can bleed through levels of JSON hierarchy

I want to first thank you, @buger, for your work on this library. Looking up a few JSON key paths in large JSON blobs is a significant bottleneck in a project I'm working on, and your library could give us a big speedup without changing our data format.

Unfortunately, I've discovered an issue in Get(): when searching for a key, Get() may locate that key outside the current JSON object. Here is an example test case that breaks (written using check.v1):

package jsonparser_test

import (
    "github.com/buger-jsonparser"
    . "gopkg.in/check.v1"
    "testing"
)

func (s *JsonParserTests) TestJsonParserSearchBleed(c *C) {
    killer := []byte(`{
      "parentkey": {
        "childkey": {
          "grandchildkey": 123
        },
        "otherchildkey": 123
      }
    }`)

    var jtype int

    _, jtype, _, _ = jsonparser.Get(killer, "childkey")
    c.Assert(jtype, Equals, jsonparser.NotExist) // fails, returns data parentkey.childkey

    _, jtype, _, _ = jsonparser.Get(killer, "parentkey", "childkey", "otherchildkey")
    c.Assert(jtype, Equals, jsonparser.NotExist) // fails, returns data parentkey.otherchildkey
}

// Boilerplate
func Test(t *testing.T) { TestingT(t) }
type JsonParserTests struct{}
var _ = Suite(&JsonParserTests{})

The issue is that Get() uses bytes.Index() to find the next key it's looking for, but only validates it by checking that it is surrounded by double quotes and followed by a colon. In particular, it does not check whether it has crossed an unmatched sequence of braces, which would indicate transitioning into another JSON object level.

I don't have a great suggestion as to how to fix this, sadly. Best of luck.

How to parse this?

Hi,

how I can get the value of this key "result" from
[[{"result":true}]]

thank you!

escape html

Hello, i was wondering if there is a way to remove html escaped characters while parsing a json as standard lib does. i can't find this in documentation.

package main

import (
	"encoding/json"
	"fmt"

	"github.com/buger/jsonparser"
)

func main() {
	e := []byte(`{ "url": "https:\/\/example.com\/" }`)
	url, _, _, _ := jsonparser.Get(e, "url")
	var parsed map[string]interface{}
	json.Unmarshal(e, &parsed)

	fmt.Println("with standard json:", parsed["url"].(string))
	fmt.Println("with jsonparser:", string(url))
}

output

with standard json: https://example.com/
with jsonparser: https:\/\/example.com\/

Deletes and Sets on keys over same document yields invalid results

Hello,

A quick example below.

package main

import (
	"strconv"
	"fmt"

	"github.com/buger/jsonparser"
)

func main() {
	var err error
	js := []byte(`{"key1":1, "key2":2,"key3":3,"key4":4}`)

	for n := 0; n < 3; n++ {
		nstr := strconv.FormatInt(int64(n), 10)
		key := "key"+nstr
		js = jsonparser.Delete(js, key)
		js, err = jsonparser.Set(js, []byte(nstr), key)
		if err != nil {
			panic(err)
		}
	}

	// prints { "key3":3,"key4":4,"key0":0,"key1":1,"key2":2}
	fmt.Println(string(js))

	for n := 0; n < 3; n++ {
		js = jsonparser.Delete(js, "key4")
		js, err = jsonparser.Set(js, []byte("four"), "key4")
		if err != nil {
			panic(err)
		}
	}

	// prints { "key3":3,"key0":0,"key1":1,"key2":2fourfour,"key4":four}
	fmt.Println(string(js))
}

EachKey panics on invalid JSON payload

Following code would first result in an err, then would panic:

package main

import (
	"fmt"
	"github.com/buger/jsonparser"
)

func main() {
	data := `{"id":`

	jsonparser.EachKey([]byte(data), func(idx int, value []byte, vt jsonparser.ValueType, currentErr error) {
		if currentErr != nil {
			fmt.Printf("Error: %v\n", currentErr)
			return
		}
		fmt.Printf("idx: %d value: %s\n", idx, string(value))
	}, []string{"id"}, []string{"method"}, []string{"params"})
}

Maybe this is similar to #100?

Improve support for array indexing in Set

Indexing with Set when the path does not already exist is not supported. For example, if you attempt the example in the docs:

jsonparser.Set([]byte(`{}`), []byte(`"http://github.com"`), "person", "avatars", "[0]", "url")

returns

{"person":{"avatars":{"[0]":{"url":"http://github.com"}}}}

Accessing top-level array ...

[
{
"first_name": "Admin",
"last_name": "Account",
"role": "org_user",
"id": 1
},
{
"first_name": "Joe",
"last_name": "Smith",
"role": "admin",
"id": 2
}
]

I've tried ArrayEach and EachKey but [0] doesn't find anything and I can't seem to find a clear example.

Any pointers would be greatly appreciated. If push comes to shove I'll wrap the response in a dictionary.

jsonparser is 10 times slower than buildin json

I'm working on allegro/marathon-consul#140. I need to handle JSON events with size from 200B to 8MB. There are 30 types of events but I'm interesting only with two. I need fast way to extract one field (eventType). I think I can do it fast with bytes.Contains but it turns out it's slower than JSON parsing. Then I try jsonparser and it's also slower.

Here are my benchmarks: https://gist.github.com/janisz/9db4d50fc09f2eba81781af8dbc26a03
Maybe I'm doing something wrong.

Accessing via index not working

I see support was added for this, that's great and thanks.

I can't however seem to get it to work properly.

Consider the following JSON.

{"metadata":["5666473ds52ca",50.11,-2.12,117.476220,false,false,"3.6.185",8]}

I want to read out the 5666473ds52ca value and am using the following code but it returns "Key path not found". Any idea what I am doing wrong here?

code, err := jsonparser.GetString(packet.payload, "metdata", "[0]")
if err != nil {
		log.Println(err)
	}

Internal panic while processing invalid JSON

When trying to parse a slightly malformed JSON object, trying to access a value with a path of more than one component triggers an apparently inrecoverable internal runtime error within the parser code.
I would have expected to obtain a parse error and have an opportunity to handle it gracefully within my own code.

Example code:

package main

import (
	"bufio"
	"fmt"
	"os"

	"github.com/buger/jsonparser"
)

type Entry struct {
	Timestamp    string
	PktsToClient int64
}

var evekeys = [][]string{
	[]string{"timestamp"},
	[]string{"flow", "pkts_toclient"},
}

func main() {

	e := Entry{}
	scanner := bufio.NewScanner(os.Stdin)

	for scanner.Scan() {
		json := scanner.Bytes()
		jsonparser.EachKey(json, func(idx int, value []byte, vt jsonparser.ValueType, err error) {
			switch idx {
			case 0:
				if err == nil {
					e.Timestamp = string(value[:])
				}
			case 1:
				if err == nil {
					e.PktsToClient, _ = jsonparser.ParseInt(value[:])
				}
			}
		}, evekeys...)

		fmt.Println(e)
	}
}

Example input:

{"timestamp":"2017-03-06T09:10:09.002473+0000","flow_id":42,"event_type":"flow","src_ip":"0.0.0.0","src_port":23,"dest_ip":"0.0.0.0","dest_port":23,"proto":"UDP","aemuse":1638400,"reassembly_memuse":12332832},"detect":{"alert":0},"app_layer":{"flow":{"http":0,"ftp":0,"smtp":0,"tls":0,"ssh":0,"imap":0,"msn":0,"smb":0,"dcerpc_tcp":0,"dns_tcp":0,"failed_tcp":0,"dcerpc_udp":0,"dns_udp":0,"failed_udp":0},"tx":{"http":0,"smtp":0,"tls":0,"dns_tcp":0,"dns_udp":0}},"flow_mgr":{"closed_pruned":0,"new_pruned":0,"est_pruned":0,"bypassed_pruned":0,"flows_checked":0,"flows_notimeout":0,"flows_timeout":0,"flows_timeout_inuse":0,"flows_removed":0,"rows_checked":65536,"rows_skipped":65536,"rows_empty":0,"rows_busy":0,"rows_maxlen":0},"dns":{"memuse":0,"memcap_state":0,"memcap_global":0},"http":{"memuse":0,"memcap":0}}}

Example run:

$ cat problem.json  | ./jsonparser-bug
panic: runtime error: index out of range

goroutine 1 [running]:
panic(0x49f920, 0xc42000a140)
	/usr/lib/go-1.7/src/runtime/panic.go:500 +0x1a1
github.com/buger/jsonparser.EachKey(0xc42008a000, 0x32d, 0x1000, 0xc42003de78, 0x517e80, 0x2, 0x2, 0x200)
	/home/satta/golang/src/github.com/buger/jsonparser/parser.go:292 +0xe66
main.main()
	/home/satta/golang/src/github.com/satta/jsonparser-bug/main.go:39 +0x139

Benchmark time unit?

Is the benchmark Time/op unit in ms? microseconds? nanoseconds? It's not clear from the README.
Anyway, great job in providing a good JSON parser for Go ๐Ÿ‘

Set for each entry in array

Hello!

Would it be possible to add the possibility to set the value for each element in the array like this:
jsonparser.Set(jsonBody, []byte("\"\""), "issuesData", "issues", "[]", "extraFields", "[]", "html")

Thanks in advance,
Nighthawk

EachKey can miss keys

I think I have root cause outlined in comments below. Here is a nice repro case:

This code should print entries for both "potato_id" and "created" keys

package main

import (
        "fmt"
        "github.com/buger/jsonparser"
)

func main() {

        json := []byte(`{"potato_id": "6", "created": "2016-07-19 15:23:00"}`)

        paths := [][]string{
                []string{"potato_id"},
                []string{"created"},
        }

        jsonparser.EachKey(json, func(pathIdx int, value []byte, dtype jsonparser.ValueType, err error) {
                fmt.Printf("found %s -> %s\n", paths[pathIdx], string(value))
        }, paths...)
}

But I only see potato_id:

nieksand$ ./bugrepro 
found [potato_id] -> 6

But if I comment out the "potato_id" path, then I suddenly see the "created" key.

...snip...
        paths := [][]string{
//                []string{"potato_id"},
                []string{"created"},
        }
...snip...
nieksand$ ./bugrepro 
found [created] -> 2016-07-19 15:23:00

Unless I'm misunderstanding the purpose of EachKey, I would expect the original program to print both entries.

Get Structure?

Would it be possible to use the parser to get the JSON's Structure?

I know there are existing packages out there, but all that I tried can't keep the original JSON's Structure's order. I.e., the output structure are all sorted by keys, which is both a blessed and cursed at the same time.

Since this parser keeps the original order, it seems to be the only candidate that is able to do it.

Thanks

ArrayEach ignores parse errors (not the same as issue #53)

The ArrayEach function will not give an error if the parsing of one of the array elements fails.
The parse err is passed to the callback function, and that's it. (As stated in #53 the callback function is powerless to act on this error)

parser.go#L585

if e != nil {
    break
}

It just breaks out of the iterating loop and just returns as usual. I would expect the ArrayEach function to also signal an error in this case.

EDIT: and yes there are ways to bypass this in your code but still seems like a bug

can make a PR if required...

Wrong Delete() method

jsonparser.Delete([]byte(`{"a": {"b": 1}, "b": 2}`), "b")

Expect:
output: {"a": {"b": 1}}

Actual:
output: {"a": {}

License

Under what license this project is?

Get fails to return correct result

Not sure what is going on, but Get doesn't work for such simple JSON:

package main

import (
	"fmt"

	"github.com/buger/jsonparser"
)

func main() {
	json := []byte(`{"fiz":"fuz","foo":{"bar":"baz"}}`)

	value, _, _, err := jsonparser.Get(json, "fiz", "bar")

	if err != nil {
		panic(err)
	}

	fmt.Println(string(value))
	// baz
}

It should return an error instead

Cannot parse "[ ]" (whitespace in array breaks parsing)

Consider "[\n]". This is a valid empty Array in JSON, but it is parsed as having one element, with dataType Unknown.

The following test demonstrates the problem:

func TestParseArrayWithWhitespace(t *testing.T) {
	data := []byte("[\n]")
	_, _ = ArrayEach(data, func(value []byte, dataType ValueType, offset int, err error) {
		t.Errorf("ArrayEach([]byte(%q), ...) called callback with an empty array: value=[]byte(%q), dataType=%v, offset=%d, err=%v", data, value, dataType, offset, err)
	})
}

This test should pass, but instead it fails:

--- FAIL: TestParseArrayWithWhitespace (0.00s)
	parser_test.go:1130: ArrayEach([]byte("[\n]"), ...) called callback with an empty array: value=[]byte(""), dataType=unknown, offset=2, err=Unknown value type

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.