robohack / yajl Goto Github PK
View Code? Open in Web Editor NEWA fast streaming JSON parsing library in C. This variant uses BSDMake to build and includes various fixes and enhancements.
Home Page: http://robohack.github.io/yajl
License: Other
A fast streaming JSON parsing library in C. This variant uses BSDMake to build and includes various fixes and enhancements.
Home Page: http://robohack.github.io/yajl
License: Other
# Welcome to Yet Another JSON Library (YAJL) ## NOTE: This is a variant of the original [YAJL][LLOYD] by Lloyd Hilaiel. This variant started as a fork of Lloyd's original. This YAJL uses BSDMake for building, Cxref for documentation, and it includes a few minor bug fixes and other enhancements. See the Git history. Further fixes or enhancements are welcome. This YAJL is maintained in [robohack's GitHub][GHRY] by Greg A. Woods. Motto: Write Portable C without complicating the build! See, e.g.: https://nullprogram.com/blog/2017/03/30/ Also, and perhaps more importantly: https://queue.acm.org/detail.cfm?id=2349257 ## Why does the world need another C library for parsing JSON? Good question. In a review of current C JSON parsing libraries I was unable to find one that satisfies my requirements. Those are, 0. written in Plain Standard ANSI C (C99!) 1. i.e. portable 2. robust -- as close to "crash proof" as possible 3. data representation independent 4. fast 5. generates verbose, useful error messages including context of where the error occurs in the input text. 6. can parse JSON data off a stream, incrementally 7. simple to use 8. tiny 9. can use a custom memory allocator Numbers 3, 5, 6, and 7 were particularly hard to find, and were what caused me to ultimately create YAJL. This document is a tour of some of the more important aspects of YAJL. ## YAJL is Free. Permissive licensing means you can use it in open source and commercial products alike without any fees. My request beyond the licensing is that if you find bugs drop me a email, or better yet, fork and fix. Porting YAJL should be trivial, the implementation is ANSI C. If you port to new systems I'd love to hear of it and integrate your patches. ## YAJL is data representation independent. BYODR! Many JSON libraries impose a structure based data representation on you. This is a benefit in some cases and a drawback in others. YAJL uses callbacks to remain agnostic of the in-memory representation. So if you wish to build up an in-memory representation, you may do so using YAJL, but you must bring the code that defines and populates the in memory structure. This also means that YAJL can be used by other (higher level) JSON libraries if so desired. ## YAJL supports stream parsing This means you do not need to hold the whole JSON representation in textual form in memory. This makes YAJL ideal for filtering projects, where you're converting YAJL from one form to another (i.e. XML). The included JSON pretty printer is an example of such a filter program. ## YAJL is fast Minimal memory copying is performed. YAJL, when possible, returns pointers into the client provided text (i.e. for strings that have no embedded escape chars, hopefully the common case). I've put a lot of effort into profiling and tuning performance, but I have ignored a couple possible performance improvements to keep the interface clean, small, and flexible. My hope is that YAJL will perform comparably to the fastest JSON parser out there. YAJL should impose both minimal CPU and memory requirements on your application. ## YAJL is tiny. Fat free. No whip. Now truly so with the elimination of CMake! enjoy, Lloyd - July, 2007 Greg - April, 2024 [GHRY]: https://github.com/robohack/yajl/ [LLOYD]: https://github.com/lloyd/yajl/
After successful build and bmake install
- yajl.pc
file does not appear in pkgconfig dir.
Version release-2.2
Visual C++, in debug mode, redefines alloc functions (to detect memory leaks).
The syntax used in this project is incompatible with the redefinition.
Would it be possible to prefix the functions with 'yajl_' in the yajl_alloc_funcs struct?
Thus yajl_malloc(), yajl_free, yajl_realloc.
It's trivial but it makes a huge difference when you need to compile with Visual C++.
Thanks
In yajl_alloc(), there no NULL pointer check after the YA_MALLOC:
hand = (yajl_handle) YA_MALLOC(afs, sizeof(struct yajl_handle_t));
+if (!hand) return NULL;
Same in yajl_lex_alloc():
yajl_lexer lxr = (yajl_lexer) YA_MALLOC(alloc, sizeof(struct yajl_lexer_t));
+if (!lxr) return NULL;
In yajl_parser.c, on line 258, we pass yajl_buf_data(hand->decodeBuf) to the callback instead of the usual buffer "buf". As this points to another memory location, the callback receive 2 buffers that are located in another space.
Concrete problem: in ModSecurity, we use the callback to get the decoded value of the string and we calculate the offset of a variable value in order to mask it in the log. In the callback, when the JSON is decoded, we receive another location than the original one and we cannot calculate the offset.
We could perform this trivial change:
-yajl_string_decode(hand->decodeBuf, buf, bufLen);
-_CC_CHK(hand->callbacks->yajl_string(
hand->ctx, yajl_buf_data(hand->decodeBuf),
yajl_buf_len(hand->decodeBuf)));
+if (yajl_string_decode(hand->decodeBuf, buf, bufLen) < 0) return yajl_status_error;
+bufLen = yajl_buf_len(hand->decodeBuf);
+strcpy((char*)buf, (char*)yajl_buf_data(hand->decodeBuf));
+_CC_CHK(hand->callbacks->yajl_string(hand->ctx, buf, bufLen));
Same on line 397
For enhanced performance in prod:
#ifdef NDEBUG
# define YA_MALLOC(afs, sz) malloc(sz)
# define YA_FREE(afs, ptr) free(ptr)
# define YA_REALLOC(afs, ptr, sz) realloc(ptr, sz)
#else
# define YA_MALLOC(afs, sz) (afs)->malloc((afs)->ctx, (sz))
# define YA_FREE(afs, ptr) (afs)->free((afs)->ctx, (ptr))
# define YA_REALLOC(afs, ptr, sz) (afs)->realloc((afs)->ctx, (ptr), (sz))
#endif
in yajl_tree.c, on line 299, we use ctx->stack->value->type but, at this location, ctx->stack->value may be NULL
yajl_buf yajl_buf_alloc(yajl_alloc_funcs * yajl_alloc)
{
yajl_buf b = YA_MALLOC(yajl_alloc, sizeof(struct yajl_buf_t));
memset((void *) b, 0, sizeof(struct yajl_buf_t));
Should be
yajl_buf b = YA_MALLOC(yajl_alloc, sizeof(struct yajl_buf_t));
if (!b) return NULL;
Ther was a PR that solves a memory allocation problem being undetected:
https://github.com/lloyd/yajl/pull/151/files
This should really be incorporated
Some memory pool API (like Apache APR) don't implement a realloc function, so it's not possible to use these API with yajl. And it's totally impossible to develop a genric one.
A (quite simple) possibility is to allow to not specify a realloc function and, in this case, use an internal function. We can do that because in all calls to realloc, we know the old size (which we don't know in an external function).
Concretely:
Remove existing tests "afs->yajl_realloc == NULL"
Create the internal "extended" realloc:
void* yajl_realloc2(yajl_alloc_funcs* afs, void* previous, size_t sz, size_t oldsz)
{
void* new = afs->yajl_malloc(afs->ctx, sz);
if (!new) return NULL;
if (!previous) return new;
if (oldsz) memcpy(new, previous, oldsz);
afs->yajl_free(afs->ctx, previous);
return new;
}
Extend the definition of YA_REALLOC:
#define YA_REALLOC(afs, ptr, sz, oldsz) ((afs)->yajl_realloc ? (afs)->yajl_realloc((afs)->ctx, (ptr), (sz)) : yajl_realloc2((afs), (ptr), (sz), (oldsz)))
I attached a complete diff, tested with mod_security2 (APR).
For info, this speeds up the parsing by 250% on big JSON.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.