Code Monkey home page Code Monkey logo

scala-json-benchmark's Introduction

A Scala JSON parsing benchmark experiment

Introduction

JSON data reading and writing is an "exciting" subject in Scala, because there are many competing libraries, and it seems all of them refuse to die. This project seeks to make another benchmark from these projects, more specifically from Rapture, Lift, Spray, Play and Json4s.

This project also demonstrates how to use these different JSON libraries to deserialize a Scala case class. Of course, we also provide the code to run the benchmark and to generate test data, that might be of use.ja

About the libraries

Some of these libraries (Rapture and Json4s) use Jackson, the Java JSON-parsing library. Lift has a custom native parser. Spray was based on Parboiled, but moved in late 2014 to a custom native parser too, acquiring a huge performance improvement. All of them offer similar JSON ASTs to handle the parsed data, usually with some way to query the objects, and also some way to generate and extract JSON objects from a case class definition.

It seems from the documentation that Play took their time to offer a nice automatic way to extract case classes instead of forcing you to write code to pick the values from each field. But now all these libraries work in similar ways, although there is still some room for choosing one of them to your project based on taste or need. They have different subtle restrictions on what kind of implicit you have to define in your classes for the extraction to work. And there are other practical concerns, of course. If you are using Play, Lift or Spray you probably would like to stick to their native JSON handling libraries, and some libraries are not available in just any platform.

Experiment

In our study here we were only concerned with a "big data" scenario. We have a large file with thousands of JSON objects, one per text line, and we just wanted to see which library could read them into case classes faster.

The ScalaJsonBenchmark.scala file contains classes with JSON parsers created using each of the tested libraries. That file also carries out the experiment, reading the input file to memory first as an Array[String]. This array is then processed 11 times with each parser. The parsers are executed in sequence at each of these iterations, and not just one parser at a time.

The time taken in the exeution was measured using System.currentTimeMillis(), which is not the perfect method, but was good enough to provide some interesting results. This experiment was performed in April 2015 with an old lousy notebook computer (4GB RAM + Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz ) and with Scala 2.11.6, using a jar file created with sbt-assembly.

Results from set 1

These are the test results from birds.dat, a file with roughly 10 megabytes. (We used to call that a "box of diskettes" back in the day...) This file is provided in this project for you to reproduce the experiment. We also provided the program used to generate this data from the model classes (Bird and Place) with a bunch of random data generating functions. The tables below show the minimum, median and maximum times in milliseconds taken by each library to parse all the data in the 11 iterations.

First run:

libraryminmedmax
Play 759 982 4906
Lift 563 813 2125
Json4s 667 746 3012
Rapture 472 583 2501
Spray 440 544 2537

Second run:

libraryminmedmax
Lift 575 1028 3090
Play 782 892 3693
Json4s 607 806 2462
Rapture 495 707 2668
Spray 439 539 1753
Results from set 2

We have also performed the same test with another file with 4MB and about 10.000 lines. The objects were similar in having a lot of fields with different types, some of them optional, and some of them were lists of objects.

First run:

libraryminmedmax
Rapture111317544229
Json4s 438 8932221
Play 471 6123318
Spray 516 5822048
Lift 310 3751882

Second run:

libraryminmedmax
Rapture 1014 1566 4452
Spray 470 661 1659
Play 507 645 2490
Json4s 390 520 3924
Lift 340 444 2374

Conclusion

This experiment was quite surprising because while the performance from the libraries was consistent in different runs on the same test file, they performed quite differently on the two test cases. More specifically, Lift was kind of bad at parsing the birds.dat, but it kicked ass at the secret data file. At the same time Rapture showed the opposite performance. We haven't figured out yet what difference in the models might have caused this effect.

As for the other libraries, Spray, Play and Json4s all exhibited a roughly similar performance.

This study has demonstrated the great level of maturity being reached by all of these projects. It is pretty hard today to defend any of the based solely on speed. And also there is no obvious answer to which might be the fastest in any case. Performance may differ, but it may depend on the specific application.

Jury is still out on which library provides the best interface. But after this experiment I intend to stop caring about any library having a much worse performance, which was indeed a problem back in the day โ€” the Scala native parser still seems to be horrid. We should just concern ourselves with the code now, as the necessary performance improvements have apparently been applied to these projects already.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.