Code Monkey home page Code Monkey logo

thrift-protobuf-compare's People

Contributors

blair avatar sharad-sprinkle avatar

Watchers

 avatar

thrift-protobuf-compare's Issues

Different data is serialized for different serializers

There are variances amongst the data that is serialized by the different
serializers.

In ThriftSerializer::create(), this line appears:
Image image2 = new Image("http://javaone.com/keynote_thumbnail.jpg",
"Javaone Keynote", -1, -1, Size.SMALL);

None of the other create() methods have the two "-1" values.

Also, in StdMediaSerializer::create(), these lines appear:
Image image1 = new Image(0, "Javaone Keynote", "A", 0, Image.Size.LARGE)
;
Image image2 = new Image(0, "Javaone Keynote", "B", 0, Image.Size.SMALL)
;

Note that the URIs are "A" nd "B", which is much shorter than the URIs used
in the ThriftSerializer and the ProtobufSerializer.

These and any other inconsistencies should be corrected so the tests can be
truly apples-to-apples.


Original issue reported on code.google.com by [email protected] on 13 May 2009 at 8:27

Avro should reuse DatumReader and DatumWriter instances

In Avro, the DatumReader and DatumWriter implementations have no internal
state and are expected to be reused, potentially shared by many threads. 
SpecificDatumReader in particular is currently expensive to create
(although that could be improved in Avro).  The attached patch vastly
improves avro-specific times, and is a better example of typical Avro usage.

After this simple patch, I see:

avro-generic            ,      5528.11364,      6779.77250,     
5587.13150,     17895.01764,        211
avro-specific           ,      4073.22960,      4960.44250,     
4712.67500,     13746.34710,        211
protobuf                ,      1753.46137,     12065.28600,     
7595.99600,     21414.74337,        217
thrift                  ,      1648.36542,      8599.77650,    
11522.16550,     21770.30742,        314

Original issue reported on code.google.com by [email protected] on 29 May 2009 at 11:24

Attachments:

Simplify and improve performance of ProtobufSerializer.serialize

Patch says it all:

Index: src/serializers/ProtobufSerializer.java
===================================================================
--- src/serializers/ProtobufSerializer.java (revision 44)
+++ src/serializers/ProtobufSerializer.java (working copy)
@@ -19,8 +19,7 @@

     public byte[] serialize(MediaContent content, ByteArrayOutputStream
baos) throws IOException
     {
-        content.writeTo(baos);
-        return baos.toByteArray();
+      return content.toByteArray();
     }

   public MediaContent create()



Original issue reported on code.google.com by ismaelj on 25 Mar 2009 at 7:45

total time column misleading for CheckingObjectSerializer

The "total time" column always includes timeDeserializeNoFieldAccess as
value for deserialization. 

CheckingObjectSerializers were introduced to show how lazy deserializers
perform in the case, that
 - no fields are accessed
 - only topmost fields are accessed
 - all fields are accessed
on the deserialized instances.

To make a fair comparison with "eager" deserializers, we should calculate
"total time" by using timeDeserializeAndCheckAllFields instead of
timeDeserializeNoFieldAccess for CheckingObjectSerializers.

Original issue reported on code.google.com by [email protected] on 6 Jan 2010 at 10:03

Protobuf serializer should reuse Builder()

It can be achieved by adding the 

private field GSONBuilder builder = MediaContent.newBuilder();

in method create() replace builder initialization with: 

builder.clear();
return builder.setMedia(...).addImage().addImage(...).build;

Original issue reported on code.google.com by [email protected] on 6 May 2013 at 5:04

The Thrift tests should reuse Transport and Protocol instances

Thrift is at a significant disadvantage in these tests because the setup of
the test does not allow for Thrift to reuse the Transport and Protocol
instances. This is because the tests mode of operation does not match well
with real RPC behavior. The tests serialize many times repeatedly, more or
less ignoring the output and then deserialize multiple times repeatedly
from a single serialization.

It would be good to find a way to restructure the tests so that Thrift was
not handicapped by this somewhat artificial usage scenario and could
instead have a standing Transport and Protocol instance for serailization
and deserialization (as is commonly be the case in real world RPC over
Thrift) rather than creating new object instances each invocation.


Original issue reported on code.google.com by [email protected] on 25 Jun 2009 at 7:42

update to Avro 1.3.0

Here's a patch to update to Avro 1.3.0.

On my box Avro generic is now a bit faster in this benchmark, but Avro
specific is a bit slower, for reasons as yet unclear to me.


Original issue reported on code.google.com by [email protected] on 1 Mar 2010 at 11:54

Attachments:

XStreamSerializer.java error in registerConverters()

There's a copy/paste error on line 80, Image.class should be Media.class:

public void registerConverters() throws Exception
  {
    xstream.alias("im", Image.class);
    xstream.registerConverter(new ImageConverter());

    xstream.alias("md", Image.class);    // This line in error.
    xstream.registerConverter(new MediaConverter());

    xstream.alias("mc", MediaContent.class);
    xstream.registerConverter(new MediaContentConverter());
  }

Original issue reported on code.google.com by [email protected] on 12 Dec 2009 at 3:04

Regression: BenchmarkRunner fails to run

In changeset r104 (
http://code.google.com/p/thrift-protobuf-compare/source/detail?r=104 ) the
dependency to jackson was updated from version 1.0.1 to 1.3.1.

Unfortunately this breaks Avro, which needs 1.0.1. The symptom is

Exception in thread "main" java.lang.NoSuchMethodError:
org.codehaus.jackson.JsonFactory.setCodec(Lorg/codehaus/jackson/ObjectCodec;)V
    at org.apache.avro.Schema.<clinit>(Schema.java:65)
    at
serializers.avro.AvroGenericSerializer.<clinit>(AvroGenericSerializer.java:19)
    at serializers.BenchmarkRunner.main(BenchmarkRunner.java:38)

Original issue reported on code.google.com by [email protected] on 6 Jan 2010 at 10:31

javaExt benchmark way too slow

Hi, 

I just stumbled upon this project. I was wondering why the manual
externalization ("java (externalizable)") is so inferior in the benchmark.
I did some own benchmarks with protobuf and Java externalization before I
decided to implement my own framework, and the numbers were pretty different.

So I dived into the code just to find out that it is far from efficient.
Please find a patch which remedies the deficiencies. javaExt needs only
8.0s total on my machine while it needed 30.0s before. In comparison:
protobuf needs 9.3s.

While JavaExt is now one of the fastest externalization methods in the
benchmark, it is far from the best. No forward and backward compatibility,
no cross language support, error-prone to implement, ...

The patch might need some polish, since I don't always check for null, and
the source-formatting might need a cleanup. But you get the idea.

Bye,
   Michael

Original issue reported on code.google.com by [email protected] on 6 Jan 2010 at 7:13

Attachments:

KryoOptimizedSerializer makes unfair optimizations?

I was looking through the code for KryoOptimizedSerializer and it looks
like it's making unfair optimizations by specializing things for the exact
data value used in the benchmark.  For example, it sets
Image.fieldsCanBeNull(false) simply because in the benchmark the fields of
Image are never null.

Conversely, I think the regular KyroSerializer could be sped up by telling
it about the fields that actually can never be null (Image.uri, Media.uri,
etc).

I tried making a configuration of the Kryo serializer that only used fair
optimizations.  It's attached.  (I don't really know Kryo, so I may have
done things incorrectly.)


Original issue reported on code.google.com by kannan%[email protected] on 22 Feb 2010 at 3:45

Attachments:

Check-in JsonFormat.java

Need to check it in to fix the failed import from ProtobufJsonSerializer.
This issue was once mentioned in
http://groups.google.com/group/java-serialization-benchmarking/browse_thread/thr
ead/9d4e9ef73ce2a2eb

The file was downloaded from:
http://protobuf.googlecode.com/issues/attachment?aid=5532802976046734193&name=Js
onFormat.java

Original issue reported on code.google.com by [email protected] on 27 Oct 2009 at 12:36

Attachments:

Avro test doesn't take optionality into account. Unfair?

The Avro schema doesn't account for optionality of fields.  Since most of
the fields are defined to be optional, this yields significant space and
speed benefits (since Avro's binary format doesn't include tags like
Thrift/Protobuf).

When I use a schema that supports optionality, the serialized data size
increases by about 10% and serialization/deserialization time increases by
about 50%.

Original issue reported on code.google.com by kannan%[email protected] on 22 Feb 2010 at 5:10

Many StdMediaSerializer subclasses are not valid for testing serialization/deserialization

There are some significant issues around serialization and deserialization
for many of the StdMediaSerializer subclasses.

Essentially, they are not really robust in the face of what should be
legitimate changes in the data.

If, for example, I add a third image in StdMediaConverter::create(), the
following serializers that used to pass the correctness test no longer pass:

WARN: serializer 'json (jackson)' failed round-trip test (ser+deser
produces Object different from input)
WARN: serializer 'xstream (xpp with conv)' failed round-trip test
(ser+deser produces Object different from input)
WARN: serializer 'xstream (stax with conv)' failed round-trip test
(ser+deser produces Object different from input)
WARN: serializer 'javolution xmlformat' failed round-trip test (ser+deser
produces Object different from input)

Looking closer at the implementation of json, it is clear why this is
happening. Although the list of images is supposed to be a list, the
serialize() and deserialize() methods are simply hard-coding in 2 image
elements. This situation is pretty pervasive -- there is a general
hardcoding in these two methods for the exact data that being generated in
the create() method.

This means that the benchmarks for these serializers are basically invalid
since they are not properly implementing correct
serialization/deserialization semantics. 



Original issue reported on code.google.com by [email protected] on 13 May 2009 at 8:36

total time column measures instance creation twice

The "total time" column is implemented with

timeCreate + timeSerializeDifferentObjects + timeDeserializeNoFieldAccess

in trunk. Since timeSerializeDifferentObjects already contains calls to
serializer.create() we measure instance creation twice.

This is probably the reason, why avro, which is slow in instance creation,
was faster in v1 than in trunk...


Original issue reported on code.google.com by [email protected] on 6 Jan 2010 at 9:55

Deserialize and Check timings are incorrect for plain ObjectSerializers.

Tests that implement plain ObjectSerializer have an unfair advantage over
tests that implement CheckingObjectSerializer.  Even for non-lazy parsers,
doing the field checks takes a significant amount of time.  BenchmarkRunner
assumes that, for ObjectSerializers, this time is zero.

Currently, Protobuf is getting screwed by this (~30% time increase on the
"Deserialization Time" graph on the "Benchmarking" page).  Most other
non-lazy parser tests do not implement CheckingObjectSerializer.

Two way to fix this problem:
1. Convert all the tests to CheckingObjectSerializer.
2. Not report "Deserialize and Check" times for plain ObjectSerializer tests.

A quick way to do (2) is to go to BenchmarkRunner.java, lines 259-260 and
replace them with:

   double timeDeserializeAndCheckAllFields = -1;
   double timeDeserializeAndCheckMediaField = -1;

This'll cause "-1" to appear in the output.  No, it's not pretty, but at
least the results are no longer misleading.  I think it's a good temporary
fix until someone puts in the time to do (1) or something better.

Original issue reported on code.google.com by kannan%[email protected] on 22 Feb 2010 at 6:20

Protobuf should be optimised for speed

What steps will reproduce the problem?
1. Add optimize_for = SPEED to media.proto
2. Rebuild Java file
3. Rerun tests

What is the expected output? What do you see instead?
After the change, I expect to see protobufs running rather faster :)

Original issue reported on code.google.com by jonathan.skeet on 18 Nov 2008 at 9:14

MessagePack

If would be good to include another interesting project in comparison: 
http://msgpack.org/


Original issue reported on code.google.com by [email protected] on 29 Mar 2011 at 6:31

ProtobufSerializer skips optional fields

ProtobufSerializer doesn't set values for many fields. This causes those
fields to not be serialized, resulting in protobuf executing more quickly
and producing fewer bytes. It could be argued this is a protobuf feature,
however the test has many fields that are left at their default value and
makes for misleading results. Besides, in real usage there will usually be
little or no fields at the default values.

Original issue reported on code.google.com by [email protected] on 29 Sep 2009 at 4:01

Add the ActiveMQ Protobuf Implementation to the benchmark

A while back I created a custom protobuf compiler and implementation
tailored for the usage patterns of the ActiveMQ Message broker.

You can find it at:
http://svn.apache.org/repos/asf/activemq/sandbox/activemq-flow/activemq-protobuf
/

This protobuf implementation has several optimizations and usage
differences from the standard google implementation and I would like to see
it included as part of the standard benchmarks.

I've created a patch against the V1 release.  To apply it, unzip into the
project directory:

cd thrift-protobuf-compare
jar -xvf tpc-activemq-protobuf.zip


Then apply the included tpc-activemq-protobuf.patch:
patch -p 0 < tpc-activemq-protobuf.patch



Original issue reported on code.google.com by [email protected] on 18 Sep 2009 at 4:48

Attachments:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.