9

Would you recommend Google Protocol Buffers or Caucho Hessian for a cross-language over-the-wire binary format? Or anything else, for that matter - Facebook Thrift for example?

Vihung
  • 11,505
  • 15
  • 52
  • 76

8 Answers8

9

We use Caucho Hessian because of the reduced integration costs and simplicity. It's performance is very good, so it's perfect for most cases.

For a few apps where cross-language integration is not that important, there's an even faster library that can squeeze even more performance called Kryo. Unfortunately it's not that widely used, and it's protocol is not quasi-standard like the one from Hessian.

A. Ionescu
  • 2,165
  • 16
  • 11
5

Depends on use case. PB is much more tightly coupled, best used internally with closely-coupled systems; not good for shared/public interfaces (as in to be shared between more than 2 specific systems). Hessian is bit more self-descriptive, has nice performance on Java. Better than PB on my tests, but I'm sure that depends on use case. PB seems to have trouble with textual data, perhaps it has been optimized for integer data.

I don't think either is particularly good for public interfaces, but given you want binary format, that is probably not a big problem.

EDIT: Hessian performance is actually not all that good as, per jvm-serializers benchmark. And PB is pretty fast as long as you make sure to add the flag that forces use of fast options on Java. And if PB is not good for public interfaces, what is? IMO, open formats like JSON are superior externally, and more often than not fast enough that performance does not matter a lot.

StaxMan
  • 102,903
  • 28
  • 190
  • 229
  • 1
    Update -- since my first tests, I have found PB to be faster than Hessian; as long as one sets defaults to use "fast mode" (older versions at least defaulted to slow mode, no idea why). – StaxMan Nov 02 '10 at 05:13
  • 1
    I disagree about not using PB for public interfaces. You can publish the protocol files and have other people generate their own reader/writers. I've done this for both public APIs as well as distributing collected data. – Abe Schneider Jul 22 '11 at 02:11
  • I personally find PB's (and similar mandatory-schema) brittle strict format unsuitable for public interfaces, and would not use PB, Thrift or Avro outside of a single entity (company). That is, good for closely-couple systems, bad for loose coupling. But it is not something specific to protobuf, but rather for most binary formats (not all; BSON is open-ended for example) – StaxMan Jul 22 '11 at 04:07
  • 1
    I hadn't heard of BSON before, it looks interesting. But I think PB forcing a schema is actually a huge advantage for sharing with the public. It makes it extremely easy to implement someone else's protocol without having to roll any of your own code. What's part of the schema is brittle? The constraints it enforces ensures you won't get any errors when sending/receiving data, while still allowing for extensions to be added. – Abe Schneider Jul 22 '11 at 15:04
  • Brittleness from inability to evolve schema (ability to strictly add fields is better than nothing I guess); but also from code generation. I guess especially code generation; it is rather difficult to actually evolve classes, end points, something that is rather easy for more flexible formats (JSON for example). BSON has its issues -- it is NOT really JSON, as it adds n+1 extensions; it's verbose for a binary format -- but it is at least flexible. My favorite really is just JSON as it can be easily data-bound to existing objects (at least on Java), extensible, human readable. – StaxMan Jul 22 '11 at 17:39
  • 1
    The problem is if you don't allow fields to be removed then you don't really have a standard. How do you guarantee that a field isn't entirely essential for operation (and PB does allow for the 'optional' keyword). Also, you can easily compose messages from other messages, which is another means of schema evolution. If you keep your messages small you will usually not have to change them. As for the generated code -- it's because the generated classes should only be used for reading/writing. You should make your own classes for your representation of the data (though potentially a pain). – Abe Schneider Jul 22 '11 at 18:13
  • I am fine with disagreeing here; but only thing I'll mention is that I like not having to replicate stuff with DTOs, but rather work with POJOs all the way. – StaxMan Jul 23 '11 at 00:01
  • 1
    I was initially unhappy with replicating stuff as well. However, one major problem with POJO is that your actual data can change with your class definition. If you do any type of long-term storage, you have to be very careful about changing code. If you separate the two, then you know specifically when you are changing your data. Also, it's possible to map message definitions onto class definitions (e.g. in Java you can use annotations to accomplish this). – Abe Schneider Jul 26 '11 at 15:05
  • True, there is no silver bullet for compatibility. So one needs to take care of artifacts used for data interchange, whether it's schema or pojos used directly. – StaxMan Jul 26 '11 at 17:08
3

For me, Caucho Hessian is the best.

It is very easy to get started, and the performance is good. I have tested local, the latent is about 3ms, on Lan you can expect about 10ms.

With hessian you don't have to write another file to define the model (we using java + java). It saves a lot of time for development and maintenance.

PeiSong
  • 905
  • 9
  • 16
  • Alas, Hessian seems REALLY slow in jvm-serializers benchmark (https://github.com/eishay/jvm-serializers/wiki) -- not sure why such discrepancy – StaxMan Jul 22 '11 at 04:12
  • http://daniel.gredler.net/2008/01/07/java-remoting-protocol-benchmarks/ have a look at the performance comparison. – PeiSong Aug 02 '11 at 02:15
  • I don't think I would trust a benchmark from 3.5 years ago; the alternatives there are rather outdated. It could of course be that test at jvm-serializers is for use case that Hessian is not designed to handle efficiently (relative small object). Numbers like 3ms or 10ms tell very little without other data or comparison to alternatives. For small messages most serialization libraries can convert in fraction of a millisecond; and then LAN/WAN overhead dominates speed. – StaxMan Aug 02 '11 at 17:30
2

If you need a support to interconnect apps from many languages/platforms, than Hessian is the best. If you use only Java, than Kryo is even faster.

Adrian A.
  • 1,044
  • 8
  • 6
  • 1
    For what it's worth, there's "jvm-serializers" benchmark (https://github.com/eishay/jvm-serializers/wiki) that gives one data point for relative speeds. Kryo seems indeed fast (although I have had some problems with it crashing -- but fast it sure is) – StaxMan Dec 21 '10 at 02:45
1

I'm myself looking into this.. no good conclusions so far, but I found http://dewpoint.snagdata.com/2008/10/21/google-protocol-buffers/ summarizing all the options.

0

I would say that ProtocolBuffers, Thrift or Hessian are fairly similar as far as their Binary formats are concerned - where they provide cross-language serialization support. The inherent serialization might have some small performance differences between them ( size/space tradeoffs ) but this is not the most important thing. ProtocolBuffers is certainly a well performing IDL defined format which has features for extensibility which make it attractive.

HOWEVER the use of an "over-the-wire" in the question implies the use of a communications library. Here Google has provided an interface definition for protobuf RPC, which is equivalent to making a specification where all implementation details is left to the implementer. This is unfortunate because it means there is de-facto NO cross-language implementation - unless you can find a cross language implementation probably mentioned here http://code.google.com/p/protobuf/wiki/ThirdPartyAddOns. I have seen some RPC implementations which support java and c, or c and c++, or python and c etc, but here you just have to find a library which satisfies your concrete requirements and evaluate otherwise youre likely to be disappointed. ( At least i was disappointed enough to write protobuf-rpc-pro )

Kyro is a serialization format like protobuf, but java only. Kyro/Net is a java only RPC implementation using Kryo messages. So it's not a good choice for "cross-language-ness" communication.

Today it would seem that ICE http://www.zeroc.com/, and Thrift which provides an RPC implementation out of the box, are the best cross-language RPC implementations out there.

pjklauser
  • 1,086
  • 11
  • 13
0

I tried Google Protocol Buffers. It works with C++/MFC, C#, PHP and more languages (see: http://code.google.com/p/protobuf/wiki/ThirdPartyAddOns) and works really well regardless of transport and disk save/loading.

buttercup
  • 1,068
  • 13
  • 36
0

Muscle has a binary message transport. Sorry that I can't comment on the others as I haven't tried them.

Joel Lucsy
  • 8,202
  • 1
  • 26
  • 33