69

We're looking into transport/protocol solutions and were about to do various performance tests, so I thought I'd check with the community if they've already done this:

Has anyone done server performance tests for simple echo services as well as serialization/deserialization for various messages sizes comparing EJB3, Thrift, and Protocol Buffers on Linux?

Primarily languages will be Java, C/C++, Python, and PHP.

Update: I'm still very interested in this, if anyone has done any further benchmarks please let me know. Also, very interesting benchmark showing compressed JSON performing similar / better than Thrift / Protocol Buffers, so I'm throwing JSON into this question as well.

Dave Jarvis
  • 28,853
  • 37
  • 164
  • 291
Parand
  • 91,328
  • 43
  • 147
  • 182
  • Thanks. I'd love to see [Fast Infoset](http://en.wikipedia.org/wiki/Fast_Infoset) (ITU-T Rec. X.891 | ISO/IEC 24824-1) and [EXI](http://en.wikipedia.org/wiki/Efficient_XML_Interchange) (W3C) in there also. – nealmcb Feb 29 '12 at 06:45
  • From https://code.google.com/p/thrift-protobuf-compare/wiki/BeyondNumbers seems that JSON benchmark is just manually writing abbreviated strings to the output. – Audrius Meskauskas Feb 10 '15 at 10:54
  • I'm currently working daily with protobufers and my experience showed me that the benchmarks are saying nothing about the case that somebody has to serialize or deserialize or the memory consumption in process. One example is the OSI Open Simulation Interface which is a complex net of messages and arrays. If you try to serialize that and would compare that to any other protocol the situation would be different. What I'm trying to say you have to experiment try to build the same system with different protocols and then compare for your case and decide. This is especially true if you are trying – Marko Bencik Jul 24 '18 at 08:42

8 Answers8

56

Latest comparison available here at the thrift-protobuf-compare project wiki. It includes many other serialization libraries.

vladaman
  • 3,220
  • 1
  • 25
  • 24
Eishay Smith
  • 1,147
  • 9
  • 13
16

I'm in the process of writing some code in an open source project named thrift-protobuf-compare comparing between protobuf and thrift. For now it covers few serialization aspects, but I intend to cover more. The results (for Thrift and Protobuf) are discussed in my blog, I'll add more when I'll get to it. You may look at the code to compare API, description language and generated code. I'll be happy to have contributions to achieve a more rounded comparison.

eishay
  • 1,236
  • 9
  • 5
  • 9
    I've just added an issue to that - you're using the default options for protocol buffers, which mean "optimise for small code size". This has a *huge* impact on performance (but does lead to much smaller code). You should do a comparison with optimize_for = SPEED turned on. – Jon Skeet Nov 18 '08 at 09:15
8

You may be interested in this question: "Biggest differences of Thrift vs Protocol Buffers?"

Community
  • 1
  • 1
user38123
  • 669
  • 3
  • 5
8

I did test performance of PB with number of other data formats (xml, json, default object serialization, hessian, one proprietary one) and libraries (jaxb, fast infoset, hand-written) for data binding task (both reading and writing), but thrift's format(s) was not included. Performance for formats with multiple converters (like xml) had very high variance, from very slow to pretty-darn-fast. Correlation between claims of authors and perceived performance was rather weak. Especially so for packages that made wildest claims.

For what it is worth, I found PB performance to be bit over hyped (usually not by its authors, but others who only know who wrote it). With default settings it did not beat fastest textual xml alternative. With optimized mode (why is this not default?), it was bit faster, comparable with the fastest JSON package. Hessian was rather fast, textual json also. Properietary binary format (no name here, it was company internal) was the slowest. Java object serialization was fast for larger messages, less so for small objects (i.e. high fixed per-operation noverhead). With PB message size was compact, but given all trade-offs you have to do (data is not self-descriptive: if you lose the schema, you lose data; there are indexes of course, and value types, but from what you have reverse-engineer back to field names if you want), I personally would only choose it for specific use cases -- size-sensitive, closely coupled system where interface/format never (or very very rarely) changes.

My opinion in this is that (a) implementation often matters more than specification (of data format), (b) end-to-end, differences between best-of-breed (for different formats) are usually not big enough to dictate the choice. That is, you may be better off choosing format+API/lib/framework you like using most (or has best tool support), find best implementation, and see if that works fast enough. If (and only if!) not, consider next best alternative.

ps. Not sure what EJB3 here would be. Maybe just plain of Java serialization?

StaxMan
  • 102,903
  • 28
  • 190
  • 229
  • Perhaps you could post the results in a blog post? I'd certainly be interested in seeing the details, particularly around the XML testing. – Parand Mar 10 '09 at 01:12
  • 1
    Ok. Core of the thing lives under "StaxBind" module with Woodstox (http://woodstox.codehaus.org) repository at Codehaus; this just for convenience. Nothing woodstox - specific. I will try to get results published -- it's frustrating if no one can reproduce them. – StaxMan Mar 11 '09 at 22:54
5

If the raw net performance is the target, then nothing beats IIOP (see RMI/IIOP). Smallest possible footprint -- only binary data, no markup at all. Serialization/deserialization is very fast too.

Since it's IIOP (that is CORBA), almost all languages have bindings.

But I presume the performance is not the only requirement, right?

Vladimir Dyuzhev
  • 17,603
  • 9
  • 45
  • 61
  • 1
    Performance is definitely not the only requirement. The other requirements we have a handle on or can evaluate fairly easily; performance is the one I was looking for feedback on. – Parand Nov 17 '08 at 23:12
  • 3
    "Only binary data" doesn't mean it's necessarily the smallest possible footprint. For instance, you can transmit an Int32 as either "just 4 bytes" or with an encoding which reduces the transmission size of small values at the cost of using more data for large values. – Jon Skeet Nov 18 '08 at 09:11
  • 3
    In my experience, it's cheaper to not worry about tight bit-packing protocols and just zlib-stream your data. Those 0's from bits you don't need compress great (assuming you zero-init the bufs). This usually beats manual bit-packing and is a ton easier to debug. Assuming zlib is an option, anyway. – scobi Feb 03 '09 at 02:25
4

One of the things near the top of my "to-do" list for PBs is to port Google's internal Protocol Buffer performance benchmark - it's mostly a case of taking confidential message formats and turning them into entirely bland ones, and then doing the same for the data.

When that's been done, I'd imagine you could build the same messages in Thrift and then compare the performance.

In other words, I don't have the data for you yet - but hopefully in the next couple of weeks...

Jon Skeet
  • 1,261,211
  • 792
  • 8,724
  • 8,929
  • The Thrift-protobuf-comparison project (http://code.google.com/p/thrift-protobuf-compare/wiki/Benchmarking) would be a good home for this, if you have done something? It'd be great to see different use cases -- the current one deals with very samll messages, which is just one area. – StaxMan Apr 24 '09 at 18:30
  • 1
    I have a benchmarking framework now, but it's *mostly* aimed at benchmarking different implementations of Protocol Buffers and different messages. See http://code.google.com/p/protobuf-csharp-port/wiki/ProtoBench – Jon Skeet Apr 24 '09 at 19:01
3

To back up Vladimir's point about IIOP, here's an interesting performance test, that should give some additional info over the google benchmarks, since it compares Thrift and CORBA. (Performance_TIDorb_vs_Thrift_morfeo.pdf // link no longer valid) To quote from the study:

  • Thrift is very efficient with small data (basic types as operation arguments)
  • Thrifts transports are not so efficient as CORBA with medium and large data (struct and >complex types > 1 kilobytes).

Another odd limitation, not having to do with performance, is that Thrift is limited to returning only several values as a struct - although this, like performance, can surely be improved perhaps.

It is interesting that the Thrift IDL closely matches the CORBA IDL, nice. I haven't used Thrift, it looks interesting especially for smaller messages, and one of the design goals was for a less cumbersome install, so these are other advantages of Thrift. That said, CORBA has a bad rap, there are many excellent implementations out there like omniORB for example, which has bindings for Python, that are easy to install and use.

Edited: The Thrift and CORBA link is no longer valid, but I did find another useful paper from CERN. They evaluated replacements for their CORBA system, and, while they evaluated Thrift, they eventually went with ZeroMQ. While Thrift performed the fastest in their performance tests, at 9000 msg/sec vs. 8000 (ZeroMQ) and 7000+ RDA (CORBA-based), they chose not to test Thrift further because of other issues notably:

It is still an immature product with a buggy implementation

michaelok
  • 1,064
  • 1
  • 13
  • 20
2

I have done a study for spring-boot, mappers (manual, Dozer and MapStruct), Thrift, REST, SOAP and Protocol Buffers integration for my job.

The server side: https://github.com/vlachenal/webservices-bench

The client side: https://github.com/vlachenal/webservices-bench-client

It is not finished and has been run on my personal computers (I have to ask for servers to complete the tests) ... but results can be consulted on:

As conclusion :

  • Thrift offers the best performance and is easy to use
  • RESTful webservice with JSON content type is pretty close to Thrift performance, is "browser ready to use" and is quite elegant (from my point of view)
  • SOAP has very poor performance but offers the best data control
  • Protocol Buffers has good performance ... until 3 simultaneous calls ... and I don't know why. It is very difficult to use: I give up (for now) to make for it work with MapStruct and I don't try with Dozer.

Projects can be completed through pull requests (either for fixes or other results).

  • While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - [From Review](/review/low-quality-posts/17837641) – mx0 Nov 04 '17 at 16:31
  • OK sorry for my previous post. I added my conclusion. If more details are needed, I will add them. –  Dec 14 '17 at 20:53