39

Working on a pet project (cassandra, spark, hadoop, kafka) I need a data serialization framework. Checking out the common three frameworks - namely Thrift, Avro and Protocolbuffers - I noticed most of them seem to be dead-alive having 2 minor releases a year at most.

This leaves me with two assumptions:

  • They are as complete as such a framework should be and just rest in maintenance mode as long as no new features are needed
  • There is no reason to exist for such framework - not being obvious to me why. If so, what alternatives are out there?

If anyone could give me a hint to my assumptions, any input is welcome.

dominik
  • 493
  • 1
  • 5
  • 10

4 Answers4

38

Protocol Buffers is a very mature framework, having been first introduced nearly 15 years ago at Google. It's certainly not dead: Nearly every service inside Google uses it. But after so much usage, there probably isn't much that needs to change at this point. In fact, they did a major release (3.0) this year, but the release was as much about removing features as adding them.

Protobuf's associated RPC system, gRPC, is relatively new and has had much more activity recently. (However, it is based on Google's internal RPC system which has seen some 12 years of development.)

I don't know as much about Thrift or Avro but they have been around a while too.

Kenton Varda
  • 31,348
  • 6
  • 88
  • 84
24

The advantage of Thrift compared to Protobuf is that Thrift offers a complete RPC and serialization framework. Plus Thrift supports about 20+ target languages and that number is still growing. We are about to include .NET core and there will be Rust support in the not-so-far future.

The fact that there have been not that many Thrift releases in the last months is surely something that needs to be addressed, and we are fully aware of it. On the other hand, the overall stability of the codebase is quite good, so one may do a Github fork and cut a branch on its own from current master as well - of course with the usual quality measures.

The main difference between Avro and Thrift is that Thrift is statically typed, while Avro uses a more dynamic approach. In most cases a static approach fits the needs quite well, in that case Thrift lets you benefit from the better performance of generated code. If that is not the case, Avro might be more suitable.

Also it is worth mentioning that besides Thrift, Protobuf and Avro there are some more solutions on the market, such as Capt'n'proto or BOLT.

JensG
  • 12,102
  • 4
  • 40
  • 51
  • 13
    "Thrift offers a complete RPC and serialization framework." -- So does Protobuf: http://grpc.io – Kenton Varda Dec 06 '16 at 06:55
  • 2
    But that's an add-on, not core protobuf. With Thrift you get that OOTB. – JensG Dec 07 '16 at 11:45
  • 17
    I disagree. gRPC and Protobuf were very much designed and built together. The fact that they are properly modular and separable -- so that you can avoid the bloat of the RPC system if you don't need it -- is a feature, not a bug. – Kenton Varda Dec 08 '16 at 05:25
  • 1
    "*gRPC and Protobuf were very much designed and built together*" -- [pbuf ~ 2008 (or older)](http://stackoverflow.com/a/40989644/499466) vs. [gRPC ~ 2015](https://heise.de/-2862634). Well, at least it was in the same century. – JensG Dec 08 '16 at 15:20
  • 38
    Yes, I'm quite aware that Protobuf was open sourced in 2008 since I was the one who led that project, thanks. Protobuf was first conceived in ~2001 and the RPC system in ~2004. Unfortunately when I open sourced Protobuf I did not have the resources to prepare the RPC implementation for public release, so that didn't actually happen until later. Nevertheless the fact stands that they were developed together. – Kenton Varda Dec 10 '16 at 05:37
  • 1
    @JensG I suspect the "pet project [using] (cassandra, spark, hadoop, kafka)" and many like them have no need nor want for an RPC spec OOTB. – dlamblin May 25 '17 at 00:37
  • @pdxleif: "strongly typed" != "typed". Nobody claimed there aren't any types at all. – JensG Mar 17 '18 at 01:27
  • 3
    @JensG Could you explain what you mean by the "The main difference between Avro and Thrift is that Thrift is statically typed," paragraph? This is the description language for Thrift: https://thrift.apache.org/docs/idl Are you saying that is somehow fundamentally different than what is expressed in the Avro schema? Avro uses the type information to achieve more efficient data encoding than Thrift. "Untagged data: Since the schema is present when data is read, considerably less type information need be encoded with data, resulting in smaller serialization size." – pdxleif Mar 18 '18 at 06:13
  • 1
    You can generate code from the schema: https://avro.apache.org/docs/1.8.2/gettingstartedjava.html#Compiling+the+schema Or you can generate the schema from the types in your code. Are you thinking of the GenericRecord interface? Offering a dynamic view of a static object does not mean the underlying object is static. You can always convert the fields in an object into a Map - you could offer the same for Thrift-generated objects, if you wish. – pdxleif Mar 18 '18 at 06:16
  • @pdxleif: If you want to open a lengthy discussion, please do that elsewhere (e.g. mailing lists). I made my point and this is just the wrong place. The whole question should have been close from the beginning as "opinion-based" anyway. – JensG Mar 18 '18 at 18:50
2

Concerning thrift: as far as I am aware of it is alive and kicking. We use it for serialization and internal API's where I work at and it works fine for that.

Missing things like connection multiplexing and more user-friendly clients have been added through projects such as Twitter's Finagle.

Though I would characterize our use of it as semi-intensive only (ie, we don't look at performance first: it should be easy to use and bug-free before anything else) we did not run into any issue so far.

So, regarding thrift, I'd say it falls into your first category.[*]

Protocolbuffers is an alternative for thrift's serialization part, but it does not provide the RPC toolbox thrift offers.

I'm not aware of any other project that blends RPC and serialization into such a simple to use and complete single package.

[*]Anyway, once you start using it and see all the benefits, it's hard to put it into your second category :)

Shastick
  • 1,068
  • 1
  • 11
  • 27
1

They are all very much in use at plenty of places, so I'd say your first assumption. I don't know what your expectation of a release schedule is, but they seem normal to me for a library of that size and maturity. Heck, Avro 1.8.0 came out at the start of 2016, and most things still use Avro 1.7.7 (e.g. Spark, Hadoop). https://avro.apache.org/releases.html

pdxleif
  • 1,581
  • 13
  • 15
  • Does Avro let you serialize/deserialize your existing classes? The "Getting Started" has two scenarios only: 1. Model classes are generated from a schema, 2. No code generation but the only classes ser/des are GenericRecord. My scenario is not covered: hundreds of existing classes serialized using JDK but want something faster. It would seem like Arvo wants to regenerate all my classes from scratch which won't work because they aren't anemic - i.e. a fully OO model. I also read a post where someone had issues with inherited classes - seems strange for a Java framework to have such issues. – Volksman Oct 21 '19 at 02:53