13

I see Thrift and Protocol Buffers mentioned a lot, but I don't really understand what they're used for. From my limited understanding, they're basically used when you want to do cross-language serialization, i.e., when you have some data structures in one language that you want to send off to another program written in another language.

Is this correct? Are they used for anything else?

(From my again limited understanding, I think Thrift and Protocol Buffers are basically two different versions of the same thing -- feel free to correct me or elaborate.)

grautur
  • 27,957
  • 33
  • 90
  • 125
  • 2
    They are a "typed" message formats for efficiently binary encoding a set of primitive data-types (without needing a explicit custom encoder as they employ message generators). Useful as either an exchange or storage (serialization) mechanism. Since the format is well-defined, it can be shared across languages (given an implementation exists), shared by remote processes that use the same language, or used for serialization (or whatnot) within a single process. And yes, they are effectively fighting for the same market (as well as Avro and others). –  Oct 12 '11 at 21:31
  • I think these should be linked: [Biggest differences of Thrift vs Protocol Buffers?](http://stackoverflow.com/q/69316/320399) – blong Mar 28 '14 at 03:18

1 Answers1

18

They are serialization protocols, primarily. Any time you need to transfer data between machines or processes, or store it on disk etc, it needs to be serialized.

Xml / json / etc work ok, but they have certain overheads that make them undesirable - in addition to limited features, they are relatively large, and computationally expensive to process in either direction. Size can be improved by compression, but that adds yet more to the processing cost. They do have the advantage of being human-readable, but: most data is not read by humans.

Now people could spend ages manually writing tedious, bug-ridden, sub-optimal, non-portable formats that are less verbose, or they can use well-tested general-purpose serialization formats that are well-documented, cross-platform, cheap-to-process, and designed by people who spend far too long worrying about serialization in order to be friendly - for example, version tolerant. Ideally, it would also allow a platform-neutral description layer (think "wsdl" or "mex") that allows you to easily say "here's what the data looks like" to any other dev (without knowing what tools/language/platform they are using), and have them consume the data painlessly without writing a new serializer/deserializer from scratch.

That is where protobuf and thrift come in.

In most cases volume-wise, I would actually expect both ends to be in the same technology in the same company: simply, they need to get data from A to B with the minimum of fuss and overhead, or they need to store it and load it back later (for example, we use protobuf inside redis blobs as a secondary cache).

Marc Gravell
  • 927,783
  • 236
  • 2,422
  • 2,784