11

I know there are a couple of post on StackOverflow about REST and Thrift for HBase, but I would like to focus a bit on the question of performance.

I have been playing with the following libraries in Node.js to connect to an HBase instance:

After some trouble with figuring out why I would not get responses from the Thrift gateway, I finally got both scripts running, with the following results (each output equates to 1000 ops completed):

┌─[mt@Marcs-MacBook-Pro]─[~/Sources/node-hbase]
└──╼ node hbase.js 
hbase-write: 99ms
hbase-write: 3412ms
hbase-write: 3854ms
hbase-write: 3924ms
hbase-write: 3808ms
hbase-write: 9035ms
hbase-read: 216ms
hbase-read: 4676ms
hbase-read: 3908ms
hbase-read: 3498ms
hbase-read: 4139ms
hbase-read: 3781ms
completed
┌─[mt@Marcs-MacBook-Pro]─[~/Sources/node-hbase]
└──╼ node thrift.js 
hbase-write: 4ms
hbase-write: 931ms
hbase-write: 1061ms
hbase-write: 988ms
hbase-write: 839ms
hbase-write: 807ms
hbase-read: 2ms
hbase-read: 435ms
hbase-read: 562ms
hbase-read: 414ms
hbase-read: 427ms
hbase-read: 423ms
completed
┌─[mt@Marcs-MacBook-Pro]─[~/Sources/node-hbase]
└──╼ 

The scripts used can be found here: https://github.com/stelcheck/node-hbase-vs-thrift

My question is, has anyone noticed as big of a difference between REST and Thrift for HBase (or even just in general for any applications/languages)?

Marc Trudel
  • 1,244
  • 1
  • 12
  • 17
  • According to benchmarks linked in http://stackoverflow.com/questions/11025901/is-there-any-performance-benchmark-for-thrift-on-hbase, Thrift is indeed very fast... but what about REST? It seems to me there shouldn't be that much of a difference after all... – Marc Trudel May 24 '13 at 09:49

2 Answers2

5

REST delivers as either XML or JSON so that the schema is present in the data itself. Thrift doesn't do this: it is just a load of bytes that then can only be deserialized against a generated entity (based on the thrift IDL definition).

So regardless of how the data is compressed, thrift is bound to be faster as it carries no schema with it, at the "cost" of being dependent on other objects to interpret the binary data.

davek
  • 21,340
  • 7
  • 73
  • 93
  • That must explain some of the difference, but surely not all of it? Beside, if I reduce the log frequency from 1000 requests to say 100, I clearly see that it goes fast for a while, then wait... basically, it spikes every 200-300 requests. – Marc Trudel May 27 '13 at 05:27
2

You may want to try this one : https://github.com/alibaba/node-hbase-client

It connects directly to the region servers & zookeeper.

Simon
  • 1,787
  • 1
  • 14
  • 29