7

I need to keep data in Aerospike. This engine which does support 'bins' ('bin' is like column in a row or field in a record). On the other hand I can keep my records as serialized blobs. Records are extracted from from database in atomic way. That is, I don't need to fetch some 'columns' of the record, I need record entirely.

The question is: what is the most efficient way of keeping data for such scenario in terms of performance? Keep it unserialized and use 'bins' to describe all record's fields, or store it as serialized blob in 1 column?

Community
  • 1
  • 1
elgris
  • 468
  • 3
  • 13

2 Answers2

6

If you are sure that your only usecase is to fetch the full record, and never the individual bins, it is better to store as a single bin value. (Internally, multiple bins will need multiple mallocs beyond a size limit). Infact, you can set the namespace config option 'single-bin true' which will optimize things further. Be aware that once you set this config option it can never be unset even with a node restart. You have to clean the drives if you want to change this config. If the namespace is in-memory, obviously, this restriction is not applicable.

In the future, if there is possibility of accessing sub-set of the bins, storing as bins is better. As it will save on the network I/O which will be much bigger than the malloc overhead.

sunil
  • 3,382
  • 15
  • 25
4

Just to add, if you store them as BLOB, choice of a better serialization mechanism can further optimize operations in terms of Network I/O.

In one of our use case, we switched from Default Java Serialization to Kryo Serialization and as a result, data size was reduced to one-third and response time of Aerospike reduced to half at the client due to lesser amount of the data being transferred.

Ankur Choudhary
  • 469
  • 1
  • 9
  • 18