0

I am curious about the impact of #shards in Elasticsearch. Particularly, I am looking for the pros and cons for having big #shards and small #shards.

For example, I have a two-node cluster. Assuming replica is one, for one index, should I create two shards which are spread across these two nodes? Or should I use the default i.e., 5 shards? My thinking is the first one.

It seems to me there is no reason to have more than one share per node per index as one Lucene instance can have better caching than a few Lucene instances.

Edit: Let's say I have only one node and want to create one index. How does the shard number affect the performance in this case? My thinking is I should have only one shard in such a case. Is that right?

  • Separate one index into multiple shard could make use multiple machine & cpu, as well as disk space, it's for scale up I guess. – user218867 Feb 17 '16 at 17:44
  • I strongly suggest you have a look at this great answer: http://stackoverflow.com/questions/15694724/shards-and-replicas-in-elasticsearch/15705989#15705989 – Val Feb 18 '16 at 04:39
  • @Val thanks for pointing me to that thread. I am fully aware of the concepts of shard and replica. What I am not clear is if it makes sense to have 5 shards if I have only one machine. Or let me put it in another way: does it make sense to have more #shards than #machines? – helloworld Feb 18 '16 at 05:50
  • I understand, however, the real answer will depend on how much data you want to index and how the hardware looks like on your machine. Those are two important variables that are missing in order to give you a real answer. But yes, one shard per index per node is **conceptually** OK. – Val Feb 18 '16 at 06:46
  • Here is a [good article](https://signalfx.com/scaling-elasticsearch-sharding-availability-hundreds-millions-documents/) by SignalFx explaining how they handle their sharding. – Val Feb 19 '16 at 08:52

1 Answers1

0

You can find the answer in the elasticsearch documentation here

avr
  • 4,290
  • 1
  • 15
  • 28