2

Could someone please tell me how can I chose the number of shards and replica for Elasticsearch index ?

I have configured the size of the index to 20GB but didn't know how I can I chose the number or shards and replica

NB: I have 5 nodes, 3 Master nodes (for hot Data), and 2 Data nodes ( One for warm and the seconde for cold data)

Thanks for your help

abdelhalim
  • 135
  • 8

2 Answers2

6

Elasticsearch uses the concept of the shard to subdivide the index into multiple pieces and allows us to make one or more copies of index shards called replicas. Please refer to this SO answer to get a detailed understanding of shards and replicas.

To set the number of shards and replicas as properties of index:

PUT /indexName

{
  "settings": {
    "index": {
      "number_of_shards": 6,
      "number_of_replicas": 2
    }
  }
}

If you have an index with 3 shards, and each has 2 replicas, then it means there are total 9 shards, but only 3 shards are in active use at that time. If shard allocation is not done in the right way, then it can cause performance issues in the cluster.

Some important tips for choosing the number of shards and replicas:

  1. The number of shards cannot be changed after an index is created. If you later find it necessary to change the number of shards, then you will have to reindex all the documents again.

  2. To decide no of shards, you will have to choose a starting point and then try to find the optimal size through testing with your data and queries.

  3. Replicas tend to improve search performance (not always). But, it is recommended to have at least 1 replica (so that data is preserved in case of hardware failure)

  4. Refer this medium article, that states that number of nodes and number of shards (primary shard + replicas), should be proportional to each other. This is important for Elasticsearch to ensure proper load balancing.

  5. As stated in this article it is recommended to keep the number of shards per node below 20 per GB heap it has configured.

  6. According to this blog when you’re planning for capacity, try and allocate shards at a rate of 150% to 300% (or about double) the number nodes that you had when initially configuring your datasets

Martijn Pieters
  • 889,049
  • 245
  • 3,507
  • 2,997
ESCoder
  • 10,330
  • 2
  • 8
  • 23
  • 1
    Thanks, your answer helped me to understand a little bit how I can chose the number of shards and replica – abdelhalim Sep 21 '20 at 12:37
1

Here are a couple of options for how to set the number of shards and replicas.

1.Using templates (if you want to set the same settings to multiple indices):

Index template

PUT _template/my_template
{   
    "order": 0,
    "index_patterns": [
      "<your-index1>","<your-index2>"
    ],
    "settings": {
      "index": {
        "number_of_shards": "2",
        "number_of_replicas": "1"
      }
    },
    "mappings": {},
    "aliases": {}
}

2.Update a single index settings:

Update Index Settings Api

PUT /my-index/_settings
{
  "index": {
        "number_of_shards": "2",
        "number_of_replicas": "1"
      }
}

Also, take a look at this article and How many shards should I have in my Elasticsearch cluster?

Assael Azran
  • 2,429
  • 1
  • 4
  • 9