Elastic Search: Correct way to restart a cluster without unassigned shards issue

Question

I have a 4 nodes & 5 shards Elastic Search (0.90.3) cluster. On restart, I see 4 of 5 shards unassigned and cluster status is red. So I am assuming the way it was restarted was not right. Each node was issued a kill (SIGKILL) command in 30 seconds interval. Meaning some node was killed, 30 seconds later some other node from the remaining 3 was killed & so on.

I tried this solution to have shards reassigned but nothing worked until I manual assigned a primary shard to the cluster using this approach. But manual assigning of primary shard resets the data for the shard resulting in loss.

How do I avoid getting into the unassigned shard problem? And If I am stuck with that problem what is the way to recover without data loss?

Instead of calling a kill command on the process, I would typically shut the node down. Less likely to cause any upset to the system as it will follow a shutdown procedure - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-nodes-shutdown.html — Nathan Smith, Apr 10 '14 at 09:53
@Nate Thanks, that is something I am planning to do. Also Can you show some light as why would the problem of unassigned shards happen? I honestly do not know what caused this to happen. Yes SIGKILL might have caused it but I am not able to see what ensued after SIGKILL for this to happen. — Prasanna, Apr 10 '14 at 18:12
If you're trying to avoid downtime, shouldn't you be restarting each node before shutting down the next one? — Avish, Apr 10 '14 at 20:44
@Avish You are right. But unfortunately that is not the case right now. Lesson learnt is to do the update one node at a time. But I am really curious to know as what happened to ElasticSearch that it was not able to form the cluster again on startup. — Prasanna, Apr 10 '14 at 20:53

Wilfred Hughes · Answer 1 · 2015-01-13T11:01:32.233

2

The correct way to restart a cluster is to do a rolling restart using the shutdown API.

This works by:

Disabling shard allocation
Restarting one node (cluster goes yellow)
Wait until it rejoins the cluster
Re-enable shard allocation
Wait until shards are reallocated (cluster goes green)
Repeat on other nodes.

You may want to increase indices.recovery.max_bytes_per_sec and cluster.routing.allocation.node_concurrent_recoveries to speed up step 5. Whilst the cluster is yellow, some shards will be unassigned (because they were on the node that was restarted), but this not a problem. Reads and writes will still work as normal.

edited Jan 13 '15 at 11:01

answered Jan 13 '15 at 10:56

Wilfred Hughes

26,027
13
120
177

1

It seems this is no longer relevant as the 5.5 documentation https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-nodes-shutdown.html#_rolling_restart_of_nodes_full_cluster_restart says `The _shutdown API has been removed` – vkats Aug 22 '17 at 13:08

Elastic Search: Correct way to restart a cluster without unassigned shards issue

1 Answers1