ElasticSearch Cluster Design Help - Data Nodes

Question

I have been reading up on ES Cluster design and have started to design the cluster we need. Please can someone clarify some of the things that are still not clear to me?

So we want to start off with 3 servers.

At the beginning we will have all three as Master, Data and Ingest with minimum two master. This basically means, we are sticking to defaults.

Question 1 is - What are data nodes exactly? Is full index replicated across other data nodes? So if one goes down, in our case the third one should be promoted to master server and the cluster should function.

Found this link Shards and replicas in Elasticsearch and it explains what data nodes are. So basically if our index has 12 shards, it might be that ES will store 4 primary shards on each data node and 8 replicas. Is this correct?

Question 2: With this as starting point, can we add more servers to function as data nodes, ingest nodes etc.

Question 3: We have setup a load balancer in front of the ES nodes, is this the recommended way of accessing ES Clusters over 9200. When ingesting, should this address be used and it will randomly be routed to an ingest node. When querying it should route to a random ES node that can handle searches.

see my response below, I saw you edited question 1, per index: try to start basic so you can get the full grasp, with 1 primary and 1 replica in each machine. this is something that you can set with on index creation. also read about "allocation". — panchicore, Jul 06 '18 at 13:26
Please don't use strike-out here. Questions here are to be written for the benefit of future readers, and so sub-questions should always be available for reading. If you have found the answer to a sub-question, or do not wish to have it answered, you can remove it, as long as it has not been responded to in an existing answer. — halfer, Jul 06 '18 at 17:34

score 1 · Accepted Answer · answered Jul 06 '18 at 13:23

What are data nodes exactly?

Disks for the shards.

Is full index replicated across other data nodes?

Yes, replica means availability as well, getting the concept of shards is key to understand this and don't get confused.

in our case the third one should be promoted to master server and the cluster should function.

Yes, read about the green, yellow and red statuses, in this case, it will turn from green to yellow, it means is still functioning but actions required, but read about "master eligibility" and also, avoid split brain, very important. https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html#master-node

With this as starting point, can we add more servers to function as data nodes, ingest nodes etc.

as many as you want, what is the app requirement? high read low write? vice-versa? equals? define how do you want to grow the cluster depending on the use case.

Question 3: We have setup a load balancer in front of the ES nodes, is this the recommended way of accessing ES Clusters over 9200. When ingesting, should this address be used and it will randomly be routed to an ingest node. When querying it should route to a random ES node that can handle searches.

If it is, for instance, a nginx, it works because I have done it, have a clear understanding on the concept of the nodes roles, for example, the "coordinating node" would handle some process flow that some requests might require and nginx is not aware of.

IMO now that you have the instances, it is a great opportunity for you to learn-by-doing and experiment with them, so move the configs, try to reproduce the problems your app might have and see what happens, aha!moments will happen and full grasp is gotten here.

Hi, Thanks for your answer. I will play around with the various scenarios using our new cluster before committing to a design. With regards to the app requirements, I would say we have more read than writes. We plan on doing some load testing and capacity planning to ensure we are production ready. — Amar Singh, Jul 10 '18 at 10:02
if you are thinking about a "high read" architecture, replicas will take an important role, start with 1 shard per index by default. some tips here: https://www.loggly.com/blog/nine-tips-configuring-elasticsearch-for-high-performance/ — panchicore, Jul 10 '18 at 10:12

ElasticSearch Cluster Design Help - Data Nodes

1 Answers1