12

I'm learning elastic search, and there's still a lot I don't get, but one thing I can't figure out (or find all that much on) is when to use one index, and when to use more. Part of this is that I definitely don't get what, exactly, an elastic search index is.

Can you explain what an elastic search index is, and when you should use just one for all your data, and when you should split your data up into multiple indexes?

Bonus points / alternatively, how can I tell when I need to split my data into multiple indexes, and then, how should I decide how to split the data amongst the new indexes?

Narfanator
  • 4,951
  • 3
  • 32
  • 57

3 Answers3

15

You can think about it as a Schema in SQL database.

A Schema contains the data for a given use case. An index holds the data for the use case.

The cool thing is that search can be done on multiple indices in one single request.

It's hard to tell you more without any information about the use case. It depends on many factors: do you need to remove some data after a period (let's say every year)? How many documents will you index and what is the size of a document?

For example, let's say that you want to index logs and keep on line 3 months of logs. You will basically create one index per month and one alias on top of the 3 current months.

When a month is over, create a new index for the new month, modify the alias and remove the old index. Removing an index is efficient performance and disk space wise!

So basically in that case I would recommend using more than one index.

Imagine another situation. Let's say you are launching a game and you don't know exactly if you will be successful or not. So start with an index1 with only one shard and create an alias index on top of it. You launch the game and you find that you will need more power (more machines) as your response time is increasing dramatically. Create a new index index2 with two shards and add it to your alias index.

This way you can scale out easily.

The key point here is IMHO aliases. Use aliases for search from the start of your project. It will help you a lot in the future.

Another use case could be that you are working for different customers. Customers don't want to have their data mixed with other customers. So may be you need in that case to create one index per customer?

The fact is that elasticsearch is very flexible and helps you to design your architecture as you need.

Hope this helps.

dadoonet
  • 13,007
  • 2
  • 37
  • 46
  • I was thinking it sounded a lot like a that, where each type is a table, but that still doesn't help me decide when to have more than one, since I'd pretty much never decide to have more than one set of tables in a single database. – Narfanator Dec 18 '13 at 06:51
  • Thanks! This is the kind of info I was looking for. – Narfanator Dec 18 '13 at 18:59
1

The largest single unit of data in elasticsearch is an index. Indexes are logical and physical partitions of documents within elasticsearch.

Elasticsearch indexes are most similar to the database abstraction in the relational world. An elasticsearch index is a fully partitioned universe within a single running server instance. Documents and type mappings are scoped per index, making it safe to re-use names and ids across indexes. Indexes also have their own settings for cluster replication, sharding, custom text analysis, and many other concerns.

For your reference :- Shards and replicas in Elasticsearch

Community
  • 1
  • 1
Roopendra
  • 7,221
  • 16
  • 59
  • 84
  • Thanks! Can you go into more detail about when you'd want to split data across multiple indexes, even if that data has the same schema? – Narfanator Dec 18 '13 at 08:15
  • You can't split data into multiple indices. A index is divided into multiple shards, you can define number of shards for a index in the configuration file. – Roopendra Dec 18 '13 at 08:45
  • [Please refer this forum](http://elasticsearch-users.115913.n3.nabble.com/increasing-shards-and-then-nodes-td2288848.html#a2289760). It may helps to understand when we should increase shards and then nodes – Roopendra Dec 18 '13 at 08:54
0

Index is the main data storage unit of ElasticSearch.

There are several types of data storage techniques:

Partition: Lets say you have an index that continuously growing and growing never stops. (i.e fb/twitter data or any type of logging). Best way to store these type of data partitioning data into several indexes. Common way to do this is using time intervals. Time interval may vary. It could be monthly, weekly, daily. Then when you get a new data, check timestamp and move to corresponding index.

No partition: If your index not growing that fast you can use single index. This is useful for small tables.

There are numerous of ways to manage your data that you can learn while exploring Elastic Search.

shyos
  • 1,347
  • 1
  • 16
  • 29