Wrote up a blog post about how I handled reindexing with no downtime recently. Takes some time to figure out all the little things that need to be in place to do so. Hope this helps!
https://summera.github.io/infrastructure/2016/07/04/reindexing-elasticsearch.html
To summarize:
Step 1: Prepare New Index
Create your new index with your new mapping. This can be on the same instance of Elasticsearch or on a brand new instance.
Step 2: Keep Indexes Up To Date
While you're reindexing you want to keep both your new and old indexes up to date. For a write operation, this can be done by sending the write operation to a background worker on both the new and old index.
Deletes are a bit trickier because there is a race condition between deleting and reindexing the record into the new index. So, you'll want to keep track of the records that need to be deleted during your reindex and process these when you are finished. If you aren't performing many deletes, another way would be to eliminate the possibility of a delete during your reindex.
Step 3: Perform Reindexing
You’ll want to use a scrolled search for reading the data and bulk API for inserting. Since after Step 2 you'll be writing new and updated documents to the new index in the background, you want to make sure you do NOT update existing documents in the new index with your bulk API requests.
This means that the operation you want for your bulk API requests is create, not index. From the documentation: “create will fail if a document with the same index and type exists already, whereas index will add or replace a document as necessary”. The main point here is you do not want old data from the scrolled search snapshot to overwrite new data in the new index.
There's a great script on github to help you with this process: es-reindex.
Step 4: Switch Over
Once you’re finished reindexing, it’s time to switch your search over to the new index. You’ll want to turn deletes back on or process the enqueued delete jobs for the new index. You may notice that searching the new index is a bit slow at first. This is because Elasticsearch and the JVM need time to warm up.
Perform any code changes you need so your application starts searching the new index. You can continue writing to the old index incase you run into problems and need to rollback. If you feel this is unnecessary, you can stop writing to it.
Step 5: Clean Up
At this point you should be completely transitioned to the new index. If everything is going well, perform any necessary cleanup such as:
- Delete the old index host if it’s different from the new
- Remove serialization code related to your old index