How to Perform an Elasticsearch Index Migration Using AliasesBy: Ryan Hyllestad | May 29, 2019
As a member of the Syncrofy back-end development team, I can attest that we go to great lengths to ensure customers have access to reliable and responsive service at all times. A big part of what makes that possible is Elasticsearch, the search/analytics engine which provides the underlying data for all dashboards, document views, and much more.
Problem & Requirements
Elasticsearch is powerful, but it can also come with a laundry list of complications for simple problems. While preparing for our recent Elasticsearch upgrade, the team discovered that our oldest search indices were no longer compatible with the new version. There are a number of resources available on migrating your indices for such an upgrade, but we found all these solutions unsuitable. For example, the elastic documentation suggests starting up an identical cluster and migrating the documents from one to the other. This is an incredibly simple operation, but it comes with a staggering infrastructural cost. We wanted a solution that would require few physical resources while going unnoticed by our customers—necessitating limited performance impact and absolutely no downtime. As such, we had to get creative and we wanted to share our solution so others might be able to take advantage of it.
Our migration solution makes use of an Elasticsearch feature called index aliases. By associating one or more indices with an alias you can search across multiple indices simultaneously without having to specify the indices themselves. This feature allows us to change the behavior of our search API without the need to deploy changes to the source code. The migration is very simple—we associate a new index with our existing index alias, specifying it as the write index, scroll through the entire old index, and batch index/delete from the old index to the new one. Because both indices are associated with the application alias, no document will ever be missing from the (combined) index alias and your application will never skip a beat!
Firstly, a few notes on setup. You’ll need to have your Elasticsearch cluster running on a version which supports your soon-to-be-outdated index/indices. It will be very clear if you’re not as Elasticsearch won’t be able to complete a rolling upgrade as long as it has the old index. Also, you’ll need to associate your outdated index with an alias, and update your application to search/aggregate to that alias rather than the index. If your application already uses aliases and the index is already associated with one you don’t have to worry about this.
The first step is to create the new index. Make sure you create it with the correct settings, as it will eventually be replacing the outdated index.
Next, you’ll want to associate the index with your application’s alias, and set it as the alias’ write_index. This is important to do in one operation; an alias of two or more indices without a write_index will be unable to index any documents. As soon as this is done, you should be able to see incoming documents being written to the new index, and they should all appear in your application along with the old documents.
Note for Version 5.x and Lower: The is_write_index setting was introduced in version 6. As such, below version 6, any write requests to an alias referencing more than one index will automatically fail. The only way we’ve found to circumvent this problem is to update your application code to write directly to the new index rather than the alias. Once you’ve migrated to the new version, you can set the write_index, and safely update your application’s write operations to point to the new alias.
Once your indices and aliases are set up, you can begin migration. Collect the to-be-migrated documents by requesting a query-less scroll response from the old index. Then, like Indiana Jones swapping out the golden idol for a bag of sand, create a bulk operation request with 2 requests per document—an insert into the new index and a delete from the old one. And just like that, you’re done! Hopefully with minimal boulder traps.
Because your data is important, you might want to consider test runs of your migration code before committing to the real deal. This can be done easily by adjusting the steps above. Just skip associating the test index with the alias, and remove the deletion requests from the bulk operations. After this, you should have all the old documents in the new index, with the exception of a small delta that was missed during the operation. Once you’re satisfied with the results, you can associate the index and migrate again. This will only work if you’ve chosen to preserve the _id of each document across the migration. If not, you’ll have to start the migration fresh.
Many Elasticsearch index features cannot be updated once created: sharding parameters, index type mappings, and search analyzers can be particularly stubborn. While planning our migration, we realized that this operation was a great opportunity to fix many of these mistakes we’ve made over the years. Because we are creating a new index, we can change basically any of these settings during the migration. Just add your change to the new index when you create it, and add in any transformation you might have to do before re-indexing each document.
Using index aliases, we were able to migrate more than 100 million documents to the new index without any hiccups. Hopefully, the productive team behind Elasticsearch will eventually implement simple tools to migrate old indices automatically. But until then, we’ll be staying creative with index aliases and the other tools we have at our disposal.
That’s just one of the ways we’re working to ensure Syncrofy customers continue to receive the high quality experience they have come to expect every time they log-in. To learn more about Syncrofy, click here.
Subscribe to our resources!
Sign up to receive our latest eBooks, webinars, blog posts, newsletter, event invitations, and much more.