I've got a Cassandra cluster (3 nodes, all nodes deployed to AWS) that I am trying to migrate over to a DataStax cluster. It's simply time to stop managing these nodes myself.
I have multiple producers and consumers all reading/writing data, all day long, to my Cassandra cluster. I don't have the option of putting an app/service/proxy in front of my Cassandra cluster, and then just flipping the switch cleanly so that all reads/writes go to/from my Cassandra, over to DataStax. So there's no clean way to migrate the tables one at a time. I'm also trying to achieve zero (or near zero) downtime for all producers/consumers of the data. One hard requirement: the migration cannot be lossy. No lost data!
I'm thinking the best strategy here is a four step process:
- Somehow, configure DataStax to be a replica of my Cassandra cluster, effectively creating streaming replication to DataStax
- Once DataStax is totally "caught up" with the other nodes in my Cassandra, keep the producers writing to my current Cassandra cluster, but cut the consumers/readers over to DataStax (that is, reconfigure them to connect to DataStax, and then restart them). Not zero downtime but I can probably live with a simple restart. (Again, zero downtime solutions are greatly preferred.)
- Cut the producers over to DataStax. Again, only near-zero-downtime, as this involves reconfiguring the producers to point to DataStax, and then requires a restart to pick up the new configs. Zero-downtime solutions would be preferred.
- Once replication traffic from the "old" Cassandra cluster drains to zero, we now have no "new" information that my non-DataStax nodes need to write to DataStax. Kill those nodes with fire.
This solution is the most minimally-invasive, closest-to-zero-downtime solution I can come up with, but assumes a few things:
- Perhaps it is not possible to treat DataStax like an extra node that can be replicated to (yes/no?)
- Perhaps Cassandra and/or DataStax have some magical features/capabilities that I don't know about, that can handle migrations better than this solution; or perhaps there are 3rd party (ideally open source) tools that could handle this better
- I have no idea how I would monitor replication "traffic" coming from the "old" Cassandra nodes into DataStax. Would need to know how to do this before I could safely shutdown + kill the old nodes (again, can't lose data).
I guess I'm wondering if this strategy is: (1) doable/feasible, and (2) optimal; and if there are any features/tools in the Cassandra/DataStax ecosystem that I could leverage to make this any better (faster and with zero downtime).