Unbalanced Cassandra cluster

Question

Update - Short version:
The PropertyFileSnitch cassandra-topology.properties for the first 3 nodes (Rack 1-3) states that only these nodes are in DC1 and the others are in DC2 by specifying the default value default=DC2:r1. When the cluster was scaled up by adding nodes 4 and 5 the PropertyFileSnitch for these nodes was configured to add them in DC1 as well in Rack 4 and 5 but the snitch from the first 3 nodes remained unchanged and as a result the cluster is in this inconsistent state.

My question is if this cluster can be rebalanced (fixed). Would it suffice if I did a full cluster restart after fixing the cassandra-topology.properties?
Please advise on how I can safely rebalance the cluster.

Longer version:

I am new to Cassandra and I started working on an already built cluster.
I have 5 nodes in the same data center on different racks running Cassandra version 3.0.5 with vnodes num_tokens: 256 and a keyspace with replication = {'class': 'NetworkTopologyStrategy', 'DC1': '3'} AND durable_writes = true.
Historically there were only 3 nodes and the cluster was scaled up with an additional 2 nodes. I have an automatic repair script that runs nodetool repair with options parallelism: parallel, primary range: false, incremental: true, job threads: 1.

After a large amount of data was inserted the problems started to appear. When running the repair script on node 4 or 5 the node 2 gets overloaded: the CPU usage stays at 100%, the MutationStage queue grows and the GC pauses take at least 1s until the Cassandra process finally dies. The repair result is usually failed with error Stream failed (progress: 0%).

When running the nodetool status command on nodes 1, 2 or 3 I get the following output:

Datacenter: DC2
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens  Owns (effective)  Host ID    Rack
UN  10.0.0.13    10.68 GB   256     0.0%              75e17b8a   r1
UN  10.0.0.14    9.43 GB    256     0.0%              21678ddb   r1
Datacenter: DC1
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens  Owns (effective)  Host ID    Rack
UN  10.0.0.10    16.14 GB   256     100.0%            cf9d327f   Rack1
UN  10.0.0.11    22.83 GB   256     100.0%            e725441e   Rack2
UN  10.0.0.12    19.66 GB   256     100.0%            95b5c8e3   Rack3

But when running the nodetool status command on nodes 4 or 5 I get the following output:

Datacenter: DC1
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens  Owns (effective)  Host ID    Rack
UN  10.0.0.13   10.68 GB   256     58.9%             75e17b8a   Rack4
UN  10.0.0.14   9.43 GB    256     61.1%             21678ddb   Rack5
UN  10.0.0.10   16.14 GB   256     60.3%             cf9d327f   Rack1
UN  10.0.0.11   22.83 GB   256     61.4%             e725441e   Rack2
UN  10.0.0.12   19.66 GB   256     58.3%             95b5c8e3   Rack3

After further investigation it seems that the PropertyFileSnitch cassandra-topology.properties was not updated on nodes 1, 2 and 3 (which are also the seeds for this cluster) after the cluster was scaled up.

Thanks!

score 1 · Answer 1 · answered Mar 27 '17 at 12:08

After searching in several online resources I found some possible solutions. I'll post them here so it will be accessible for everyone.

From Practical Cassandra: A Developer's Approach:

Ring View Differs between Nodes
When the ring view differs between nodes, it is never a good thing. There is also no easy way to recover from this state. The only way to recover is to do a full cluster restart. A rolling restart won’t work because the Gossip protocol from the bad nodes will inform the newly booting good nodes of the bad state. A full cluster restart and bringing the good nodes up first should enable the cluster to come back up in a good state.

The same solution can be found also in DataStax docs: View of ring differs between some nodes

I also found a similar question on Apache Cassandra Community. The answer on the community users thread is:

What has happened is that you have now two datacenters in your cluster. The way they replicate information will depend on your keyspace settings. Regarding your process I don't think it is safe to do it that way. I'd start off by decommissioning nodes 4 and 5 so that your cluster is back to 1 datacenter with 3 nodes and then add them sequentially again making sure the configuration in the Snitch is the proper one.

score 0 · Answer 2 · answered Mar 21 '17 at 19:35

I can't tell if what you suggested is sufficient without accessing the system, but I have a few observations. The ownership should be distributed between all nodes in the cluster. This means that the total of all values under the "Owns" tab for all 5 nodes should be equal to 100 if they are forming one cluster. Having several nodes owning 100% of the cluster doesn't look right. This indicates that each node is acting in standalone mode and is not joined to the cluster.
I see Address 10.40.0.10 in the first printout while it is 10.0.0.10 in the second printout. Looks like a misconfiguration. Additionally, check if each node can reach all other nodes IP addresses. I see 10.0.0.13 belongs to 'r1' in the first printout while it belongs to 'Rack4' in the second.
For simplicity and ease of configuration, you can configure one datacenter (e.g. DC1) and one rack (e.g. Rack1) for all 5 nodes regardless of their physical distribution.

You're right, **it's a misconfiguration**. The *PropertyFileSnitch* `cassandra-topology.properties` for the first 3 nodes (Rack 1-3) states that only these nodes are in DC1 and the others are in DC2 by specifying the default value `default=DC2:r1`. When the cluster was scaled up by adding nodes 4 and 5 the *PropertyFileSnitch* for these nodes was configured to add them in DC1 as well in Racks 4 and 5 but the snitch from the first 3 nodes remained unchanged and as a result the cluster is in this inconsistent state. I am trying to find out how can I **safely** reconfigure it. — alien5, Mar 21 '17 at 20:21
Thank you for pointing out the IP mistake, it slipped when formatting the post width. The nodes are in the same data center and have access to one another. They can see each other's load but the load is not distributed equally because of the misconfigured snitch. — alien5, Mar 21 '17 at 20:27

Unbalanced Cassandra cluster

2 Answers2