6

I'm running a Cassandra cluster with version 2.2.3. The cluster consists of 3 nodes now, two of them are seeds and one is a normal node.

When I start repair on each node (command: nodetool repair -tr my_keyspace), I found the command was blocked on each node, I redirect the trace back information to the log and found there are many errors like the following:

Session completed with the following error: org.apache.cassandra.exceptions.RepairException: [repair #5717bb00-e685-11e5-801e-c71692f88562 on my_keyspace/node, (4856831381680181267,4878966233072304148]] Validation failed in /10.16.170.20

Has anyone faced the error before? Can we run Cassandra nodetool repair parallelly on each node?

Elias Mårtenson
  • 3,624
  • 19
  • 32
zhibo fu
  • 91
  • 3
  • Can you provide any exceptions from the system.log? – Stefan Podkowinski Mar 11 '16 at 10:20
  • I found that there are many exceptions recorded in system.log. most of them seems like following: WARN [RepairJobTask:6] 2016-03-10 22:01:22,757 RepairJob.java:162 - [repair #8ac54e8f-e74e-11e5-abdf-417f87165ecc] node sync failed – zhibo fu Mar 14 '16 at 02:54
  • `ERROR [RepairJobTask:6] 2016-03-10 22:01:22,767 RepairSession.java:290 - [repair #8ac54e8f-e74e-11e5-abdf-417f87165ecc] Session completed with the following error org.apache.cassandra.exceptions.RepairException: [repair #8ac54e8f-e74e-11e5-abdf-417f87165ecc on my_keyspaces/node, (-5727290568361773337,-5702819924840199489]] Validation failed in /10.16.170.20` – zhibo fu Mar 14 '16 at 02:56
  • `ERROR [ValidationExecutor:5] 2016-03-10 22:01:22,770 CassandraDaemon.java:185 - Exception in thread Thread[ValidationExecutor:5,1,main] java.lang.RuntimeException: Cannot start multiple repair sessions over the same sstables` – zhibo fu Mar 14 '16 at 02:56
  • `ERROR [Repair#1:1] 2016-03-10 22:01:22,779 CassandraDaemon.java:185 - Exception in thread Thread[Repair#1:1,5,RMI Runtime] java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down` – zhibo fu Mar 14 '16 at 02:57

2 Answers2

1

I also found that if I execute nodetool repair without "-pr", the repair can complete successful more times, it seems that the "-pr" is not recommended after 2.2.x.:) Another I have found that execute nodetool repair without "-pr" parallelly on each node, it work good. However, there are no new data updated.

zhibo fu
  • 91
  • 3
1

One guess is that when you run Cassandra repair, you'd better run repair node by node, which means only after one node is done, you start repair on the other node. The other thing is that on Cassandra 2.2 and above, incremental repair is the default and there're some discussion on using -pr together with incremental repair and the conclusion is not suggesting them used together(references:http://docs.datastax.com/en/archived/cassandra/2.2/cassandra/tools/toolsRepair.html and https://groups.google.com/forum/#!topic/nosql-databases/qzdbVLGFrD8) Hope these are helpful to you!

Zhong Hu
  • 264
  • 2
  • 5