In Opscenter I see one of the nodes is orange. It seems like it's working on compaction. I used nodetool compactionstats and whenever I did this the Completed nad percentage stays the same (even with hours in between). I currently don't see cpu load from cassandra on that node. So it seems stuck (somewhere mid 60%). Also some other nodes have compaction on the same columnfamily. I don't see any progress.
EDIT: all of a sudden I see a change again. However the progress didn't go up. It came from +60% and is now on 50.95%. So it seems like compaction restarted. EDIT2: seems like it actually finished all of a sudden and I confused 2 similarly named columnfamilies. EDIT3: The finish was on another node that seemed stuck too. One of the nodes is still in this "stuck" state, drained and not using cpu for java.
WARN [RMI TCP Connection(554)-192.168.0.68] 2015-11-09 17:18:13,677 ColumnFamilyStore.java (line 2101) Unable to cancel in-progress compactions for usage_record_ptd. Probably there is an unusually large row in progress somewhere. It is also possible that buggy code left some sstables compacting after it was done with them
- How can I assure that nothing is happening?
- Is it recommended to disable compaction from a certain data size? (I believe 25GB on each node).
- Can I stop this compaction? nodetool stop compaction doesn't seem to work.
- Is stopping the compaction dangerous?
- Is killing the cassandra process dangerous while compacting(I did nodetool drain on one node)?
Any other remarks? Thanks a lot in advance!
This is output of nodetool compactionstats grepped for the keyspace that seems stuck.
4e48f940-86c6-11e5-96be-dd3c9e46ec74 mykeyspace mycolumnfamily 1447062197972 52321301 16743606 {1:2, 4:248}
94acec50-86c8-11e5-96be-dd3c9e46ec74 mykeyspace mycolumnfamily 1447063175061 48992375 13420862 {3:3, 4:245}
3210c9b0-8707-11e5-96be-dd3c9e46ec74 mykeyspace mycolumnfamily 1447090067915 52763216 17732003 {1:2, 4:248}
24f96fe0-86ce-11e5-96be-dd3c9e46ec74 mykeyspace mycolumnfamily 1447065564638 44909171 17029440 {1:2, 3:39, 4:209}
06d58370-86ef-11e5-96be-dd3c9e46ec74 mykeyspace mycolumnfamily 1447079687463 53570365 17873962 {1:2, 3:2, 4:246}
f7aa5fa0-86c7-11e5-96be-dd3c9e46ec74 mykeyspace mycolumnfamily 1447062911642 47701016 13291915 {3:2, 4:246}
806a4380-86f7-11e5-96be-dd3c9e46ec74 mykeyspace mycolumnfamily 1447083327416 52644411 17363023 {1:2, 2:1, 4:247}
c845b900-86c5-11e5-96be-dd3c9e46ec74 mykeyspace mycolumnfamily 1447061973136 48944530 16698191 {1:2, 3:6, 4:242}
bb44a0b0-8718-11e5-96be-dd3c9e46ec74 mykeyspace mycolumnfamily 1447097599547 48768463 13518523 {2:2, 3:5, 4:242}
f2c17ea0-86c3-11e5-96be-dd3c9e46ec74 mykeyspace mycolumnfamily 1447061185418 90367799 13904914 {5:4, 6:7, 7:52, 8:185}
1aae6590-86ce-11e5-96be-dd3c9e46ec74 mykeyspace mycolumnfamily 1447065547369 53190698 17228121 {1:2, 4:248}
d7ca8d00-86d5-11e5-96be-dd3c9e46ec74 mykeyspace mycolumnfamily 1447068871120 52422499 16995963 {1:2, 3:3, 4:245}
6e890290-86df-11e5-96be-dd3c9e46ec74 mykeyspace mycolumnfamily 1447072989497 45218168 17174468 {1:2, 3:21, 4:227}
I also see frequently lines like this in system.log:
WARN [Native-Transport-Requests:11935] 2015-11-09 20:10:41,886 BatchStatement.java (line 223) Batch of prepared statements for [billing.usage_record_by_billing_period, billing.metric] is of size 53086, exceeding specified threshold of 5120 by 47966.