0

In Opscenter I see one of the nodes is orange. It seems like it's working on compaction. I used nodetool compactionstats and whenever I did this the Completed nad percentage stays the same (even with hours in between). I currently don't see cpu load from cassandra on that node. So it seems stuck (somewhere mid 60%). Also some other nodes have compaction on the same columnfamily. I don't see any progress.

EDIT: all of a sudden I see a change again. However the progress didn't go up. It came from +60% and is now on 50.95%. So it seems like compaction restarted. EDIT2: seems like it actually finished all of a sudden and I confused 2 similarly named columnfamilies. EDIT3: The finish was on another node that seemed stuck too. One of the nodes is still in this "stuck" state, drained and not using cpu for java.

 WARN [RMI TCP Connection(554)-192.168.0.68] 2015-11-09 17:18:13,677 ColumnFamilyStore.java (line 2101) Unable to cancel in-progress compactions for usage_record_ptd.  Probably there is an unusually large row in progress somewhere.  It is also possible that buggy code left some sstables compacting after it was done with them
  • How can I assure that nothing is happening?
  • Is it recommended to disable compaction from a certain data size? (I believe 25GB on each node).
  • Can I stop this compaction? nodetool stop compaction doesn't seem to work.
  • Is stopping the compaction dangerous?
  • Is killing the cassandra process dangerous while compacting(I did nodetool drain on one node)?

Any other remarks? Thanks a lot in advance!

This is output of nodetool compactionstats grepped for the keyspace that seems stuck.

4e48f940-86c6-11e5-96be-dd3c9e46ec74     mykeyspace            mycolumnfamily             1447062197972             52321301       16743606       {1:2, 4:248}
94acec50-86c8-11e5-96be-dd3c9e46ec74     mykeyspace            mycolumnfamily             1447063175061             48992375       13420862       {3:3, 4:245}
3210c9b0-8707-11e5-96be-dd3c9e46ec74     mykeyspace            mycolumnfamily             1447090067915             52763216       17732003       {1:2, 4:248}
24f96fe0-86ce-11e5-96be-dd3c9e46ec74     mykeyspace            mycolumnfamily             1447065564638             44909171       17029440       {1:2, 3:39, 4:209}
06d58370-86ef-11e5-96be-dd3c9e46ec74     mykeyspace            mycolumnfamily             1447079687463             53570365       17873962       {1:2, 3:2, 4:246}
f7aa5fa0-86c7-11e5-96be-dd3c9e46ec74     mykeyspace            mycolumnfamily             1447062911642             47701016       13291915       {3:2, 4:246}
806a4380-86f7-11e5-96be-dd3c9e46ec74     mykeyspace            mycolumnfamily             1447083327416             52644411       17363023       {1:2, 2:1, 4:247}
c845b900-86c5-11e5-96be-dd3c9e46ec74     mykeyspace            mycolumnfamily             1447061973136             48944530       16698191       {1:2, 3:6, 4:242}
bb44a0b0-8718-11e5-96be-dd3c9e46ec74     mykeyspace            mycolumnfamily             1447097599547             48768463       13518523       {2:2, 3:5, 4:242}
f2c17ea0-86c3-11e5-96be-dd3c9e46ec74     mykeyspace            mycolumnfamily             1447061185418             90367799       13904914       {5:4, 6:7, 7:52, 8:185}
1aae6590-86ce-11e5-96be-dd3c9e46ec74     mykeyspace            mycolumnfamily             1447065547369             53190698       17228121       {1:2, 4:248}
d7ca8d00-86d5-11e5-96be-dd3c9e46ec74     mykeyspace            mycolumnfamily             1447068871120             52422499       16995963       {1:2, 3:3, 4:245}
6e890290-86df-11e5-96be-dd3c9e46ec74     mykeyspace            mycolumnfamily             1447072989497             45218168       17174468       {1:2, 3:21, 4:227}

I also see frequently lines like this in system.log:

WARN [Native-Transport-Requests:11935] 2015-11-09 20:10:41,886 BatchStatement.java (line 223) Batch of prepared statements for [billing.usage_record_by_billing_period, billing.metric] is of size 53086, exceeding specified threshold of 5120 by 47966.
  • about your last warning. It means you have too many statements in one batch (+50000). This will put load on the coordinate node. Instead it's recommended to use small batches group by the same PK or even no batch and let token aware driver do the work. – treehouse Nov 09 '15 at 20:45
  • for the compaction question I suggest you ask in the mailing list. mailto:user-subscribe@cassandra.apache.org – treehouse Nov 09 '15 at 20:47
  • In the node system.log, do you find any hint of compaction activity ? – Adpi2 Nov 10 '15 at 11:47
  • I see very few messages about compaction. I had a few messages about large rows (biggest was 230MB IIRC). Between those there is literally hours. I did a strace and it showed Resource unavailable and some segmentation faults. The resource unavailable is not due to ulimit settings. Maybe IO that's taking a while. – th3penguinwhisperer Nov 10 '15 at 13:56
  • @th3penguinwhisperer did you find solution for this ? nodetool stop -- COMPACTION didnt help – Jigar Shah Jul 31 '17 at 14:05

0 Answers0