0

I wish to delete a large amount of rows from a particular table

I did the following steps: 1) Set gc_grace_seconds = 0 for the table 2) Deleted a large number of rows ~1 million 3) Ran ./nodetool compact keyspace_name table_name

However when I ran nodetool compact(Step 3) nothing happens. It does not start compaction. Due to the large number of tombstones most of my requests now timeout as well.

The table has the following settings:

AND bloom_filter_fp_chance = 0.001
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'tombstone_threshold': '0.2', 'tombstone_compaction_interval': '86400', 'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 0
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';

I wish to compact and get rid of the tombstones so that I can actually get rid of the unwanted data.

I have two nodes in my cluster with replication factor 2 Since I did the deletes the difference in size between the two has increased. There is a difference of about 700MB. I am using dsc-cassandra-2.1.10

cfstats are shown below

Keyspace: keyspace1
        Read Count: 16316
        Read Latency: 12.23892982348615 ms.
        Write Count: 11078808
        Write Latency: 0.6955001765532899 ms.
        Pending Flushes: 0
                Table: table1
                SSTable count: 92
                SSTables in each level: [1, 4, 38, 49, 0, 0, 0, 0, 0]
                Space used (live): 38247164244
                Space used (total): 38247164244
                Space used by snapshots (total): 26692664189
                Off heap memory used (total): 14695952
                SSTable Compression Ratio: 0.32499125289530584
                Number of keys (estimate): 2788
                Memtable cell count: 16632
                Memtable data size: 1839846
                Memtable off heap memory used: 0
                Memtable switch count: 93
                Local read count: 16316
                Local read latency: 12.239 ms
                Local write count: 11078808
                Local write latency: 0.696 ms
                Pending flushes: 0
                Bloom filter false positives: 331
                Bloom filter false ratio: 0.00000
                Bloom filter space used: 10960
                Bloom filter off heap memory used: 10224
                Index summary off heap memory used: 3672
                Compression metadata off heap memory used: 14682056
                Compacted partition minimum bytes: 216
                Compacted partition maximum bytes: 3449259151
                Compacted partition mean bytes: 25823653
                Average live cells per slice (last five minutes): 405.3014160485502
                Maximum live cells per slice (last five minutes): 5002.0
                Average tombstones per slice (last five minutes): 0.0
                Maximum tombstones per slice (last five minutes): 0.0

1 Answers1

0

compaction strategy dictates the behavior of nodetool compact and there are subtle differences in the api between versions

http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/tools/toolsCompact.html vs https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCompact.html

To remove the data and tombstones:

  1. switch the compaction strategy to SizeTieredComapction
  2. run a major compaction that will generate one sstable (that will not hold tombstones / data covered by tombstones)
  3. switch compaction back to LeveledCompaction

Executing a major compaction and switching between compaction strategies is an IO intensive operation- please take that into account.

Shlomi Livne
  • 439
  • 2
  • 3