In Cassandra, data isn’t deleted in the same way it is in RDBMSs. Cassandra is designed for high write throughput, and avoids reads-before-writes. It uses sstables, which are immutable once written. So, a delete is actually an update, and updates are actually inserts (into new sstables). A “tombstone” marker is written to indicate that the data is now (logically) deleted. (See here for a good writeup about tombstones from The Last Pickle.)
Heavy deletes can lead to not only extra disk space usage, but also decreasing performance on reads. (Since we might have to read through all those tombstones to find any “live” data.) Over time, tombstones get cleaned up during anti-entropy compactions (the process that cleans up out-of-date records). However, how quickly this happens depends on many things – compaction strategies, sstable sizes, gc_grace_seconds, whether relevant rows are in one sstable or spread out over multiple sstables, etc.
People often ask how they can clean up unwanted tombstones, especially if they’re getting tombstone warnings or errors in their logs. We can tune the compaction to be more aggressive (using compaction subproperties), then sit back and wait again. Or, in some cases customers use forced major compactions, but this can lead to issues down the road (having one big sstable that won’t get compacted for some time), so might also require using sstablesplit to split up the large resulting sstable. Or, a hack for smaller tables is to alter the table to use a different compaction strategy, then alter it back. (Forcing all sstables to be rewritten.)
Recently, though, it was pointed out to me that DSE 5.1 (and Apache Cassandra 3.10) offer a new nodetool command, ‘nodetool garbagecollect’, introduced in jira CASSANDRA-7019 . This was added for just this situation, and is designed to clean up a table’s droppable tombstones.
This looks to be a very useful update. I haven’t played with it much yet – if you have any good or bad experiences with it, I’d love to hear them!