I’ve a feeling we’re not in Kansas anymore – backups in Cassandra

(The sarcastic image above is from my older blog, Oracle2MySQL.)

In the spirit of my other blog, Oracle2MySQL, I was trying to think of things that caught my by surprise when I was exploring Cassandra.  One thing that really got me was the mechanisms for backups.

I read about taking backups using snapshots.  Sounded good so far, I’ve used LVM snapshots to backup databases, although they can cause a drag on performance while you copy them off.

But, wait, I go to the next page, and it says the snapshot is taken by creating hard links to the data files.  This is where I experienced an impedance mismatch.  I figured something was wrong with the documentation, it made no sense.  Hard links don’t do anything but link to the existing file.  How is that a point-in-time, consistent snapshot of the database?

It took me a bit to shake myself out of it, and remember the important fact:  sstables are immutable.  All that is needed is a way to record which files exist at that point in time, and prevent deletion.  That was when I said to myself, “Now I… I know we’re not in Kansas.”