I was recently working on a database migration project, and needed to distribute some snapshots of the database that were generated using xtrabackup, to a number of replication slaves in our VM cluster (dare I use the buzzword: cloud).

The size of the snapshot was approx. 110GB, which is fairly small, but I needed to reduce the size as much as possible to speed-up the transfer process, as well as reduce my network bandwidth consumption.

Initially, I created a .tar archive, and then compressed the tar archive using gzip with slow/best compression.

tar -cf db.snapshot.2016-03-31.tar db.snapshot.2016-03-31; 
gzip -9 db.snapshot.2016-03-31.tar

This reduced the size of the snapshot from approx. ~110GB, down to approx. ~22GB, which greatly reduced network traffic overhead. However, I wasn't satisfied, and knew I'd get better results if I used bzip2, instead.

tar -cf db.snapshot.2016-03-31.tar db.snapshot.2016-03-31; 
bzip2 -9 db.snapshot.2016-03-31.tar

Sure enough, using bzip2 instead of gzip on the same snapshot archive, with the same (slowest) compression speed (-9), reduced it even further down to approx. ~17GB, which is about as best as I could hope to achieve.

While it did take quite some time to compress the archive, it greatly reduced the time required to distribute the snapshot to all of the database replication slaves in the cluster. In the end, there seems to be many people who favor bzip2 over gzip, and vice versa, similar to the whole emacs vs. vim types, but I think both programs are equally acceptable, and I don't really care to debate about compression algorithms with people.

Copyleft (<) 1998-2017 www.seanodonnell.com