One off system backup methods
December 1, 2009,
Before a new system install it is smart to do a backup of important personal data. There are various methods how you can do this.
Yesterday I installed Fedora 12 on my laptop. I prefer to do complete reinstalls than to do a distribution upgrade. I have the (false?) impression that a distupgrade seems to leave some stuff lingering around. It also is a good moment to make new choices regarding which packages to install and weed out what I don't need.
Before an installation I backup several directories, mostly /etc/ for some configuration information and the home directories. I do these backups to another machine on the same network (100 Mbit, switched).
The scp program (secure copy) is part of OpenSSH and is a secure replacement or rcp. The program has an option to recursively walk through a tree. While it's great for copying single files there are a few drawbacks to use to backup a whole home directory:
- scp follows symbolic links. If there are symbolic links to things you don't want to backup you might end up with a lot more data than you wanted to have
- it reads files on a per file basis, instead of whole blocks
tar with ssh
Traditionally tar (tape archiver) has been used on Unix and Linux systems for backup. Since tar can write to and read from standard in/output it is trivial to tunnel it using ssh:
tar cf - . | ssh email@example.com 'cd /backup ; tar xf -'
The advantage of this approach: tar reads in things pretty quickly, does not follow symlinks by default, it is really easy to do and it does not require any daemons running, apart from OpenSSH, which is already running by default on my machines.
The disadvantage is the overhead for ssh. I could not get the full 100 Mbit/sec on my network.
Another popular tool to make backups is rsync. Advantages are that it too can be tunneled over ssh if you want to, but you can also just run a daemon, connect to it and read/write (depending on the configuration). In archive mode it will also keep track if files have changed and it will not copy things that have been copied before. This means you can interrupt it and resume it at another time, without having to redo everything.
When putting a backup of my data back at 1 AM I decided that the easiest way to get my data transferred was to temporarily forget about encryption (local switched network, no one else logged in, wireless disabled, etcetera) and just go for raw performance. I quickly set up a very minimalistic rsync configuration (copied from the rsync manpages) just for the backups and transferred 13 GB within about 30 minutes, with a full 100 Mbit/sec for most of the time.
update One reader commented on IRC that I could not do proper math. He is right, since I am no mathematician. My intention with this article was just to say that the overhead from SSH can be noticed and that if you want do do quick data transfers at 1:30 AM (and you really want to go to bed) it is faster to do it with rsync.