One off system backup methods

Armijn Hemel, December 1, 2009, 6024 views.

Before a new system install it is smart to do a backup of important personal data. There are various methods how you can do this.

Tags: , , ,

Yesterday I installed Fedora 12 on my laptop. I prefer to do complete reinstalls than to do a distribution upgrade. I have the (false?) impression that a distupgrade seems to leave some stuff lingering around. It also is a good moment to make new choices regarding which packages to install and weed out what I don't need.

Before an installation I backup several directories, mostly /etc/ for some configuration information and the home directories. I do these backups to another machine on the same network (100 Mbit, switched).

scp

The scp program (secure copy) is part of OpenSSH and is a secure replacement or rcp. The program has an option to recursively walk through a tree. While it's great for copying single files there are a few drawbacks to use to backup a whole home directory:

  • scp follows symbolic links. If there are symbolic links to things you don't want to backup you might end up with a lot more data than you wanted to have
  • it reads files on a per file basis, instead of whole blocks

tar with ssh

Traditionally tar (tape archiver) has been used on Unix and Linux systems for backup. Since tar can write to and read from standard in/output it is trivial to tunnel it using ssh:

tar cf - . | ssh example@example.com 'cd /backup ; tar xf -'

The advantage of this approach: tar reads in things pretty quickly, does not follow symlinks by default, it is really easy to do and it does not require any daemons running, apart from OpenSSH, which is already running by default on my machines.

The disadvantage is the overhead for ssh. I could not get the full 100 Mbit/sec on my network.

rsync

Another popular tool to make backups is rsync. Advantages are that it too can be tunneled over ssh if you want to, but you can also just run a daemon, connect to it and read/write (depending on the configuration). In archive mode it will also keep track if files have changed and it will not copy things that have been copied before. This means you can interrupt it and resume it at another time, without having to redo everything.

When putting a backup of my data back at 1 AM I decided that the easiest way to get my data transferred was to temporarily forget about encryption (local switched network, no one else logged in, wireless disabled, etcetera) and just go for raw performance. I quickly set up a very minimalistic rsync configuration (copied from the rsync manpages) just for the backups and transferred 13 GB within about 30 minutes, with a full 100 Mbit/sec for most of the time.

update One reader commented on IRC that I could not do proper math. He is right, since I am no mathematician. My intention with this article was just to say that the overhead from SSH can be noticed and that if you want do do quick data transfers at 1:30 AM (and you really want to go to bed) it is faster to do it with rsync.

Social networking: Tweet this article on Twitter Pass on this article on LinkedIn Bookmark this article on Google Bookmark this article on Yahoo! Bookmark this article on Technorati Bookmark this article on Delicious Share this article on Facebook Digg this article on Digg Submit this article to Reddit Thumb this article up at StumbleUpon Submit this article to Furl

Talkback

respond to this article

Re: One off system backup methods (Jan van Haarst, 2010-01-07 09:46 CET)
If you want to speed up your backup with rsync, you can use 2 options:
1) Use '--compress-level=1' , this will gzip your transferred data. I use level 1 as higher levels take too much cpu and thus slow things down.
2) Use '-e "ssh -c arcfour"' or '-e "ssh -c blowfish"'. This will change the cipher for the ssh link from slow aes to faster arcfour or blowfish. Arcfour is insecure, but for local wired LAN syncs that doesn't matter.