Backups

No need to go on about the importance of backing up your files. Plenty of people have done, and will do that already. Here are a few things to help you out when backing up your machine.

Using tar and ssh for a Network Backup
Using rsync for a Network Backup
Splitting and recombining tar files
Incremental Backup Script

Using tar and ssh for a Network Backup

Tar is a very handy tool for grouping files together into a single bundle. Despite its roots as a tape archiving tool, it is almost ubiquitously used for achieving files on machines without a tape drive.

Tar can be combined with ssh to backup files across a network. The commands below show a few ways of doing this. The first command will create a tar file of the directory dir on the remote machine bkMachine. The username bob is provided for the ssh login, and the path and filename of the resulting remote tar file is specified by the of switch of the dd command.

> tar -cf - dir | ssh bob@bkMachine "dd of=/some/where/dir.tar"

This second command will instead of just creating a tar file, recreate the directory at the remote end.

> tar -cf - dir | ssh bob@bkMachine "tar -C /some/where -xf -"

It is also possible to compress the data sent over the network. This following command uses the -z flag of the tar command to use compression. Note that -z has to be included in both the creating tar command and the extracting tar command.

> tar -zcf - dir | ssh bob@bkMachine "tar -C /some/where -zxf -"

Faster compression and data transfer is possible by using different compression tools and settings. This article on comparing compressions tools gives various ways of speeding up network backups and is worth a read. WARNING: lzop seems to have problems with large files.

Using `rsync` for a Network Backup

rsync is an exceptionally handy tool for backing up files over a network. Its project web page gives a full description of what it can be used for. Below is just one way in which it can be used.

The following command will copy the directory dir and its contents to the remote machine bkmachine. Thus, once finished, the remote machine would have the directory /some/where/dir/ whose contents would match that of the local machine. Note, trailing slashes effect what rsync does. If dir had a trailing slash, then only the contents of the directory would be copied.

> rsync -va --progress --stats dir bob@bkmachine:/some/where/

The table below gives a brief description of the command line flags. Take a look at the man page for more details.

Switch Description

-v Verbose output

-a Archive, an abbreviation for a bunch of switches to recurse and preserve as much as possible

--progress Show file transfer progress

--stats Show statistics after command has finished

Switch	Description
`-v`	Verbose output
`-a`	Archive, an abbreviation for a bunch of switches to recurse and preserve as much as possible
`--progress`	Show file transfer progress
`--stats`	Show statistics after command has finished

Splitting and recombining tar files

If you've created a backup in the form of a large tar file, you can use the split command to divide it into smaller chunks that will fit onto a CD or DVD. You can then use the cat command later to recombine the pieces into a tar file.

The example below splits a large tar into a number of chunks for writing onto a DVD. The chunks will be called backup.tar.chunkXX where XX defines each chunk using letters. The first chunk is called backup.tar.chunkaa, the second backup.tar.chunkab etc... The prefixes are designed for easy reconstruction as you'll see next.

> split -b 500m backup.tar backup.tar.chunk

In fact it's easy to create the tar chunks without having to create a complete tar file first. You can do this by using a pipe as shown below. Tar redirects the tar stream to stdout which split reads and creates the chunks. This is handy if you've limited disk space.

> tar -cf - /home/me | split -b 700m - backup.tar.chunk

Reconstruction can be done with the cat command. Below is an example.

> cat backup.tar.chunk* > backup.tar

Incremental Backup Script

The script below can be used to perform incremental backups on a list of files and directories. A text file, defined by the SRC_FILE variable (in this case called srcFile) is used to declare which directories and files should be backed up. These files should exist within the directory specified in SRC_DIR. BACKUP_DIR is used to specify where the backups should be placed.

File: ~/bin/backup

#!/bin/bash

# Source and destination (No trailing slash needed)
BACKUP_DIR="/files/backups"
SRC_DIR="/home/andy"
SRC_FILE="$BACKUP_DIR/srcFile"
LOG_FILE="$BACKUP_DIR/backup.log"
LOCK_FILE="$BACKUP_DIR/backup.lock"

# Test that required commands are in the path
type date rsync &> /dev/null
if [ $? != "0" ]; then
   echo "No date command" >> $LOG_FILE
   exit 1
fi

RUNDATE=$(date +%Y-%m-%d\(%T\))

echo "$RUNDATE: Started Backup" >> $LOG_FILE

if [ ! -e $SRC_FILE ] ; then
   echo "$RUNDATE: The source list file '$SRC_FILE' does not exist" >> $LOG_FILE
   exit 1
fi;

if [ ! -d $BACKUP_DIR ] ; then
   echo "$RUNDATE: The backup directory '$BACKUP_DIR' does not exist" >> $LOG_FILE
   exit 1
fi;

# The set -C (noclobber) prevents existing files being overwritten by redirection
# The : > basically touches a file (and zero lengths it)
# If the file does not exist, it will be created
(set -C; : > $LOCK_FILE) 2> /dev/null
if [ $? != "0" ]; then
   # A lock file exists so exit with an error
   echo "$RUNDATE: Lock File exists, check: $LOCK_FILE" >> $LOG_FILE
   exit 1
fi

# Add a trap so that the lock file is removed on exit or a ctrl-C
trap 'rm $LOCK_FILE' EXIT

# -----------------------------------------------------------------------------------------
# Do the Incremental backup
# -----------------------------------------------------------------------------------------

# Create the back paths
BK_DIR=$BACKUP_DIR/$(date +%Y-%m-%d\(%T\))
BK_NEW=$BACKUP_DIR/bkNew
BK_OLD=$BACKUP_DIR/bkOld

# If bkOld exists, delete the target directory and the bkOld link
if [ -e $BK_OLD ] ; then
   LS_OUT=$(ls -l $BK_OLD)
   BK_OLD_DIR=${LS_OUT#*-> }     # BK_OLD_DIR is the target of the BK_OLD link

   rm -rf $BK_OLD_DIR            # Remove old backup directory
   rm $BK_OLD                    # Remove the bkOld link
fi

# If bkNew exists, change link from bkNew to bkOld
if [ -e $BK_NEW ] ; then
   LS_OUT=$(ls -l $BK_NEW)
   BK_NEW_DIR=${LS_OUT#*-> }     # BK_NEW_DIR is the target of the BK_NEW link

   ln -s $BK_NEW_DIR $BK_OLD     # Link bkOld to the current new directory
   rm $BK_NEW                    # Remove the bkNew link
   cp -al $BK_NEW_DIR $BK_DIR    # Copy (archive and link) data into the new directory
fi


# Now do the actual backup
rsync -ar --delete --files-from=$SRC_FILE $SRC_DIR $BK_DIR/
RSYNC_CODE=$?

# Check rsync success
if [ $RSYNC_CODE != "0" ]; then
   echo "$RUNDATE: Rsync returned a failure code: $RSYNC_CODE" >> $LOG_FILE
   exit 1
fi

# Link bkNew to the new backup directory
ln -s $BK_DIR $BK_NEW

echo "$RUNDATE: Backup Finished" >> $LOG_FILE

# All done
exit 0

Below is an example srcFile. The file used by rsync to determine which files and directories need backing up.

File: srcFile

bin
code
documents
projects
.vimrc
.bashrc
.bash_profile
.inputrc
.dir_colors

To make sure that the script does get run, it's worth setting up cron to run it at regular intervals. The crontab file below will perform the backup once every half an hour. Note the use of the nice command, to reduce the scripts impact on the system.

File: ~/.crontab

# Min  Hours  Days  Months  Day  Command
  0,30 *      *     *       *    nice -n 19 /home/andy/bin/backup

The Lab Book Pages

Backups

Using tar and ssh for a Network Backup

Using rsync for a Network Backup

Splitting and recombining tar files

Incremental Backup Script

Using `rsync` for a Network Backup