Tag Archives: rdiff-backup

Backup – rsnapshot and rdiff (multiple backups)

This is a very basic/simple guide about how to setup incremental and versioned backups of your Linux computers and Mac. 🙂

Initial problem:

    • Time Machine is unreliable after a while, and when you put on sleep your Mac, most of the time it complains because the USB drive wasn’t disconnected properly :@
    • I’d like to be able to have an incremental/versioning backup system local BUT also have some of critical files uploaded in the cloud [using some cron and some cloud provider’s utility]
    • Time Machine on external drives uses ‘sparsebundle’ storage system, which is complicated to open and extract files from Linux command line [I’ve previously created a Time Machine on the pi, and I was thinking to create a sort of system to open the sparsebundle file, and upload the files during the night – but this doesn’t seem easy or neither really reliable]
    • Backing up VMs with Time Machine takes ages, as if a little bit changes, the whole content gets copied over (space and time consuming)

So… I needed something that could:

  • Do incremental backups storing only the differences (for VMs) to avoid to transfer every time GBs of data for little changes
  • Do versioning of small files (documents, videos, music, etc…) based on a custom schedule
  • Be accessible on the filesystem without tricky stuffs (like opening a ‘sparsebundle’ file
  • Be able to run on a raspberry pi and mostly likely, able to access Linux and Mac systems, and have a centralised backup system.

Answer: combination of rsnapshot and rdiff-backups… plus some sort of Cloud Provider’s utility to sync part of this content on the Cloud (still work in progress).
I found this nice article where it explains the differences between the two tools, and it should clarify why I’ve chosen to use a combination of both of them and not just one.
The main bit is this one:

rdiff-backup stores previous versions as compressed deltas to the current version similar to a version control system. rsnapshot uses actual files and hardlinks to save space. For small files, storage size is similar. For large files that change often, such as logfiles, databases, etc., rdiff-backup requires significantly less space for a given number of versions.

So, I’ve installed rsnapshot and rdiff-backups on my pi. Packages are available using apt-get command.
After that, I have created one rsnapshot configuration file for each of my linux machines (actually pi’s) and one for my Macrdiff-backup will be called within rsnapshot, in a post-exec script (option available, and very handy).

It’s clearly necessary to have SSH enable on your Linux and Mac machines. Also, in this particular case, I have added the following in visudo on the Mac, to allow the user to run pmset passwordless:

user ALL=(ALL) NOPASSWD: /usr/bin/pmset

Configuration files

I’m posting 2 configuration examples: one for my pi (local backup_, and the other onefor my Mac (remote backup – via ssh/rsync).
I’ve literally kept the original /etc/rsnapshot.conf just as reference – not actively using at all.

Here my custom configuration files:

/etc/default/rsnapshot

This is a file that I’ve created and I use it as “default/general” parameters that I include in any of the other custom files. Why? Just to avoid to copy and paste the same on any custom file 🙂

#####################################
# Default configuration paramenters #
#####################################
# just use include_conf <tab> file:
#include_conf /etc/default/rsnapshot
config_version 1.2
no_create_root 1
cmd_cp /bin/cp
cmd_rm /bin/rm
cmd_rsync /usr/bin/rsync
cmd_ssh /usr/bin/ssh
cmd_logger /usr/bin/logger
cmd_du /usr/bin/du
du_args -csh
link_dest 1
use_lazy_deletes 1
rsync_numtries 3
#stop_on_stale_lockfile 0

PI configuration file (local backup)

pi1_rsnap.conf

# pi1 conf file
include_conf /etc/default/rsnapshot
snapshot_root /USB/backups/pi1/
#retain hourly 6
retain daily 7
retain weekly 4
retain monthly 12
logfile /var/log/rsnapshot/p1.log
lockfile /USB/backups/rsnapshot_run/pi1.pid
#sync_first 1
verbose 2
loglevel 5
use_lazy_deletes 1
backup /home/ files/
backup /etc/ files/
backup /var/spool/cron/ files/
backup_script /usr/bin/dpkg --get-selections > packages.txt installed-packages/

This script copies home, etc, cron into /USB/backups/pi1/daily.0/files/.
The last line also execute the command and pull the output file and store within /USB/backups/pi1/daily.0/installed-packages/


The MAC configuration (remote backup).

This requires some extras.
What I’ve done is combining a pre and post script around the rsnapshot backup, in order to obtain the following:

  1. waking up the MAC via wake-on-lan package (this is possible because my MAC is connected also via ethernet)
  2. connect via ssh
  3. send a command to keep the disk on and avoid them to go in idle
  4. visually notify that the backup is about to run (in case someone is currently using the Mac)
  5. run the rsnapshot backup
  6. once finished, run rdiff-backup for the big files (VMs)
  7. once done, kill the process that was keeping the disks on
  8. visual notification sent to inform that backup has completed
  9. disconnect. If no one is connected, the Mac will go back in standby (if enabled).
  10. clean up old rdiff-backups

mac_rsnap.conf

# mac conf file
include_conf /etc/default/rsnapshot
snapshot_root /USB/backups/mac/
#retain hourly 6
#retain daily 7
retain weekly 4
retain monthly 12
logfile /var/log/rsnapshot/mac.log
lockfile /USB/backups/rsnapshot_run/mac.pid

#rsync_short_args -a
rsync_long_args --delete --numeric-ids --relative --delete-excluded --filter=". /etc/rsnapshot_configs/mac/<span style="color: #0000ff;">rsync_selections</span>"

#sync_first 1
verbose 1
loglevel 5
use_lazy_deletes 1

# Specify the path to a script (and any optional arguments) to run right
# before rsnapshot syncs files
<span style="color: #339966;">cmd_preexec</span> /etc/rsnapshot_configs/mac/<span style="color: #0000ff;">pre-exec.sh</span>

# Specify the path to a script (and any optional arguments) to run right
# after rsnapshot syncs files
<span style="color: #339966;">cmd_postexec</span> /etc/rsnapshot_configs/mac/<span style="color: #0000ff;">rdiff_vms.sh</span>

#Remote backup
</code><code>backup user@mac:/ files/

The following bash scripts have some parameters that need to be set manually (highlighted in orange)

pre-exec.sh

#!/bin/bash

# --------------------------------------------- #
# This script wake up the mac box via ethernet
# using wake-on-lan, wait for ssh connection,
# connects and issue a command to keep the
# disks on for the following backup tasks.
#
# There is a timeout for number of tries. If
# reached, an email notification will be sent.
# --------------------------------------------- #

# Email parameters
EMAIL="<span style="color: #ff9900;">[email protected]</span>"
SENDMAIL=<span style="color: #ff9900;">/usr/sbin/sendmail</span>

# MAC details
MACADDR="<span style="color: #ff9900;">xx:xx:xx:xx:xx:xx</span>"
USER=<span style="color: #ff9900;">user</span>
HOST=<span style="color: #ff9900;">mac</span>

# Estimated amount of time to get ssh available
waitBeforeTry=<span style="color: #ff9900;">40</span>

# Retries parameters
sleepSecInterval=5
maxConnectionAttempts=10

# --------------------------------------------- #
emailnotification () {
echo -e "Subject:$1\n" | $SENDMAIL $EMAIL
logger "${BASH_SOURCE[0]} PID $ - $1"
}

# Turn on your mac via Ethernet LAN
sudo /usr/sbin/etherwake $MACADDR

sleep $waitBeforeTry

index=1
while (( $index <= $maxConnectionAttempts ))
do
echo quit | telnet $HOST 22 2>/dev/null | grep -q Connected
if [ $? -ne 0 ] ; then
sleep $sleepSecInterval
((index+=1)) #; echo "DEBUG: $index"
else
break
fi
done

# Notify if reach max attempts
MSG="Unable to connect to $USER@$HOST after $maxConnectionAttempts attempts."
[ $index -eq $maxConnectionAttempts ] && emailnotification $MSG

# Connect via ssh and disable sleep and disksleep
ssh $USER@$HOST 'sudo pmset sleep 0'
ssh $USER@$HOST 'sudo pmset disksleep 0'
#ssh $USER@$HOST 'nohup pmset noidle > /dev/null 2>&1 &'
ssh $USER@$HOST ' osascript -e '"'"'display notification "Starting Backup in few seconds" with title "Backup starts" sound name "default" '"'"' '

sleep 5

rdiff_vms.sh

#!/bin/bash

# Script executed after rsnapshot
USER=<span style="color: #ff9900;">user</span>
HOST=<span style="color: #ff9900;">mac</span>

# ===================================================
rdiff-backup --exclude-symbolic-links $USER@$HOST::Users/user/Documents/VMs/ /USB/backups/mac/VMs/

# All files should be now backed up

# Re-setting previous values for sleep and disksleep... and notify
ssh $USER@$HOST 'sudo pmset sleep 10'
ssh $USER@$HOST 'sudo pmset disksleep 10'
#ssh $USER@$HOST 'pkill pmset noidle'
ssh $USER@$HOST ' osascript -e '"'"'display notification "Backup has now completed." with title "Backup Finished" sound name "default" '"'"' '

# Putting on sleep the box - NOT REQUIRED
# sleep will happen automatically and no risk to force sleep if I'm using it
#ssh $USER@$HOST 'sudo pmset sleepnow'

# Cleaning up old backups: remove backups older than 6 months
rdiff-backup --remove-older-than 6M --force /USB/backups/mac/VMs/

The following file is the one used as ‘filter‘ for rsync. It uses that syntax.
To clarify, this does the backup of Documents, Pictures, Movies, Music folders ONLY from the user called ‘user‘, excluding the subfolders ‘VMs‘ in Documents, all the folders that starts with ‘Season‘ in Movies, any other possible folders in ‘user’ home dir, and any file/folder starting with .Spotlight, .Trash and .DS_Store files in ANY subfolders.

rsync_selections

+ Users/
+ Users/user/
+ Users/user/Documents/
+ Users/user/Pictures/
+ Users/user/Movies/
+ Users/user/Music/
- .Spotlight*
- .Trash*
- .DS_Store
- Users/user/Documents/VMs/
- Users/user/Movies/Season*/
- Users/user/*
- Users/*
- /*

/etc/cron.d/rsnapshot
This is the CRON that executes the backup jobs.
The ‘less frequent’ job needs to run before the ‘most frequent’. I’ve explained this later in this post, however the reason is that the actual active sync happens JUST in the most frequent job, and the others are just rotations made with a ‘mv’ command. So, it’s important to make the rotation BEFORE the sync.

###############
# >>> MAC <<< #
###############
# set to run only weekly at 10:30 am on Monday
30 10 * * 1 user /usr/bin/rsnapshot -c /etc/rsnapshot_configs/mac/mac_rsnap.conf weekly
# Monthly rotation at 10:00 am (1st every month)
0 10 1 * * user /usr/bin/rsnapshot -c /etc/rsnapshot_configs/mac/mac_rsnap.conf monthly
###############
# >>> PI <<< #
###############
# Daily 9:30am
30 9 * * * root /usr/bin/rsnapshot -c /etc/rsnapshot_configs/pi_rsnap.conf daily
# Weekly 9:05am (Sunday)
5 9 * * 7 root /usr/bin/rsnapshot -c /etc/rsnapshot_configs/pi_rsnap.conf weekly
# Monthly 9:00am (1st every month)
0 9 1 * * root /usr/bin/rsnapshot -c /etc/rsnapshot_configs/pi_rsnap.conf monthly

Folders created:

/USB/                               [mount point of my external USB drive]
/USB/backups/                       [subfolder to keep all the backups]
/USB/backups/pi/                    [folder for 'pi' box]
/USB/backups/mac/                   [folder for 'mac']
/etc/rsnapshot_configs/             [where I keep all the conf files]
/var/log/rsnapshot/                 [log files - chmod 1777*]
/USB/backups/rsnapshot_run/         [dir for jobs' pids - chmod 1777*]

*Use chmod 1777 on logs and run folders if you want other users than root to run the backups and write log files.


Let’s clarify some bits and pieces

sync_first 1

To be sure to properly complete the first full backup, enable  sync_first setting this to 1. Once completed, remove/comment it out.
To execute the first sync, run the following:

rsnapshot -c my_rsnapshot.conf sync

Basically, run the sync as many times you want… and once you have finished, you will start invoking (with CRON) the daily, weekly, monthly… etc backups. REMEMBER to disable it once finished, otherwise you won’t actually run any sync!

TABs no spaces!

IMPORTANT: do NOT use spaces in the rsnapshot configuration files but only TABS!!!
Copy and paste might change tabs to spaces so be sure to review all your configs. Use the -t flag to test every time if syntax is correct.

Test your configuration (-t)

rsnapshot -t -c my_rsnapshot.conf <sync|daily|weekly... >

The -t will also display exactly the command that it’s going to be executed – very handy! 🙂

Remote backups

Another thing to keep in mind is that ‘REMOTE’ backups (whatever uses user@host …) are actually launching the command on the remote host so it’s required to have rsync installed on the remote machine too (and rdiff-backup if used too). Versions should also match. If not, at least rsync should be version >= 3.
To allow this to work on my Mac, for instance, I had to install “rdiff-backup” and install a newer version of “rsync”, as the default version is 2.6.x. I’ve used the Rudix packages. Easy easy 🙂

Retain daily/weekly/monthly… sync… wtf?!

Very important to understand about rsnapshot that made me kinda mad for few hours: the job that DOES the backup is the one on the top of the list (most frequent).
So, if you have daily, weekly, monthly… set as ‘retain’ parameters in the rsnapshot conf file, the one that does actually the copy of the files is ‘daily‘ (top of the list – most frequent). The other ones are JUST some sort of rotation of the folder tree. Literally a ‘mv’ command… that’s it. You can verify this using -t flag to see the commands.
So, don’t get confused 🙂

So, to summarise:

  • sync: first initial backup – handy especially to create the initial backup. This creates a .sync folder in snapshot_root.
  • daily: this is the one that does the copy (or the ‘most frequent’ backup set – in mac for example, I set that to be ‘weekly’ and ‘monthly’ only, so in that case, weekly is the most frequent backup set and it’s the one that does the sync
  • weekly/monthly… (less frequent backups): these are simply ‘mv’ commands.

To explain more in details… the flow of my Mac…
You run the first sync (as many times as you want), with ‘sync_first‘ enabled.

rsnapshot -c my_rsnapshot.conf sync

This creates the backup in /USB/backups/mac/.sync/
Than you run the crons. Weekly will be the first to run:

rsnapshot -c my_rsnapshot.conf weekly

This will actually run this move, creating the first weekly folder:

mv /USB/backups/mac/.sync/ /USB/backups/mac/weekly.0/

Than, DISABLE ‘sync_first’ and the next time the weekly cron will be executed, something like that will run, moving the weekly.0 to weekly.1, hard linking the identical files and sync’ing the ones that have been changed since:

mv /USB/backups/mac/weekly.0/ /USB/backups/mac/weekly.1/
/usr/bin/rsync -a --delete --numeric-ids --relative --delete-excluded \
    --link-dest=/USB/backups/mac/weekly.1/files/ /home/ \
    /USB/backups/mac/weekly.0/files/
[...]

Then, next time, weekly.2 and weekly.3 will be created: same method.
Until the LAST backup is created (#3, in this case -> 4 retention – from 0 to 3), the monthly job won’t take any affects.
Once we have /USB/backups/mac/weekly.3/, and this will be executed…

rsnapshot -c my_rsnapshot.conf monthly

… this will be executed:
mv /USB/backups/mac/weekly.3/ /USB/backups/mac/monthly.0/

And so and so…

Little note, keeping the above example. You might start this backup in the middle of month, so at the end of the month you won’t have reached the 4th weekly backup sets, but just the 2nd (#0 and #1). So.. what happens with the ‘monthly’ one that will run on the 1st of the month?
Answer: nothing.
Basically, this time the monthly backup will skip as the previous max retention limit is not reached yet. Weekly backups will continue rotating within themselves.
The first week of the second month, weekly backup will reach #2 (third backup). #1 => #2, #0 =>  #1 and the new backup stored in #0.
Second week #3 (4th and last). #2 => #3, #1 => #2, #0 =>  #1 and the new backup stored in #0. The #3 (oldest) should be the one that rotates… but the monthly cron won’t be executed until the next month. But there’s nothing to be worried about. Next weekly run, on the third week, the #3 will be marked for deletion, and a new #0 will be created.  Same for the forth week. Oldest backup deleted, max limit reached.
And here, we will get into the new month, where the monthly backup will be called BEFORE the weekly one, and it will rotate weekly.3 in monthly.0, and the weekly (#3 => monthly#0, #2 => #3, #1 => #2) freeing up ‘one space’ (#0). This will be filled up from the next ‘weekly’ run, and all will be ‘in sync’ for the next months. 🙂

I hope this example clarifies. 🙂

NOTE:
If you are decide, one day, to move your backup from one disk to another one, MAKE SURE to rsync preserving the hard links, otherwise your backup will raise like a cake in the oven! 🙂

Here a sample command:

rsync -az -H --delete --numeric-ids /path/to/source server2:/path/to/dest/