MFH Lab : Linux Backup and Recovery

Use rsync for File Level Backups

rsync is a great utility for syncing directories of files either locally or to another server. rsync has been used for years by System Administrators, hence it is very refined for the purpose of backing up data. In the author’s opinion, one of the best features of sync is its ability to be scripted from the command line.

In this tutorial, we will discuss rsync in various ways −

  • Explore and talk about some common options
  • Create local backups
  • Create remote backups over SSH
  • Restore local backups

rsync is named for its purpose: Remote Sync and is both powerful and flexible in use.

Following is a basic rsync remote backup over ssh −

MiNi:~ rdc$ rsync -aAvz --progress ./Desktop/ImportantStuff/ 
rdc@ Documents/RemoteStuff/
rdc@'s password:
sending incremental file list
   6,148 100%    0.00kB/s    0:00:00 (xfr#1, to-chk=23/25)
2017-02-14 16_26_47-002 - Veeam_Architecture001.png
   33,144 100%   31.61MB/s    0:00:00 (xfr#2, to-chk=22/25)
A Guide to the WordPress REST API | Toptal.pdf
   892,406 100%   25.03MB/s    0:00:00 (xfr#3, to-chk=21/25)
Rick Cardon Technologies, LLC..webloc
   77 100%    2.21kB/s    0:00:00 (xfr#4, to-chk=20/25)
   43,188,224   1%    4.26MB/s    0:08:29
sent 2,318,683,608 bytes  received 446 bytes  7,302,941.90 bytes/sec
total size is 2,327,091,863  speedup is 1.00
MiNi:~ rdc$

The following sync sent nearly 2.3GB of data across our LAN. The beauty of rsync is it works incrementally at the block level on a file-by-file basis. This means, if we change just two characters in a 1MB text file, only one or two blocks will be transferred across the lan on the next sync!

Furthermore, the incremental function can be disabled in favor of more network bandwidth used for less CPU utilization. This might prove advisable if constantly copying several 10MB database files every 10 minutes on a 1Gb dedicated Backup-Lan. The reasoning is: these will always be changing and will be transmitting incrementally every 10 minutes and may tax load of the remote CPU. Since the total transfer load will not exceed 5 minutes, we may just wish to sync the database files in their entirety.

Following are the most common switches with rsync −

rsync syntax:
rsync [options] [local path] [[remote host:remote path] or [target path
-aArchive mode and assumes -r, -p, -t, -g, -l
-dSync only directory tree, no files
-rRecursive into directory
-lCopy symlinks as symlinks
-pPreserve permissions
-gPreserve group
-vVerbose output
-zCompress over network link
-XPreserve extended attributes
-APreserve ACLs
-tPreserve timestamps
-WTransfer whole file, not incremental blocks
-uDo not overwrite files on target
–progressShow transfer progress
–deleteDelete older files on target
–max-size = XXXMax file size to sync

When to use rsync

My personal preference for rsync is when backing up files from a source host to a target host. For example, all the home directories for data recovery or even offsite and into the cloud for disaster recovery.

Local Backup With rsync

We have already seen how to transfer files from one host to another. The same method can be used to sync directories and files locally.

Let’s make a manual incremental backup of /etc/ in our root user’s directory.

First, we need to create a directory off ~/root for the synced backup −

[root@localhost rdc]# mkdir /root/etc_baks

Then, assure there is enough free disk-space.

[root@localhost rdc]# du -h --summarize /etc/ 
49M    /etc/
[root@localhost rdc]# df -h 
Filesystem           Size     Used     Avail    Use%     Mounted on 
/dev/mapper/cl-root   43G      15G        28G    35%         /

We are good for syncing our entire /etc/ directory −

rsync -aAvr /etc/ /root/etc_baks/

Our synced /etc/ directory −

[root@localhost etc_baks]# ls -l ./
total 1436
drwxr-xr-x.   3 root root      101 Feb  1 19:40 abrt
-rw-r--r--.   1 root root       16 Feb  1 19:51 adjtime
-rw-r--r--.   1 root root     1518 Jun  7  2013 aliases
-rw-r--r--.   1 root root    12288 Feb 27 19:06 aliases.db
drwxr-xr-x.   2 root root       51 Feb  1 19:41 alsa
drwxr-xr-x.   2 root root     4096 Feb 27 17:11 alternatives
-rw-------.   1 root root      541 Mar 31  2016 anacrontab
-rw-r--r--.   1 root root       55 Nov  4 12:29 asound.conf
-rw-r--r--.   1 root root        1 Nov  5 14:16 at.deny
drwxr-xr-x.   2 root root       32 Feb  1 19:40 at-spi2
--{ condensed output }--

Now let’s do an incremental rsync −

[root@localhost etc_baks]# rsync -aAvr --progress  /etc/ /root/etc_baks/
sending incremental file list

   0 100%    0.00kB/s    0:00:00 (xfer#1, to-check=1145/1282)
sent 204620 bytes  received 2321 bytes  413882.00 bytes/sec
total size is 80245040  speedup is 387.77

[root@localhost etc_baks]#

Only our test_incremental.txt file was copied.z

Remote Differential Backups With rsync

Let’s do our initial rsync full backup onto a server with a backup plan deployed. This example is actually backing up a folder on a Mac OS X Workstation to a CentOS server. Another great aspect of rsync is that it can be used on any platform rsync has been ported to.

MiNi:~ rdc$ rsync -aAvz Desktop/ImportanStuff/
rdc@'s password:
sending incremental file list
A Guide to the WordPress REST API | Toptal.pdf
Rick Cardon Tech LLC.webloc
sent 2,318,281,023 bytes  received 336 bytes  9,720,257.27 bytes/sec
total size is 2,326,636,892  speedup is 1.00
MiNi:~ rdc$

We have now backed up a folder from a workstation onto a server running a RAID6 volume with rotated disaster recovery media stored offsite. Using rsync has given us standard 3-2-1 backup with only one server having an expensive redundant disk array and rotated differential backups.

Now let’s do another backup of the same folder using rsync after a single new file named test_file.txt has been added.

MiNi:~ rdc$ rsync -aAvz Desktop/ImportanStuff/
rdc@'s password:  
sending incremental file list 

sent 814 bytes  received 61 bytes  134.62 bytes/sec
total size is 2,326,636,910  speedup is 2,659,013.61
MiNi:~ rdc$

As you can see, only the new file was delivered to the server via rsync. The differential comparison was made on a file-by-file basis.

A few things to note are: This only copies the new file: test_file.txt, since it was the only file with changes. rsync uses ssh. We did not ever need to use our root account on either machine.

Simple, powerful and effective, rsync is great for backing up entire folders and directory structures. However, rsync by itself doesn’t automate the process. This is where we need to dig into our toolbox and find the best, small, and simple tool for the job.

To automate rsync backups with cronjobs, it is essential that SSH users be set up using SSH keys for authentication. This combined with cronjobs enables rsync to be done automatically at timed intervals.

Use DD for Block-by-Block Bare Metal Recovery Images

DD is a Linux utility that has been around since the dawn of the Linux kernel meeting the GNU Utilities.

dd in simplest terms copies an image of a selected disk area. Then provides the ability to copy selected blocks of a physical disk. So unless you have backups, once dd writes over a disk, all blocks are replaced. Loss of previous data exceeds the recovery capabilities for even highly priced professional-level data-recovery.

The entire process for making a bootable system image with dd is as follows −

  • Boot from the CentOS server with a bootable linux distribution
  • Find the designation of the bootable disk to be imaged
  • Decide location where the recovery image will be stored
  • Find the block size used on your disk
  • Start the dd image operation

In this tutorial, for the sake of time and simplicity, we will be creating an ISO image of the master-boot record from a CentOS virtual machine. We will then store this image offsite. In case our MBR becomes corrupted and needs to be restored, the same process can be applied to an entire bootable disk or partition. However, the time and disk space needed really goes a little overboard for this tutorial.

It is encouraged for CentOS admins to become proficient in restoring a fully bootable disk/partition in a test environment and perform a bare metal restore. This will take a lot of pressure off when eventually one needs to complete the practice in a real life situation with Managers and a few dozen end-users counting downtime. In such a case, 10 minutes of figuring things out can seem like an eternity and make one sweat.

Note − When using dd make sure to NOT confuse source and target volumes. You can destroy data and bootable servers by copying your backup location to a boot drive. Or possibly worse destroy data forever by copying over data at a very low level with DD.

Following are the common command line switches and parameters for dd −

if=In file or source to be copied
of=Out file or the copy of the in file
bsSet both input and output block size
obsSet output file block size
ibsSet input file block size
countSet the number of blocks to copy
convExtra options to add for imaging
NoerrorDo not stop processing an error
syncPads unfitted input blocks in the event of error or misalignment

Note on block size − The default block size for dd is 512 bytes. This was the standard block size of lower density hard disk drives. Today’s higher density HDDs have increased to 4096 byte (4kB) block sizes to allow for disks ranging from 1TB and larger. Thus, we will want to check disk block size before using dd with newer, higher capacity hard disks.

For this tutorial, instead of working on a production server with dd, we will be using a CentOS installation running in VMWare. We will also configure VMWare to boot a bootable Linux ISO image instead of working with a bootable USB Stick.

First, we will need to download the CentOS image entitled: CentOS Gnome ISO. This is almost 3GB and it is advised to always keep a copy for creating bootable USB thumb-drives and booting into virtual server installations for trouble-shooting and bare metal images.

Other bootable Linux distros will work just as well. Linux Mint can be used for bootable ISOs as it has great hardware support and polished GUI disk tools for maintenance.

CentOS GNOME Live bootable image can be downloaded from:

Let’s configure our VMWare Workstation installation to boot from our Linux bootable image. The steps are for VMWare on OS X. However, they are similar across VMWare Workstation on Linux, Windows, and even Virtual Box.

Note − Using a virtual desktop solution like Virtual Box or VMWare Workstation is a great way to set up lab scenarios for learning CentOS Administration tasks. It provides the ability to install several CentOS installations, practically no hardware configuration letting the person focus on administration, and even save the server state before making changes.

First let’s configure a virtual cd-rom and attach our ISO image to boot instead of the virtual CentOS server installation −

ISO image

Now, set the startup disk −

Startup Disk

Now when booted, our virtual machine will boot from the CentOS bootable ISO image and allow access to files on the Virtual CentOS server that was previously configured.

Let’s check our disks to see where we want to copy the MBR from (condensed output is as follows).

MiNt ~ # fdisk -l
Disk /dev/sda: 60 GiB, 21474836480 bytes, 41943040 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/sdb: 20 GiB, 21474836480 bytes, 41943040 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

We have located both our physical disks: sda and sdb. Each has a block size of 512 bytes. So, we will now run the dd command to copy the first 512 bytes for our MBR on SDA1.

The best way to do this is −

[root@mint rdc]# dd if=/dev/sda bs=512 count=1  | gzip -c >
1+0 records in 
1+0 records out 
512 bytes copied, 0.000171388 s, 3.0 MB/s

[root@mint rdc]# ls /mnt/sdb/ 
[root@mint rdc]#

Just like that, we have full image of out master boot record. If we have enough room to image the boot drive, we could just as easily make a full system boot image −

dd if=/dev/INPUT/DEVICE-NAME-HERE conv=sync,noerror bs=4K | gzip -c >

The conv=sync is used when bytes must be aligned for a physical medium. In this case, dd may get an error if exact 4K alignments are not read (say… a file that is only 3K but needs to take minimum of a single 4K block on disk. Or, there is simply an error reading and the file cannot be read by dd.). Thus, dd with conv=sync,noerror will pad the 3K with trivial, but useful data to physical medium in 4K block alignments. While not presenting an error that may end a large operation.

When working with data from disks we always want to include: conv=sync,noerror parameter.

This is simply because the disks are not streams like TCP data. They are made up of blocks aligned to a certain size. For example, if we have 512 byte blocks, a file of only 300 bytes still needs a full 512 bytes of disk-space (possibly 2 blocks for inode information like permissions and other filesystem information).

Use gzip and tar for Secure Storage

gzip and tar are two utilities a CentOS administrator must become accustomed to using. They are used for a lot more than to simply decompress archives.

Using Gnu Tar in CentOS Linux

Tar is an archiving utility similar to winrar on Windows. Its name Tape Archive abbreviated as tar pretty much sums up the utility. tar will take files and place them into an archive for logical convenience. Hence, instead of the dozens of files stored in /etc. we could just “tar” them up into an archive for backup and storage convenience.

tar has been the standard for storing archived files on Unix and Linux for many years. Hence, using tar along with gzip or bzip is considered as a best practice for archives on each system.

Following is a list of common command line switches and options used with tar −

-cCreates a new .tar archive
-CExtracts to a different directory
-jUses bzip2 compression
-zUses gzip compression
-vVerbose show archiving progress
-tLists archive contents
-fFile name of the archive
-xExtracts tar archive

Following is the basic syntax for creating a tar archive.

tar -cvf [tar archive name]

Note on Compression mechanisms with tar − It is advised to stick with one of two common compression schemes when using tar: gzip and bzip2. gzip files consume less CPU resources but are usually larger in size. While bzip2 will take longer to compress, they utilize more CPU resources; but will result in a smaller end filesize.

When using file compression, we will always want to use standard file extensions letting everyone including ourselves know (versus guess by trial and error) what compression scheme is needed to extract archives.


When needing to possibly extract archives on a Windows box or for use on Windows, it is advised to use the .tar.tbz or .tar.gz as most the three character single extensions will confuse Windows and Windows only Administrators (however, that is sometimes the desired outcome)

Let’s create a gzipped tar archive from our remote backups copied from the Mac Workstation −

[rdc@mint Documents]$ tar -cvz -f RemoteStuff.tgz ./RemoteStuff/ 
./RemoteStuff/A Guide to the WordPress REST API | Toptal.pdf
./RemoteStuff/Rick Cardon Tech LLC.webloc
[rdc@mint Documents]$ ls -ld RemoteStuff.tgz
-rw-rw-r--. 1 rdc rdc 2317140451 Mar 12 06:10 RemoteStuff.tgz

Note − Instead of adding all the files directly to the archive, we archived the entire folder RemoteStuff. This is the easiest method. Simply because when extracted, the entire directory RemoteStuff is extracted with all the files inside the current working directory as ./currentWorkingDirectory/RemoteStuff/

Now let’s extract the archive inside the /root/ home directory.

[root@centos ~]# tar -zxvf RemoteStuff.tgz
./RemoteStuff/A Guide to the WordPress REST API | Toptal.pdf
./RemoteStuff/Rick Cardon Tech LLC.webloc
[root@mint ~]# ping

As seen above, all the files were simply extracted into the containing directory within our current working directory.

[root@centos ~]# ls -l 
total 2262872 
-rw-------.   1   root   root       1752   Feb   1   19:52   anaconda-ks.cfg 
drwxr-xr-x. 137   root   root       8192   Mar   9   04:42   etc_baks 
-rw-r--r--.   1   root   root       1800   Feb   2   03:14   initial-setup-ks.cfg 
drwxr-xr-x.   6   rdc    rdc        4096   Mar  10   22:20   RemoteStuff 
-rw-r--r--.   1   root   root 2317140451   Mar  12   07:12   RemoteStuff.tgz 
-rw-r--r--.   1   root   root       9446   Feb  25   05:09   ssl.conf [root@centos ~]#

Use gzip to Compress File Backups

As noted earlier, we can use either bzip2 or gzip from tar with the -j or -z command line switches. We can also use gzip to compress individual files. However, using bzip or gzip alone does not offer as many features as when combined with tar.

When using gzip, the default action is to remove the original files, replacing each with a compressed version adding the .gz extension.

Some common command line switches for gzip are −

-cKeeps files after placing into the archive
-lGet statistics for the compressed archive
-rRecursively compresses files in the directories
-1 thru 9Specifies the compression level on a scale of 1 thru 9

gzip more or less works on a file-by-file basis and not on an archive basis like some Windows O/S zip utilities. The main reason for this is that tar already provides advanced archiving features. gzip is designed to provide only a compression mechanism.

Hence, when thinking of gzip, think of a single file. When thinking of multiple files, think of tar archives. Let’s now explore this with our previous tar archive.

Note − Seasoned Linux professionals will often refer to a tarred archive as a tarball.

Let’s make another tar archive from our rsync backup.

[root@centos Documents]# tar -cvf RemoteStuff.tar ./RemoteStuff/
[root@centos Documents]# ls
RemoteStuff.tar RemoteStuff/

For demonstration purposes, let’s gzip the newly created tarball, and tell gzip to keep the old file. By default, without the -c option, gzip will replace the entire tar archive with a .gz file.

[root@centos Documents]# gzip -c RemoteStuff.tar > RemoteStuff.tar.gz
[root@centos Documents]# ls
RemoteStuff  RemoteStuff.tar  RemoteStuff.tar.gz
We now have our original directory, our tarred directory and finally our gziped tarball.

Let’s try to test the -l switch with gzip.

[root@centos Documents]# gzip -l RemoteStuff.tar.gz  
     compressed        uncompressed        ratio uncompressed_name 
     2317140467          2326661120        0.4% RemoteStuff.tar
[root@centos Documents]#

To demonstrate how gzip differs from Windows Zip Utilities, let’s run gzip on a folder of text files.

[root@centos Documents]# ls text_files/
 file1.txt  file2.txt  file3.txt  file4.txt  file5.txt
[root@centos Documents]#

Now let’s use the -r option to recursively compress all the text files in the directory.

[root@centos Documents]# gzip -9 -r text_files/

[root@centos Documents]# ls ./text_files/
file1.txt.gz  file2.txt.gz  file3.txt.gz  file4.txt.gz  file5.txt.gz
[root@centos Documents]#

See? Not what some may have anticipated. All the original text files were removed and each was compressed individually. Because of this behavior, it is best to think of gzip alone when needing to work in single files.

Working with tarballs, let’s extract our rsynced tarball into a new directory.

[root@centos Documents]# tar -C /tmp -zxvf RemoteStuff.tar.gz

As seen above, we extracted and decompressed our tarball into the /tmp directory.

[root@centos Documents]# ls /tmp 

Encrypt TarBall Archives

Encrypting tarball archives for storing secure documents that may need to be accessed by other employees of the organization, in case of disaster recovery, can be a tricky concept. There are basically three ways to do this: either use GnuPG, or use openssl, or use a third part utility.

GnuPG is primarily designed for asymmetric encryption and has an identity-association in mind rather than a passphrase. True, it can be used with symmetrical encryption, but this is not the main strength of GnuPG. Thus, I would discount GnuPG for storing archives with physical security when more people than the original person may need access (like maybe a corporate manager who wants to protect against an Administrator holding all the keys to the kingdom as leverage).

Openssl like GnuPG can do what we want and ships with CentOS. But again, is not specifically designed to do what we want and encryption has been questioned in the security community.

Our choice is a utility called 7zip. 7zip is a compression utility like gzip but with many more features. Like Gnu Gzip, 7zip and its standards are in the open-source community. We just need to install 7zip from our EHEL Repository (the next chapter will cover installing the Extended Enterprise Repositories in detail).

Install 7zip on Centos

7zip is a simple install once our EHEL repositories have been loaded and configured in CentOS.

[root@centos Documents]# yum -y install p7zip.x86_64 p7zip-plugins.x86_64
Loaded plugins: fastestmirror, langpacks
| 3.6 kB  00:00:00
|  13 kB  00:00:00
| 4.3 kB  00:00:00
| 3.4 kB  00:00:00
| 3.4 kB  00:00:00
(1/2): epel/x86_64/updateinfo
| 756 kB  00:00:04      
| 4.6 MB  00:00:18
Loading mirror speeds from cached hostfile
--> Running transaction check
---> Package p7zip.x86_64 0:16.02-2.el7 will be installed
---> Package p7zip-plugins.x86_64 0:16.02-2.el7 will be installed
--> Finished Dependency Resolution
Dependencies Resolved

Simple as that, 7zip is installed and ready be used with 256-bit AES encryption for our tarball archives.

Now let’s use 7z to encrypt our gzipped archive with a password. The syntax for doing so is pretty simple −

7z a -p <output filename><input filename>

Where, a: add to archive, and -p: encrypt and prompt for passphrase

[root@centos Documents]# 7z a -p RemoteStuff.tgz.7z RemoteStuff.tar.gz

7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,1 CPU Intel(R)
Core(TM) i5-4278U CPU @ 2.60GHz (40651),ASM,AES-NI)
Scanning the drive:
1 file, 2317140467 bytes (2210 MiB)

Creating archive: RemoteStuff.tgz.7z

Items to compress: 1

Enter password (will not be echoed):
Verify password (will not be echoed) :

Files read from disk: 1
Archive size: 2280453410 bytes (2175 MiB)
Everything is Ok
[root@centos Documents]# ls
RemoteStuff  RemoteStuff.tar  RemoteStuff.tar.gz  RemoteStuff.tgz.7z  slapD

[root@centos Documents]#

Now, we have our .7z archive that encrypts the gzipped tarball with 256-bit AES.

Note − 7zip uses AES 256-bit encryption with an SHA-256 hash of the password and counter, repeated up to 512K times for key derivation. This should be secure enough if a complex key is used.

The process of encrypting and recompressing the archive further can take some time with larger archives.

7zip is an advanced offering with more features than gzip or bzip2. However, it is not as standard with CentOS or amongst the Linux world. Thus, the other utilities should be used often as possible.