mbackup: how to store 100TB of backups on a 2TB disk

Backup software. Some of it is free, some is good, some is fast, some has lots of features. I could never find the one that combined everything I wanted so I used to write my own. The script was called mbackup and it had everything I wanted, no more, no less.

And then I lost it.

Now I resurrected it because it can be very useful and several people asked me for something like it.

Jargon
To explain what is special about mbackup, let me first elaborate a bit on backup jargon.

The archive bit: this is a file attribute (like the ‘hidden’ bit and the ‘system’ bit) that the filesystem applies to a file which tells backup software if a file has changed since the last backup. If it is zero, the file has not changed or is not newly created. When the file is changed Windows toggles the bit so it is set to one and backup software knows it must create a backup of the file. The backup software then sets the archive bit back to zero to indicate is was backed up.

There are several backup types.

Full backup: a copy of all files that have their archive bit set to ‘ready for archival’. After backing up a file, its archive attribute is set to zero (=’ready for archival’ not checked).

Copy: a full backup but ignores the archival bit. Look at it as an out of cycle, an extra backup. Also a copy does not change the archive bit.

Incremental: a backup of files that were created or changed (=they had their archive bit set to one) after the last backup (either full, incremental or differential). Sets the archive bit back to zero.

Differential: a backup of files that were created or changed after the last full backup. Necessarily, a differential backup honours  but does not change the archive bit.

A backup system usually combines the above backup types. The most common example is this one:

Week 1
Monday: full backup
Tuesday: incremental backup
Wednesday: incremental backup
Thursday: incremental backup
Friday: incremental backup, take Monday’s tape (disk, stick, whatever) home.

Week 2
Monday: full backup
Tuesday: incremental backup
Wednesday: incremental backup
Thursday: incremental backup
Friday: incremental backup, take Monday’s tape (disk, stick, whatever) home.

Week 3
Monday: full backup
Tuesday: incremental backup
Wednesday: incremental backup
Thursday: incremental backup
Friday: incremental backup, take Monday’s tape (disk, stick, whatever) home.

Week 4
Monday: full backup
Tuesday: incremental backup
Wednesday: incremental backup
Thursday: incremental backup
Friday: incremental backup, take Monday’s tape (disk, stick, whatever) home and store it for archival, bring the other three Monday tapes back to overwrite.

Restoring
Restoring a full backup or a copy is easy: just copy all the files back to their original locations. However it can take a long time to create a full backup.

  • An incremental backup is quickly made however to restore it you must first restore the full backup and then all subsequent incremental backups.
  • Differential backups combine those two a bit: they’re quicker to make than full backups and you need to only ever restore one differential backup after restoring the full backup.

I had a lot of data and backup tapes weren’t very cheap so I used usb drives to store my backups. Harddrives back then weren’t very large however…

The problem with third party software
I tried several commercial and free backup software packages but none were simple to use, quick to make a backup and easy to restore. Some got close but my objection to them was that I needed to install the ‘third party’ software, store the installation files and hope that the software would still be around when the backup would ever be needed.

Some backup packages create huge databases in which they store everything. I like to be able to verify my backups without using the software that created it because you don’t know in which circumstances you will need to perform the restore.

Also, I like to be able to quickly and easily search through my backups.

The solution
I wanted a system that combined these features:

  • the easy to restore property of full backups
  • the quick to make property of incremental backups
  • uses no third party software
  • can store a lot of backups in a limited space

 

Hardlinks
Mbackup fulfills those requirements by using hardlinks. Hardlinks are the Windows equivalent of the UNIX symlink. The way the NTFS filesystem is designed allows for powerfull trick. Among others, NTFS has a data layer and a presentation layer. The data layer contains – you guessed it – the data. It contains the files. The presentation layer is what we, the users, get to see. A file exists in the data layer and the presentation layer provides us with a pointer to the physical file.

There is nothing preventing us to create a second (or third, fourth, etc.) pointer to the same file from different locations in the presentation layer! Such new pointers to ‘existing’ (that is, they already have a pointer to them) files are called hardlinks.

If you make a hardlink the file, for all intents and purposes, exists in both locations. If you delete one, you really only delete the pointer from the NTFS presentation layer and so the file in the data layer and the first pointer are left alone: if you delete one, the other file will still exist. If you change it however the ‘other’ file changes along with it – it is stored only once and therefor hardlinks do no require extra disk space.

On a side note: deleting a file from Windows removes the NTFS presentation layer pointer to a file, not the file itself (unless the partition is on an SSD drive, but that’s a different tale…) which is why we are able to use undelete tools.

This is how mbackup works: the first time it creates a full backup. Subsequent runs first duplicate the folder structure but not the files and then creates hardlinks of all files from the previous backup (provided they still exist in the source directory). Then the new backup is updated with new and changed files from the source.

The result is a backup containing folders of full backups at the cost of one full backup plus incrementals.

If you have a disk containing 1TB of data and you change 50MB of data on every day, you can store ten year’s worth of FULL backups on a 2TB disk and have room to spare for a couple of zombie films.

One more thing I built into mbackup. My users accessed their files from a mapped network share. You know users: they will reach any limit eventually. Thus they regularly reached the Windows maximum path length limit: 255 characters. They tended to make files called something like

P:\325.004\Project\projects\documentation of new version\laurent’s second verified version\sent to customer\yesterday\backup\old\Concerning the creation of filenames on Windows NTFS filesystems using a typewriter and rather creative mind – version 12.pdf

Now on the client computer that might be ok but if the physical location on the server of the client’s P:-drive is D:\User data\common\ the file is 17 characters over the maximum path lenght limit and xcopy will choke on a ‘memory error’. Other (mostly free) software have this problem as well.

My workaround is using the subst command, Windows’ equivalent to UNIX’s mount command. With subst you can have a volume X: (for example) ‘mounted’ to a path D:\User data\common\ (again, just an example). From X: the path character count is then reset.

To put it a bit more formal:

  1. Mount the source, target and previous directories to temporary volumes letters (subst).
  2. If no backup exists yet, create a regular backup, then exit (xcopy).
  3. Duplicate the previous backup’s folder structure without the files (xcopy).
  4. Create hardlinks FROM all the previous backup’s files TO the new file structure if that file still exists in the source directory (mklink /h or fsutil).
  5. Update the new backup from the source directory (xcopy)

 

Ok, here comes the script. I refrained from using third party components but you could easily upgrade it with:

 

My original implementation even wrote output to a web interface. Its easy: have the script log errors (check %errorlevel% after important commands), analyse the log file and create an html file from it, then upload the file to a webserver.

Note that I will probably change bits and pieces of the script in the next couple of days.

I encourage you to modify the script for yourself, especially the xcopy parameters (type xcopy /?).

:: Copyright Vorkbaard, 2013-2014

:: This program  is free software: you can redistribute it and/or modify it
:: under the terms of the GNU General Public License as published by the
:: Free Software Foundation, either version 3 of the License, or (at your
:: option) any later version.

:: This program is distributed in the hope that it will be useful, but WITHOUT
:: ANY WARRANTY;  without even the implied warranty of MERCHANTABILITY or FIT-
:: NESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more
:: details.

:: You should have received a copy of the GNU General Public License along with
:: this program. If not, see .

:: Contact: gmail@vorkbaard.nl


:: Version: 2014-09-16

:: ======================================================== ABOUT THE SCRIPT ==

:: This script creates backups for disaster recovery and archival. If no backup
:: exists in the designated  target directory, it will create  a full backup of
:: the designated source directory. The next time the  script is ran, it dupli-
:: cates the  folder structure of the  previous backup and fills  it with hard-
:: links of the  previous  backup,  then adds files that  were created  or
:: changed since the last backup.

:: The result is a backup system that  combines the advantages of  full and in-
:: cremental backups:
:: - it is almost as fast as incremental backups;
:: - it takes as little diskspace as incremental backups;
:: - it is as easily restorable as a full backup.

:: Furthermore:
:: - it uses only native Windows tools;
:: - all backups are instantly accessible and searchable;
:: - it works around the Windows maximum path lenght on mapped shares issue.

:: Note that  if a backup went  wrong you should delete  its folder because the
:: most  recent  backup folder is  the basis  for the  next backup.  This  is a
:: weakness in the system; it is inherent to all incremental backup systems.

:: To work  around this weakness  you could change the target  directory backup
:: name once in a  while, forcing  the script to begin  anew and not base a new
:: backup upon another.

:: Best practice: use three external drives for backups - one connected to your
:: computer, one off-site and one in your bag as you travel  between the sites.
:: This will allow for your office to burn down and you to get robbed and still
:: leave you with a complete backup with history.

:: ======================================================= RULES OF THE GAME ==

:: * Do not end directory names with backspaces.
:: * Do not manually create extra directories in the backup directory.
:: * Because the script uses hardlinks  the target directory MUST BE ON A LOCAL
::   VOLUME. Usb drives are ok, mapped shares are not.
:: * mklink /h  does not  work in Windows XP  and perhaps also  not in 2000 and
::   2003. A workaround is to replace it by:
::               fsutil hardlink create
::   Just do a  search and  replace.  If you use fsutil  then you must  run the
::   script as local admin.
:: * Robocopy, if absent, can be found in the Windows Server 2003 Resource Kit.

@echo off
cls
Title mbackup

:: ============================================================== PARAMETERS ==

:: Temporary volume names that mbackup uses. Use volume
:: names that are not in use on this system.
	set SourceVolume=S:
	set TargetVolume=T:
	set PreviousVolume=P:

:: Where to store the logfile
	set Log=C:\

:: Set the system locale for timestamping
   set SystemLocale=English(US)
   ::set SystemLocale=Dutch

:: ======================================================= END OF PARAMETERS ==
	
:: Do a bit of cleaning in case a previous backup has gone
:: haywire. Suppress output so as not to confuse the user; use
:: scripted error handling instead.
	call :CleanUp > nul

:: Check if the designated volumes are really available
	set err=
	subst %SourceVolume% /d
	subst %TargetVolume% /d
	subst %PreviousVolume% /d
	
	if exist %SourceVolume% set err=1
	if exist %TargetVolume% set err=2
	if exist %PreviousVolume% set err=3
	if not [%err%]==[] goto :ExitErr

:: Redirect to maintenance tools
	if [%1]==[clean] (
	call :CleanUp
	goto :eof
	)
	
	if [%1]==[del] goto :DelBackup
	
:: ======================================================= HANDLE USER INPUT ==

:: source: location of the files to backup
	set source=%1
	:: remove double quotes
	set source=%source:"=%
	if not exist "%source%" (
	set err=8
	goto :ExitErr
	)

:: where to store the backups
	set target=%2
	:: Remove double quotes
	set target=%target:"=%
	if not exist %target% mkdir "%target%"
	)

Echo Backing up from: %source%
Echo Backing up to: %target%

:: =============================================================== TIMESTAMP ==

if %SystemLocale%==Dutch (
set day=%date:~8,2%
set month=%date:~5,2%
set year=%date:~0,4%
set hour=%time:~0,2%
set minute=%time:~3,2%
set second=%time:~6,2%
)

if %SystemLocale%==English(US) (
set day=%date:~7,2%
set month=%date:~4,2%
set year=%date:~10,4%
set hour=%time:~0,2%
set minute=%time:~3,2%
set second=%time:~6,2%
)

:: Here's how to create your own locale's date tag:
:: On your command line, type
:: echo %date%
:: Mine says: di 09-07-2013

:: The day is 09, of which '0' is the fourth character in
:: the %date% value so we need to skip the first three
:: characters (~3). The day part takes two places, so the
:: day would be:
:: set day=%date:~3,2%

:: The month is 07, of which '0' is the seventh character
:: in the %date% value so we need to skip the first fix
:: characters (~6). The month takes two places, so the month
:: would be:
:: set month=%date:~6,2%

:: The year is 2013, of which '2' is the tenth character in
:: the %date% value so we need to skip the first nine
:: characters (~9). The year takes four places, so the year
:: would be:
:: set year=%date:~9,4%

:: Hours, minutes and seconds are optional, they would be
:: handy if you make more than one backup per day. They work
:: about the same as the date but with the %time% variable.

:: Create the actual tag
	set timestamp=%year%-%month%-%day%-%hour%-%minute%-%second%
	:: Replace empty spaces with zeroes
	set timestamp=%timestamp: =0%

:: Add a timestamp to the logfile
	set Log="%Log%\mbackup-%timestamp%.log"

:: ====================================================== CREATE THE BACKUPS ==

:: Mount the source directory on a separate volume
	subst %SourceVolume% "%source%"

:: Find out if a previous backup exists
	set PrevBackupExists=false
	dir "%target%" /a:d|find /c "" > NrOfPrevBackups.tmp
	for /f "usebackq delims=*" %%i in (NrOfPrevBackups.tmp) do set NrOfPrevBackups=%%i
	del NrOfPrevBackups.tmp
	if %NrOfPrevBackups% GTR 2 goto :PrevBackupExists

:: ========================================================== INITIAL BACKUP ==
	Title mbackup - Creating initial backup
	call :CreateNewBackupDir
	
	:: Create a base backup.
	robocopy "%source%" "%target%\%timestamp%" *.* /E /B /COPYALL /XD DfsrPrivate /XF *.tmp /XF Thumbs.db /XJ /R:5 /W:1
	call :LogSize "%target%\%timestamp%"
	goto :CleanUp

:: ====================================================== SUBSEQUENT BACKUPS ==
	:PrevBackupExists
	Title mbackup - Creating subsequent backup - creating skeleton directory
	:: Get name of previous backup
		for /f "usebackq delims=*" %%i in (`dir "%target%" /o:d /a:d /b /t:c`) do (
		set PrevBackupDir=%target%\%%i)
	
	:: Mount the previous backup directory on a separate volume
		subst %PreviousVolume% "%PrevBackupDir%"

	:: Create new backup directory
		call :CreateNewBackupDir

	:: Create a copy of the directory structure
		robocopy "%PrevBackupDir%" "%target%\%timestamp%" /XF * /E /B /COPYALL /XD DfsrPrivate /XF *.tmp /XF Thumbs.db /XJ /R:5 /W:1

	:: Create hardlinks FROM the previous backup TO the new backup directory,
	:: but only if the file exists in the source directory
		Title mbackup - Creating subsequent backup - duplicating previous backup's files
		Echo Preparing to copy hardlinks. **This may take a while!**
		for /f "usebackq delims=*" %%i in (`dir %PreviousVolume%\*.* /s /b /a:-d`) do if exist "%SourceVolume%%%~pnxi" mklink /h "%TargetVolume%%%~pnxi" "%PreviousVolume%%%~pnxi"

	:: Update the new backup directory with new and changed files and reset the
	:: backup bit of the files
		Title mbackup - Creating subsequent backup - Copying new and changed files...
		robocopy %SourceVolume% %TargetVolume% /M /E /FP /XD DfsrPrivate /XF *.tmp /XF Thumbs.db /XJ /R:5 /W:1
		call :LogSize "%target%\%timestamp%"
		call :CleanUp
		goto :eof

:: ============================================================= SUBROUTINES ==

:: Clean up after the script has done its job
	:CleanUp
	if exist %TargetVolume% subst %TargetVolume% /d
	if exist %PreviousVolume% subst %PreviousVolume% /d
	if exist %SourceVolume% subst %SourceVolume% /d
	if not [%err%]==[] goto :ExitErr
	echo All seems to be in order.
	Title mbackup - Done
	exit /b 0

:: Create name for new backup directory and mount it on a separate volume
	:CreateNewBackupDir
	set NewBackupDir=%target%\%timestamp%
	mkdir "%NewBackupDir%"
	subst %TargetVolume% "%NewBackupDir%"
	goto :eof

:: Error handling
	:ExitErr
	if [%err%]==[1] echo Source volume already in use.
	if [%err%]==[2] echo Target volume already in use.
	if [%err%]==[3] echo Previous volume already in use.
	if [%err%]==[4] echo Something went wrong while creating the initial backup.
	if [%err%]==[5] echo Something went wrong while duplicating the directory structure.
	if [%err%]==[6] echo Something went wrong while duplicating the backup files.
	if [%err%]==[7] echo Something went wrong while updating the backup.
	if [%err%]==[8] echo Source directory does not exist.
	Title mbackup - Error
	exit /b %err%

:LogSize
	GetFolderSize "%target%\%timestamp%" "%tmp%\mbackupsize.tmp" GB
	echo  %timestamp% >> C:\Scripts\Size.log
	type "%tmp%\mbackupsize.tmp" >> C:\Scripts\Size.log
	echo.>> C:\Scripts\Size.log
	echo.>> C:\Scripts\Size.log
	del "%tmp%\mbackupsize.tmp"
	goto :eof
	
:: ======================================================= MAINTENANCE TOOLS ==
:DelBackup
Title mbackup - Deleting backup
:: mbackup del 
	:: Delete a backup gone wrong. Might be impossible with regular rmdir
	:: because of too long filenames.
	if not [%1]==[del] goto :EndDelBackup
	if [%2]==[] goto :EndDelBackup
	
	:: Remove double quotes
	set DelFolder=%2
	set DelFolder=%DelFolder:"=%
	
	if not exist "%DelFolder%" (
	echo "%DelFolder%" does not exist.
	goto :eof
	)
	
	echo =*=*=*=*=*= DELETING BACKUP FOLDER "%DelFolder%" =*=*=*=*=*=*=*=
	Pause

	:: Mount the directory in which to delete files and folders
	subst %TargetVolume% "%DelFolder%"

	:: Loop through the folders to delete
	for /f "usebackq delims=*" %%i in (`dir %TargetVolume%\ /a:d /b`) do rmdir "%TargetVolume%\%%i" /s /q

	:: Delete remaining files
	del /q %TargetVolume%\*.*

	:: Unmount directory
	subst %TargetVolume% /d

	:: Delete that day's backup folder
	rmdir "%2" /s /q

	echo Deleted backup folder "%target%\%2"
	Title mbackup - Done.
	goto :eof
	:EndDelBackup

:: ==== End of maintenance tools ==============================

Notice that in the course of a month I have used about 650 pyshical GB of data while the backups take up 10.6 TB.
mbackup-production1
All 25 folders contain a FULL BACKUP.
mbackup-production2

Since I’ve used robocopy I haven’t had any more issues with pathnames being too long. The drawback is that rmdir doesn’t suffice anymore so to be able to delete a backup folder I’ve had to create my own tool and integrated it into mbackup.

If you need to delete a backup folder and rmdir can’t do it, use

mbackup del "F:\Path to backups\backupfolder\2014-10-30"

Running

mbackup clean

removes mbackup leftovers in the form of subst’ed volumes.

2 Comments

  1. He Martin,

    Hartelijk dank voor al je zware denkwerk !!

    Ik maak al jaren lang naar volle tevredenheid gebruik van je script.
    En BLAT blijkt op kantoor ook een echte “liveSaver” te zijn.

    Ik ben op dit moment met een nieuwe implementatie bezig waarbij het er naar uit gaat zien dat ik naar een Debian of CentOS ISCSI target ga mbackuppen.

    Als je nog behoefte hebt aan je oude script dan geef je maar een gil.
    Ik heb nog een versie liggen uit de tijd dat je keuken nog open was.

    Hartelijke groet,
    André van Vlaanderen

    • Kapitein Vorkbaard

      Hoi André,
      Bedankt voor het compliment :) De jongste versie werkt met Robocopy. Robocopy zit tegenwoordig ingebakken in Windows (MS heeft ze overgenomen) en is eigenlijk xcopy on steriods. Robocopy heeft ook geen boodschap aan pathlengtes. Nadeel is wel dat het heel moeilijk doet met rechten.

      Tegenwoordig gebruik ik op mijn eigen servers eigenlijk de Shadow Copy Service (‘previous versions’) maar volgens mij heeft MS dat alweer overboord gegooid.

      Binnenkort komt er trouwens een update op deze pagina want ik heb wat feedback met foutcorrectie gekregen.

      Als je naar een *nix wilt backuppen over iscsi, kijk dan eens naar Nas4free.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top