## ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ## The Backup Process ## ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ## ************************************************************************** ## Some Terminology ## ************************************************************************** 1. Backup archive. An archive of backed up material. Usually a single file. E.g. a tar file of a directory 2. Repository. Location where we keep permanent or semi-permanent backup archives. May divide a repository into 1. permanent section 2. holding section. Holding section contains recent backups which will not be permanently archived but are required for recovery purposes in case of mishap. For example holding section could contain weekly backups in between permanent monthly backups or incremental backups in between full backups. ## ************************************************************************** ## Overview ## ************************************************************************** Can divide backup work into 3 parts: 1. Create backup archives locally (tar/compression of data, incremental or total ec) 2. Migrate archives to repository and within repository. (including distribution to other machines and decision about what is stored (daily copies but only monthly 'snapshots' kept). 3. Mirror repository QU: who should drive transfer: the archive end or the tar end? ANS: do it from archive end (problem what if source machine is not guaranteed to be on? Guess have to allow from both. ## ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ## Design ## ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ## ************************************************************************** ## Requirements ## ************************************************************************** ## ************************************************************************** ## Implementation ## ************************************************************************** ## ========================================================================== ## Local Archiver ## ========================================================================== Types supported: filesystem db db_mysql db_postgresql svn What is common: 1. archive name 2. items to put in archive 3. hook command to perform the backup **However** really for each one slightly different options and implementation. Suggests a structure where for each type we have a class: BackupItem doBackup() So will have a class for file backup, class for dbs etc. Each of these will produce a single archive. Finally if convenient might provider a 'director' function/class to helper tasks such as loading a config file and then setting up appropriate backup classes. ## ========================================================================== ## Scheduler and Permanent Archive ## ========================================================================== Centrally coordinate scheduling. Some repos should be dumb 1. List repos (maybe more than one) 2. have recent directory and current directory 3. have permanent archive with dates on which we put in a new entry (on that day clear current folder and archive the archive you have made today permanently want a clean current directory flag Issues: * support for an incremental system. * efficiency using rsync. might want to sync against the last current version in order to gain greater efficiency. each task consists of source machine and destination machine perform_recent_backups: [make backups between permanent backups on a daily basis] max_recent_backups: 10 permanent_backup_schedule: [list in form [days in month [monthes]] with no indication indicating daily Idea for sync operation: 2 local backup directories: tmp, local_daily repository daily_all .... holding .... permanent .... with under each date: ${backupset-name1} ${backupset-name2} ... ## -------------------------------------------------------------------------- ## Implementation ## -------------------------------------------------------------------------- I think we can safely assume main repository will always be on linux system. This means can use command line utilities that are available there. In particular: 1. rsync 2. ssh login 3. cron jobs Step 1: DAILY: local_daily_backup -> repository holding section Step 2: * get copies of daily backups from remote locations and store Step 3: periodically (use cron) migrate archives from holding section to permanent section Step 4: mirror the repository to remote locations Should try reduce amount of bandwidth used