HPCVL (Carleton) Backup Procedures
Anytime users and administers are creating data or files that are to be reused by other project participants a safe and secure means of storing and archiving this data. From a system administration point of view we are concerned with three things data integrity, reliability, and sizing. Each of these components will be addressed in the sections below.

General Guidelines

It will be the standard practice of system administrators at HPCVL Carleton University to ensure that all user data and program files are indexed and archived. The goal is to ensure that in the event of data loss the system and program files can be restored in a timely manner.

HPCVL researchers and users must ensure that they take the proper steps to protect their program files in anticipation of a data loss. Programmers and researches should engage in secure programming practices to ensure the preservation of their data. By regularily saving files, tracking changes, and checkpointing program code, developers will be able to resume their work without interruption once the data has been restored.

Backing Up User Accounts

Users will be responsible for managing all files that reside in thier home directories. System administors will place disk quotas on user accounts so that there data files don't consume system resources. Large data files generated by individual users will be stored in a common storage archive (/data01/scratch on T3 storage array). At present only the files that reside in a users home directory will be archived so that they can be restored in the event of data loss. Until an adequate back up system in place HPCVL Caleton can't guarantee the restoration of data files that are stored in the common archive. System administors will only backup the user files that reside in the users home directory. This policy applies to all of Carleton's HPCVL computing resources including the Beowulf cluster and the SunFire.

Restrictions on User Accounts

User home directories will be limited to a maximum drive volume of 500Mb. This done to ensure the managability, and scalability of system resources. User directories are only intended for user configuration and program files and are not intended to store large volumes of program data. For users that require an exceptional amount of data storage the lab system administrator will create a separate external volume of space to accomodate program data.

Users with BIG Data Requirements

Researchers that are using, or generating large data files will be able to temporarily store their data to the T3 storage unit. As this device expands there will be more that enough storage to accomadate large volumes or program data.

Both the SunFire and the Beowulf have a common directory where users can store excess program data. By using the T3 storage array users canhave a single repository for all their program data and share it seamlessly between both the SunFire and the Beowulf. In order to access this archive users will need to create a subdirectory in /data01/scratch that reflects their HPCVL account name (ie. mkdir /data01/scratch/hpcXXXX). No quotas will be placed on this archive but disk usage will be monitored to ensure that no user is needlessly consuming disk resources. Users found to be hogging disk space will be notified by email and will have quotas imposed if the problem persists.

Users should remember that the /data01 archive is only intended for program data and is not intended as an archive for miscellaneous files and documentation. Users should also be aware that the data residing in this directory will not be backed up so there is no protection against data loss for files residing in the /data01 directory.

HPCVL Carleton Backup Procedures

The backup procedures described in this document are only intended to restore critical user and system data in the event of a disaster. At present HPCVL Carleton does not have facilites in place to archive large volumes of data. Users should be aware that only their critical data (program and configuration files) will be backed up as a precaution and that they are reponsible for the day-to-day integrity of their files. Users should also be aware that since the lab is operational 24/7 all system backups are "warm". This means that all files that are open will not be archived during a backup. System Administrators will help mitigate the impact to users by scheduling backups in the early morning.

Monthly Backups

HPCVL Administrators will keep archiving user directories for a period of one month. This will be sufficient to provide enough redundancy to ensure complete and prompt restoration of user program files. On the last day of every month a complete backup of all files will be performed.

Weekly Backups

Each week HPCVL administrators will to a full backup of all user files for the previous week. On each Sunday a complete weekly backup will be performed.

Daily Incremental Backups

Each day a daily incremental backup will be performed. At the end of each day HPCVL administrators will back all incremental changes to the previous day. The purpose of this is to combine daily incremental backups with full monthly or weekly backups so that a complete system of redundancy is provided.

Data Restoration Procedures

Individual User Files

In the event that a user should accidently remove some critical files they may contact the Carleton HPCVL system administrator to see if they can be restored to the system. As long as effected files had been resident in the users home directory and were inactive at the time of a backup the system administrator should be able to recover a recent version of the file.

Loss of Data Due to System Crashes

In the event of a system crash all user files in their home directories will be restored to the state of the most recent backup before the system crash took place.

 
  © HPCVL 2012
Last updated on Thursday, 26-Nov-2009 11:50:40 EST