The following types of recovery may be required for RAID-5 volumes:
Parity resynchronization and stale subdisk recovery are typically performed when the RAID-5 volume is started, or shortly after the system boots. They can also be performed by running the
See "Unstartable RAID-5 volumes" on page 22.
If hot-relocation is enabled at the time of a disk failure, system administrator intervention is not required unless no suitable disk space is available for relocation. Hot-relocation is triggered by the failure and the system administrator is notified of the failure by electronic mail.
Hot relocation automatically attempts to relocate the subdisks of a failing RAID-5 plex. After any relocation takes place, the hot-relocation daemon (
vxrelocd) also initiates a parity resynchronization.
In the case of a failing RAID-5 log plex, relocation occurs only if the log plex is mirrored; the
vxrelocd daemon then initiates a mirror resynchronization to recreate the RAID-5 log plex. If hot-relocation is disabled at the time of a failure, the system administrator may need to initiate a resynchronization or recovery.
Note Following severe hardware failure of several disks or other related subsystems underlying a RAID-5 plex, it may be only be possible to recover the volume by removing the volume, recreating it on hardware that is functioning correctly, and restoring the contents of the volume from a backup.
In most cases, a RAID-5 array does not have stale parity. Stale parity only occurs after all RAID-5 log plexes for the RAID-5 volume have failed, and then only if there is a system failure. Even if a RAID-5 volume has stale parity, it is usually repaired as part of the volume start process.
If a volume without valid RAID-5 logs is started and the process is killed before the volume is resynchronized, the result is an active volume with stale parity.
The following example is output from the
-ht command for a stale RAID-5 volume:
V NAME RVG/VSET/CO KSTATE STATE LENGTH READPOL PREFPLEX UTYPE
PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE
SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE
SV NAME PLEX VOLNAME NVOLLAYR LENGTH [COL/]OFF AM/NM MODE
v r5vol - ENABLED NEEDSYNC 204800 RAID - raid5
pl r5vol-01 r5vol ENABLED ACTIVE 204800 RAID 3/16 RW
sd disk01-01 r5vol-01 disk01 0 102400 0/0 hdisk3 ENA
sd disk02-01 r5vol-01 disk02 0 102400 1/0 hdisk4 dS
sd disk03-01 r5vol-01 disk03 0 102400 2/0 hdisk5 ENA
This output lists the volume state as
NEEDSYNC, indicating that the parity needs to be resynchronized. The state could also have been
SYNC, indicating that a synchronization was attempted at start time and that a synchronization process should be doing the synchronization. If no such process exists or if the volume is in the
NEEDSYNC state, a synchronization can be manually started by using the
resync keyword for the
Parity is regenerated by issuing
ioctls to the RAID-5 volume. The resynchronization process starts at the beginning of the RAID-5 volume and resynchronizes a region equal to the number of sectors specified by the
-o iosize option. If the
-o iosize option is not specified, the default maximum I/O size is used. The
resync operation then moves onto the next region until the entire length of the RAID-5 volume has been resynchronized.
For larger volumes, parity regeneration can take a long time. It is possible that the system could be shut down or crash before the operation is completed. In case of a system shutdown, the progress of parity regeneration must be kept across reboots. Otherwise, the process has to start all over again.
To avoid the restart process, parity regeneration is checkpointed. This means that the offset up to which the parity has been regenerated is saved in the configuration database. The
checkpt=size option controls how often the checkpoint is saved. If the option is not specified, the default checkpoint size is used.
Because saving the checkpoint offset requires a transaction, making the checkpoint size too small can extend the time required to regenerate parity. After a system reboot, a RAID-5 volume that has a checkpoint offset smaller than the volume length starts a parity resynchronization at the checkpoint offset.
To resynchronize parity on a RAID-5 volume
RAID-5 log plexes can become detached due to disk failures. These RAID-5 logs can be reattached by using the
att keyword for the
To reattach a failed RAID-5 log plex
Stale subdisk recovery is usually done at volume start time. However, the process doing the recovery can crash, or the volume may be started with an option such as
delayrecover that prevents subdisk recovery. In addition, the disk on which the subdisk resides can be replaced without recovery operations being performed. In such cases, you can perform subdisk recovery by using the
To recover a stale subdisk in the RAID-5 volume
A RAID-5 volume that has multiple stale subdisks can be recovered in one operation. To recover multiple stale subdisks, use the
recover command on the volume: