Hot-relocation allows a system to react automatically to I/O failures on redundant (mirrored or RAID-5) VxVM objects, and to restore redundancy and access to those objects. VxVM detects I/O failures on objects and relocates the affected subdisks to disks designated as spare disks or to free space within the disk group. VxVM then reconstructs the objects that existed before the failure and makes them redundant and accessible again.
When a partial disk failure occurs (that is, a failure affecting only some subdisks on a disk), redundant data on the failed portion of the disk is relocated. Existing volumes on the unaffected portions of the disk remain accessible.
Hot-relocation is only performed for redundant (mirrored or RAID-5) subdisks on a failed disk. Non-redundant subdisks on a failed disk are not relocated, but the system administrator is notified of their failure.
The hot-relocation daemon, vxrelocd, detects and reacts to VxVM events that signify the following types of failures:
This is normally detected as a result of an I/O failure from a VxVM object. VxVM attempts to correct the error. If the error cannot be corrected, VxVM tries to access configuration information in the private region of the disk. If it cannot access the private region, it considers the disk failed.
This is normally detected as a result of an uncorrectable I/O error in the plex (which affects subdisks within the plex). For mirrored volumes, the plex is detached.
This is normally detected as a result of an uncorrectable I/O error. The subdisk is detached.
When vxrelocd detects such a failure, it performs the following steps:
vxrelocd informs the system administrator (and other nominated users) by electronic mail of the failure and which VxVM objects are affected.
vxrelocd next determines if any subdisks can be relocated. vxrelocd looks for suitable space on disks that have been reserved as hot-relocation spares (marked spare) in the disk group where the failure occurred. It then relocates the subdisks to use this space.
If no spare disks are available or additional space is needed, vxrelocd uses free space on disks in the same disk group, except those disks that have been excluded for hot-relocation use (marked nohotuse). When vxrelocd has relocated the subdisks, it reattaches each relocated subdisk to its plex.
Finally, vxrelocd initiates appropriate recovery procedures. For example, recovery includes mirror resynchronization for mirrored volumes or data recovery for RAID-5 volumes. It also notifies the system administrator of the hot-relocation and recovery actions that have been taken.
If relocation is not possible, vxrelocd notifies the system administrator and takes no further action.
Hot-relocation does not guarantee the same layout of data or the same performance after relocation. An administrator should check whether any configuration changes are required after hot-relocation occurs.
The failing subdisks are on non-redundant volumes (that is, volumes of types other than mirrored or RAID-5).
There are insufficient spare disks or free disk space in the disk group.
The only available space is on a disk that already contains a mirror of the failing plex.
The only available space is on a disk that already contains the RAID-5 log plex or one of its healthy subdisks. Failing subdisks in the RAID-5 plex cannot be relocated.
If a mirrored volume has a dirty region logging (DRL) log subdisk as part of its data plex, failing subdisks belonging to that plex cannot be relocated.
If a RAID-5 volume log plex or a mirrored volume DRL log plex fails, a new log plex is created elsewhere. There is no need to relocate the failed subdisks of the log plex.
See the vxrelocd(1M) manual page.
Figure: Example of hot-relocation for a subdisk in a RAID-5 volume shows the hot-relocation process in the case of the failure of a single subdisk of a RAID-5 volume.