vxrelocd - monitor Veritas Volume Manager for failure events and relocate failed subdisks
/etc/vx/bin/vxrelocd [-o vxrecover_argument] [-O old_version] [-s save_max] [mail_address...]
The vxrelocd command monitors Veritas Volume Manager (VxVM) by analyzing the output of the vxnotify command, and waits for a failure. When a failure occurs, vxrelocd sends mail via mailx to root (by default) or to other specified users and relocates failed subdisks. After completing the relocation, vxrelocd sends more mail indicating the status of each subdisk replacement. The vxrecover utility is then run on volumes with relocated subdisks to restore data. Mail is sent after vxrecover executes.
The -o option and its argument are passed directly to vxrecover if vxrecover is called. This allows specifying -o slow[=iodelay] to keep vxrecover from overloading a busy system during recovery. The default value for the delay is 250 milliseconds.
Reverts back to an older version. Specifying -O VxVM_version directs vxrelocd to use the relocation scheme in that version.
Before vxrelocd attempts a relocation, a snapshot of the current configuration is saved in /etc/vx/saveconfig.d. This option specifies the maximum number of configurations to keep for each diskgroup. The default is 32.
By default, vxrelocd sends mail to root with information about a detected failure and the status of any relocation and recovery attempts. To send mail to other users, add the user login name to the vxrelocd startup line in the startup script /etc/init.d/vxvm-recover, and reboot the system. For example, if the line appears as:
nohup vxrelocd root &
and you want mail also to be sent to user1 and user2, change the line to read:
nohup vxrelocd root user1 user2 &
Alternatively, you can kill the vxrelocd process and restart it as vxrelocd root mail_address, where mail_address is a user's login name. Do not kill the vxrelocd process while a relocation attempt is in progress.
The mail notification that is sent when a failure is detected follows this format:
Failures have been detected by the Veritas Volume Manager:
failed log plexes:
The Volume Manager will attempt to find spare disks,
relocate failed subdisks and then recover the data
in the failed plexes.
The medianame list under failed disks specifies disks that appear to have completely failed; the medianame list under failing disks indicates a partial disk failure or a disk that is in the process of failing. When a disk has failed completely, the same medianame list appears under both failed disks and failing disks. The plexname list under failed plexes shows plexes that were detached due to I/O failures that occurred while attempting to do I/O to subdisks they contain. The plexname list under failed log plexes indicates RAID-5 or DRL (dirty region logging) log plexes that have failed. The subdiskname list specifies subdisks in RAID-5 volumes that were detached due to I/O errors.
A disk can be marked as ''spare.'' This makes the disk available as a site for relocating failed subdisks. Disks that are marked as spares are not used for normal allocations unless you explicitly specify them. This ensures that there is a pool of spare space available for relocating failed subdisks and that this space does not get consumed by normal operations. Spare space is the first space used to relocate failed subdisks. However, if no spare space is available, or the available spare space is not suitable or sufficient, free space is also used except for those marked with the nohotuse flag. See the vxedit(1M) and vxdiskadm(1M) manual pages for more information on marking a disk as a spare or nohotuse.
A disk can be marked as ''nohotuse.'' This excludes the disk from being used by vxrelocd, but it is still available as free space. See the vxedit(1M) and vxdiskadm(1M) manual pages for more information on marking a disk as a spare or nohotuse.
After mail is sent, vxrelocd relocates failed subdisks (those listed in the subdisks list). This requires finding appropriate spare or free space in the same disk group as the failed subdisk. A disk is eligible as replacement space if it is a valid Veritas Volume Manager disk (VM disk) and contains enough space to hold the data contained in the failed subdisk. If no space is available on spare disks, the relocation uses free space that is not marked nohotuse.
To determine which of the eligible disks to use, vxrelocd first tries the disk that is closest to the failed disk. The value of ''closeness'' depends on the controller of the failed disk. A disk on the same controller as the failed disk is closer than a disk on a different controller.
vxrelocd moves all subdisks from a failing drive to the same destination disk if possible.
If no spare or free space is found, mail is sent explaining the disposition of volumes that had storage on the failed disk:
Hot-relocation was not successful for subdisks on disk
dm_name in volume v_name in disk group dg_name
No replacement was made and the disk is still unusable.
The following volumes have storage on medianame
These volumes are still usable, but the redundancy of
those volumes is reduced. Any RAID-5 volumes with storage
on the failed disk may become unusable in the face of
If any non-RAID-5 volumes were made unusable due to the disk failure, the following message is included:
The following volumes:
have data on medianame
but have no other usable mirrors on
other disks. These volumes are now unusable and the data on
them is unavailable. These volumes must have their data restored.
If any RAID-5 volumes were made unavailable due to the disk failure, the following message is included:
The following RAID-5 volumes:
had storage on medianame
and have experienced
other failures. These RAID-5 volumes are now unusable
and data on them is unavailable. These RAID-5 volumes must
have their data restored.
If there is spare space available, a snapshot of the current configuration is saved in /etc/vx/saveconfig.d/dg_name.yymmdd_hhmmss.mpvsh before attempting a subdisk relocation. Relocation requires setting up a subdisk on the spare or free space not marked with nohotuse and using it to replace the failed subdisk. If this is successful, the vxrecover command runs in the background to recover the data in volumes that had storage on the disk.
If the relocation fails, the following message is sent:
Hot-relocation was not successful for subdisks
on disk dm_name in volume v_name
. No replacement was made
and the disk is still unusable.
If any volumes (RAID-5 or otherwise) become unusable due to the failure, the following message is included:
The following volumes:
have data on dm_name
but have no other usable mirrors on other
disks. These volumes are now unusable and the data on them is
unavailable. These volumes must have their data restored.
If the relocation procedure was successful and recovery has begun, the following mail message is sent:
Volume v_name Subdisk sd_name
, but not yet recovered.
After recovery completes, a mail message is sent relaying the result of the recovery procedure. If the recovery is successful, the following message is included in the mail:
Recovery complete for volume v_name
If the recovery was not successful, the following message is included in the mail:
Failure recovering v_name in disk group dg_name
If you do not want automatic subdisk relocation, you can disable the hot-relocation feature by killing the relocation daemon, vxrelocd, and preventing it from restarting. However, do not kill the daemon while it is doing the relocation. To kill the daemon, run the command:
from the command line and find the two entries for vxrelocd. Execute the command:
kill -9 PID1 PID2
(substituting PID1 and PID2 with the process IDs for the two vxrelocd processes). To prevent vxrelocd from being started again, you must comment out the line that starts up vxrelocd in the startup script /etc/init.d/vxvm-recover.
The startup file for vxrelocd.
File where vxrelocd saves a snapshot of the current configuration before performing a relocation.
kill(1), mail(1), ps(1), vxdiskadm(1M), vxedit(1M), vxintro(1M), vxnotify(1M), vxrecover(1M), vxsparecheck(1M), vxunreloc(1M)