vxrelocd (1M)

NAME

vxrelocd - monitor Veritas Volume Manager for failure events and relocate failed subdisks. Also, in thin-provision space, reclaim the storage space of the deleted volume.

SYNOPSIS

/etc/vx/bin/vxrelocd [-o vxrecover_argument] [-O old_version] [-s save_max] [mail_address...]

DESCRIPTION

The vxrelocd command monitors Veritas Volume Manager (VxVM) by analyzing the output of the vxnotify command, and waits for a failure. When a failure occurs, vxrelocd sends mail via mailx to root (by default) or to other specified users and relocates failed subdisks. After completing the relocation, vxrelocd sends more mail indicating the status of each subdisk replacement. The vxrecover utility is then run on volumes with relocated subdisks to restore data. Mail is sent after vxrecover executes.

Also, vxrelocd perform the storage reclaim operation at regular intervals as set by the user. In thin-provision space, the deleted volume may have some allocated storage space which needs to be reclaimed. The reclaim operation starts once every day at 22:00 by default. This policy is defined to avoid overloading the array with reclaim operation at the busy hours of the day.

The tunables to control the reclaim operation are reclaim_on_delete_wait_period and reclaim_on_delete_start_time.

reclaim_on_delete_wait_period is set to 1 by default, which means the storage space on the deleted volume is reclaimed after 1 full day. If the value is set anything greater than 366, then the storage space needs to be manually reclaimed by the admin, using ’vxdisk reclaim’ command. If the value is set to -1, then the reclaim operation is immediate after the volume is deleted. reclaim_on_delete_start_time is set to 22:00, which means to start the reclaim operation at 22:00 hours, everyday.

The tunables are set in the default file /etc/default/vxsf.

By default, If a volume is deleted, the storage space of the volume is reclaimed on the next day at 22:00 hours.

OPTIONS

-o The -o option and its argument are passed directly to vxrecover if vxrecover is called. This allows specifying -o slow[=iodelay] to keep vxrecover from overloading a busy system during recovery. The default value for the delay is 250 milliseconds.
-O Reverts back to an older version. Specifying -O VxVM_version directs vxrelocd to use the relocation scheme in that version.
-s Before vxrelocd attempts a relocation, a snapshot of the current configuration is saved in /etc/vx/saveconfig.d. This option specifies the maximum number of configurations to keep for each diskgroup. The default is 32.

Mail Notification

By default, vxrelocd sends mail to root with information about a detected failure and the status of any relocation and recovery attempts. To send mail to other users, add the user login name to the vxrelocd startup line in the startup script /etc/init.d/vxvm-recover, and reboot the system. For example, if the line appears as:

vxrelocd root &

and you want mail also to be sent to user1 and user2, change the line to read:


vxrelocd root user1 user2 &

Alternatively, you can kill the vxrelocd process and restart it as vxrelocd root mail_address, where mail_address is a user’s login name. Do not kill the vxrelocd process while a relocation attempt is in progress.

The mail notification that is sent when a failure is detected follows this format:


Failures have been detected by the Veritas Volume Manager:

failed disks: medianame ... failed plexes: plexname ... failed log plexes: plexname ... failing disks: medianame ... failed subdisks: subdiskname ...

The Volume Manager will attempt to find spare disks, relocate failed subdisks and then recover the data in the failed plexes.

The medianame list under failed disks specifies disks that appear to have completely failed; the medianame list under failing disks indicates a partial disk failure or a disk that is in the process of failing. When a disk has failed completely, the same medianame list appears under both failed disks and failing disks. The plexname list under failed plexes shows plexes that were detached due to I/O failures that occurred while attempting to do I/O to subdisks they contain. The plexname list under failed log plexes indicates RAID-5 or DRL (dirty region logging) log plexes that have failed. The subdiskname list specifies subdisks in RAID-5 volumes that were detached due to I/O errors.

Spare Space

A disk can be marked as ‘‘spare.’’ This makes the disk available as a site for relocating failed subdisks. Disks that are marked as spares are not used for normal allocations unless you explicitly specify them. This ensures that there is a pool of spare space available for relocating failed subdisks and that this space does not get consumed by normal operations. Spare space is the first space used to relocate failed subdisks. However, if no spare space is available, or the available spare space is not suitable or sufficient, free space is also used except for those marked with the nohotuse flag. See the vxedit(1M) and vxdiskadm(1M) manual pages for more information on marking a disk as a spare or nohotuse.

Nohotuse Space

A disk can be marked as ‘‘nohotuse.’’ This excludes the disk from being used by vxrelocd, but it is still available as free space. See the vxedit(1M) and vxdiskadm(1M) manual pages for more information on marking a disk as a spare or nohotuse.

Replacement Procedure

After mail is sent, vxrelocd relocates failed subdisks (those listed in the subdisks list). This requires finding appropriate spare or free space in the same disk group as the failed subdisk. A disk is eligible as replacement space if it is a valid Veritas Volume Manager disk (VM disk) and contains enough space to hold the data contained in the failed subdisk. If no space is available on spare disks, the relocation uses free space that is not marked nohotuse.

To determine which of the eligible disks to use, vxrelocd first tries the disk that is closest to the failed disk. The value of ‘‘closeness’’ depends on the controller and disk number of the failed disk. A disk on the same controller as the failed disk is closer than a disk on a different controller.

vxrelocd moves all subdisks from a failing drive to the same destination disk if possible.

If no spare or free space is found, mail is sent explaining the disposition of volumes that had storage on the failed disk:


Hot-relocation was not successful for subdisks on disk dm_name in volume v_name in disk group dg_name. No replacement was made and the disk is still unusable.

The following volumes have storage on medianame:

volumename ...

These volumes are still usable, but the redundancy of those volumes is reduced. Any RAID-5 volumes with storage on the failed disk may become unusable in the face of further failures.

If any non-RAID-5 volumes were made unusable due to the disk failure, the following message is included:


The following volumes:

volumename ...

have data on medianame but have no other usable mirrors on other disks. These volumes are now unusable and the data on them is unavailable. These volumes must have their data restored.

If any RAID-5 volumes were made unavailable due to the disk failure, the following message is included:


The following RAID-5 volumes:

volumename ...

had storage on medianame and have experienced other failures. These RAID-5 volumes are now unusable and data on them is unavailable. These RAID-5 volumes must have their data restored.

If there is spare space available, a snapshot of the current configuration is saved in /etc/vx/saveconfig.d/dg_name.yymmdd_hhmmss.mpvsh before attempting a subdisk relocation. Relocation requires setting up a subdisk on the spare or free space not marked with nohotuse and using it to replace the failed subdisk. If this is successful, the vxrecover command runs in the background to recover the data in volumes that had storage on the disk.

If the relocation fails, the following message is sent:


Hot-relocation was not successful for subdisks on disk dm_name in volume v_name in disk group dg_name. No replacement was made and the disk is still unusable.

If any volumes (RAID-5 or otherwise) become unusable due to the failure, the following message is included:


The following volumes:

volumename ...

have data on dm_name but have no other usable mirrors on other disks. These volumes are now unusable and the data on them is unavailable. These volumes must have their data restored.

If the relocation procedure was successful and recovery has begun, the following mail message is sent:


Volume v_name Subdisk sd_name relocated to newsd_name, but not yet recovered.

After recovery completes, a mail message is sent relaying the result of the recovery procedure. If the recovery is successful, the following message is included in the mail:


Recovery complete for volume v_name in disk group dg_name.

If the recovery was not successful, the following message is included in the mail:


Failure recovering v_name in disk group dg_name.

Disabling vxrelocd

If you do not want automatic subdisk relocation, you can disable the hot-relocation feature by killing the relocation daemon, vxrelocd, and preventing it from restarting. However, do not kill the daemon while it is doing the relocation. To kill the daemon, run the command:

ps -ef

from the command line and find the two entries for vxrelocd. Execute the command:


kill -9 PID1 PID2

(substituting PID1 and PID2 with the process IDs for the two vxrelocd processes). To prevent vxrelocd from being started again, you must comment out the line that starts up vxrelocd in the startup script /etc/init.d/vxvm-recover.

FILES

/etc/init.d/vxvm-recover The startup file for vxrelocd.
/etc/vx/saveconfig.d/dg_name.yymmdd_hhmmss.mpvsh
  File where vxrelocd saves a snapshot of the current configuration before performing a relocation.

SEE ALSO

kill(1), mailx(1), ps(1), vxdiskadm(1M), vxedit(1M), vxintro(1M), vxnotify(1M), vxrecover(1M), vxsparecheck(1M), vxunreloc(1M)


VxVM 4.0 vxrelocd (1M)