Recovering from Primary data volume error


		< Previous \| TOC \| Index \| Next >

Recovering from Primary data volume error

If a write to a Primary data volume fails, the data volume is detached. The RVG continues to function as before to provide access to other volumes in the RVG. Writes to the failed volume return an error and are not logged in the SRL.

RLINKs are not affected by a data volume failure. If the SRL was not empty at the time of the volume error, those updates will continue to flow from the SRL to the Secondary RLINKs. Any writes for the failed volume that were completed by the application but not written to the volume remain in the SRL. These writes are marked as pending in the SRL and are replayed to the volume when the volume is recovered. If the volume is recovered from the backup and restarted, these writes are discarded.

If the data volume had a permanent failure, such as damaged hardware, you must recover from backup. Recovery from this failure consists of two parts:

Restoring the Primary data volume from backup
Resynchronizing any Secondary RLINKs

If the RVG contains a database, recovery of the failed data volume must be coordinated with the recovery requirements of the database. The details of the database recovery sequence determine what must be done to synchronize Secondary RLINKs. See Example 1 and Example 2 for detailed examples of recovery procedures.

As an alternative, you can transfer the Primary role to the Secondary. See Example 3 for more information.

If the data volume failed due to a temporary outage such as a disconnected cable, and you are sure that there is no permanent hardware damage, you can start the data volume without dissociating it from the RVG. The pending writes in the SRL are replayed to the data volume. See Example 4 for an example of the recovery procedure.

Example 1

In this example, all the RLINKs are detached before recovery of the failure begins on the Primary. When recovery of the failure is complete, including any database recovery procedures, all the RLINKs must be synchronized using a Primary checkpoint.

On the Primary (seattle):

Detach all RLINKs
# vxrlink -g hrdg det rlk_london_hr_rvg
Fix or repair the data volume.
If the data volume can be repaired by repairing its underlying subdisks, you need not dissociate the data volume from the RVG. If the problem is fixed by dissociating the failed volume and associating a new one in its place, the dissociation and association must be done while the RVG is stopped.
Make sure the data volume is started before restarting the RVG.
# vxvol -g hrdg start hr_dv01

# vxrvg -g hrdg start hr_rvg
Restore the database.
Synchronize all the RLINKs using block-level backup and checkpointing.

Example 2

This example does the minimum to repair data volume errors, leaving all RLINKs attached. In this example, restoring the failed volume data from backup, and the database recovery is done with live RLINKs. Because all the changes on the Primary are replicated, all the Secondaries must be consistent with the Primary after the changes have been replicated. This method may not always be practical because it might require replication of large amounts of data. The repaired data volume must also be carefully tested on every target database to be supported.

On the Primary (seattle):

Stop the RVG.
# vxrvg -g hrdg stop hr_rvg
Dissociate the failed data volume from the RVG.
Fix or repair the data volume or use a new volume.
If the data volume can be repaired by repairing its underlying subdisks, you need not dissociate the data volume from the RVG. If the problem is fixed by dissociating the failed volume and associating a new one in its place, the dissociation and association must be done while the RVG is stopped.
Associate the volume with the RVG.
Make sure the data volume is started before restarting the RVG. If the data volume is not started, start the data volume:
# vxvol -g hrdg start hr_dv01
Start the RVG:
# vxrvg -g hrdg start hr_rvg
Restore the database.

Example 3

As an alternative to the procedures described in Example 1 and Example 2, the Primary role can be transferred to a Secondary host. For details, see Chapter 7, Transferring the Primary role. After takeover, the original Primary with the failed data volume will not become acting_secondary until the failed data volume is recovered or dissociated.

Example 4

If the I/O error on the data volume is temporary and you are sure that all the existing data is intact, you can start the data volume without dissociating it from the RVG. For example, if the SCSI cable was disconnected or there was a power outage of the storage. In this case, follow the steps below:

Fix the temporary failure.
Start the data volume:
# vxvol -g hrdg start hr_dv01

Any outstanding writes in the SRL are written to the data volume.


^ Return to Top	< Previous \| TOC \| Index \| Next >