Taking over the primary role by the remote cluster


		< Previous \| TOC \| Index \| Next >

Taking over the primary role by the remote cluster

Takeover occurs when the remote cluster on the secondary site starts the application that uses replicated data. This situation may occur if the secondary site perceives the primary site as dead, or when the primary site becomes inaccessible (perhaps for a known reason). See the Veritas Volume Replicator Administrator's Guide for detailed description of concepts of taking over the primary role.

Before enabling the secondary site to take over the primary role, the administrator on the secondary site must "declare" the type of failure at the remote (primary, in this case) site. Designate the failure type using one of the options for the haclus command, are discussed in the following sections.

disaster

When the cluster on the primary site is inaccessible and appears dead, the administrator declares the failure type as disaster. For example, fire may destroy a data center, including the primary site and all data in the volumes. After making this declaration, the administrator can bring the service group online on the secondary site, which now has the role as "primary" site.

outage

When the administrator of a secondary site knows the primary site is inaccessible for a known reason, such as a temporary power outage, the administrator may declare the failure as an outage. Typically, an administrator expects the primary site to return to its original state.

After the declaration for an outage occurs, the RVGSharedPri agent enables DCM logging while the secondary site maintains the primary replication role. After the original primary site becomes alive and returns to its original state, DCM logging makes it possible to use fast fail back resynchronization when data is resynchronized to the original cluster.

Before attempting to resynchronize the data using the fast fail back option from the current primary site to the original primary site, take the precaution at the original primary site to make a snapshot of the original data. This action provides a valid copy of data (see replica) at the original primary site for use in the case the current primary site fails before the resynchronization is complete.\

disconnect

When both clusters are functioning properly and the heartbeat link between the clusters fails, a split-brain condition exists. In this case, the administrator can declare the failure as disconnect, meaning no attempt will occur to take over the role of the primary site at the secondary site. This declaration is merely advisory, generating a message in the VCS log indicating the failure results from a network outage rather than a server outage.

replica

In the rare case where the current primary site becomes inaccessible while data is resynchronized from that site to the original primary site using the fast fail back method, the administrator at the original primary site may resort to using a data snapshot (if it exists) taken before the start of the fast fail back operation. In this case, the failure type is designated as replica.

Example of takeover for an outage

To take over after an outage

From any node of the secondary site, issue the haclus command:
# haclus -declare outage -clus rac_cluster101
After declaring the state of the remote cluster, bring the Oracle service group online on the secondary site. For example:
# hagrp -online -force oradb1_grp -any

Example of resynchronization after an outage

To resynchronize after an outage

On the original primary site, create a snapshot of the RVG before resynchronizing it in case the current primary site fails during the resynchronization. Assuming the disk group is oradatadg and the RVG is rac1_rvg, type:
# vxrvg -g oradatadg -F snapshot rac1_rvg

See the Veritas Volume Replicator Administrator's Guide for details on RVG snapshots.
Resynchronize the RVG. From the CVM master node of the current primary site, issue the hares command and the -action option with the fbsync action token to resynchronize the RVGSharedPri resource. For example:
# hares -action ora_vvr_shpri fbsync -sys mercury

To determine which node is the CVM master node, type:

# vxdctl -c mode
Perform one of the following commands, depending on whether the resynchronization of data from the current primary site to the original primary site is successful:
1. If the resynchronization of data is successful, use the vxrvg command with the snapback option to reattach the snapshot volumes on the original primary site to the original volumes in the specified RVG:
  # vxrvg -g oradatadg snapback rac1_rvg
2. A failed attempt at the resynchronization of data (for example, a disaster hits the primary RVG when resynchronization is in progress) could generate inconsistent data. You can restore the contents of the RVG data volumes from the snapshot taken in step 1:
  # vxrvg -g oradatadg snaprestore rac1_rvg


^ Return to Top	< Previous \| TOC \| Index \| Next >