You can configure new or existing RHEV-based virtual machines for disaster recovery (DR) by setting them up and configuring VCS for DR.
To set up RHEV-based virtual machines for DR
For more information about configuring a global cluster: see the Veritas InfoScale ™ Solutions Disaster Recovery Implementation Guide.
Configure any additional parameters such as NICs and virtual disk for the virtual machine.
Verify that the virtual machine turns on correctly.
Install appropriate RHEL operating system inside the guest.
Configure the network interface with appropriate parameters such as IP address, Netmask, and gateway.
Make sure that the NIC is not under network manager control. You can disable this setting by editing the /etc/sysconfig/network-scripts/ifcfg-eth0
file inside the virtual machine and setting NM_CONTROLLED to "no".
Make sure that the virtual machine does not have a CDROM attached to it. This is necessary since VCS sends the DR payload in the form of a CDROM to the virtual machine.
To configure VCS for managing RHEV-based virtual machines for DR
The DROpts attribute enables you to specify site-specific networking parameters for the virtual machine such as IP Address, Netmask, Gateway, DNSServers, DNSSearchPath and Device. The Device is set to the name of the NIC as seen by the guest, for example eth0.
Verify that the ConfigureNetwork key in the DROpts attribute is set to 1.
The DROpts attribute must be set on the KVMGuest resource in both the clusters.
/opt/VRTSvcs/bin/sample_triggers/VRTSvcs/preonline_rhev
.Create a folder in the /opt/VRTSvcs
directory on each RHEL-H host to host the trigger script. Copy the trigger script in this folder with the name "preonline". Enable the preonline trigger on the virtual machine service group by setting the PreOnline service group attribute. Also, specify the path (relative to /opt/VRTSvcs
) in the TriggerPath attribute.
For example:
group RHEV_VM_SG1 ( SystemList = { vcslx317 = 0, vcslx373 = 1 } ClusterList = { test_rhevdr_pri = 0, test_rhevdr_sec = 1 } AutoStartList = { vcslx317 } TriggerPath = "bin/triggers/RHEVDR" PreOnline = 1 )
For more information on setting triggers, see the Cluster Server Administrator's Guide.
Add the appropriate replication resource (such as Hitachi TrueCopy or EMC SRDF). For details on the appropriate replication agent, see the Replication Agent Installation and Configuration Guide for that agent.
Add an Online Global Firm dependency from the virtual machine (VM) service group to the replication service group.
Configure the replication service group as global.
/opt/VRTSvcs/bin/sample_triggers/VRTSvcs/postonline_rhev
.Copy the postonline trigger to the same location as the preonline trigger script, with the name "postonline". Enable the postonline trigger on the replication service group by adding the POSTONLINE key to the TriggersEnabled attribute. Also, specify the path (relative to /opt/VRTSvcs
) in the TriggerPath attribute.
For example:
group SRDF_SG1 ( SystemList = { vcslx317 = 0, vcslx373 = 1 } ClusterList = { test_rhevdr_pri = 0, test_rhevdr_sec = 1 } AutoStartList = { vcslx317 } TriggerPath = "bin/triggers/RHEVDR" TriggersEnabled = { POSTONLINE } )
For more information on setting triggers, see the Cluster Server Administrator's Guide.
If you have multiple replicated Storage Domains, the replication direction for all the domains in a datacenter must be the same.
To align replication for multiple replicated Storage Domains in a datacenter
This is because the Storage Pool Manager (SPM) host requires read-write access to all the Storage Domains in a datacenter.
After completing all the above steps, you can easily switch the virtual machine service group from one site to the other. When you online the replication service group in a site, the replication resource makes sure that the replication direction is from that site to the remote site. This ensures that all the replicated devices are read-write enabled in the current site.
See About disaster recovery for Red Hat Enterprise Virtualization virtual machines.
Disaster recovery workflow
The trigger checks whether the SPM is in the local cluster. If the SPM is in the local cluster, the trigger checks whether the SPM host is in the UP state. If the SPM host is in the NON_RESPONSIVE state, the trigger fences out the host. This enables RHEV-M to select some other host in the current cluster.
If the SPM is in the remote cluster, the trigger deactivates all the hosts in the remote cluster. Additionally, if the remote SPM host is in the NON_RESPONSIVE state, the trigger script fences out the host. This enables RHEV-M to select some other host in the current cluster.
The trigger script then waits for 10 minutes for the SPM to failover to the local cluster.
When the SPM successfully fails over to the local cluster, the script then reactivates all the remote hosts that were previously deactivated.
Then the trigger script proceeds to online the virtual machine service group.
Troubleshooting a disaster recovery configuration
When the service groups are switched to the secondary site, the hosts in the primary site may go into the NON_OPERATIONAL state. To resolve this issue, deactivate the hosts by putting them in maintenance mode, and reactivate them. If the issue is not resolved, log onto the RHEL-H host and restart the vdsmd service using the service vdsmd restartcommand. If the issue still persists, please contact RedHat Technical Support.
After a DR failover, the DNS configuration of the virtual machine may not change. To resolve this issue, check if the network adapter inside the virtual machine is under Network Manager control. If so, unconfigure the network adapter by editing the /etc/sysconfig/network-scripts/ifcfg-eth0
file inside the virtual machine and setting NM_CONTROLLED to "no".
After a failover to the secondary site, the virtual machine service group does not go online. To resolve this issue, check the state of the SPM in the data center. Make sure that the SPM is active on some host in the secondary RHEV cluster. Additionally, check the VCS engine logs for more information.