Configuring Red Hat Enterprise Virtualization (RHEV) virtual machines for disaster recovery using Cluster Server (VCS)


Configuring Red Hat Enterprise Virtualization (RHEV) virtual machines for disaster recovery using Cluster Server (VCS)
Prev	Disaster recovery for virtual machines in the Red Hat Enterprise Virtualization environment	Next

You can configure new or existing RHEV-based virtual machines for disaster recovery (DR) by setting them up and configuring VCS for DR.

To set up RHEV-based virtual machines for DR

Configure VCS on both sites in the RHEL-H hosts, with the GCO option.
For more information about configuring a global cluster: see the Veritas InfoScale ™ Solutions Disaster Recovery Implementation Guide.
Configure replication setup using a replication technology such as VVR, VFR, Hitachi TrueCopy, or EMC SRDF.
Map the primary LUNs to all the RHEL-H hosts in the primary site.
Issue OS level SCSI rescan commands and verify that the LUNs are visible in the output of the multipath -l command.
Map the secondary LUNs to all the RHEL hosts in the secondary site and verify that they are visible in the output of the multipath -l command on all the hosts in the secondary site.
Add the RHEL-H hosts to the RHEV-M console.
- Create two RHEV clusters in the same datacenter, representing the two sites.
- Add all the RHEL-H hosts from the primary site to one of the RHEV clusters.
- Similarly, add all the RHEL-H hosts from the secondary site to the second RHEV cluster.
Log in to the RHEV-M console and create a Fibre Channel-type Storage Domain on one of the primary site hosts using the primary LUNs.
In the RHEV-M console, create a virtual machine and assign a virtual disk carved out of the Fibre Channel Storage Domain created in 7.
- Configure any additional parameters such as NICs and virtual disk for the virtual machine.
- Verify that the virtual machine turns on correctly.
- Install appropriate RHEL operating system inside the guest.
- Configure the network interface with appropriate parameters such as IP address, Netmask, and gateway.
- Make sure that the NIC is not under network manager control. You can disable this setting by editing the /etc/sysconfig/network-scripts/ifcfg-eth0 file inside the virtual machine and setting NM_CONTROLLED to "no".
- Make sure that the virtual machine does not have a CDROM attached to it. This is necessary since VCS sends the DR payload in the form of a CDROM to the virtual machine.
Copy the package VRTSvcsnr from the VCS installation media to the guest and install it. This package installs a lightweight service which starts when the guest boots. The service reconfigures the IP address and Gateway of the guest as specified in the KVMGuest resource.

To configure VCS for managing RHEV-based virtual machines for DR

Install VCS in the RHEL-H hosts at both the primary and the secondary sites.
- Configure all the VCS nodes in the primary site in a single primary VCS cluster.
- Configure all the VCS nodes in the secondary site in the same secondary VCS cluster.
- Make sure that the RHEV cluster at each site corresponds to the VCS cluster at that site.
See Figure: VCS Resource dependency diagram.
Create a service group in the primary VCS cluster and add a KVMGuest resource for managing the virtual machine. Repeat this step in the secondary VCS cluster.
Configure site-specific parameters for the KVMGuest resource in each VCS cluster.
- The DROpts attribute enables you to specify site-specific networking parameters for the virtual machine such as IP Address, Netmask, Gateway, DNSServers, DNSSearchPath and Device. The Device is set to the name of the NIC as seen by the guest, for example eth0.
- Verify that the ConfigureNetwork key in the DROpts attribute is set to 1.
- The DROpts attribute must be set on the KVMGuest resource in both the clusters.
Configure the preonline trigger on the virtual machine service group. The preonline trigger script is located at /opt/VRTSvcs/bin/sample_triggers/VRTSvcs/preonline_rhev.
- Create a folder in the /opt/VRTSvcs directory on each RHEL-H host to host the trigger script. Copy the trigger script in this folder with the name "preonline". Enable the preonline trigger on the virtual machine service group by setting the PreOnline service group attribute. Also, specify the path (relative to /opt/VRTSvcs) in the TriggerPath attribute.
For example:
```
group RHEV_VM_SG1 (
    SystemList = { vcslx317 = 0, vcslx373 = 1 }
    ClusterList = { test_rhevdr_pri = 0, test_rhevdr_sec = 1 }
    AutoStartList = { vcslx317 }
    TriggerPath = "bin/triggers/RHEVDR"
    PreOnline = 1
    )
```
For more information on setting triggers, see the Cluster Server Administrator's Guide.
Create a separate service group for managing the replication direction. This task must be performed for each cluster.
- Add the appropriate replication resource (such as Hitachi TrueCopy or EMC SRDF). For details on the appropriate replication agent, see the Replication Agent Installation and Configuration Guide for that agent.
- Add an Online Global Firm dependency from the virtual machine (VM) service group to the replication service group.
- Configure the replication service group as global.
Configure the postonline trigger on the replication service group. The postonline trigger script is located at /opt/VRTSvcs/bin/sample_triggers/VRTSvcs/postonline_rhev.
- Copy the postonline trigger to the same location as the preonline trigger script, with the name "postonline". Enable the postonline trigger on the replication service group by adding the POSTONLINE key to the TriggersEnabled attribute. Also, specify the path (relative to /opt/VRTSvcs) in the TriggerPath attribute.
  
  For example:
```
group SRDF_SG1 (
    SystemList = { vcslx317 = 0, vcslx373 = 1 }
    ClusterList = { test_rhevdr_pri = 0, test_rhevdr_sec = 1 }
    AutoStartList = { vcslx317 }
    TriggerPath = "bin/triggers/RHEVDR"
    TriggersEnabled = { POSTONLINE }
    )
```
  For more information on setting triggers, see the Cluster Server Administrator's Guide.

If you have multiple replicated Storage Domains, the replication direction for all the domains in a datacenter must be the same.

To align replication for multiple replicated Storage Domains in a datacenter

Add all the replication resources in the same Replication Service Group.
If you require different Storage Domains to be replicated in different directions at the same time, configure them in a separate datacenter.
This is because the Storage Pool Manager (SPM) host requires read-write access to all the Storage Domains in a datacenter.

After completing all the above steps, you can easily switch the virtual machine service group from one site to the other. When you online the replication service group in a site, the replication resource makes sure that the replication direction is from that site to the remote site. This ensures that all the replicated devices are read-write enabled in the current site.

See About disaster recovery for Red Hat Enterprise Virtualization virtual machines.

Disaster recovery workflow

Online the replication service group in a site followed by the virtual machine service group.
Check the failover by logging into the RHEV-M console. Select the Hosts tab of the appropriate datacenter to verify that the SPM is marked on one of the hosts in the site in which the replication service group is online.
When you bring the Replication Service Group online, the postonline trigger probes the KVMGuest resources in the parent service group. This is to ensure that the virtual machine service group can go online.
When you bring the virtual machine service group online, the preonline trigger performs the following tasks:
- The trigger checks whether the SPM is in the local cluster. If the SPM is in the local cluster, the trigger checks whether the SPM host is in the UP state. If the SPM host is in the NON_RESPONSIVE state, the trigger fences out the host. This enables RHEV-M to select some other host in the current cluster.
- If the SPM is in the remote cluster, the trigger deactivates all the hosts in the remote cluster. Additionally, if the remote SPM host is in the NON_RESPONSIVE state, the trigger script fences out the host. This enables RHEV-M to select some other host in the current cluster.
- The trigger script then waits for 10 minutes for the SPM to failover to the local cluster.
- When the SPM successfully fails over to the local cluster, the script then reactivates all the remote hosts that were previously deactivated.
- Then the trigger script proceeds to online the virtual machine service group.
When the KVMGuest resource goes online, the KVMGuest agent sets a virtual machine payload on the virtual machine before starting it. This payload contains the site-specific networking parameters that you set in the DROpts attribute for that resource.
When the virtual machine starts, the vcs-net-reconfig service is loaded and reads the DR parameters from the CDROM and then applies them to the guest. This way, the networking personality of the virtual machine is modified when the virtual machine crosses site boundaries.

Troubleshooting a disaster recovery configuration

You can troubleshoot your disaster recovery in the following scenarios:
- When the service groups are switched to the secondary site, the hosts in the primary site may go into the NON_OPERATIONAL state. To resolve this issue, deactivate the hosts by putting them in maintenance mode, and reactivate them. If the issue is not resolved, log onto the RHEL-H host and restart the vdsmd service using the service vdsmd restartcommand. If the issue still persists, please contact RedHat Technical Support.
- After a DR failover, the DNS configuration of the virtual machine may not change. To resolve this issue, check if the network adapter inside the virtual machine is under Network Manager control. If so, unconfigure the network adapter by editing the /etc/sysconfig/network-scripts/ifcfg-eth0 file inside the virtual machine and setting NM_CONTROLLED to "no".
- After a failover to the secondary site, the virtual machine service group does not go online. To resolve this issue, check the state of the SPM in the data center. Make sure that the SPM is active on some host in the secondary RHEV cluster. Additionally, check the VCS engine logs for more information.

Prev	Up	Next
Configure VVR and VFR in VCS GCO option for replication between DR sites	Home	Multi-tier business service support