RHEV environment: If a node on which the VM is running panics or is forcefully shutdown, VCS is unable to start the VM on another node

In a RHEV environment, if a node on which a virtual machine is running panics or is forcefully shutdown, the state of that virtual machine is not cleared. RHEV-M sets the VM to UNKNOWN state and VCS is unable to start this virtual machine on another node. You must initiate manual fencing in RHEV-M to clear the state.

This is not a VCS limitation because it is related to RHEV-M design. For more information, refer Red Hat Enterprise Virtualization 3.4 Technical Guide.

To initiate manual fencing in RHEV-M and clearing the VM state

  1. In the RHEVMinfo attribute, set the UseManualRHEVMFencing key to 1.
    UseManualRHEVMFencing = 1
  2. Override the resource attribute:
    hares -override resource_name OnlineRetryLimit
  3. Modify the OnlineRetryLimit attribute value to 2:
    hares  - modify resource_name OnlineRetryLimit 2

After you clear the state of the VM, VCS starts the VM on another node.

The following is a sample resource configuration of RHEV-based disaster recovery:

group rhev_sg (
SystemList = { rhelh_a1 = 0, rhelh_a2 = 1 }
TriggerPath ="bin/triggers/RHEVDR"
PreOnline=1
OnlineRetryLimit = 1
)
KVMGuest rhev_fo (
RHEVMInfo = { Enabled = 1, URL =
"https://192.168.72.11:443",
User = "admin@internal",
Password = flgLglGlgLglG,
Cluster = RHEV-PRIM-CLUS,
UseManualRHEVMFencing = 1 }
GuestName = swvm02
OnlineRetryLimit = 2
)
// resource dependency tree
//
// group rhev_sg
// {
// KVMGuest rhev_fo
// }