RHEV environment: If a node on which the VM is running panics or is forcefully shutdown, VCS is unable to start the VM on another node

In a RHEV environment, if a node on which a virtual machine is running panics or is forcefully shutdown, the state of that virtual machine is not cleared. RHEV-M sets the VM to UNKNOWN state and VCS is unable to start this virtual machine on another node. You must initiate manual fencing in RHEV-M to clear the state.

This is not a VCS limitation because it is related to RHEV-M design. For more information, refer Red Hat Enterprise Virtualization Technical Guide.

To initiate manual fencing in RHEV-M and clearing the VM state

  1. In the RHEVMinfo attribute, set the UseManualRHEVMFencing key to 1.

    UseManualRHEVMFencing = 1

  2. Override the resource attribute:

    hares -override resource_name OnlineRetryLimit

  3. Modify the OnlineRetryLimit attribute value to 2:

    hares - modify resource_name OnlineRetryLimit 2

After you clear the state of the VM, VCS starts the VM on another node.

The following is a sample resource configuration of RHEV-based disaster recovery:

group rhev_sg (
        SystemList = { rhelh_a1 = 0, rhelh_a2 = 1 }
        TriggerPath ="bin/triggers/RHEVDR"
        PreOnline=1
        OnlineRetryLimit = 1
        )

KVMGuest rhev_fo (
        RHEVMInfo = { Enabled = 1, URL = 
         "https://192.168.72.11:443",
        User = "admin@internal",
        Password = flgLglGlgLglG,
        Cluster = RHEV-PRIM-CLUS,
        UseManualRHEVMFencing = 1 }
        GuestName = swvm02
        OnlineRetryLimit = 2
        )


// resource dependency tree
//
//      group rhev_sg
//      {
//      KVMGuest rhev_fo
//      }