Managing logical partition (LPAR) failure scenarios

VCS handles the LPAR failures in the following cases.

Table: Failure scenarios and their resolutions

Failure scenario

Resolution

Physical server is down

When the physical server is down, the management LPAR as well as managed LPARs will be down. In this case, the managed LPARs which are running will be failed over to another system by VCS using the sysoffline trigger with the help of HMC. Ensure that HMC access is setup on all nodes of the cluster even if the node is not managing any LPAR.

If the managed LPAR is configured for live migration, make sure that profile file for the managed LPAR is created on other management LPARs and its path is configured in ProfileFile attribute. For details on ProfileFile attribute and creation of profile file:

See Providing logical partition (LPAR) failover with live migration.

Management LPAR is down but physical server is up

When the management LPAR is down, the physical server may not be down. The managed LPARs might be running. In this case, it is not desirable to automatically failover the managed LPARs. To ensure that the managed LPAR is not automatically failed over, the group that has LPAR resource should have SysDownPolicy = { "AutoDisableNoOffline" }. With this the groups will remain autodisabled on system fault. You can online the LPAR on any other system by setting autoenable for the group, after ensuring that the LPAR is down on the faulted system.

VIO servers are down

When all the VIO servers which are providing virtual resources to the managed LPARs are down, then the managed LPARs are failed over to another host. Ensure that VIOSName attribute of the LPAR resources is populated with list of all VIO servers which are servicing that LPAR. If VIOSName is not populated, managed LPARs will not be failed over in case of VIO server(s) crash. If any one of the VIO servers specified in VIOSName attribute is running, LPAR agent won't failover the managed LPARs.

HMC is down

If the environment has redundant HMC, then even if one of the HMC goes down, LPAR agent can still manage the LPARs without any issues. For this, ensure that MCName and MCUser attributes are populated with both HMC details.