Symantec logo

Recovering from a preexisting network partition (split-brain)

The fencing module vxfen prevents a node from starting up after a network partition and subsequent panic and reboot of a node.

Example Scenario I

Example scenario I scenario could cause similar symptoms would be a two-node cluster with one node shut down for maintenance. During the outage, the private interconnect cables are disconnected.

Example scenario I

Example scenario I

Click the thumbnail above to view full-sized image.

In Example scenario I, the following occurs:

Example Scenario II

Similar to Example Scenario I, if private interconnect cables are disconnected in a two-node cluster, Node 1 is fenced out of the cluster, panics, and reboots. If before the private interconnect cables are fixed and Node 1 rejoins the cluster, Node 0 panics and reboots (or just reboots). No node can write to the data disks until the private networks are fixed. This is because GAB membership cannot be formed, therefore the cluster cannot be formed.

Suggested solution: Shut down both nodes, reconnect the cables, restart the nodes.

Example Scenario III

Similar to Example Scenario II, if private interconnect cables are disconnected in a two-node cluster, Node 1 is fenced out of the cluster, panics, and reboots. If before the private interconnect cables are fixed and Node 1 rejoins the cluster, Node 0 panics due to hardware failure and cannot come back up, Node 1 cannot rejoin.

Suggested solution: Shut down Node 1, reconnect the cables, restart the node. You must then clear the registration of Node 0 from the coordinator disks.

  1. On Node 1, type the following command:

    # /opt/VRTSvcs/vxfen/bin/vxfenclearpre

  2. Restart the node.