Table: I/O fencing scenarios describes how I/O fencing works to prevent data corruption in different failure event scenarios. For each event, review the corrective operator actions.
Table: I/O fencing scenarios
Event |
Node A: What happens? |
Node B: What happens? |
Operator action |
---|---|---|---|
Both private networks fail. |
Node A races for majority of coordination points. If Node A wins race for coordination points, Node A ejects Node B from the shared disks and continues. |
Node B races for majority of coordination points. If Node B loses the race for the coordination points, Node B panics and removes itself from the cluster. |
When Node B is ejected from cluster, repair the private networks before attempting to bring Node B back. |
Both private networks function again after event above. |
Node A continues to work. |
Node B has crashed. It cannot start the database since it is unable to write to the data disks. |
Restart Node B after private networks are restored. |
One private network fails. |
Node A prints message about an IOFENCE on the console but continues. |
Node B prints message about an IOFENCE on the console but continues. |
Repair private network. After network is repaired, both nodes automatically use it. |
Node A hangs. |
Node A is extremely busy for some reason or is in the kernel debugger. When Node A is no longer hung or in the kernel debugger, any queued writes to the data disks fail because Node A is ejected. When Node A receives message from GAB about being ejected, it panics and removes itself from the cluster. |
Node B loses heartbeats with Node A, and races for a majority of coordination points. Node B wins race for coordination points and ejects Node A from shared data disks. |
Repair or debug the node that hangs and reboot the node to rejoin the cluster. |
Nodes A and B and private networks lose power. Coordination points and data disks retain power. Power returns to nodes and they restart, but private networks still have no power. |
Node A restarts and I/O fencing driver (vxfen) detects Node B is registered with coordination points. The driver does not see Node B listed as member of cluster because private networks are down. This causes the I/O fencing device driver to prevent Node A from joining the cluster. Node A console displays: Potentially a preexisting split brain. Dropping out of the cluster. Refer to the user documentation for steps required to clear preexisting split brain. |
Node B restarts and I/O fencing driver (vxfen) detects Node A is registered with coordination points. The driver does not see Node A listed as member of cluster because private networks are down. This causes the I/O fencing device driver to prevent Node B from joining the cluster. Node B console displays: Potentially a preexisting split brain. Dropping out of the cluster. Refer to the user documentation for steps required to clear preexisting split brain. |
Resolve preexisting split-brain condition. |
Node A crashes while Node B is down. Node B comes up and Node A is still down. |
Node A is crashed. |
Node B restarts and detects Node A is registered with the coordination points. The driver does not see Node A listed as member of the cluster. The I/O fencing device driver prints message on console: Potentially a preexisting split brain. Dropping out of the cluster. Refer to the user documentation for steps required to clear preexisting split brain. |
Resolve preexisting split-brain condition. |
The disk array containing two of the three coordination points is powered off. No node leaves the cluster membership |
Node A continues to operate as long as no nodes leave the cluster. |
Node B continues to operate as long as no nodes leave the cluster. |
Power on the failed disk array so that subsequent network partition does not cause cluster shutdown, or replace coordination points. See Replacing I/O fencing coordinator disks when the cluster is online. |
The disk array containing two of the three coordination points is powered off. Node B gracefully leaves the cluster and the disk array is still powered off. Leaving gracefully implies a clean shutdown so that vxfen is properly unconfigured. |
Node A continues to operate in the cluster. |
Node B has left the cluster. |
Power on the failed disk array so that subsequent network partition does not cause cluster shutdown, or replace coordination points. See Replacing I/O fencing coordinator disks when the cluster is online. |
The disk array containing two of the three coordination points is powered off. Node B abruptly crashes or a network partition occurs between node A and node B, and the disk array is still powered off. |
Node A races for a majority of coordination points. Node A fails because only one of the three coordination points is available. Node A panics and removes itself from the cluster. |
Node B has left cluster due to crash or network partition. |
Power on the failed disk array and restart I/O fencing driver to enable Node A to register with all coordination points, or replace coordination points. |