Ongoing cluster membership
Once the cluster is up and running, a system remains an active member of the cluster as long as peer systems receive a heartbeat signal from that system over the cluster interconnect. A change in cluster membership is determined as follows:
-
When LLT on a system no longer receives heartbeat messages from a system on any of the configured LLT interfaces for a predefined time, LLT informs GAB of the heartbeat loss from that specific system.
This predefined time is 16 seconds by default, but can be configured. It is set with the set-timer peerinact
command as described in the llttab manual page.
-
When LLT informs GAB of a heartbeat loss, the systems that are remaining in the cluster coordinate to agree which systems are still actively participating in the cluster and which are not. This happens during a time period known as GAB Stable Timeout (5 seconds).
VCS has specific error handling that takes effect in the case where the systems do not agree.
-
GAB marks the system as DOWN, excludes the system from the cluster membership, and delivers the membership change to the fencing module.
-
The fencing module performs membership arbitration to ensure that there is not a split brain situation and only one functional cohesive cluster continues to run.
The fencing module is turned on by default. See About cluster membership and data protection without I/O fencing for actions that occur if the fencing module has been deactivated.