VCS behavior when a resource goes offline

The RestartLimit attribute can be used to specify the number of retries that an agent should perform to bring a resource online before the VCS engine declares the resource as FAULTED.

The ToleranceLimit attribute defines the number of times the monitor agent function can return unexpected OFFLINE before it declares the resource as FAULTED. A large ToleranceLimit value delays detection of a genuinely faulted resource.

For example, assume that RestartLimit is set to 2 and ToleranceLimit is set to 3 for a resource and that the resource state is ONLINE. If the next monitor detects the resource state as OFFLINE, the agent waits for another monitor cycle instead of triggering a restart. The agent waits a maximum of 3 (ToleranceLimit) monitor cycles before it triggers a restart.

The RestartLimit and ToleranceLimit attributes determine the VCS behavior in the following scenarios: