Cluster Volume Manager (CVM) tolerance to storage connectivity failures

Cluster Volume Manager (CVM) uses a shared storage model. A shared disk group provides concurrent read and write access to the volumes that it contains for all nodes in a cluster.

Cluster resiliency means that the cluster functions with minimal disruptions if one or more nodes lose connectivity to the shared storage. When CVM detects a loss of storage connectivity for an online disk group, CVM performs appropriate error handling for the situation. For example, CVM may redirect I/O over the network, detach a plex, or disable a volume for all disks, depending on the situation. The behavior of CVM can be customized to ensure the appropriate handling for your environment.

The CVM resiliency features also enable a node to join the cluster even if the new node does not have connectivity to all the disks in the shared disk group.This behavior ensures that a node that is taken offline can rejoin the cluster. Similarly, a shared disk group can be imported on a node.

Note:

Cluster resiliency functionality is intended to handle temporary failures. Restore the connectivity as soon as possible.

CVM provides increased cluster resiliency and tolerance to connectivity failure in the following ways:

Table:

Functionality

Description

Configurable?

Consistency of data plexes.

CVM manages connectivity errors for data disks so that I/O can continue to the unaffected disks.

  • If a failure is seen on all nodes, CVM detaches the affected plexes as long at least one plex is still accessible.

  • If a failure does not affect all of the nodes, the disk detach policy determines how CVM handles the failure.

Yes. Controlled by the detach policy, which can be local or global.

Continuity of application I/O

If a connectivity failure does not affect all the nodes, CVM can redirect I/O over the network to a node that has access to the storage. This behavior enables the application I/O to continue even when storage connectivity failures occur.

By redirecting I/O, CVM can avoid the need to either locally fail the I/O on the volume or detach the plex when at least one node has access to the underlying storage. Therefore, the ioship policy changes the behavior of the disk detach policy.

Yes. Controlled by the ioship tunable parameter, which is set for a disk group.

Availability of shared disk group configurations.

The master node handles configuration changes to the shared disk group, so CVM ensures that the master node has access to the configuration copies.

If the master node loses connectivity to a configuration copy, CVM redirects the I/O requests over the network to a node that can access the configuration copy. This behavior ensures that the disk group stays available.

This behavior is independent of the disk detach policy or ioship policy.

If the disk group version is less than 170, CVM handles the disconnectivity according to the disk group failure policy (dgfailpolicy) .

No. Enabled by default.

Availability of snapshots

CVM initiates internal I/Os to update Data Change Objects (DCOs).

If a node loses connectivity to these objects, CVM redirects the internal I/Os over the network to a node that has access.

This behavior is independent of the disk detach policy or ioship policy.

No. Enabled by default.

Availability of cluster nodes and shared disk groups

CVM enables a cluster node to join even if the node does not have access to all of the shared storage.

Similarly, a node can import a shared disk group even if there is a local failure to the storage.

This behavior is independent of the disk detach policy or ioship policy.

Yes. Controlled by the storage_connectivity tunable.

More Information

About disk detach policies

Setting the detach policy for shared disk groups

About redirection of application I/Os with CVM I/O shipping

Enabling I/O shipping for shared disk groups

Availability of shared disk group configuration copies

Availability of cluster nodes and shared disk groups

Controlling the CVM tolerance to storage disconnectivity