How CVM handles local storage disconnectivity with the global detach policy

CVM behavior for a local failure depends on the setting for the detach policy, and the number of plexes affected.

If the failure does not affect all nodes, the failure is considered to be local. Local failure could occur on one or more nodes, but not all nodes. The I/O failure is considered local if at least one node still has access to the plex.

If the detach policy is set to global, and the failure affects one or more plexes in the volume for one or more nodes, CVM detaches the plex. The global detach policy indicates that CVM should ensure that the plexes (mirrors) of the volume stay consistent. Detaching the plex ensures that data on the plex is exactly the same for all nodes. When the connectivity returns, CVM reattaches the plex to the volume and resynchronizes the plex.

Figure: How CVM handles local partial failure - global detach policy shows how CVM handles a local partial failure, when the detach policy is global.

Figure: How CVM handles local partial failure - global detach policy

How CVM handles local partial failure - global detach policy

The benefit with this policy is that the volume is still available for I/O on all nodes. If there is a read or write I/O failure on a slave node, the master node performs the usual I/O recovery operations to repair the failure. If required, the plex is detached from the volume, for the entire cluster. All nodes remain in the cluster and continue to perform I/O, but the redundancy of the mirrors is reduced.

The disadvantage is that redundancy is lost, because of the detached plex. Because one or more nodes in the cluster lose connectivity to a plex, the entire cluster loses access to that plex. This behavior means that a local fault on one node has a global impact on all nodes in the cluster.

The global detach policy also requires the overhead of reattaching the plex. When the problem that caused the I/O failure has been corrected, the disks should be re-attached. The mirrors that were detached must be recovered before the redundancy of the data can be restored.

If a node experiences failure to all of the plexes of a mirrored volume, the I/Os fail to the volume from the local node, but no plexes are detached. This behavior prevents the behavior wherein each plex was detached one after other and the volume was disabled globally.