Fabric Monitoring and proactive error detection

DMP takes a proactive role in detecting errors on paths.

The DMP event source daemon vxesd uses the Storage Networking Industry Association (SNIA) HBA API library to receive SAN fabric events from the HBA.

DMP checks devices that are suspect based on the information from the SAN events, even if there is no active I/O. New I/O is directed to healthy paths while DMP verifies the suspect devices.

During startup, vxesd queries the HBA (by way of the SNIA library) to obtain the SAN topology. The vxesd daemon determines the Port World Wide Names (PWWN) that correspond to each of the device paths that are visible to the operating system. After the vxesd daemon obtains the topology, vxesd registers with the HBA for SAN event notification. If LUNs are disconnected from a SAN, the HBA notifies vxesd of the SAN event, specifying the PWWNs that are affected. The vxesd daemon uses this event information and correlates it with the previous topology information to determine which set of device paths have been affected.

The vxesd daemon sends the affected set to the vxconfigd daemon (DDL) so that the device paths can be marked as suspect.

When the path is marked as suspect, DMP does not send new I/O to the path unless it is the last path to the device. In the background, the DMP restore task checks the accessibility of the paths on its next periodic cycle using a SCSI inquiry probe. If the SCSI inquiry fails, DMP disables the path to the affected LUNs, which is also logged in the event log.

If the LUNs are reconnected at a later time, the HBA informs vxesd of the SAN event. When the DMP restore task runs its next test cycle, the disabled paths are checked with the SCSI probe and re-enabled if successful.

Note:

If vxesd receives an HBA LINK UP event, the DMP restore task is restarted and the SCSI probes run immediately, without waiting for the next periodic cycle. When the DMP restore task is restarted, it starts a new periodic cycle. If the disabled paths are not accessible by the time of the first SCSI probe, they are re-tested on the next cycle (300s by default).

The fabric monitor functionality is disabled by default. The value of the dmp_monitor_fabric tunable is persistent across restarts.

To display the current value of the dmp_monitor_fabric tunable, use the following command:

# vxdmpadm gettune dmp_monitor_fabric

To disable the Fabric Monitoring functionality, use the following command:

# vxdmpadm settune dmp_monitor_fabric=off

To enable the Fabric Monitoring functionality, use the following command:

# vxdmpadm settune dmp_monitor_fabric=on