Replacing a host bus adapter on an M5000 server managed by DMP

This section contains the procedure to replace an online host bus adapter (HBA) when DMP is managing multi-pathing in a Cluster File System (CFS) cluster. The HBA World Wide Port Name (WWPN) changes when the HBA is replaced. Following are the prerequisites to replace an online host bus adapter:

To replace an online host bus adapter on an M5000 server

  1. Identify the HBAs on the M5000 server. For example, to identify Emulex HBAs, enter the following command:
    /usr/platform/sun4u/sbin/prtdiag -v | grep emlx
    00 PCIe 0 2, fc20, 10df 119, 0, 0 okay 4,
    4 SUNW,emlxs-pci10df,fc20 LPe 11002-S
    /pci@0,600000/pci@0/pci@9/SUNW,emlxs@0
    00 PCIe 0 2, fc20, 10df 119, 0, 1 okay 4,
    4 SUNW,emlxs-pci10df,fc20 LPe 11002-S
    /pci@0,600000/pci@0/pci@9/SUNW,emlxs@0,1
    00 PCIe 3 2, fc20, 10df 2, 0, 0 okay 4,
    4 SUNW,emlxs-pci10df,fc20 LPe 11002-S
    /pci@3,700000/SUNW,emlxs@0
    00 PCIe 3 2, fc20, 10df 2, 0, 1 okay 4,
    4 SUNW,emlxs-pci10df,fc20 LPe 11002-S
    /pci@3,700000/SUNW,emlxs@0,1
  2. Identify the HBA and its WWPN(s), which you want to replace using the cfgadm command.

    To identify the HBA, enter the following:

    # cfgadm -al | grep -i fibre 
    iou#0-pci#1 fibre/hp connected configured ok
    iou#0-pci#4 fibre/hp connected configured ok

    To list all HBAs, enter the following:

    # luxadm -e port
    /devices/pci@0,600000/pci@0/pci@9/SUNW,emlxs@0/fp@0,0:devctl
    NOT CONNECTED
    /devices/pci@0,600000/pci@0/pci@9/SUNW,emlxs@0,1/fp@0,0:devctl
    CONNECTED
    /devices/pci@3,700000/SUNW,emlxs@0/fp@0,0:devctl
    NOT CONNECTED
    /devices/pci@3,700000/SUNW,emlxs@0,1/fp@0,0:devctl
    CONNECTED

    To select the HBA to dump the portap and get the WWPN, enter the following:

    # luxadm -e dump_map /devices/pci@0,600000/pci@0/pci@9/SUNW,emlxs@0,1/
    fp@0,0:devctl
    0     304700   0          203600a0b847900c 200600a0b847900c 0x0
    (Disk device)
    1     30a800   0          20220002ac00065f 2ff70002ac00065f 0x0
    (Disk device)
    2     30a900   0          21220002ac00065f 2ff70002ac00065f 0x0
    (Disk device)
    3     560500   0          10000000c97c3c2f 20000000c97c3c2f 0x1f
    (Unknown Type)
    4     560700   0          10000000c97c9557 20000000c97c9557 0x1f
    (Unknown Type)
    5     560b00   0          10000000c97c34b5 20000000c97c34b5 0x1f
    (Unknown Type)
    6     560900   0          10000000c973149f 20000000c973149f 0x1f
    (Unknown Type,Host Bus Adapter)

    Alternately, you can run the fcinfo hba-port Solaris command to get the WWPN(s) for the HBA ports.

  3. Ensure you have a compatible spare HBA for hot-swap.
  4. Stop the I/O operations on the HBA port(s) and disable the DMP subpath(s) for the HBA that you want to replace.
    # vxdmpadm disable ctrl=ctrl#
  5. Dynamically unconfigure the HBA in the PCIe slot using the cfgadm command.
    # cfgadm -c unconfigure iou#0-pci#1

    Look for console messages to check if the cfgadm command is unsuccessful. If the cfgadm command is unsuccessful, proceed to troubleshooting using the server hardware documentation. Check the Solaris 10 patch level recommended for dynamic reconfiguration operations and contact SUN support for further assistance.

    console messages
    Oct 24 16:21:44 m5000sb0 pcihp: NOTICE: pcihp (pxb_plx2):
    card is removed from the slot iou 0-pci 1
  6. Verify that the HBA card that is being replaced in step 5 is not in the configuration. Enter the following command:
    # cfgadm -al | grep -i fibre
    iou 0-pci 4 fibre/hp connected configured ok
  7. Mark the fiber cable(s).
  8. Remove the fiber cable(s) and the HBA that you must replace.

    For more information, see the HBA replacement procedures in SPARC Enterprise M4000/M5000/M8000/M9000 Servers Dynamic Reconfiguration (DR) User's Guide.

  9. Replace the HBA with a new compatible HBA of similar type in the same slot. The reinserted card shows up as follows:
    console messages
    iou 0-pci 1 unknown disconnected unconfigured unknown
  10. Bring the replaced HBA back into the configuration. Enter the following:
    # cfgadm -c configure iou 0-pci 1
    console messages
    Oct 24 16:21:57 m5000sb0 pcihp: NOTICE: pcihp (pxb_plx2):
    card is inserted in the slot iou#0-pci#1 (pci dev 0)
  11. Verify that the reinserted HBA is in the configuration. Enter the following:
    # cfgadm -al | grep -i fibre
    iou#0-pci 1 fibre/hp connected configured ok <====
    iou#0-pci 4 fibre/hp connected configured ok
  12. Modify fabric zoning to include the replaced HBA WWPN(s).
  13. Enable LUN security on storage for the new WWPN(s).
  14. Perform an operating system device scan to re-discover the LUNs. Enter the following:
    # cfgadm -c configure c3
  15. Clean up the device tree for old LUNs. Enter the following:
    # devfsadm -Cv

    Note:

    Sometimes replacing an HBA creates new devices. Perform cleanup operations for the LUN only when new devices are created.

  16. If SFCFSHA does not show a ghost path for the removed HBA path, enable the path using the vxdmpadm command. This performs the device scan for that particular HBA subpath(s). Enter the following:
    # vxdmpadm enable ctrl=ctrl#
  17. Verify if I/O operations are scheduled on that path. If I/O operations are running correctly on all paths, the dynamic HBA replacement operation is complete.