OS: AIX Etrack Incidents: 1233409, 1294686, 1274390, 1269305 * * * PATCH 5.0MP1_5.0MP1EXT+e1274390 for VERITAS Low Latency Transport 5.0 MP1 and 5.0 MP1 Update 1 AIX * * * Patch Date: July 2008 This README provides information on: * BEFORE GETTING STARTED * CRC AND BYTE COUNT * FIXES AND ENHANCEMENTS INCLUDED IN THIS PATCH * PACKAGES AFFECTED BY THIS PATCH * INSTALLING THIS LLT PATCH FOR VCS * UNINSTALLING THIS LLT PATCH FOR VCS * INSTALLING THIS LLT PATCH FOR SFRAC ENVIRONMENT * UNINSTALLING THIS LLT PATCH FOR SFRAC ENVIRONMENT * DISABLING LLT TO PREVENT PANIC IN LOOP BEFORE GETTING STARTED ---------------------- This patch only applies to: 1.VRTSllt 5.0-MP1 running on AIX 5.3 and 5.2. 2.VRTSllt 5.0-MP1 UPDATE 1 running on AIX 5.3 and AIX 6.1 Ensure that you are running one of supported configurations before installing this patch. CRC AND BYTE COUNT ------------------- Ensure that the file you have downloaded matches the following checksum and byte count : The following command can be used to ascertain this: # cksum VRTSllt.rte.bff 3384016361 3840000 VRTSllt.rte.bff displaying the CRC value (3384016361), and the byte count (3840000) FIXES AND ENHANCEMENTS INCLUDED IN THIS VCS PATCH ------------------------------------------------- e1233409 In Cluster setup, Veritas low latency transport(LLT) driver is used for communication. LLT communicate with AIX OS DLPI driver for sending and receiving network packets on physical network. The upcalls from DLPI driver to LLT use to be always in process context. With latest changes in AIX DLPI driver now calls to LLT comes in interrupt context. This causes panic or hang in LLT driver or in clients of LLT like GAB. The patch made the changes in LLT to be interrupt safe and calls to clients of LLT done in process context. The known AIX APAR which change the behavior of DLPI driver and causing panic in LLT or GAB are: 5200-10 - AIX APAR IZ19838 5300-06 - AIX APAR IZ05430 5300-07 - AIX APAR IZ11726 5300-08 - AIX APAR IZ09036 6100-00 - AIX APAR IZ13304 To find if your system has APAR which changes the DLPI behavior, run the intsfix command with APAR number or grep for string "BRING DLPI DRIVER "TO SPEC"". for eg : # instfix -iv | grep "BRING DLPI DRIVER "TO SPEC"" IZ11726 Abstract: BRING DLPI DRIVER "TO SPEC" e1294686 LLT-DLPI changes can cause hang on single cpu machine as the thread holding lock is swapped out of cpu and another thread spin on cpu for the same lock. Changes are done in the locking mechanism for LLT. e1274390 Multiple LLT client registering ports with LLT can result in deadlock due to race condition. The fix is done to resolve simultaneous port registration for multiple client e1269305 Under heavy load condition, LLT can cause panic while calling xmalloc when the system are very busy. This happens because LLT uses xmalloc only when call to allocb fails. Call to allocb fails only when network memory poll runs out of memory. Changes are done in LLT to check for current execution context before calling xmalloc as calling xmalloc from interrupt context causes panic. PACKAGES AFFECTED BY THIS VCS PATCH -------------------------------- This patch brings the VRTSllt.rte.bff fileset to 5.0.1.201 level INSTALLING THIS LLT PATCH FOR VCS --------------------------------- The following steps should be run on each node in the cluster, one at a time: 1. Failover Groups: If any failover groups have been configured, and are currently running on this node, migrate them to any of the other active cluster nodes. # hagrp -state | grep # hagrp -switch -to 2. Offline all parallel groups using CFS and CVM resources on the current system. As an example the oracle_group would be offlined as follows: # hagrp -state | grep # hagrp -offline -sys Note: This may take time, especially if your main.cf is big and has a lot of dependencies. 2a. After all applications using CFS and CVM have been taken down, run "slibclean" to unload the libraries from memory. 3. Stop VCS on the current node. # /opt/VRTSvcs/bin/hastop -local 4. If CVM/CFS/VXFEN is not configured, please go to step 7 verify that ports "h", "v" and "w" have been closed if CVM/CFS is configured # /sbin/gabconfig -a The display should not have ports "h", "v" and "w" listed 5. Deinitialize CFS # /opt/VRTSvxfs/sbin/fsclustadm cfsdeinit Verify that port "f" has been closed # /sbin/gabconfig -a The display should not have port "f" listed 6. Unconfigure vxfen # /sbin/vxfenconfig -U Verify that port "b" has been closed # /sbin/gabconfig -a The display should not have port "b" listed 7. At this point all gab ports except port "a" should have been closed Verify this as follows: # /sbin/gabconfig -a 8. Unconfigure GAB: # /sbin/gabconfig -U 9. Unconfigure LLT: # /sbin/lltconfig -Uo 10. Unload the LLT driver: # /usr/sbin/strload -ud /usr/lib/drivers/pse/llt 11. Verify that the LLT driver has been unloaded # /usr/sbin/strload -qd /usr/lib/drivers/pse/llt /usr/lib/drivers/pse/llt: no If llt is still loaded "yes" will show up in the output above. Note: If you are unable to successfully unload the LLT driver, the server may require a reboot after patch installation. This is so that the newer LLT driver gets loaded in the AIX kernel 12. cd to the patch location and install the LLT patch: # installp -a -d ./VRTSllt.rte.bff VRTSllt.rte 13. Verify that the new fileset has been installed: # lslpp -l VRTSllt.rte VRTSllt.rte 5.0.1.201 APPLIED VERITAS Low Latency Transport 14. Verify that the new LLT driver is loaded # strload -qd /usr/lib/drivers/pse/llt /usr/lib/drivers/pse/llt: yes 15. If not already loaded, load the newly installed LLT driver # strload -d /usr/lib/drivers/pse/llt 16. Configure LLT # /sbin/lltconfig -c 17. Verify that LLT has been configured properly # /sbin/lltconfig LLT is running 18. Verify that the GAB driver is loaded: # /usr/bin/genkex | grep gab 23a3000 44a28 /usr/lib/drivers/gab 19. Configure GAB: # sh /etc/gabtab 20. Verify that the GAB membership shows up correctly: # /sbin/gabconfig -a GAB Port Memberships =============================================================== Port a gen 6eefdf01 membership 0 21. Start VCS: # /opt/VRTSvcs/bin/hastart 22. To commit the patch (note that the patch cannot be backed out if it has been committed): # installp -c VRTSllt.rte 23. Verify that the fileset is committed: # lslpp -l VRTSllt.rte VRTSllt.rte 5.0.1.201 COMMITTED VERITAS Low Latency Transport UNINSTALLING THIS LLT PATCH FOR VCS ------------------------------------ The VRTSllt.rte.bff patch can be backed out if it has not been committed NOTE: Before uninstalling patch, make sure that the APAR changing DLPI behavior is not installed on the system by running following commands: #instfix -iv | grep "BRING DLPI DRIVER "TO SPEC"" If above mentioned command returns an APAR then backing out this point patch will move llt to older version which will cause panic or hang. Steps to Backout the Patch: a. Failover Groups: If any failover groups have been configured, and are currently running on this node, migrate them to any of the other active cluster nodes. # hagrp -state | grep # hagrp -switch -to b. Offline all parallel groups using CFS and CVM resources on the current system. As an example the oracle_group would be offlined as follows: # hagrp -state | grep # hagrp -offline -sys Note: This may take time, especially if your main.cf is big and has a lot of dependencies. c. After all applications using CFS and CVM have been taken down, run "slibclean" to unload the libraries from memory. d. Stop VCS on the current node. # /opt/VRTSvcs/bin/hastop -local e. If CVM/CFS is not configured, please go to step 6 verify that ports "h", "v" and "w" have been closed if CVM/CFS is configured # /sbin/gabconfig -a The display should not have ports "h", "v" and "w" listed f. Deinitialize CFS # /opt/VRTSvxfs/sbin/fsclustadm cfsdeinit Verify that port "f" has been closed # /sbin/gabconfig -a The display should not have port "f" listed g. Unconfigure vxfen # /sbin/vxfenconfig -U Verify that port "b" has been closed # /sbin/gabconfig -a The display should not have port "b" listed h. At this point all gab ports except port "a" should have been closed Verify this as follows: # /sbin/gabconfig -a i. Unconfigure GAB: # /sbin/gabconfig -U j. Unconfigure LLT: # /sbin/lltconfig -Uo k. Unload the LLT driver: # /usr/sbin/strload -ud /usr/lib/drivers/pse/llt l. Verify that the LLT driver has been unloaded # /usr/sbin/strload -qd /usr/lib/drivers/pse/llt /usr/lib/drivers/pse/llt: no If llt is still loaded "yes" will show up in the output above. m. Backout the patch # installp -r VRTSllt.rte 5.0.1.201 n. Verify that the patch has been backed out: # lslpp -l | grep VRTSllt.rte VRTSllt.rte 5.0.1.0 COMMITTED VERITAS Low Latency Transport o. Next as before go through the process of loading and configuring LLT, GAB and bringing up VCS (steps 14 through 21 above of section "INSTALLING THIS LLT PATCH FOR VCS") Note: Steps 14 and 15 would now refer to the old llt driver INSTALLING THIS LLT PATCH FOR SFRAC ENVIRONMENT ------------------------------------------------ To install the patch: The following steps should be run on each node in the cluster, one at a time: 1. If Oracle and associated processes are active outside of VCS control, please stop them. 2(a) If srvctl is running (as is the case with 10gR1 and 10gR2) $ srvctl stop nodeapps -n 2(b) If "gsd" is running stop it. For Oracle 9i R2 as Oracle user run the following command $ gsdctl stop To check the status of gsdctl, you can use the following command: $ gsdctl stat The gsdctl command is typically found in $ORACLE_HOME/bin 3. Failover Groups: If any failover groups have been configured, and are currently running on this node, migrate them to any of the other active cluster nodes. # hagrp -state | grep # hagrp -switch -to 4. Offline all parallel groups using CFS and CVM resources on the current system. As an example the oracle_group would be offlined as follows: # hagrp -state | grep # hagrp -offline -sys Note: This may take time, especially if your main.cf is big and has a lot of dependencies. 4(a) Stop CRS manually if CRS is not under VCS control. #/etc/init.crs stop 4(b) After all the oracle instances and other applications using CFS and CVM have been taken down, run "slibclean" to unload the libraries from memory. 5. Stop VCS on the current node. # /opt/VRTSvcs/bin/hastop -local 6. Verify that ports "h", "v" and "w" have been closed # /sbin/gabconfig -a The display should not have ports "h", "v" and "w" listed 7. Deinitialize CFS # /opt/VRTSvxfs/sbin/fsclustadm cfsdeinit Verify that port "f" has been closed # /sbin/gabconfig -a The display should not have port "f" listed 8. Unconfigure vcsmm # /sbin/vcsmmconfig -U Verify that port "o" has been closed # /sbin/gabconfig -a The display should not have port "o" listed If it does ensure that Oracle instances are offline 9. Unconfigure lmx # /sbin/lmxconfig -U 10. Unload the lmx driver # /usr/lib/methods/lmxext -stop To verify if the driver has been unloaded # /usr/lib/methods/lmxext -status lmx: unloaded 11. Unconfigure vxfen # /sbin/vxfenconfig -U Verify that port "b" has been closed # /sbin/gabconfig -a The display should not have port "b" listed 12. Unmount odm # umount /dev/odm # Verify that port "d" has been closed # /sbin/gabconfig -a The display should not have port "d" listed 13. At this point all gab ports except port "a" should have been closed Verify this as follows: # /sbin/gabconfig -a 14. Unconfigure GAB: # /sbin/gabconfig -U 15. Unconfigure LLT: # /sbin/lltconfig -Uo 16. Unload the LLT driver: # /usr/sbin/strload -ud /usr/lib/drivers/pse/llt 17. Verify that the LLT driver has been unloaded # /usr/sbin/strload -qd /usr/lib/drivers/pse/llt /usr/lib/drivers/pse/llt: no Note: If you are unable to successfully unload the LLT driver, the server may require a reboot after patch (de) installation. This is so that the newer LLT driver gets loaded in the AIX kernel 18. cd to the patch location and install the LLT patch: # installp -a -d ./VRTSllt.rte.bff VRTSllt.rte 19. Verify that the new fileset has been installed: # lslpp -l VRTSllt.rte VRTSllt.rte 5.0.1.201 APPLIED VERITAS Low Latency Transport 20. Verify that the new LLT driver is loaded # strload -qd /usr/lib/drivers/pse/llt /usr/lib/drivers/pse/llt: yes 21. If not already loaded, load the newly installed LLT driver # strload -d /usr/lib/drivers/pse/llt 22. Configure LLT # /sbin/lltconfig -c 23. Verify that LLT has been configured properly # /sbin/lltconfig LLT is running 24. Configure GAB: # sh /etc/gabtab 25. Verify that the GAB membership shows up correctly: # /sbin/gabconfig -a GAB Port Memberships =============================================================== Port a gen 6eefdf01 membership 0 26. Configure vxfen # /sbin/vxfenconfig -c Verify that vxfen has been configured # /sbin/gabconfig -a The output should list port "b" 27. Load the lmx driver # /etc/methods/lmxext -start Verify that lmx is loaded in kernel # /etc/methods/lmxext -status lmx: loaded 28. Configure the lmx driver # /sbin/lmxconfig -c 29. Configure vcsmm # /sbin/vcsmmconfig -c Verify that vxfen has been configured # /sbin/gabconfig -a The output should list port "o" 30. Mount ODM # mount /dev/odm 31. Start VCS # /opt/VRTSvcs/bin/hastart 32. Check if all ports are now open # /sbin/gabconfig -a The output should list ports "a", "b", "d", "f", "h", "o", "v", and "w" 33. To commit the patch (note that the patch cannot be backed out if it has been committed): # installp -c VRTSllt.rte 34. To verify that the fileset is committed: # lslpp -l VRTSllt.rte VRTSllt.rte 5.0.1.201 COMMITTED VERITAS Low Latency Transport UNINSTALLING THIS LLT PATCH FOR SFRAC ENVIRONMENT -------------------------------------------------- The VRTSllt.rte.bff patch can be backed out if it has not been committed NOTE: Before uninstalling patch, make sure that the APAR changing DLPI behavior is not installed on the system by running following command: #instfix -iv | grep "BRING DLPI DRIVER "TO SPEC"" If above mentioned command returns an APAR then backing out this point patch will move llt to older version which will cause panic or hang. Steps to Backout the Patch: Follow the steps outlined 1 through 20 of section "INSTALLING THIS LLT PATCH FOR SFRAC ENVIRONMENT" to stop and unload a. Backout the patch # installp -r VRTSllt.rte 5.0.1.201 b. Verify that the patch has been backed out: # lslpp -l | grep VRTSllt.rte VRTSllt.rte 5.0.1.0 COMMITTED VERITAS Low Latency Transport c. Follow steps 20 through 34 of SECTION "INSTALLING THIS LLT PATCH FOR SFRAC ENVIRONMENT" to restart in the old environment. Note: Steps 20 and 21 would now refer to the old llt driver DISABLING LLT TO PREVENT PANIC IN LOOP --------------------------------------- Older llt driver can cause panic on the system with AIX APAR which changes the behavior of DLPI driver. This can result in panic on each reboot as llt driver gets configured with the init script on reboot. Following are the steps to prevent llt driver from starting up on reboot and preventing panic: 1. If system is in panic state, bring the OS to single user mode. 2. Disable LLT via renaming the start-up script. #cd /etc/rc.d/rc2.d #mv S70llt nostart.S70llt 3. Boot the OS to multi-user mode. Note that llt will not be configured on boot. 4. Apply this LLT patch. follow the steps mentioned in "INSTALLING THIS PATCH FOR VCS" or "INSTALLING THIS PATCH FOR SFRAC".