OS: AIX OS Version: 5.2, 5.3 & 6.1 Etrack Incidents: 1513967 Fixes Applied for Products: VRTSllt - Veritas Low Latency Transport by Symantec VRTSgab - Veritas Group Membership and Atomic Broadcast by Symantec Additional Instructions: Please read the instructions below before installing the patch. PATCH 5.0MP3RP1HF1 for VERITAS Low Latency Transport & VERITAS Group Membership and Atomic Broadcast =============================================================== Patch Date: May 2009 This README provides information on: * BEFORE GETTING STARTED * CRC AND BYTE COUNT * FIXES AND ENHANCEMENTS INCLUDED IN THIS PATCH * PACKAGES AFFECTED BY THIS PATCH * INSTALLING THE PATCHES FOR VCS * UNINSTALLING THE PATCHES FOR VCS * INSTALLING THE PATCHES FOR SFRAC ENVIRONMENT * UNINSTALLING THE PATCHES FOR SFRAC ENVIRONMENT BEFORE GETTING STARTED ---------------------- This patches only applies to: VRTSllt 5.0 MP3 RP1 running on AIX 5.2, 5.3 and 6.1. and VRTSgab 5.0 MP3 RP1 running on AIX 5.2, 5.3 and 6.1. Ensure that you are running the supported configurations before installing this patch. CRC AND BYTE COUNT ------------------- Ensure that the file you have downloaded matches the following checksum and byte count. The following command can be used to ascertain this: # cksum VRTSllt.rte.bff 1871310486 3840000 VRTSllt.rte.bff # cksum VRTSgab.rte.bff 961410986 8550400 VRTSgab.rte.bff FIXES AND ENHANCEMENTS INCLUDED IN THIS PATCH: --------------------------------------------- Etrack Incident : 1513967 Symptom:: During a heavy I/O load induced on SFHA/SFCFS-HA/SFRAC, one or more nodes may halt due to an abend exception with the following stack trace. The problem is more likely to occur with AIX 6.1 on P6 with storage keys enabled but can not be ruled out with storage keys disabled configuration as well. Stack Trace: [0001AD40]abend_trap+000000 () [0007E5B8]tstart+000558 (??) [00014F50].kernel_add_gate_cstack+000030 () [F1000000A02A6044].llt_aix_timeout+0000E4 () [F1000000A02AFD64].llt_timer_handler+000470 () [F1000000A02A62F0].llt_timer_procfunc+0000A0 () [00014D70].hkey_legacy_gate+00004C () [001A2DD0]procentry+000010 (??, ??, ??, ??) Defect description:: LLT and GAB internal timer implementations use tstart() kernel service to submit timer requests, with the timer request block as an input. The tstart() implementation keeps information of the timer handler from the timer request block. During a race condition, if LLT or GAB attempt to submit another timer request through tstart() before the previous timer handler is completed, the previous handler's stale values may be accessed by tstart(), leading to abend exception and system panic. Resolution:: The race condition is avoided by modifying LLT and GAB drivers to call tstop() before issuing a tstart() in their timer implementations. PACKAGES AFFECTED BY THIS PATCH: ------------------------------- This patch affects the following VCS packages VRTSllt.rte.bff fileset to 5.0.3.100 level and VRTSgab.rte.bff fileset to 5.0.3.100 level INSTALLING THE PATCHES FOR VCS: ------------------------------ The following steps should be run on each node in the cluster, one at a time: 1. Failover Groups: If any failover groups have been configured, and are currently running on this node, migrate them to any of the other active cluster nodes. # hagrp -state | grep # hagrp -switch -to 2a. Offline all parallel groups using CFS and CVM resources on the current system. As an example the oracle_group would be offlined as follows: # hagrp -state | grep # hagrp -offline -sys Note: This may take time, especially if your main.cf is big and has a lot of dependencies. 2b. After all applications using CFS and CVM have been taken down, run 'slibclean' to unload the libraries from memory. 3. Stop VCS on the current node. # /opt/VRTSvcs/bin/hastop -local 4. If CVM/CFS/VXFEN is not configured, please go to step 7 verify that ports 'h', 'v' and 'w' have been closed if CVM/CFS is configured # /sbin/gabconfig -a The display should not have ports 'h', 'v' and 'w' listed 5. Deinitialize CFS # /opt/VRTSvxfs/sbin/fsclustadm cfsdeinit Verify that port 'f' has been closed # /sbin/gabconfig -a The display should not have port 'f' listed 6. Unconfigure vxfen # /sbin/vxfenconfig -U Verify that port 'b' has been closed # /sbin/gabconfig -a The display should not have port 'b' listed 7. At this point all gab ports except port 'a' should have been closed. Verify this as follows: # /sbin/gabconfig -a 8. Unconfigure GAB: # /sbin/gabconfig -U 9. Unconfigure LLT: # /sbin/lltconfig -Uo 10. Unload the LLT driver: # /usr/sbin/strload -ud /usr/lib/drivers/pse/llt Unload the GAB driver: # /etc/methods/gabkext -stop 11. Verify that the LLT driver has been unloaded # /usr/sbin/strload -qd /usr/lib/drivers/pse/llt /usr/lib/drivers/pse/llt: no If llt is still loaded "yes" will show up in the output above. Note: If you are unable to successfully unload the LLT driver, the server may require a reboot after patch installation. This is so that the newer LLT driver gets loaded in the AIX kernel. Verify that the GAB driver has been unloaded: # /etc/methods/gabkext -status gab: unloaded NOTE: If you are unable to successfully unload the GAB driver, the server must be rebooted AFTER the patch installation. This is so that the new GAB driver gets loaded in the AIX kernel. 12. Change directory to the patch location and gunzip the VRTS*.bff.gz files. Install the LLT & GAB patch from the bff files from the same location: # installp -a -d ./VRTSllt.rte.bff VRTSllt.rte # installp -a -d ./VRTSgab.rte.bff VRTSgab.rte 13. Verify that the new fileset has been installed: # lslpp -l VRTSllt.rte VRTSllt.rte 5.0.3.101 APPLIED Veritas Low Latency Transport by Symantec # lslpp -l VRTSgab.rte VRTSgab.rte 5.0.3.101 APPLIED Veritas Group Membership and Atomic Broadcast by Symantec 14. Verify that the new LLT driver has been loaded: # strload -qd /usr/lib/drivers/pse/llt /usr/lib/drivers/pse/llt: yes Verify that the new GAB driver has been loaded: # /etc/methods/gabkext -status gab: loaded 15. If not already loaded, load the newly installed LLT driver: # strload -d /usr/lib/drivers/pse/llt If not already loaded, load the newly installed GAB driver: # /etc/methods/gabkext -start 16. Configure LLT # /sbin/lltconfig -c 17. Verify that LLT has been configured properly # /sbin/lltconfig LLT is running 18. Verify that the GAB driver is loaded: # /usr/bin/genkex | grep gab 19. Configure GAB: # sh /etc/gabtab 20. Verify that the GAB membership shows up correctly: # /sbin/gabconfig -a The display should have Port 'a' listed 21. Start VCS: # /opt/VRTSvcs/bin/hastart 22. To commit the patch: (Note: That the patch cannot be backed out once it is committed) # installp -c VRTSllt.rte # installp -c VRTSgab.rte 23. Verify that the fileset is committed: # lslpp -l VRTSllt.rte VRTSllt.rte 5.0.3.101 COMMITTED Veritas Low Latency Transport by Symantec # lslpp -l VRTSgab.rte VRTSgab.rte 5.0.3.101 COMMITTED Veritas Group Membership and Atomic Broadcast by Symantec UNINSTALLING THE PATCHES FOR VCS: -------------------------------- The VRTSllt.rte.bff & VRTSgab.rte.bff patch can ONLY be backed out if it has not been committed. NOTE: Before uninstalling patch, make sure that the APAR changing DLPI behavior is not installed on the system by running following commands: # instfix -iv | grep "BRING DLPI DRIVER \"TO SPEC\"" If above mentioned command returns an APAR then backing out this point patch will move llt to older version which will cause panic or hang. Steps to Backout the Patch: 1. Failover Groups: If any failover groups have been configured, and are currently running on this node, migrate them to any of the other active cluster nodes. # hagrp -state | grep # hagrp -switch -to 2. Offline all parallel groups using CFS and CVM resources on the current system. As an example the oracle_group would be offlined as follows: # hagrp -state | grep # hagrp -offline -sys Note: This may take time, especially if your main.cf is big and has a lot of dependencies. 3. After all applications using CFS and CVM have been taken down, run 'slibclean' to unload the libraries from memory. 4. Stop VCS on the current node. # /opt/VRTSvcs/bin/hastop -local 5. If CVM/CFS is not configured, please go to step 7 verify that ports 'h', 'v' and 'w' have been closed if CVM/CFS is configured # /sbin/gabconfig -a The display should not have ports 'h', 'v' and 'w' listed 6. Deinitialize CFS # /opt/VRTSvxfs/sbin/fsclustadm cfsdeinit Verify that port 'f' has been closed # /sbin/gabconfig -a The display should not have port 'f' listed 7. Unconfigure vxfen # /sbin/vxfenconfig -U Verify that port 'b' has been closed # /sbin/gabconfig -a The display should not have port 'b' listed 8. At this point all gab ports except port 'a' should have been closed. Verify this as follows: # /sbin/gabconfig -a 9. Unconfigure GAB: # /sbin/gabconfig -U 10. Unconfigure LLT: # /sbin/lltconfig -Uo 11. Unload the LLT & GAB drivers: # /usr/sbin/strload -ud /usr/lib/drivers/pse/llt Unload the GAB driver: # /etc/methods/gabkext -stop 12. Verify that the LLT driver has been unloaded # /usr/sbin/strload -qd /usr/lib/drivers/pse/llt /usr/lib/drivers/pse/llt: no If llt is still loaded "yes" will show up in the output above. Note: If you are unable to successfully unload the LLT driver, the server may require a reboot after patch installation. This is so that the newer LLT driver gets loaded in the AIX kernel. Verify that the GAB driver has been unloaded: # /etc/methods/gabkext -status gab: unloaded NOTE: If you are unable to successfully unload the GAB driver, the server must be rebooted AFTER the patch installation. This is so that the new GAB driver gets loaded in the AIX kernel. 13. Backout the patches : # installp -r VRTSllt.rte 5.0.3.101 # installp -r VRTSgab.rte 5.0.3.101 14. Verify that the patch has been backed out: # lslpp -l VRTSllt.rte VRTSllt.rte 5.0.3.100 COMMITTED Veritas Low Latency Transport by Symantec # lslpp -l VRTSgab.rte VRTSgab.rte 5.0.3.100 COMMITTED Veritas Group Membership and Atomic Broadcast by Symantec 15. Next as before go through the process of loading and configuring LLT, GAB and bringing up VCS (steps 14 through 21 above of section "INSTALLING THE PATCHES FOR VCS") Note: Steps 14 and 15 would now refer to the old llt driver. INSTALLING THE PATCHES FOR SFRAC ENVIRONMENT: -------------------------------------------- To install the patch: The following steps should be run on each node in the cluster, one at a time: 1. If Oracle and associated processes are active outside of VCS control, please stop them. 2(a) If srvctl is running (as is the case with 10gR1 and 10gR2) $ srvctl stop nodeapps -n 2(b) If 'gsd' is running stop it. For Oracle 9i R2 as Oracle user run the following command $ gsdctl stop To check the status of gsdctl, you can use the following command: $ gsdctl stat The gsdctl command is typically found in $ORACLE_HOME/bin. 3. Failover Groups: If any failover groups have been configured, and are currently running on this node, migrate them to any of the other active cluster nodes. # hagrp -state | grep # hagrp -switch -to 4. Offline all parallel groups using CFS and CVM resources on the current system. As an example the oracle_group would be offlined as follows: # hagrp -state | grep # hagrp -offline -sys Note: This may take time, especially if your main.cf is big and has a lot of dependencies. 4(a) Stop CRS manually if CRS is not under VCS control. #/etc/init.crs stop 4(b) After all the oracle instances and other applications using CFS and CVM have been taken down, run 'slibclean' to unload the libraries from memory. 5. Stop VCS on the current node. # /opt/VRTSvcs/bin/hastop -local 6. Verify that ports 'h', 'v' and 'w' have been closed # /sbin/gabconfig -a The display should not have ports 'h', 'v' and 'w' listed 7. Deinitialize CFS # /opt/VRTSvxfs/sbin/fsclustadm cfsdeinit Verify that port 'f' has been closed # /sbin/gabconfig -a The display should not have port 'f' listed 8. Unconfigure vcsmm # /sbin/vcsmmconfig -U Verify that port 'o' has been closed # /sbin/gabconfig -a The display should not have port 'o' listed If it does ensure that Oracle instances are offline 9. Unconfigure lmx # /sbin/lmxconfig -U 10. Unload the lmx driver # /usr/lib/methods/lmxext -stop To verify if the driver has been unloaded # /usr/lib/methods/lmxext -status lmx: unloaded 11. Unconfigure vxfen # /sbin/vxfenconfig -U Verify that port 'b' has been closed # /sbin/gabconfig -a The display should not have port 'b' listed 12. Unmount odm # umount /dev/odm Verify that port 'd' has been closed # /sbin/gabconfig -a The display should not have port 'd' listed 13. At this point all gab ports except port 'a' should have been closed Verify this as follows: # /sbin/gabconfig -a 14. To unconfigure GAB & LLT and to install the patches please follow the steps 8-20 mentioned in "INSTALLING THE PATCHES FOR VCS" above. 15. Configure vxfen # /sbin/vxfenconfig -c Verify that vxfen has been configured # /sbin/gabconfig -a The output should list port 'b' 16. Load the lmx driver # /etc/methods/lmxext -start Verify that lmx is loaded in kernel # /etc/methods/lmxext -status lmx: loaded 17. Configure the lmx driver # /sbin/lmxconfig -c 18. Configure vcsmm # /sbin/vcsmmconfig -c Verify that vxfen has been configured # /sbin/gabconfig -a The output should list port 'o' 19. Mount ODM # mount /dev/odm 20. Start VCS # /opt/VRTSvcs/bin/hastart 21. Check if all ports are now open # /sbin/gabconfig -a The output should list ports 'a', 'b', 'd', 'f', 'h', 'o', 'v', and 'w' 22. To commit the patch: (Note: that the patch cannot be backed out if it is committed) # installp -c VRTSllt.rte # installp -c VRTSgab.rte 23. Verify that the fileset is committed: # lslpp -l VRTSllt.rte VRTSllt.rte 5.0.3.101 COMMITTED Veritas Low Latency Transport by Symantec # lslpp -l VRTSgab.rte VRTSgab.rte 5.0.3.101 COMMITTED Veritas Group Membership and Atomic Broadcast by Symantec UNINSTALLING THE PATCHES FOR SFRAC ENVIRONMENT: ---------------------------------------------- The VRTSllt.rte.bff & VRTSgab.rte.bff patch can ONLY be backed out if it has not been committed. NOTE: Before uninstalling patch, make sure that the APAR changing DLPI behavior is not installed on the system by running following commands: # instfix -iv | grep "BRING DLPI DRIVER \"TO SPEC\"" If above mentioned command returns an APAR then backing out this point patch will move llt to older version which will cause panic or hang. Steps to Backout the Patch: Follow the steps outlined 1 through 14 of section "INSTALLING THE PATCHES FOR SFRAC ENVIRONMENT" only to stop and unload the drivers. 1. Backout the patches : # installp -r VRTSllt.rte 5.0.3.101 # installp -r VRTSgab.rte 5.0.3.101 2. Verify that the patch has been backed out: # lslpp -l VRTSllt.rte VRTSllt.rte 5.0.3.100 COMMITTED Veritas Low Latency Transport by Symantec # lslpp -l VRTSgab.rte VRTSgab.rte 5.0.3.100 COMMITTED Veritas Group Membership and Atomic Broadcast by Symantec 3. Next as before go through the process of loading and configuring LLT, GAB and bringing up SFRAC (steps 14 through 23 above of section "INSTALLING THE PATCHES FOR SFRAC ENVIRONMENT") Note: The llt & gab drivers will now refer to the old ones.