* * * READ ME * * * * * * Veritas Volume Manager 5.0 MP2 RP3 * * * * * * P-patch 2 * * * Patch Date: 2012-07-17 This document provides the following information: * PATCH NAME * PACKAGES AFFECTED BY THE PATCH * BASE PRODUCT VERSIONS FOR THE PATCH * OPERATING SYSTEMS SUPPORTED BY THE PATCH * INCIDENTS FIXED BY THE PATCH * INSTALLATION PRE-REQUISITES * INSTALLING THE PATCH * REMOVING THE PATCH PATCH NAME ---------- Veritas Volume Manager 5.0 MP2 RP3 P-patch 2 PACKAGES AFFECTED BY THE PATCH ------------------------------ VRTSvxvm VRTSvxvm BASE PRODUCT VERSIONS FOR THE PATCH ----------------------------------- * Veritas Volume Manager 5.0 MP2 * Veritas Storage Foundation for Oracle RAC 5.0 MP2 * Veritas Storage Foundation Cluster File System 5.0 MP2 * Veritas Volume Replicator 5.0 MP2 * Veritas Storage Foundation 5.0 MP2 * Veritas Storage Foundation High Availability 5.0 MP2 * Veritas Storage Foundation for Oracle 5.0 MP2 OPERATING SYSTEMS SUPPORTED BY THE PATCH ---------------------------------------- HP-UX 11i v2 (11.23) INCIDENTS FIXED BY THE PATCH ---------------------------- This patch fixes the following Symantec incidents: Patch ID: PHCO_43057, PHKL_43058 * 2627009 (Tracking ID: 2413763) SYMPTOM: vxconfigd, the VxVM daemon dumps core with the following stack: ddl_fill_dmp_info ddl_init_dmp_tree ddl_fetch_dmp_tree ddl_find_devices_in_system find_devices_in_system mode_set setup_mode startup main __libc_start_main _start DESCRIPTION: Dynamic Multi Pathing node buffer declared in the Device Discovery Layer was not initialized. Since the node buffer is local to the function, an explicit initialization is required before copying another buffer into it. RESOLUTION: The node buffer is appropriately initialized using memset() to address the coredump. * 2634096 (Tracking ID: 1206369) SYMPTOM: The recoveryoption of enclosure when set to nothrottle, is not persistent after reboot. DESCRIPTION: When the CLI 'vxdmpadm setattr' was used to set recovery option as nothrottle, the persistent information was not updated correctly. After a reboot the nothrottle recoveryoption was not being considered, hence user set value was not effective and the value changed back to default. RESOLUTION: Corrected vxdmp code to update nothrottle recovery option and to read it back at boot time. * 2662215 (Tracking ID: 2067319) SYMPTOM: In a multi node CVM (Cluster Volume Manager) environment, the vxconfigd process (VxVM configuration daemon) on master node may hang during a cluster reconfiguration. The vxconfigd can be found in tight loop with following stack: msgtail() + 0x204 msg() + 0x5c send_slaves() + 0xd94 master_send_abort() + 0x90 send_slaves() + 0xe4 master_get_results() + 0x58 commit() + 0x1acc req_vol_commit() + 0x968 request_loop() + 0xec8 main() + 0x14e0 __start() + 0x68 A number of following messages can be seen in the syslog. VxVM vxconfigd WARNING V-5-1-10377 send_slave: got slave_join: retry later DESCRIPTION: In CVM environment, master synchronizes the transaction details related to any configuration change among all the joined slaves (passive). While a transaction is in progress, if CVM reconfiguration happens due to join of a new node, the master aborts the transaction. In this situation a race between a passive slave and master is causing the vxconfigd hang on the master node. RESOLUTION: The transaction abort code is modified to handle the CVM reconfiguration properly. * 2662216 (Tracking ID: 530741) SYMPTOM: In CVM(Cluster Volume Manager) environment where a private DG(Disk Group) is imported, vxconfigd process may dump a core when 'vxdg -g flush' on the private DG is executed just after the following commands in sequence: i) A volume stop in the DG fails with error "Error in cluster processing" # vxvol -g stop VxVM vxvol ERROR V-5-1-10128 Error in cluster processing ii) Subsequent volume stop succeeds. # vxvol -g -f stop # The core shows the following stack; dbf_fmt_tbl+0x4c0() voldbf_fmt_tbl+0x3c() voldbsup_format_record+0xa0() format_write+0x2d8() ddb_update+0x15c() dg_update+0x11c() req_dg_flush_common+0x354() req_dg_flush_name+0x7c() request_loop+0xae4() main+0xcb4() _start+0x108() Then once the DG is deported, next import will be failed with the following error; # vxdg import VxVM vxdg ERROR V-5-1-10978 Disk group : import failed: Disk group has no valid configuration copies In the messages file, the following log will be seen; vxvm:vxconfigd: [ID 702911 daemon.error] Disk , copy 1: Block 1: Duplicate record in configuration DESCRIPTION: On any configuration change, VxVM will try to keep the same configuration data to be stored in the user land(vxconfigd), kernel land(vxio) and on-disk database. If the transaction of configuration change encounters an error, VxVM will clean up any inconsistencies between the user land, kernel land and on-disk database. In CVM environment, if cluster reconfiguration occurs following a transaction of configuration change on a private DG, the transaction may be aborted because of the reconfiguration. However appropriate clean up on the databases is not done in an error case where transaction in kernel is aborted by the reconfiguration. Then another configuration change in the same private DG would result in a duplicate record in the VxVM configuration database leading to a coredump or disk group import issue. This is a rare timing issue, however may be seen in a normal cluster stop operation. RESOLUTION: Code changes are made to hold on appropriate clean up and set appropriate error code. * 2803256 (Tracking ID: 2647975) SYMPTOM: Serial Split Brain (SSB) condition caused Cluster Volume Manager (CVM) Master Takeover to fail. The below vxconfigd debug output was noticed when the issue was noticed, VxVM vxconfigd NOTICE V-5-1-7899 CVM_VOLD_CHANGE command received V-5-1-0 Preempting CM NID 1 VxVM vxconfigd NOTICE V-5-1-9576 Split Brain. da id is 0.5, while dm id is 0.4 for dm cvmdgA-01 VxVM vxconfigd WARNING V-5-1-8060 master: could not delete shared disk groups VxVM vxconfigd ERROR V-5-1-7934 Disk group cvmdgA: Disabled by errors VxVM vxconfigd ERROR V-5-1-7934 Disk group cvmdgB: Disabled by errors ... VxVM vxconfigd ERROR V-5-1-11467 kernel_fail_join() : Reconfiguration interrupted: Reason is transition to role failed (12, 1) VxVM vxconfigd NOTICE V-5-1-7901 CVM_VOLD_STOP command received DESCRIPTION: When Serial Split Brain (SSB) condition is detected by the new CVM master, on Veritas Volume Manager (VxVM) versions 5.0 and 5.1, the default CVM behaviour will cause the new CVM master to leave the cluster and causes cluster-wide downtime. RESOLUTION: When SSB is detected in a diskgroup, CVM will only disable that particular diskgroup and keep the other diskgroups imported during the CVM Master Takeover, the new CVM master will not leave the cluster with the fix applied. * 2803260 (Tracking ID: 2040150) SYMPTOM: IO error messages for dmpnode observed in events logs (with reservation conflict error). Disk Marked Failing. We hit this issue when total number of PGR keys goes 32 or more by count. DESCRIPTION: Whenever there is 32 or more keys, we are not able to get all keys because only 7th byte of response buffer is used to get total bytes where keys are stored. In case of N keys, DMP can only read the first N % 32 (where % is mathematical modulo) keys of them. RESOLUTION: As per scsi-3 standard the 4th to 7th byte of response buffer of PGR_READ_KEYS command gives total bytes where keys are stored. Calculating buflen to follow using 4th to 7th byte of response buffer. * 2803331 (Tracking ID: 2792748) SYMPTOM: In an HPUX CVM environment, the slave join fails with the following error message in syslog : VxVM vxconfigd ERROR V-5-1-5784 cluster_establish:kernel interrupted vold on overlapping reconfig. DESCRIPTION: During the join, the slave node performs disk group import. As part of the import, the file descriptor pertaining to "Port u" is closed because of a wrong assignment of the return value of open(). Hence, the subsequent write to the same port was returning EBADF. RESOLUTION: This code issue is corrected by adding additional brackets thereby avoiding the wrong file descriptor close. Also, allow setting pfto attribute to 0. INSTALLING THE PATCH -------------------- $ swinstall -x autoreboot=true Please do swverify after installing the patches in order to make sure that the patches are installed correctly using: $ swverify REMOVING THE PATCH ------------------ To remove the patch, enter the following command: # swremove -x autoreboot=true SPECIAL INSTRUCTIONS -------------------- NONE OTHERS ------ --------------------------- Incidents fixed in MP2RP3P1: ========================== unixvm-cvs: Incident parent Abstract -------- --------------------------------------------------------------------- 2049321 (2441937) vxconfigrestore precommit fails with awk errors 2138782 (1822681) memory leak in vxio/voldrl_cleansio_start 2215565 (2209866) Inconsistent behavior with handling of siteconsistent flag 2236561 (990338) FMR Refreshing a snapshot should keep the same name for the snap object 2248993 (1715204) vxsnap operations leads to orphan snap obj in case of any failure occurs during operation, orphan snap object can't be removed. 2274273 (850816) Parallel 'vxsnap reattach' operations can cause data corruption. Also, lead to orphan snap objects. 2277306 (2183984) System panic in dmp_update_stats() routine 2280633 (2280624) Need to set site consistent only on mirrored-volumes. 2323929 (2323925) If rootdisk is encapsulated and if install-db is present, clear warning should be displayed on system boot. 2325112 (1545835) vxconfigd core dump during system boot after VxVM4.1RP4 applied. 2353415 (2349352) During LUN provisioning in single path IO mode environment a data corruption is observed 2353422 (2334534) In CVM environment, vxconfigd level join is hung when Master returns error "VE_NO_JOINERS" to a joinin g node and cluster nidmap is changed in new reconfiguration 2360720 (2359814) vxconfigbackup doesn't handle errors well 2368917 (1818780) Oakmont Linux consume many memories during dmp test. 2419942 (844758) vxbrk_rootmir fails with second swapvol 2497156 (2165141) vxvm resets b_clock_ticks to zero if I/O hints are passed by VxFS 2497157 (2026773) DMP: vxconfigd hang after array side port disable followed by vxdisk scandisks 2497160 (1268784) Memory Leaks in VxVM plugin of VxMS 2497164 (1381772) Slow performance of snapshot backups after applying ET1269854. 2497166 (1215169) Customer concerned about high memory consumption by vxesd on 5.0MP2RP2 on HP 11.23 2497228 (2481938) vxconfigbackup throwing an error when DG contains a sectioned disk 2530898 (1137504) vxesd -k does not kill existing intance but starts multiple ones vmprov: 2574355 Create a vmprovider patch for 5.0/11.23 MP2RP3P1 Incidents fixed in MP2RP3: ========================== Incident parent Abstract -------- --------------------------------------------------------------------- 1045826 (929437) HP-UX 11.23 - vxvm 5.0 MP1 - errors from apm_keyget: invalid APM crc 1514814 (1362609) vxconfigd ERROR V-5-1- 12826 osuuid invalid guid in console.log VxVM 5.0 1677976 (1976961) vxconfigd is hung in dmp_close_path 1937718 (1933970) Restrict the max_specialio tunable value to the permissible limit. 2000303 (1589715) vxconfigd dumps core, after vxdmpadm getportids ctlr= on a disabled ctlr 2027839 (2027831) vxdg free not reporting free space correctly on CVM master. vxprint not printing DEVICE column for SDs 2049394 /etc/vx/diag.d/vxcvmdiag cvminfo core dumps on HPUX 5.0MP1:UNOF_MP1RP6HF2 (Samsung Cards current VM patch level) 2078241 (2078209) resize of vxvm volume resulting in vxconfigd hang -- Need to increase the default value for volpagemod_max_memsz 2091195 (1224659) Customer appears to be hitting e1224659 when running vxconfigbackup on 5.0MP2RP2 on HP 11.23 2111423 (1488399) DMP: Detection of I/O being sent on SCTL device on HP 2111424 (1485075) vmtest/tc/scripts/admin/voldg/cds/set.tc hits DMP ted assert dmp_select_path:2a 2111441 (1954062) vxrecover results in os crash 2111639 (1114178) vxconfigd dumped a core in req_dg_get_info_common() 2111672 (1528932) vxconfigd asserts in config_db_disable() 2112470 (339187) CVM activation tag in vxprint -m output breaks vxprint 2112477 (1138518) vx commands (such as vxdg, vxdctl, vxdisk) hanging Incidents fixed in MP2RP2: ========================== 1927987 (1927982) vxpfto returns error arbitrarily even though the command sets the values correctly. 1944259 (1701865) join failed due to interrupted reconfig on joiner and same reconfig completed on master 1451436 (1593032) vxfenconfig ERROR V-11- 2-1064 DMP Idle Lun Monitoring DeRegistration FAILED 1957355 (1744224) FMR3: multiple vxplex attach cmds running in parallel on a volume lead to clearing DCO map and subsequently lead to corruption 1937832 (1755466) vol_find_ilock: searching of ilock is inefficient 1938088 (1755830) kmsg: sender: the logic for resend of messages needs to be optimized 1938117 (1755810) kmsg: sender thread is woken up unnecessarily during flowcontrol 1938114 (1755788) for a broadcast message, sender thread may end up sending the same message multiple times (not resend) 1938036 (1755519) kmsg layer: receiver side flowcontrol is not supported 1938076 (1755628) kmsg layer: with heavy messaging in the cluster the receiver thread slows down processing 1275028 (927444) Makefile.kernel file needs dcoinc.h header file 1849485 (1677416) Node is not joining back into the cluster 1969589 (1969526) panic in voldiodone when a hung priv region I/O comes back 1944180 (1819777) panic caused because of voldisk getting deleted in kernel when I/Os are active, due to duplicate da rid. 1946107 (1435470) Cluster nodes panicked in voldco_or_pvmbuf_to_pvmbuf code after installing 5.0MP3 1957358 (1729558) multiple vxplex attach cmds running in parallel on a volume lead to clearing DCO map and subsequently lead to corruption in FMR2 1874059 (1435681) vxesd looping, using ~100% of one CPU. 1902781 (1228526) Running vxdg flush on a slave node in a cvm cluster disables the disk group 1924619 (1532363) vxdisk 'updateudid' is corrupting diskid. Import of diskgroup fails. 1587888 (1587885) vxdiskunsetup fails with error "awk: Input line cannot be longer than 3, 000 bytes" 1876291 (913890) EMC ASL (libvxemc.so) with PowerPath co- existence is unable to skip LUNZ disks (CLARiiON) and PP co-existence broken on HP 1946112 (1471581) vxconfigd may hang when checking for ecopy functionality on array (ASL) 1946117 (1742702) vxvmconvert fails, probably due to wrong disk capacity calculation 1946110 (147037) vxconfigd cores at start up 1946109 (1463547) Persistent vxconfigd core dump on dynamic LUN reconfiguration. 1921587 (1907796) Corrupted Blocks in Oracle after Dynamic LUN expansion and vxconfigd core dump 1946105 (1421078) Manpage for vxdg(1M) needs to cover last shared dg disk detach scenario better 1946116 (1059720) Switching to EBN does not imediately show EFI disks correctly 1885021 (839077) vxresize fails on filesystems greater than 2TB 1946114 (1676061) System panic'd after 2 out of 4 paths to disk were removed. 1946118 (1192166) vxdg -n [newdg] deport [origdg] causes a sort of memory leak 1946104 (1203661) vxclustadm man page needs mcsg instead of hpsg, redundant ifdefs 1501595 (1650955) 'vxdctl enable' caused node panic after one path unmasked/unpresented on array Incidents Fixed in MP2RP1 ================== e1164654 (795042) vxvmconvert tools needs to be modified to eliminate the use of private LVM headers. e1274122 (828910) Double free of memory in voldg_clean_cpulist() e1274138 (1087073) disk.convert script prints VGs converted list when one or more failed e1274155 (1015605) Poison nibble: nibble of 0xc in the dmp minor will cause panic e1274241 (1001370) Lun reuse issue not fixed by the DMP backport hotfix e1274243 (1064826) "Could not do stat on path /devsdw" e1274255 (972406) vxconfigd hang e1360836 (1321272) vxcommands hanging after re-connect the FC-site link e1361304 (1260745) Node is not joining the cluster after reconnect the FC & heartbeat link. e1394216 (1393764) vxconfigd hung on node which is on which try to become master on site2 when FC and haerbeat link is disabled at same time. e1409142 (1361260) Slow I/O performance with VxFS filesystems on mirror-concat VxVM volumes with DCO and DRL. e1455184 (1414336) Disk devices do not appear in vxdisk list, but in vxprint e1470963 (1599295) /vmtest/tc/scripts/support/vxdisksetup/setup.tc is FAILED with vxdisksetup on IA e1501518 (1395616) vxdmp for the PGR_PREEMPT command unnecessarily retries on all paths and multiple times incase of RESV_CONFLICT e1501593 (1426480) VOLCVM_CLEAR_PR ioctl does not propogate the error returned by DMP to the caller e1504534 (795129) DG loses - CVM/VxVM with MC/SG e1504777 (1131566) If a config or klog copy hits an error, vxconfigd should validate and possibly detach the disk e1507982 (1507935) 5.0MP3RP1 Campus Cluster: vxconfigd core dumps when settag set to long sitename e1509485 (1220091) Reduce slave disk re-onlines from SLAVE_DISK_OP_NOTIFY request from master. e1514000 (1458481) volfmr_copymaps_instant panic on node of cluster during shared and private dg creations e1522342 (1269468) Enclosure removed/presented back multiple times, whilst vxconfigd is restarted, core dumps e1528686 (1227106) HxRT SFOR 5.0mp3 PowerPath/DMX : PGR key issues - Uncertain PGR key number and vold_pgr_unregister failed errors e1542863 (1541662) System panicked in DRL code when running flashsnap e1543470 (1159227) Getting core file related to vxesd in SFORAHA stack using combo installer. e1555898 (1068626) Full resync occurred in remaining nodes after SFORAC panic rebooted e1557153 (1729344) vxdg deport hung e1592476 (1543908) While running vxevac command, Oracle process thread stuck into ogetblk() which leads to i/o hang. e1631998 (1397234) Vm command hung consistently during DMP testing on PA machine. e1632029 (1392872) Nodes has panicked on which failover has happened when master TOCed and disable FC and both sites are writing to same volume. e1632058 (1393756) Vxcommands hung on master & slave after FC-site link disconnected e1632081 (1321296) vxassist core dump e1636487 (1289510) vxconfigd dumps core during vmcert run and later vm hung e1650957 (1228140) After setting path attribute to active, path state is not updated. e1670680 (1787772) deporting a dg hangs after re-connect the FC site Link e1719779 (1468885) The vxbrk_rootmir script does not complete and is hanging after invoking vxprivutil e1363314 (1260746) Node not joining back with 2min delay in disconnecting FC & heartbeat link e1586930 (1878759) panic on 11.23 IVM Guest machine Incidents Fixed in MP2 ============================== 1003433 (600447) vxprivutil dumpconfig is showing last_platform as 0x0#bad 1274123 (832350) vxdctl's initdmp section of man page require correction 1274142 (524055) vxvm:vxassist : ERROR:Cannot update volume vol-1 1274148 (900090) HP -- MSA arrays do not have an ASL and are not recognized as a JBOD 1274177 (1090155) During vxevac, the vxsd command can absorb all nfile resources in kernel 1274185 (1067501) 'vxdisksetup - iB' incorrectly calculates publen 1274194 (1079281) vxconfigrestore hits awk limitation in cbr_res_main() 1274272 (1189432) (DS6000)vxdmpadm disalbe/enable ctlr will hang all VX command 1274276 (1211302) EVA6k/HP-UX : Slave node panics after disable primary paths. 1299382 (1053529) Unable to import a shared disk group on the DR site 1299403 (1213239) CVM: Recovery for subvolumes(of a layered volume) does not happen due to missing -f option 1360759 (1260756) vxconfigd core dumps after fix for vxcommands hanging is applied 1360849 (1260757) Master node is getting crashed after reattach the site 1362201 (853822) master stuck in 'master selection' during random shutdown -r (updown) on 16 node cluster 1362204 (865400) panic in vol_kmsg_handle_send_err+00035C during shutdown -r and rejoin of 1 node in 16 node cluster 1376234 (1265794) vxvol set doesn't allow changing campus cluster options while the volume is open 1381724 (1321475) Join Failure Panic Loop on axe76 cluster 1415504 (990475) FMR2 : Oring of DRL Recovery Map with FMR Detach Map when the volume is opened in RWBK mode 1416347 (1416080) System panic in vol_change_disk() routine due to NULL deference. 1416349 (1386980) Panic in vol_putdisk(). Looks like another version of same problem as in e1288427 1422656 (1004746) New TC's for Fmr2. 1427498 (913656) abort from CBO gets stuck due to deadlock causing safetytimer expiry 1427499 (1133089) CVM: In consecutive master takeovers, slave state is not reset appropriately 1427500 (1145348) Reconfiguration deadlock in master node and passive slaves during DRL Rebuild response 1427503 (1156613) Safety timer expiry due to CVM reconfig taking too long for SG's comfort. 1427504 (1171932) CVM: master takeover reconfig continues on master but gets interrupted on slaves with a JOIN causing deadlock 1427505 (1168279) volsio stuck in defer Q causing reconfiguration to hang 1427506 (1210957) Vxio hung IO's and uncorrectable write errors 1450039 (1443679) FMR3: I/Os initiating DCO updates for clearing DRL async clear region may not wait for its completion. 1450048 (1246785) Panic in dmp_get_iocount() due to invalid cpu table address. 1450098 (1207898) Upon stop all nodes in the RAC cluster, some stale fencing keys are left from Master node with HDS USP V array 1450934 (1450932) Enhance DMP's delay queue processing logic to avoid infinite retries 1453894 (1453694) System panic in scsi_strategy_real() when an extra paths are added to an existing LUN on the fly. 1459367 (1033534) Enhancements for online and offline disk opertions 1465688 (1461717) 'vxsnap make' command result in vxconfigd and IO sleep too long time