README VERSION : 1.1 README CREATION DATE : 2013-02-01 PATCH-ID : PVKL_03975 PATCH NAME : VRTSvxvm 6.0.300.0 BASE PACKAGE NAME : VRTSvxvm BASE PACKAGE VERSION : 6.0.100.000 SUPERSEDED PATCHES : PVKL_03969 REQUIRED PATCHES : PVCO_03974 INCOMPATIBLE PATCHES : NONE SUPPORTED PADV : hpux1131 (P-PLATFORM , A-ARCHITECTURE , D-DISTRIBUTION , V-VERSION) PATCH CATEGORY : CORE , CORRUPTION , HANG , PANIC , PERFORMANCE PATCH CRITICALITY : CRITICAL HAS KERNEL COMPONENT : YES ID : NONE REBOOT REQUIRED : YES REQUIRE APPLICATION DOWNTIME : YES PATCH INSTALLATION INSTRUCTIONS: -------------------------------- Please refer to Release Notes for install instructions PATCH UNINSTALLATION INSTRUCTIONS: ---------------------------------- Please refer to Release Notes for uninstall instructions SPECIAL INSTRUCTIONS: --------------------- NONE SUMMARY OF FIXED ISSUES: ----------------------------------------- PATCH ID:PVKL_03975 2863672 (2834046) NFS migration failed due to device reminoring. 2892684 (1859018) "link detached from volume" warnings are displayed when a linked-breakoff snapshot is created 2892698 (2851085) DMP doesn't detect implicit LUN ownership changes for some of the dmpnodes 2940447 (2940446) Full fsck hangs on I/O in VxVM when cache object size is very large 2941226 (2915063) Rebooting VIS array having mirror volumes, master node panicked and other nodes CVM FAULTED 2941234 (2899173) vxconfigd hang after executing command "vradmin stoprep" 2944708 (1725593) The 'vxdmpadm listctlr' command has to be enhanced to print the count of device paths seen through the controller. 2944710 (2744004) vxconfigd is hung on the VVR secondary node during VVR configuration. 2944714 (2833498) vxconfigd hangs while reclaim operation is in progress on volumes having instant snapshots 2944717 (2851403) System panic is seen while unloading "vxio" module. This happens whenever VxVM uses SmartMove feature and the "vxportal" module gets reloaded (for e.g. during VxFS package upgrade) 2944722 (2869594) Master node panics due to corruption if space optimized snapshots are refreshed and 'vxclustadm setmaster' is used to select master. 2944725 (2910043) Avoid order 8 allocation by vxconfigd in node reconfig. 2944729 (2933138) panic in voldco_update_itemq_chunk() due to accessing invalid buffer 2974870 (2935771) In VVR environment, RLINK disconnects after master switch. 2978189 (2948172) Executing "vxdisk -o thin,fssize list" command can result in panic. 2979767 (2798673) System panics in voldco_alloc_layout() while creating volume with instant DCO 2983679 (2970368) Enhance handling of SRDF-R2 Write-Disabled devices in DMP. 3005921 (1901838) Incorrect setting of Nolicense flag can lead to dmp database inconsistency. 3020087 (2619600) Live migration of virtual machine having SFHA/SFCFSHA stack with data disks fencing enabled, causes service groups configured on virtual machine to fault. 3025973 (3002770) Accessing NULL pointer in dmp_aa_recv_inquiry() caused system panic. 3027482 (2273190) Incorrect setting of UNDISCOVERED flag can lead to database inconsistency PATCH ID:PVKL_03969 2860207 (2859470) EMC SRDF (Symmetrix Remote Data Facility) R2 disk with EFI label is not recognized by VxVM (Veritas Volume Manager) and its shown in error state. 2892643 (2801962) Growing a volume takes significantly large time when the volume has version 20 DCO attached to it. 2924207 (2886402) When re-configuring devices, vxconfigd hang is observed. SUMMARY OF KNOWN ISSUES: ----------------------------------------- 2954029(2957766) On HP-UX 11.31, vxdiskadm option 22-2 Dynamic Reconfiguration operation 'Remove Luns' might fail with error "ERROR: Please make sure to remove Luns from Array" 3037620(2979786) The SCSI registration keys are not removed if VCS engine is stopped for the second time. KNOWN ISSUES : -------------- * INCIDENT NO::2954029 TRACKING ID ::2957766 SYMPTOM:: When the user tries to remove LUNs from the system using vxdiskadm option 22-2 Dynamic Reconfiguration operation 'Remove Luns', the device removal operation might fail and report the following error message. "ERROR: Please make sure to remove Luns from Array" This is due to the Dynamic Reconfiguration Tool not being able to find devices that are not part of the legacy HP-UX I/O device tree but are seen only in the agile I/O device tree. WORKAROUND:: Perform the following steps - Remove the device with no hardware (NO_HW in output of 'ioscan -fNC disk') using rmsf(1M) - Run ioscan(1M) - Run 'vxdisk scandisks'. * INCIDENT NO::3037620 TRACKING ID ::2979786 SYMPTOM:: If VCS engine is stopped for the first time, the SCSI registration keys are removed. But if VCS engine is stopped for the second time, the keys are not removed. WORKAROUND:: None NONE FIXED INCIDENTS: ---------------- PATCH ID:PVKL_03975 * INCIDENT NO:2863672 TRACKING ID:2834046 SYMPTOM: VxVM dynamically reminors all the volumes during DG import if the DG base minor numbers are not in the correct pool. This behaviour cases NFS client to have to re-mount all NFS file systems in an environment where CVM is used on the NFS server side. DESCRIPTION: Starting from 5.1, the minor number space is divided into two pools, one for private disk groups and another for shared disk groups. During DG import, the DG base minor numbers will be adjusted automatically if not in the correct pool, and so do the volumes in the disk groups. This behaviour reduces many minor conflicting cases during DG import. But in NFS environment, it makes all file handles on the client side stale. Customers had to unmount files systems and restart applications. RESOLUTION: A new tunable, "autoreminor", is introduced. The default value is "on". Most of the customers don't care about auto-reminoring. They can just leave it as it is. For a environment that autoreminoring is not desirable, customers can just turn it off. Another major change is that during DG import, VxVM won't change minor numbers as long as there is no minor conflicts. This includes the cases that minor numbers are in the wrong pool. * INCIDENT NO:2892684 TRACKING ID:1859018 SYMPTOM: "Link link detached from volume " warnings are displayed when a linked-breakoff snapshot is created. DESCRIPTION: The purpose of these message is to let user and administrators know about the detach of link due to I/O errors. These messages get displayed uneccesarily whenever linked-breakoff snapshot is created. RESOLUTION: Code changes are made to display messages only when link is detached due to I/O errors on volumes involved in link-relationship. * INCIDENT NO:2892698 TRACKING ID:2851085 SYMPTOM: DMP doesn't detect implicit LUN ownership changes DESCRIPTION: DMP does ownership monitoring for ALUA arrays to detect implicit LUN ownership changes. This helps DMP to always use Active/Optimized path for sending down I/O. This feature is controlled using dmp_monitor_ownership tune and is enabled by default. In case of partial discovery triggered through event source daemon (vxesd), ALUA information kept in kernel data structure for ownership monitoring was getting wiped. This causes ownership monitoring to not work for these dmpnodes. RESOLUTION: Source has been updated to handle such case. * INCIDENT NO:2940447 TRACKING ID:2940446 SYMPTOM: I/O can hang on volume with space optimized snapshot if the underlying cache object is of very large size. It can also lead to data corruption in cache- object. DESCRIPTION: Cache volume maintains B+ tree for mapping the offset and its actual location in cache object. Copy-on-write I/O generated on snapshot volumes needs to determine the offset of particular I/O in cache object. Due to incorrect type- casting the value calculated for large offset truncates to smaller value due to overflow, leading to data corruption. RESOLUTION: Code changes are done to avoid overflow during offset calculation in cache object. * INCIDENT NO:2941226 TRACKING ID:2915063 SYMPTOM: System panic with following stack during detaching plex of volume in CVM environment. vol_klog_findent() vol_klog_detach() vol_mvcvm_cdetsio_callback() vol_klog_start() voliod_iohandle() voliod_loop() DESCRIPTION: During plex-detach operation VxVM searches the plex object to be detached in kernel. In case if there is some transaction in progress on any diskgroup in the system, incorrect plex object gets selected sometime, which results into dereference of invalid address and panics the system. RESOLUTION: Code changes done to make sure that correct plex object is getting selected. * INCIDENT NO:2941234 TRACKING ID:2899173 SYMPTOM: In CVR environment, SRL failure may result into vxconfigd hang and eventually resulting into 'vradmin stoprep' command hang. DESCRIPTION: 'vradmin stoprep' command is hung because vxconfigd is waiting indefinitely in transaction. Transaction was waiting for IO completion on SRL. We generate error handler to handle IO failure on SRL. But if we are in transaction, this error was not getting handled properly resulting into transaction hang. RESOLUTION: Fix is provided such that when SRL failure is encountered, transaction itself handles IO error on SRL. * INCIDENT NO:2944708 TRACKING ID:1725593 SYMPTOM: The 'vxdmpadm listctlr' command does not show the count of device paths seen through it DESCRIPTION: The 'vxdmpadm listctlr' currently does not show the number of device paths seen through it. The CLI option has been enhanced to provide this information as an additional column at the end of each line in the CLI's output RESOLUTION: The number of paths under each controller is counted and the value is displayed as the last column in the 'vxdmpadm listctlr' CLI output * INCIDENT NO:2944710 TRACKING ID:2744004 SYMPTOM: When VVR is configured, vxconfigd on secondary gets hung. Any vx commands issued during this time does not complete. DESCRIPTION: Vxconfigd is waiting for IOs to drain before allowing a configuration change command to proceed. The IOs never drain completely resulting into the hang. This is because there is a deadlock where pending IOs are unable to start and vxconfigd keeps waiting for their completion. RESOLUTION: Changed the code so that this deadlock does not arise. The IOs can be started properly and complete allowing vxconfigd to function properly. * INCIDENT NO:2944714 TRACKING ID:2833498 SYMPTOM: vxconfigd daemon hangs in vol_ktrans_commit() while reclaim operation is in progress on volumes having instant snapshots. Stack trace is given below: vol_ktrans_commit volconfig_ioctl DESCRIPTION: Storage reclaim leads to the generation of special IOs (termed as Reclaim IOs), which can be very large in size(>4G) and unlike application IOs, these are not broken into smaller sized IOs. Reclaim IOs need to be tracked in snapshot maps if the volume has full snapshots configured. The mechanism to track reclaim IO is not capable of handling such large IOs causing hang. RESOLUTION: Code changes are made to use the alternative mechanism in Volume manager to track the reclaim IOs. * INCIDENT NO:2944717 TRACKING ID:2851403 SYMPTOM: System panics while unloading 'vxio' module when VxVM SmartMove feature is used and the "vxportal" module gets reloaded (for e.g. during VxFS package upgrade). Stack trace looks like: vxportalclose() vxfs_close_portal() vol_sr_unload() vol_unload() DESCRIPTION: During a smart-move operation like plex attach, VxVM opens the 'vxportal' module to read in-use file system maps information. This file descriptor gets closed only when 'vxio' module is unloaded. If the 'vxportal' module is unloaded and reloaded before 'vxio', the file descriptor with 'vxio' becomes invalid and results in a panic. RESOLUTION: Code changes are made to close the file descriptor for 'vxportal' after reading free/invalid file system map information. This ensures that stale file descriptors don't get used for 'vxportal'. * INCIDENT NO:2944722 TRACKING ID:2869594 SYMPTOM: Master node would panic with following stack after a space optimized snapshot is refreshed or deleted and master node is selected using 'vxclustadm setmaster' volilock_rm_from_ils vol_cvol_unilock vol_cvol_bplus_walk vol_cvol_rw_start voliod_iohandle voliod_loop thread_start In addition to this, all space optimized snapshots on the corresponding cache object may be corrupted. DESCRIPTION: In CVM, the master node owns the responsibility of maintaining the cache object indexing structure for providing space optimized functionality. When a space optimized snapshot is refreshed or deleted, the indexing structure would get rebuilt in background after the operation is returned. When the master node is switched using 'vxclustadm setmaster' before index rebuild is complete, both old master and new master nodes would rebuild the index in parallel which results in index corruption. Since the index is corrupted, the data stored on space optimized snapshots should not be trusted. I/Os issued on corrupted index would lead to panic. RESOLUTION: When the master role is switched using 'vxclustadm setmaster', the index rebuild on old master node would be safely aborted. Only new master node would be allowed to rebuild the index. * INCIDENT NO:2944725 TRACKING ID:2910043 SYMPTOM: Frequent swapin/swapout seen due to higher order memory requests DESCRIPTION: In VxVM operations such as plex attach, snapshot resync/reattach issue ATOMIC_COPY IOCTL's. Default I/O size for these operation is 1MB and VxVM allocates this memory from operating system. Memory allocations of such large size can results into swapin/swapout of pages and are not very efficient. In presence of lot of such operations , system may not work very efficiently. RESOLUTION: VxVM has its own I/O memory management module, which allocates pages from operating system and efficiently manage them. Modified ATOMIC_COPY code to make use of VxVM's internal I/O memory pool instead of directly allocating memory from operating system. * INCIDENT NO:2944729 TRACKING ID:2933138 SYMPTOM: System panics with stack trace given below: voldco_update_itemq_chunk() voldco_chunk_updatesio_start() voliod_iohandle() voliod_loop() DESCRIPTION: While tracking IOs in snapshot MAPS information is stored in- memory pages. For large sized IOs (such as reclaim IOs), this information can span across multiple pages. Sometimes the pages are not properly referenced in MAP update for IOs of larger size which lead to panic because of invalid page addresses. RESOLUTION: Code is modified to properly reference pages during MAP update for large sized IOs. * INCIDENT NO:2974870 TRACKING ID:2935771 SYMPTOM: Rlinks disconnect after switching the master. DESCRIPTION: Sometimes switching a master on the primary can cause the Rlinks to disconnect. vradmin repstatus would show "paused due to network disconnection" as the replication status. VVR uses a connection to check if the secondary is alive. The secondary responds to these requests by replying back, indicating that it is alive. On a master switch, the old master fails to close this connection with the secondary. Thus after the master switch the old master as well as the new master would send the requests to the secondary. This causes a mismatch of connection numbers on the secondary and the secondary does not reply to the requests of the new master. Thus it causes the Rlinks to disconnect. RESOLUTION: The solution is to close the connection of the old master with the secondary, so that it does not keep sending connection requests to the secondary. * INCIDENT NO:2978189 TRACKING ID:2948172 SYMPTOM: Execution of command "vxdisk -o thin,fssize list" can cause hang or panic. Hang stack trace might look like: pse_block_thread pse_sleep_thread .hkey_legacy_gate volsiowait vol_objioctl vol_object_ioctl voliod_ioctl volsioctl_real volsioctl Panic stack trace might look like: voldco_breakup_write_extents volfmr_breakup_extents vol_mv_indirect_write_start volkcontext_process volsiowait vol_objioctl vol_object_ioctl voliod_ioctl volsioctl_real vols_ioctl vols_compat_ioctl compat_sys_ioctl sysenter_dispatch DESCRIPTION: Command "vxdisk -o thin,fssize list" triggers reclaim I/Os to get file system usage from veritas file system on veritas volume manager mounted volumes. We currently do not support reclamation on volumes with space optimized (SO) snapshots. But because of a bug, reclaim IOs continue to execute for volumes with SO Snapshots leading to system panic/hang. RESOLUTION: Code changes are made to not to allow reclamation IOs to proceed on volumes with SO Snapshots. * INCIDENT NO:2979767 TRACKING ID:2798673 SYMPTOM: System panic is observed with the stacktrace given below: voldco_alloc_layout voldco_toc_updatesio_done voliod_iohandle voliod_loop DESCRIPTION: DCO (data change object) contains metadata information required to start DCO volume and decode further information from the DCO volume. This information is stored in the 1st block of DCO volume. If this metadata information is incorrect/corrupted, the further processing of volume start resulted into panic due to divide-by-zero error in kernel. RESOLUTION: Code changes are made to verify the correctness of DCO volumes metadata information during startup. If the information read is incorrect, volume start operations fails. * INCIDENT NO:2983679 TRACKING ID:2970368 SYMPTOM: SRDF-R2 WD(write-disabled)devices are shown in error state and lots of path enable/disable messages are generated in /etc/vx/dmpevents.log file. DESCRIPTION: DMP(dynamic multi-pathing driver) disables the paths of write protected devices. Therefore these devices are shown in error state. Vxattachd daemon tries to online these devices and executes partial device discovery for these devices. As part of partial device discovery, enabling and disabling the paths of such write protected devices generate lots of path enable/disable messages in /etc/vx/dmpevents.log file. RESOLUTION: This issue is addressed by not disabling paths of write protected devices in DMP. * INCIDENT NO:3005921 TRACKING ID:1901838 SYMPTOM: After addition of a license key that enables multi-pathing, the state of the controller is still shown as DISABLED in the vxdmpadm CLI output. DESCRIPTION: When the multi-pathing license key is added, the state of active paths of a LUN is changed to ENABLED but the state of the controller is not updated. RESOLUTION: As a fix, whenever multipathing license key is installed, the operation updates the state of the controller in addition to that of the LUN paths. * INCIDENT NO:3020087 TRACKING ID:2619600 SYMPTOM: Live migration of virtual machine having SFHA/SFCFSHA stack with data disks fencing enabled, causes service groups configured on virtual machine to fault. DESCRIPTION: After live migration of virtual machine having SFHA/SFCFSHA stack with data disks fencing enabled is done, I/O fails on shared SAN devices with reservation conflict and causes service groups to fault. Live migration causes SCSI initiator change. Hence I/O coming from migrated server to shared SAN storage fails with reservation conflict. RESOLUTION: Code changes are added to check whether the host is fenced off from cluster. If host is fenced off, then registration key is re-registered for dmpnode through migrated server and restart IO. * INCIDENT NO:3025973 TRACKING ID:3002770 SYMPTOM: The system panics with the following stack trace: vxdmp:dmp_aa_recv_inquiry vxdmp:dmp_process_scsireq vxdmp:dmp_daemons_loop unix:thread_start DESCRIPTION: The panic happens while handling the SCSI response for SCSI Inquiry command. In order to determine if the path on which SCSI Inquiry command was issued is read-only, the code needs to check the error buffer. However the error buffer is not always prepared. So the code should examine if the error buffer is valid before further checking. Without such error buffer examination, the system may panic with NULL pointer. RESOLUTION: The source code is modified to verify the error buffer to be valid. * INCIDENT NO:3027482 TRACKING ID:2273190 SYMPTOM: The device discovery commands 'vxdisk scandisks' or 'vxdctl enable' issued just after license key installation may fail and abort. DESCRIPTION: After addition of license key that enables multi-pathing, the state of paths maintained at user level is incorrect. RESOLUTION: As a fix, whenever multi-pathing license key is installed, the operation updates the state of paths both at user level and kernel level. PATCH ID:PVKL_03969 * INCIDENT NO:2860207 TRACKING ID:2859470 SYMPTOM: The EMC SRDF-R2 disk may go in error state when you create EFI label on the R1 disk. For example: R1 site # vxdisk -eo alldgs list | grep -i srdf emc0_008c auto:cdsdisk emc0_008c SRDFdg online c1t5006048C5368E580d266 srdf-r1 R2 site # vxdisk -eo alldgs list | grep -i srdf emc1_0072 auto - - error c1t5006048C536979A0d65 srdf-r2 DESCRIPTION: Since R2 disks are in write protected mode, the default open() call (made for read-write mode) fails for the R2 disks, and the disk is marked as invalid. RESOLUTION: As a fix, DMP was changed to be able to read the EFI label even on a write protected SRDF-R2 disk. * INCIDENT NO:2892643 TRACKING ID:2801962 SYMPTOM: Operations that lead to growing of volume, including 'vxresize', 'vxassist growby/growto' take significantly larger time if the volume has version 20 DCO(Data Change Object) attached to it in comparison to volume which doesn't have DCO attached. DESCRIPTION: When a volume with a DCO is grown, it needs to copy the existing map in DCO and update the map to track the grown regions. The algorithm was such that for each region in the map it would search for the page that contains that region so as to update the map. Number of regions and number of pages containing them are proportional to volume size. So, the search complexity is amplified and observed primarily when the volume size is of the order of terabytes. In the reported instance, it took more than 12 minutes to grow a 2.7TB volume by 50G. RESOLUTION: Code has been enhanced to find the regions that are contained within a page and then avoid looking-up the page for all those regions. * INCIDENT NO:2924207 TRACKING ID:2886402 SYMPTOM: When re-configuring dmp devices, typically using command 'vxdisk scandisks', vxconfigd hang is observed. Since it is in hang state, no VxVM(Veritas volume manager)commands are able to respond. Following process stack of vxconfigd was observed. dmp_unregister_disk dmp_decode_destroy_dmpnode dmp_decipher_instructions dmp_process_instruction_buffer dmp_reconfigure_db gendmpioctl dmpioctl dmp_ioctl dmp_compat_ioctl compat_blkdev_ioctl compat_sys_ioctl cstar_dispatch DESCRIPTION: When DMP(dynamic multipathing) node is about to be destroyed, a flag is set to hold any IO(read/write) on it. The IOs which may come in between the process of setting flag and actual destruction of DMP node, are placed in dmp queue and are never served. So the hang is observed. RESOLUTION: Appropriate flag is set for node which is to be destroyed so that any IO after marking flag will be rejected so as to avoid hang condition. INCIDENTS FROM OLD PATCHES: --------------------------- NONE