* * * READ ME * * * * * * Veritas Volume Manager 5.1 SP1 RP3 * * * * * * P-patch 2 * * * Patch Date: 2013-03-14 This document provides the following information: * PATCH NAME * OPERATING SYSTEMS SUPPORTED BY THE PATCH * PACKAGES AFFECTED BY THE PATCH * BASE PRODUCT VERSIONS FOR THE PATCH * SUMMARY OF INCIDENTS FIXED BY THE PATCH * DETAILS OF INCIDENTS FIXED BY THE PATCH * INSTALLATION PRE-REQUISITES * INSTALLING THE PATCH * REMOVING THE PATCH PATCH NAME ---------- Veritas Volume Manager 5.1 SP1 RP3 P-patch 2 OPERATING SYSTEMS SUPPORTED BY THE PATCH ---------------------------------------- Solaris 10 X86 PACKAGES AFFECTED BY THE PATCH ------------------------------ VRTSvxvm BASE PRODUCT VERSIONS FOR THE PATCH ----------------------------------- * Veritas Storage Foundation for Oracle RAC 5.1 SP1 * Veritas Storage Foundation Cluster File System 5.1 SP1 * Veritas Storage Foundation 5.1 SP1 * Veritas Storage Foundation High Availability 5.1 SP1 * Veritas Dynamic Multi-Pathing 5.1 SP1 * Symantec VirtualStore 5.1 SP1 SUMMARY OF INCIDENTS FIXED BY THE PATCH --------------------------------------- Patch ID: 142630-18 * 2567623 (2567618) VRTSexplorer coredumps in checkhbaapi/print_target_map_entry. * 2736131 (1190117) Preserve VxVM disk related offset content using vxdisk -fo retain init * 2875965 (2875962) During the upgrade of VRTSaslapm package, a conflict is encountered with VRTSvxvm package because an APM binary is included in VRTSvxvm package which is already installed * 2906832 (2398954) Machine panics while doing I/O on a VxFS mounted instant snapshot with ODM smartsync enabled. * 2980266 (2441615) Cannot create zpool of correct size with LUNs of sizes greater than 2TB. * 2990730 (2970368) Enhance handling of SRDF-R2 Write-Disabled devices in DMP. * 3008448 (3008423) vxdisksetup fails and throws error of type "Device path does not exist". * 3069507 (3002770) While issuing a SCSI inquiry command, NULL pointer dereference in DMP causes system panic * 3072890 (2352517) Machine panics while excluding a controller from VxVM view * 3083188 (2622536) VVR : Modify the algorithm to restart the local throttled I/Os(in ru_state->throttleq) * 3083189 (3025713) adddisk / rmdisk taking time and I/O hang during the command execution. * 3083991 (2277359) vxdisksetup -fi succeeds on ZFS LUN's but it should fail Patch ID: 142630-17 * 2485252 (2910043) Avoid order 8 allocation by vxconfigd in node reconfig. * 2711758 (2710579) Do not write backup labels for CDS disk - irrespective of disk size * 2847333 (2834046) NFS migration failed due to device reminoring. * 2860208 (2859470) EMC SRDF (Symmetrix Remote Data Facility) R2 disk with EFI label is not recognized by VxVM (Veritas Volume Manager) and its shown in error state. * 2881862 (2878876) vxconfigd dumps core in vol_cbr_dolog() due to race between two threads processing requests from the same client. * 2883606 (2189812) While executing 'vxdisk updateudid' on a disk which is in 'online invalid' state causes vxconfigd to dump core * 2915758 (2915751) Solaris machine panics during dynamic lun expansion of a CDS disk. * 2919718 (2919714) On a THIN lun, vxevac returns 0 without migrating unmounted VxFS volumes. * 2929003 (2928987) DMP(Dynamic multipathing) retried IO infinitely causing vxconfigd to hang in case of OS(Operating system) layer failure. * 2940448 (2940446) Full fsck hangs on I/O in VxVM when cache object size is very large * 2946948 (2406096) vxconfigd core dump in vol_cbr_oplistfree() * 2957608 (2671241) vxnotify need to support volume ldisabled to enable and enabled to ldisabled notification required for agent framework. * 2979692 (2575051) In a CVM environment, Master switch or master takeover operations can result in panic when disk group configuration has cache objects. DETAILS OF INCIDENTS FIXED BY THE PATCH --------------------------------------- This patch fixes the following Symantec incidents: Patch ID: 142630-18 * 2567623 (Tracking ID: 2567618) SYMPTOM: VRTSexplorer coredumps in checkhbaapi/print_target_map_entry which looks like: print_target_map_entry() check_hbaapi() main() _start() DESCRIPTION: checkhbaapi utility uses HBA_GetFcpTargetMapping() API which returns the current set of mappings between operating system and fibre channel protocol (FCP) devices for a given HBA port. The maximum limit for mappings was set to 512 and only that much memory was allocated. When the number of mappings returned was greater than 512, the function that prints this information used to try to access the entries beyond that limit, which resulted in core dumps. RESOLUTION: The code has been changed to allocate enough memory for all the mappings returned by HBA_GetFcpTargetMapping(). * 2736131 (Tracking ID: 1190117) SYMPTOM: The 'vxdisk -f init' command can corrupt/overwrite some of the public region data. DESCRIPTION: When VxVM (Veritas Volume Manager) is upgraded from 4.1 to 5.0 and if a disk is initialized with 5.0 version, then the contents of the pubic region might get corrupted. This is because the default size of private region is increased from 1MB in 4.1 to 32MB in 5.0. RESOLUTION: A new option "-o" is introduced.The -o retain option for disk initialization keeps existing private/public offsets intact. The command can be used in following way: vxdisk -fo retain init * 2875965 (Tracking ID: 2875962) SYMPTOM: When an upgrade install is performed from VxVM 5.0MPx to VxVM 5.1(and higher) the installation script may give the following message: The following files are already installed on the system and are being used by another package: /usr/lib/vxvm/root/kernel/drv/vxapm/dmpsvc.SunOS_5.10 Do you want to install these conflicting files [y, n,?, q] DESCRIPTION: A VxVM 5.0MPx patch incorrectly packaged the IBM SanVC APM with a VxVM patch, which was subsequently corrected in a later patch. Any upgrade performed from that 5.0MPx patch to 5.1 or higher will result in this packaging message. RESOLUTION: Added code to the packaging script of the VxVM package to remove the APM files so that a conflict between VRTSaslapm and VRTSvxvm packages are resolved. * 2906832 (Tracking ID: 2398954) SYMPTOM: Machine panics while doing I/O on a VxFS mounted instant snapshot with ODM smartsync enabled. The panic has the following stack. panic: post_hndlr(): Unresolved kernel interruption cold_vm_hndlr bubbledown as_ubcopy privlbcopy volkio_to_kio_copy vol_multistepsio_overlay_data vol_multistepsio_start voliod_iohandle voliod_loop kthread_daemon_startup DESCRIPTION: VxVM uses the fields av_back and av_forw of io buf structure to store its private information. VxFS also uses these fields to chain io buffers before passing I/O to VxVM. When an I/O is received at VxVM layer it always resets these fields. But if ODM smartsync is enabled, VxFS uses a special strategy routine to pass on hints to VxVM. Due to a bug in the special strategy routine, the fields av_back and av_forw are not reset and could be pointing to a valid buffer in VxFS io buffer chain. VxVM interprets these fields (av_back, av_forw) wrongly and modifies its contents which in turn corrupts the next buffer in the chain leading to the panic. RESOLUTION: The fields av_back and av_forw of io buf structure are reset in the special strategy routine. * 2980266 (Tracking ID: 2441615) SYMPTOM: Cannot create zpool of correct size with LUNs of sizes greater than 2TB. DESCRIPTION: zpool of correct size is not being created because DMP (Dynamic Multipathing) solution fails to return the correct capacity of the lun. DMP routine(s) used to calculate the size returns the number of blocks in integer format, which results in incorrect calulation of the lun size and hence, the issue. RESOLUTION: Code changes have been made to get the correct size of the lun. * 2990730 (Tracking ID: 2970368) SYMPTOM: SRDF-R2 WD(write-disabled)devices are shown in error state and lots of path enable/disable messages are generated in /etc/vx/dmpevents.log file. DESCRIPTION: DMP(dynamic multi-pathing driver) disables the paths of write protected devices. Therefore these devices are shown in error state. Vxattachd daemon tries to online these devices and executes partial device discovery for these devices. As part of partial device discovery, enabling and disabling the paths of such write protected devices generate lots of path enable/disable messages in /etc/vx/dmpevents.log file. RESOLUTION: This issue is addressed by not disabling paths of write protected devices in DMP. * 3008448 (Tracking ID: 3008423) SYMPTOM: vxdisksetup fails and throws the error of following format: VxVM ERROR V-5-1-15404 vxmediadisc: Device path /dev/rdsk/emcpower6cs2 does not exist. VxVM INFO V-5-1-15407 Usage: vxmediadisc [-p] prtvtoc: /dev/rdsk/emcpower6cs2: No such file or directory DESCRIPTION: Labeling of EMC disks controlled by Powerpath fails as it finds incorrect path for the given disk. As is not valid, 'vxmediadisc' returns an error which is not handled at the time of labeling of the disk. And the other problem with the incorrect path is, it does not get the correct disk geometry which is used to label the disks. RESOLUTION: Code changes have been made to handle the error returned by 'vxmediadisc' and obtain the correct disk geometry. * 3069507 (Tracking ID: 3002770) SYMPTOM: While issuing a SCSI inquiry command, the system panics with the following stack trace: vxdmp:dmp_aa_recv_inquiry vxdmp:dmp_process_scsireq vxdmp:dmp_daemons_loop unix:thread_start DESCRIPTION: The panic happens while handling the SCSI response for SCSI Inquiry command. In order to determine if the path on which SCSI Inquiry command was issued is read-only, DMP needs to check the error buffer. However the error buffer is not always prepared. So DMP should examine if the error buffer is valid before further checking. Without such error buffer examination, the system may panic with NULL pointer. RESOLUTION: The source code is modified to verify the error buffer to be valid. * 3072890 (Tracking ID: 2352517) SYMPTOM: Excluding a controller from VxVM using the command "vxdmpadm exclude ctlr=" leads to panic with the following stack gen_common_adaptiveminq_select_path+0x1b4() dmp_select_path+0x70() gendmpstrategy+0x17c() voldiskiostart+0x4f0() vol_subdisksio_start+0x2c8() volkcontext_process+0x268() volkiostart+0xaac() vxiostrategy+0x74() vx_bread_bp+0x258() vx_getblk_cmn+0x1d0() vx_getblk+0x34() vx_getmap+0xd0() vx_getemap+0xf0() vx_do_extfree+0x108() vx_extfree+0x278() vx_te_trunc_data+0x8b0() vx_te_trunc+0x294() vx_trunc_typed+0x180() vx_trunc_tran2+0x644() vx_trunc_tran+0x17c() vx_trunc+0x2d0() vx_inactive_remove+0xe8() vx_inactive_tran+0x5e4() vx_local_inactive_list+8() vx_inactive_list+0x410() vx_workitem_process+0x10() vx_worklist_process+0x344() vx_worklist_thread+0x94() thread_start+4() DESCRIPTION: While excluding a controller from VxVM view, we have to exclude all the paths also.The panic occurs since we exclude the controller before we exclude the paths belonging to that controller. While excluding the path, we access the controller of that path which would be NULL thus leading to panic RESOLUTION: Code changes have been done to exclude all the paths belonging to a controller before excluding a controller. * 3083188 (Tracking ID: 2622536) SYMPTOM: Under a heavy I/O load, write I/Os on VVR Primary logowner takes a very long time to complete. DESCRIPTION: VVR can not allow more than 2048 I/Os outstanding on the SRL volume. Any I/Os beyond this threshold will be throttled. The throttled I/Os are restarted after every SRL header flush operation. The restarted throttled I/Os contend with new I/Os and can starve if new I/Os get preference. RESOLUTION: The SRL log allocation algorithm is modified to give priority to the throttled I/Os. The new I/Os will go behind the throttled I/Os. * 3083189 (Tracking ID: 3025713) SYMPTOM: VxVM commands "vxdg adddisk" and "vxdg rmdisk" takes long time (approximately 90 seconds) and I/Os hang during the command execution. DESCRIPTION: The VxVM commands on a VVR DiskGroup doesn't complete until all the outstanding I/Os in that DiskGroup are drained completely. If replication is active, the outstanding I/Os include network I/Os (the I/Os to be sent from Primary to Secondary). VxVM commands get hung waiting for these network I/Os to drain or rlink to be disconnected. RESOLUTION: If there are outstanding Network I/Os lying in readback pool, the Rlink is disconnected to allow the VxVM commands to complete. * 3083991 (Tracking ID: 2277359) SYMPTOM: vxdisksetup -fi succeeds on ZFS LUN's but it should fail with the following error: VxVM vxdisksetup ERROR V-5-2-5716 Disk is in use by ZFS. Slice(s) 0 are in use as ZFS zpool (or former) devices. If you still want to initialize this device for VxVM use, please destroy the zpool by running 'zpool' command if it is still active, and then remove the ZFS signature from each of these slice(s) as follows: dd if=/dev/zero of=/dev/vx/rdmp/[n] oseek=31 bs=512 count=1 [n] is the slice number. DESCRIPTION: While executing vxdisksetup on devices VxVM checks whether the device is ZFS disk or not.But because of the bug we were not recognizing the ZFS disk label properly. RESOLUTION: code changes are done to properly identify ZFS disks during initialization. Patch ID: 142630-17 * 2485252 (Tracking ID: 2910043) SYMPTOM: Frequent swapin/swapout seen due to higher order memory requests DESCRIPTION: In VxVM operations such as plex attach, snapshot resync/reattach issue ATOMIC_COPY IOCTL's. Default I/O size for these operation is 1MB and VxVM allocates this memory from operating system. Memory allocations of such large size can results into swapin/swapout of pages and are not very efficient. In presence of lot of such operations , system may not work very efficiently. RESOLUTION: VxVM has its own I/O memory management module, which allocates pages from operating system and efficiently manage them. Modified ATOMIC_COPY code to make use of VxVM's internal I/O memory pool instead of directly allocating memory from operating system. * 2711758 (Tracking ID: 2710579) SYMPTOM: Data corruption can be observed on a CDS (Cross-platform Data Sharing) disk, as part of operations like LUN resize, Disk FLUSH, Disk ONLINE etc. The following pattern would be found in the data region of the disk. cyl alt 2 hd sec DESCRIPTION: The CDS disk maintains a SUN VTOC in the zeroth block and a backup label at the end of the disk. The VTOC maintains the disk geometry information like number of cylinders, tracks and sectors per track. The backup label is the duplicate of VTOC and the backup label location is determined from VTOC contents. If the content of SUN VTOC located in the zeroth sector are incorrect, this may result in the wrong calculation of the backup label location. If the wrongly calculated backup label location falls in the public data region rather than the end of the disk as designed, data corruption occurs. RESOLUTION: Suppressed writing the backup label to prevent the data corruption. * 2847333 (Tracking ID: 2834046) SYMPTOM: VxVM dynamically reminors all the volumes during DG import if the DG base minor numbers are not in the correct pool. This behaviour cases NFS client to have to re-mount all NFS file systems in an environment where CVM is used on the NFS server side. DESCRIPTION: Starting from 5.1, the minor number space is divided into two pools, one for private disk groups and another for shared disk groups. During DG import, the DG base minor numbers will be adjusted automatically if not in the correct pool, and so do the volumes in the disk groups. This behaviour reduces many minor conflicting cases during DG import. But in NFS environment, it makes all file handles on the client side stale. Customers had to unmount files systems and restart applications. RESOLUTION: A new tunable, "autoreminor", is introduced. The default value is "on". Most of the customers don't care about auto-reminoring. They can just leave it as it is. For a environment that autoreminoring is not desirable, customers can just turn it off. Another major change is that during DG import, VxVM won't change minor numbers as long as there is no minor conflicts. This includes the cases that minor numbers are in the wrong pool. * 2860208 (Tracking ID: 2859470) SYMPTOM: The EMC SRDF-R2 disk may go in error state when you create EFI label on the R1 disk. For example: R1 site # vxdisk -eo alldgs list | grep -i srdf emc0_008c auto:cdsdisk emc0_008c SRDFdg online c1t5006048C5368E580d266 srdf-r1 R2 site # vxdisk -eo alldgs list | grep -i srdf emc1_0072 auto - - error c1t5006048C536979A0d65 srdf-r2 DESCRIPTION: Since R2 disks are in write protected mode, the default open() call (made for read-write mode) fails for the R2 disks, and the disk is marked as invalid. RESOLUTION: As a fix, DMP was changed to be able to read the EFI label even on a write protected SRDF-R2 disk. * 2881862 (Tracking ID: 2878876) SYMPTOM: vxconfigd, VxVM configuration daemon dumps core with the following stack. vol_cbr_dolog () vol_cbr_translog () vold_preprocess_request () request_loop () main () DESCRIPTION: This core is a result of a race between two threads which are processing the requests from the same client. While one thread completed processing a request and is in the phase of releasing the memory used, other thread is processing a request "DISCONNECT" from the same client. Due to the race condition, the second thread attempted to access the memory which is being released and dumped core. RESOLUTION: The issue is resolved by protecting the common data of the client by a mutex. * 2883606 (Tracking ID: 2189812) SYMPTOM: While executing 'vxdisk updateudid' on a disk which is in 'online invalid' state causes vxconfigd to dump core with following stack: priv_join() req_disk_updateudid() request_loop() main() DESCRIPTION: While updating udid, nullity check was not done for an internal data structure. This lead vxconfigd to dump core. RESOLUTION: Code changes are done to add nullity checks for internal data structure. * 2915758 (Tracking ID: 2915751) SYMPTOM: Solaris machine panics while resizing CDS-EFI LUN or CDS VTOC to EFI conversion case where new size of resize is greater than 1TB. DESCRIPTION: While resizing a disk having CDS-EFI format or while resizing a CDS disk from less than 1TB to >= 1TB, machine panics because of the incorrect use of device numbers. VxVM uses the whole slice number s0 instead of s7 which represents the whole device for EFI format. Hence, the device open fails and the incorrect disk maxiosize was populated. While doing an I/O, machine panics with divide by zero error. RESOLUTION: While resizing a disk having CDS-EFI format or while resizing a CDS disk from less than 1TB to >= 1TB, VxVM now correctly uses device number corresponding to partition 7 of the device. * 2919718 (Tracking ID: 2919714) SYMPTOM: On a THIN lun, vxevac returns 0 without migrating unmounted VxFS volumes. The following error messages are displayed when an unmounted VxFS volumes is processed: VxVM vxsd ERROR V-5-1-14671 Volume v2 is configured on THIN luns and not mounted. Use 'force' option, to bypass smartmove. To take advantage of smartmove for supporting thin luns, retry this operation after mounting the volume. VxVM vxsd ERROR V-5-1-407 Attempting to cleanup after failure ... DESCRIPTION: On a THIN lun, VM will not move or copy data on an unmounted VxFS volumes unless smartmove is bypassed. The vxevac command fails needs to be enhanced to detect unmounted VxFS volumes on THIN luns and to support a force option that allows the user to bypass smartmove. RESOLUTION: The vxevac script has been modified to check for unmounted VxFS volumes on THIN luns prior to performing the migration. If an unmounted VxFS volume is detected the command fails with a non-zero return code and displays a message notifying the user to mount the volumes or bypass smartmove by specifying the force option: VxVM vxevac ERROR V-5-2-0 The following VxFS volume(s) are configured on THIN luns and not mounted: v2 To take advantage of smartmove support on thin luns, retry this operation after mounting the volume(s). Otherwise, bypass smartmove by specifying the '-f' force option. * 2929003 (Tracking ID: 2928987) SYMPTOM: vxconfigd hung is observed when IO failed by OS layer. DESCRIPTION: DMP is supposed to do number of IO retries that are defined by user. When it receives IO failure from OS layer, due to bug it restarts IO without checking IO retry count, thus IO gets stuck in loop infinitely RESOLUTION: Code changes are done in DMP to use the IO retry count defined by user. * 2940448 (Tracking ID: 2940446) SYMPTOM: I/O can hang on volume with space optimized snapshot if the underlying cache object is of very large size. It can also lead to data corruption in cache- object. DESCRIPTION: Cache volume maintains B+ tree for mapping the offset and its actual location in cache object. Copy-on-write I/O generated on snapshot volumes needs to determine the offset of particular I/O in cache object. Due to incorrect type- casting the value calculated for large offset truncates to smaller value due to overflow, leading to data corruption. RESOLUTION: Code changes are done to avoid overflow during offset calculation in cache object. * 2946948 (Tracking ID: 2406096) SYMPTOM: vxconfigd, VxVM configuration daemon, dumps core with the following stack: vol_cbr_oplistfree() vol_clntaddop() vol_cbr_translog() vold_preprocess_request() request_loop() main() DESCRIPTION: vxsnap utility forks a child process and the parent process exits. The child process continues the remaining work as a background process. It does not create a new connection with vxconfigd and continues to use the parent's connection. Since the parent is dead, vxconfigd cleans up its client structure. Corresponding to further requests from child process, vxconfigd tries accessing the client structure that was already freed and hence, dumps core. RESOLUTION: The issue is solved by initiating a separate connection with vxconfigd from the forked child. * 2957608 (Tracking ID: 2671241) SYMPTOM: When the DRL log plex is configured in a volume, vxnotify doesn't report volume enabled message. DESCRIPTION: When the DRL log plex is configured in a volume, we will make a two phase start of the volume; The first is to start plexes and make the volume state DETACHED, then make the volume state ENABLED in the second phase after the log recovery. However when we are notifying the configuration change to the interested client, we are only checking the status change from DISABLED to ENABLED. RESOLUTION: With fix, notification is generated on state change of volume from any state to 'ENABLED' (and any state to 'DISABLED'). * 2979692 (Tracking ID: 2575051) SYMPTOM: In a CVM environment, Master switch or master takeover operations results in panic with below stack. volobject_iogen vol_cvol_volobject_iogen vol_cvol_recover3_start voliod_iohandle voliod_loop kernel_thread DESCRIPTION: Panic happens while accessing fields of stale cache object. The cache recovery process gets initiated by master takeover or master switch operation. In the recovery process, VxVM does not take I/O count on cache objects. In the meanwhile, same cache object can go through transaction while recovery is still in progress. Therefore cache object gets changed as part of transaction and in recovery code path, VxVM tries to access stale cache object resulting in panic. RESOLUTION: This issue is addressed by code changes in cache recovery code path. INSTALLING THE PATCH -------------------- o Before-the-upgrade :- (a) Stop I/Os to all the VxVM volumes. (b) Umount any filesystems with VxVM volumes. (c) Stop applications using any VxVM volumes. For Solaris 10 releases, refer to the man pages for instructions on using 'patchadd' and 'patchrm' scripts provided with Solaris. Any other special or non-generic installation instructions should be described below as special instructions. The followin g example installs a patch to a standalone machine: example# patchadd 142630-18 REMOVING THE PATCH ------------------ The following example removes a patch from a standalone system: example# patchrm 142630-18 KNOWN ISSUES ------------ * Tracking ID: 2223250 SYMPTOM: Node join fails if the recovery for the leaving node is not completed. WORKAROUND: Retry node join after the recovery is completed. SPECIAL INSTRUCTIONS -------------------- You need to use the shutdown command to reboot the system after patch installation or de-installation: shutdown -g0 -y -i6 A Solaris 10 issue may prevent this patch from complete installation. Before installing this VM patch, install the Solaris patch 119254-70 (or a later revision). This Solaris patch fixes packaging, installation and patch utilities. [Sun Bug ID 6337009] Download Solaris 10 patch 119254-70 (or later) from Oracle at https://support.oracle.com If 5.1SP1RP3P1 is installed on the system, it is required to remove 5.1SP1RP3P1 and then install 5.1SP1RP3P2. OTHERS ------ NONE