* * * READ ME * * * * * * Veritas Volume Manager 5.1 SP1 RP3 * * * * * * P-patch 1 * * * Patch Date: 2014-02-26 This document provides the following information: * PATCH NAME * OPERATING SYSTEMS SUPPORTED BY THE PATCH * PACKAGES AFFECTED BY THE PATCH * BASE PRODUCT VERSIONS FOR THE PATCH * SUMMARY OF INCIDENTS FIXED BY THE PATCH * DETAILS OF INCIDENTS FIXED BY THE PATCH * INSTALLATION PRE-REQUISITES * INSTALLING THE PATCH * REMOVING THE PATCH PATCH NAME ---------- Veritas Volume Manager 5.1 SP1 RP3 P-patch 1 OPERATING SYSTEMS SUPPORTED BY THE PATCH ---------------------------------------- HP-UX 11i v3 (11.31) PACKAGES AFFECTED BY THE PATCH ------------------------------ VRTSvxvm VRTSvxvm BASE PRODUCT VERSIONS FOR THE PATCH ----------------------------------- * Veritas Storage Foundation for Oracle RAC 5.1 SP1 * Veritas Storage Foundation Cluster File System 5.1 SP1 * Veritas Storage Foundation 5.1 SP1 * Veritas Storage Foundation High Availability 5.1 SP1 * Veritas Dynamic Multi-Pathing 5.1 SP1 SUMMARY OF INCIDENTS FIXED BY THE PATCH --------------------------------------- Patch ID: PHCO_43824, PHKL_43779 * 2982085 (2976130) Multithreading of the vxconfigd (1M) daemon for HP-UX 11i v3 causes the DMP database to be deleted as part of the device-discovery commands. * 3326516 (2665207) Improve messaging on "vxdisk updateudid" failure on an imported disk group. * 3363231 (3325371) Panic occurs in the vol_multistepsio_read_source() function when snapshots are used. Patch ID: PHCO_43526, PHKL_43527 * 2233987 (2233225) Growing a volume to more than a limit, the default being 1G, does not synchronize plexes for the newly allocated regions of the volume. * 2245645 (2255018) The vxplex(1M) command core dumps during the relayout operation from concat to RAID 5. * 2366130 (2270593) Shared disk group enters the disabled state when the vxconfigd(1M) daemon is restarted on the master node followed by the node join operation. * 2437840 (2283588) Initialization of the mirror on the root disk gives an error message on the IA machine. * 2485252 (2910043) Avoid order 8 allocation by the vxconfigd(1M) daemon while the node is reconfigured. * 2567623 (2567618) The VRTSexplorer dumps core in vxcheckhbaapi/print_target_map_entry. * 2570739 (2497074) The "Configuration daemon error 441" error occurs while trying to stop a volume that uses the vxvol(1M) command on the Cross Platform Data Sharing - Extensible Firmware Interface (CDS-EFI) disks. * 2703384 (2692012) When moving the subdisks by using the vxassist(1M) command or the vxevac(1M) command, if the disk tags are not the same for the source and the destination, the command fails with a generic error message. * 2832887 (2488323) Write on volumes with links could hang if the volume also has snapshots. * 2836974 (2743926) The DMP restore daemon fails to restart during the system boot. * 2847333 (2834046) NFS migration failed due to device reminoring. * 2851354 (2837717) "vxdisk(1M) resize" command fails if 'da name' is specified. * 2860208 (2859470) The Symmetrix Remote Data Facility R2 (SRDF-R2) with the Extensible Firmware Interface (EFI) label is not recognized by Veritas Volume Manager (VxVM) and goes in an error state. * 2881862 (2878876) The vxconfigd daemon dumps core in vol_cbr_dolog() due to race between two threads processing requests from the same client. * 2883606 (2189812) When the 'vxdisk updateudid' command is executed on a disk which is in the 'online invalid' state, the vxconfigd(1M) daemon dumps core. * 2906832 (2398954) The system panics while performing I/O on a VxFS mounted instant snapshot with the Oracle Disk Manager (ODM) SmartSync enabled. * 2916915 (2916911) The vxconfigd(1M) daemon sends a VOL_DIO_READ request before the device is open. This may result in a scenario where the open operation fails but the disk read or write operations proceeds. * 2919718 (2919714) On a thin Logical Unit Number (LUN), the vxevac(1M) command returns 0 without migrating the unmounted-VxFS volumes. * 2929003 (2928987) DMP(Dynamic multipathing) retried IO infinitely causing vxconfigd to hang in case of OS(Operating system) layer failure. * 2934484 (2899173) The vxconfigd(1M) daemon hangs after executing the "vradmin stoprep" command. * 2940448 (2940446) A full file system check (fsck) hangs on I/O in Veritas Volume Manager (VxVM) when the cache object size is very large. * 2946948 (2406096) The vxconfigd daemon dumps core in vol_cbr_oplistfree() function. * 2950826 (2915063) During the detachment of a plex of a volume in the Cluster Volume Manager (CVM) environment, the system panics. * 2950829 (2921816) System panics while starting replication after disabling the DCM volumes. * 2957608 (2671241) When the Dirty Region Logging (DRL) log plex is configured in a volume, the vxnotify(1M) command does not report the volume enabled message. * 2960650 (2932214) After performing the "vxdisk resize" operation from less than 1TB to greater than or equal to 1TB on a disk with SIMPLE or SLICED format, that has the Sun Microsystems Incorporation (SMI) label, the disk enters the "online invalid" state. * 2973632 (2973522) At cable connect on port1 of dual-port Fibre Channel Host Bus Adapters (FC HBA), paths via port2 are marked as SUSPECT. * 2979692 (2575051) In a Cluster Volume Manager (CVM) environment, the master switch or the master nodes takeover operations results in panic. * 2983901 (2907746) File Descriptor leaks are observed with the device-discovery command of VxVM. * 2986939 (2530536) When any path of the ASM disk is disabled, this results in multiple DMP reconfigurations. * 2990730 (2970368) Enhance handling of SRDF-R2 Write-Disabled devices in DMP. * 3000033 (2631369) In a Cluster Volume Manager (CVM) environment, when the vxconfigd(1M) daemon is started in the single-threaded mode, a cluster reconfiguration such as a node join and Veritas Volume Manager(VxVM) operations on shared disk group take more time to complete. * 3012938 (3012929) The vxconfigbackup(1M) command gives errors when disk names are changed. * 3041018 (3041014) Beautify error messages seen during relayout operation. * 3043203 (3038684) The restore daemon enables the paths of Business Continuance Volumes-Not Ready (BCV-NR) devices. * 3047474 (3047470) The device /dev/vx/esd is not recreated on reboot with the latest major number, if it is already present on the system. * 3047803 (2969844) The device discovery failure should not cause the DMP database to be destroyed completely. * 3059145 (2979824) The vxdiskadm(1M) utility bug results in the exclusion of the unintended paths. * 3069507 (3002770) While issuing a SCSI inquiry command, NULL pointer dereference in DMP causes system panic. * 3072890 (2352517) The system panics while excluding a controller from Veritas Volume Manager (VxVM) view. * 3077756 (3077582) Interfaces to get and reset the failio flag on disk after path failure recovered. * 3083188 (2622536) Under a heavy I/O load, write I/Os on the Veritas Volume Replicator (VVR) Primary logowner takes a very long time to complete. * 3083189 (3025713) In a Veritas Volume Replicator (VVR) environment, adddisk / rmdisk taking time and I/O hang during the command execution. * 3087113 (3087250) In a CVM environment, during the node join operation when the host node joins the cluster node, this takes a long time to execute. * 3087777 (3076093) The patch upgrade script "installrp" can panic the system while doing a patch upgrade. * 3100378 (2921147) The udid_mismatch flag is absent on a clone disk when source disk is unavailable. * 3139302 (3139300) Memory leaks are observed in the device discovery code path of VxVM. * 3140407 (2959325) The vxconfigd(1M) daemon dumps core while performing the disk group move operation. * 3140735 (2861011) The "vxdisk -g resize " command fails with an error for the Cross-platform Data Sharing(CDS) formatted disk. * 3142325 (3130353) Continuous disable or enable path messages are seen on the console for EMC Not Ready (NR) devices. * 3144794 (3136272) The disk group import operation with the "-o noreonline" option takes additional import time. * 3147666 (3139983) I/Os are returned as failure to the application from the Dynamic Multi-Pathing (DMP) driver without retrying for the "iotimeout" value set by the "timebound" recovery option. The default "iotimeout" value is 300 seconds. * 3158099 (3090667) The system panics or hangs while executing the "vxdisk -o thin, fssize list" command as part of Veritas Operations Manager (VOM) Storage Foundation (SF) discovery. * 3158780 (2518067) The disabling of a switch port of the last-but-one active path to a Logical Unit Number (LUN) disables the Dynamic Multi-Pathing (DMP) node, and results in I/O failures on the DMP node even when an active path is available for the I/O. * 3158781 (2495338) The disks having the hpdisk format can't be initialized with private region offset other than 128. * 3158790 (2779580) Secondary node gives configuration error (no Primary RVG) after reboot of master node on Primary site. * 3158793 (2911040) The restore operation from a cascaded snapshot leaves the volume in unusable state if any cascaded snapshot is in the detached state. * 3158794 (1982965) The vxdg(1M) command import operation fails if the disk access) name is based on the naming scheme which is different from the prevailing naming scheme on the host. * 3158798 (2825102) CVM reconfiguration and VxVM transaction code paths can simultaneously access volume device list resulting in data corruption. * 3158799 (2735364) The "clone_disk" disk flag attribute is not cleared when a cloned disk group is removed by the "vxdg destroy " command. * 3158800 (2886333) The vxdg(1M) join operation should not allow mixing of clone and non-clone disks in a disk group. * 3158802 (2091520) The ability to move the configdb placement from one disk to another using "vxdisk set keepmeta=[always|skip|default]" command. * 3158804 (2236443) Disk group import failure should be made fencing aware, in place of VxVM vxdmp V-5-0-0 i/o error message. * 3158809 (2969335) The node that leaves the cluster node while the instant operation is in progress, hangs in the kernel and cannot join back to the cluster node unless it is rebooted. * 3158813 (2845383) The site gets detached if the plex detach operation is performed with the site- consistency set to off. * 3158818 (2909668) In case of multiple sets of the cloned disks of the same source disk group, the import operation on the second set of the clone disk fails, if the first set of the clone disks were imported with "updateid". * 3158819 (2685230) In a Cluster Volume Replicator (CVR) environment, if the SRL is resized and the logowner is switched to and from the master node to the slave node, then there could be a SRL corruption that leads to the Rlink detach. * 3158821 (2910367) When SRL on the secondary site disabled, secondary panics. * 3164583 (3065072) Data loss occurs during the import of a clone disk group, when some of the disks are missing and the import "useclonedev" and "updateid" options are specified. * 3164596 (2882312) If an SRL fault occurs in the middle of an I/O load, and you immediately issue a read operation on data written during the SRL fault, the system returns old data. * 3164601 (2957555) The vxconfigd(1M) daemon on the CVM master node hangs in the userland during the vxsnap(1M) restore operation. * 3164610 (2966990) In a Veritas Volume Replicator (VVR) environment, the I/O hangs at the primary side after multiple cluster reconfigurations are triggered in parallel. * 3164611 (2962010) The replication hangs when the Storage Replicator Log (SRL) is resized. * 3164612 (2746907) The vxconfigd(1M) daemon can hang under the heavy I/O load on the master node during the reconfiguration. * 3164613 (2814891) The vxconfigrestore(1M) utility does not work properly if the SCSI page 83 inquiry returns more than one FPCH name identifier for a single LUN. * 3164615 (3102114) A system crash during the 'vxsnap restore' operation can cause the vxconfigd (1M) daemon to dump core after the system reboots. * 3164616 (2992667) When new disks are added to the SAN framework of the Virtual Intelligent System (VIS) appliance and the Fibre Channel (FC) switcher is changed to the direct connection, the "vxdisk list" command does not show the newly added disks even after the "vxdisk scandisks" command is executed. * 3164617 (2938710) The vxassist(1M) command dumps core during the relayout operation . * 3164618 (3058746) When the DMP disks of one RAID volume group is disabled, the I/O of the other volume group hangs. * 3164619 (2866059) The error messages displayed during the resize operation by using the vxdisk (1M) command needs to be enhanced. * 3164620 (2993667) Veritas Volume Manager (VxVM) allows setting the Cross-platform Data Sharing (CDS) attribute for a disk group even when a disk is missing, because it experienced I/O errors. * 3164624 (3101419) In CVR environment, I/Os to the data volumes in an RVG experience may temporary hang during the SRL overflow with the heavy I/O load. * 3164626 (3067784) The grow and shrink operations by the vxresize(1M) utility may dump core in vfprintf() function. * 3164627 (2787908) The vxconfigd(1M) daemon dumps core when the slave node joins the CVM cluster node. * 3164628 (2952553) Refresh of a snapshot should not be allowed from a different source volume without force option. * 3164629 (2855707) I/O hangs with the SUN6540 array during the path fault injection test. * 3164631 (2933688) When the 'Data corruption protection' check is activated by Dynamic Mult- Pathing (DMP), the device- discovery operation aborts, but the I/O to the affected devices continues, this results in data corruption. * 3164633 (3022689) The vxbrk_rootmir(1M) utility succeeds with the following error message: " ioscan: /dev/rdsk/eva4k6k0_48s2: No such file or directory". * 3164637 (2054606) During the DMP driver unload operation the system panics. * 3164639 (2898324) UMR errors reported by Purify tool in "vradmind migrate" command. * 3164643 (2815441) The file system mount operation fails when the volume is resized and the volume has a link to volume. * 3164645 (3091916) The Small Computer System Interface (SCSI) I/O errors overflow the syslog. * 3164646 (2893530) With no VVR configuration, when system is rebooted, it panicked. * 3164647 (3006245) While executing a snapshot operation on a volume which has 'snappoints' configured, the system panics infrequently. * 3164650 (2812161) In a Veritas Volume Replicator (VVR) environment, after the Rlink is detached, the vxconfigd(1M) daemon on the secondary host may hang. * 3164759 (2635640) The "vxdisksetup(1M) -ifB" command fails on Enclosuer Based Naming (EBN) with the legacy tree removed. * 3164790 (2959333) The Cross-platform Data Sharing (CDS) flag is not listed for disabled CDS disk groups. * 3164792 (1783763) In a Veritas Volume Replicator (VVR) environment, the vxconfigd(1M) daemon may hang during a configuration change operation. * 3164793 (3015181) I/O hangs on both the nodes of the cluster when the disk array is disabled. * 3164874 (2986596) The disk groups imported with mix of standard and clone Logical Unit Numbers (LUNs)s may lead to data corruption. * 3164880 (3031796) Snapshot reattach operation fails if any other snapshot of the primary volume is not accessible. * 3164881 (2919720) The vxconfigd(1M) command dumps core in the rec_lock1_5() function. * 3164883 (2933476) The vxdisk(1M) command resize fails with a generic error message. Failure messages need to be more informative. * 3164884 (2935771) In a Veritas Volume Replicator (VVR) environment, the 'rlinks' disconnect after switching the master node. * 3164911 (1901838) After installation of a license key that enables multi-pathing, the state of the controller is shown as DISABLED in the command- line-interface (CLI) output for the vxdmpadm(1M) command. * 3164916 (1973983) The vxunreloc(1M) command fails when the Data Change Object (DCO) plex is in DISABLED state. * 3178903 (2270686) In a CVM environment, during the node join operation when the host node joins the cluster node, this takes a long time to execute. * 3181315 (2898547) The 'vradmind' process dumps core on the Veritas Volume Replicator (VVR) secondary site in a Clustered Volume Replicator (CVR) environment, when Logowner Service Group on VVR Primary Site is shuffled across its CVM (Clustered Volume Manager) nodes. * 3181318 (3146715) Rlinks do not connect with Network Address Translation (NAT) configurations on Little Endian Architecture. * 3183145 (2477418) In VVR environment, logowner node on the secondary panics in low memory situations. * 3189869 (2959733) Handling the device path reconfiguration in case the device paths are moved across LUNs or enclosures to prevent the vxconfigd(1M) daemon coredump. * 3224030 (2433785) In a CVM environment, the node join operation to the cluster node fails intermittently. * 3227719 (2588771) The system panics when the multi-controller enclosure is disabled. * 3235365 (2438536) Reattaching a site after it was either manually detached or detached due to storage inaccessibility, causes data corruption. * 3238094 (3243355) The vxres_lvmroot(1M) utility which restores the Logical Volume Manager (LVM) root disk from the VxVM root disk fails. * 3240788 (3158323) In a VVR environment, with multiple secondaries, if SRL overflows for rlinks at different times, it may result in the vxconfigd(1M) daemon to hang on the primary node. * 3242839 (3194358) The continuous messages displayed in the syslog file with EMC not-ready (NR) LUNs. * 3245608 (3261485) The vxcdsconvert(1M) utility failed with the error "Unable to initialize the disk as a CDS disk". * 3247983 (3248281) When the "vxdisk scandisks" or "vxdctle enable" commands are run consecutively the "VxVM vxdisk ERROR V-5-1-0 Device discovery failed." error is encountered. * 3253306 (2876256) The "vxdisk set mediatype" command fails with the new naming scheme. * 3256806 (3259926) The vxdmpadm(1M) command fails to enable the paths when the '-f' option is provided. DETAILS OF INCIDENTS FIXED BY THE PATCH --------------------------------------- This patch fixes the following Symantec incidents: Patch ID: PHCO_43824, PHKL_43779 * 2982085 (Tracking ID: 2976130) SYMPTOM: The device-discovery commands such as "vxdisk scandisks" and "vxdctl enable" may cause the entire DMP database to be deleted. This causes the VxVM I/O errors and file systems to get disabled. For instances where VxVM manages the root disk(s), a system hang occurs. In a Serviceguard/SGeRAC environment integrated with CVM and/or CFS, VxVM I/O failures would typically lead to a Serviceguard INIT and/or a CRS TOC (if the voting disks sit on VxVM volumes). Syslog shows the removal of arrays from the DMP database as following: vmunix: NOTICE: VxVM vxdmp V-5-0-0 removed disk array 000292601518, datype = EMC In addition to messages that indicate VxVM I/O errors and file systems are disabled. DESCRIPTION: VxVM's vxconfigd(1M) daemon uses HP's libIO(3X) APIs such as io_search() and io_search_array() functions to claim devices that are attached to the host. Although, vxconfigd(1M) is multithreaded, it uses a non-thread safe version of the libIO(3X) APIs. A race condition may occur when multiple vxconfigd threads perform device discovery. This results in a NULL value returned to the libIO (3X) APIs call.VxVM interprets the NULL return value as an indication of none of the devices being attached and proceeds to delete all the devices previously claimed from the DMP database. RESOLUTION: The vxconfigd(1M) daemon, as well as the event source daemon vxesd(1M), is now linked with HP's thread-safe libIO(3X) library. This prevents the race condition among multiple vxconfigd threads that perform device discovery. Please refer to HP's customer bulletin c03585923 for a list of other software components required for a complete * 3326516 (Tracking ID: 2665207) SYMPTOM: When a user tries to update the "udid" on a disk that is an imported disk group, no message is displayed to convey that the operation is not allowed. DESCRIPTION: The code exits without displaying the error message, and the user is unaware of the restriction imposed on an imported disk group. RESOLUTION: The code has been modified to display a message that conveys to the user that the operation is not allowed. The following error message is displayed: "VxVM vxdisk ERROR V-5-1-17080 The UDID for device emc0_0f36 cannot be updated because the disk is part of an imported diskgroup VxVM vxdisk INFO V-5-1-17079 If you require the UDID to be updated, please deport the diskgroup. The DDL and on-disk UDID content can be inspected by issuing the 'vxdisk -o udid list' command" * 3363231 (Tracking ID: 3325371) SYMPTOM: Panic occurs in the vol_multistepsio_read_source() function when VxVM's FastResync feature is used. The stack trace observed is as following: vol_multistepsio_read_source() vol_multistepsio_start() volkcontext_process() vol_rv_write2_start() voliod_iohandle() voliod_loop() kernel_thread() DESCRIPTION: When a volume is resized, Data Change Object (DCO) also needs to be resized. However, the old accumulator contents are not copied into the new accumulator. Thereby, the respective regions are marked as invalid. Subsequent I/O on these regions triggers the panic. RESOLUTION: The code is modified to appropriately copy the accumulator contents during the resize operation. Patch ID: PHCO_43526, PHKL_43527 * 2233987 (Tracking ID: 2233225) SYMPTOM: Growing a volume to more than a limit, the default being 1 GB does not synchronize plexes for the newly allocated regions of the volume. DESCRIPTION: There was a coding issue that skipped setting the hint to re-synchronize the plexes for certain scenarios. Any further reads on the newly allocated regions returns different results, this depends upon the plex that is considered for the read operation. RESOLUTION: The code is modified to set the re-synchronize hint correctly for any missed scenarios. * 2245645 (Tracking ID: 2255018) SYMPTOM: The vxplex(1M) command dumps core during the relayout operation from "concat" to RAID 5. The following stack trace is observed: _int_malloc () from /lib/libc.so.6 malloc () from /lib/libc.so.6 vxvmutil_xmalloc () xmalloc () raid5_clear_logs () do_att () main () DESCRIPTION: During the relayout operation when the vxplex(1M) utility builds the configuration, it retries the transaction. The transaction gets restarted. However, the cleanup is not done correctly before the restart. Thus, the vxplex (1M) command dumps core. RESOLUTION: The code is modified so that the proper cleanup is done before the transaction is restarted. * 2366130 (Tracking ID: 2270593) SYMPTOM: In a CVM environment, during the node join operation if the vxconfigd(1M) daemon on the master node is restarted, the shared disk groups may get disabled. The error is displayed in syslog is as following: "Error in cluster processing" DESCRIPTION: When the vxconfigd(1M) daemon is restarted on the CVM master, all the shared disk groups are re-imported. During this process, if a cluster reconfiguration happens and not all the slave nodes have re-established the connection to the master's vxconfigd, the re-import of the shared disk groups may fail with error, leaving the disk groups in the disabled state. RESOLUTION: The code is modified so that during the vxconfigd(1M) demon restart operation, if the re-import of the disk groups encounters the "Error in cluster processing" error, then the re-import operation of the disk group is deferred until all the slave nodes re-establish the connection to the master node. * 2437840 (Tracking ID: 2283588) SYMPTOM: The vxdisksetup (1M) command used for the initialization of the mirror on the root disk fails with the following error message on the IA machine: VxVM vxislvm ERROR V-5-1-2604 cannot open /dev/rdisk/disk*_p2 DESCRIPTION: The reported error occurs due to the unavailability of the OS disk-device node corresponding to the disk slice. The situation can occur when the Logical Volume Manager (LVM) rooted disk is copied as the VxVM rooted disk, using the vxcp_lvmroot(1M) command. Later, a root mirror is created using the vxrootmir (1M) command when the system is booted from the created VxVM rootdisk. The vxrootmir(1M) command is executed on the VxVM rooted disk. After creating slices for the target disk, the device-node files are created on the VxVM rooted disk. However, the OS on the LVM rooted disk is unaware of the newly created slices of the disk initialized during 'vxrootmir' on the VxVM rooted disk. Thus, the OS on the LVM rooted disk does not contain or create the device nodes for the new slices of the 'vxrootmir'ed 'disk. When booted from the LVM rooted disk, the vxdisksetup(1M) command uses the idisk(1M) command to detect if the disk is a sliced disk, and tries to access the corresponding slices through the OS device nodes. Assuming, that the device nodes are present. Thereby, an error is encountered. RESOLUTION: The code is modified such that the vxdisksetup(1M) command uses the DMP devices to access the disk slices. Since, DMP creates the device nodes by reading the disk format after the boot, the DMP disk nodes are always created for the disk slices. * 2485252 (Tracking ID: 2910043) SYMPTOM: When VxVM operations like the plex attach, snapshot resync or reattach are performed, frequent swap-in and swap-out activities are observed due to the excessive memory allocation by the vxiod daemons. DESCRIPTION: When the VxVM operations such as the plex attach, snapshot resync, or reattach operations are performed, the default I/O size of the operation is 1 MB and VxVM allocates this memory from the OS. Such huge memory allocations can result in swap-in and swap-out of pages and are not very efficient. When many operations are performed, the system may not work very efficiently. RESOLUTION: The code is modified to make use of VxVM's internal I/O memory pool, instead of directly allocating the memory from the OS. * 2567623 (Tracking ID: 2567618) SYMPTOM: The VRTSexplorer dumps core with the segmentation fault in checkhbaapi/print_target_map_entry. The stack trace is observed as follows: print_target_map_entry() check_hbaapi() main() _start() DESCRIPTION: The checkhbaapi utility uses the HBA_GetFcpTargetMapping() API which returns the current set of mappings between the OS and the Fiber Channel Protocol (FCP) devices for a given Host Bus Adapter (HBA) port. The maximum limit for mappings is set to 512 and only that much memory is allocated. When the number of mappings returned is greater than 512, the function that prints this information tries to access the entries beyond that limit, which results in core dumps. RESOLUTION: The code is modified to allocate enough memory for all the mappings returned by the HBA_GetFcpTargetMapping() API. * 2570739 (Tracking ID: 2497074) SYMPTOM: If the resize operation is performed on disks of size greater than 1 TB, the Cross Platform Data Sharing -Extensible Firmware Interface (CDS-EFI) disks and the "vxvol stop all" command is executed. The following error message is displayed: "VxVM vxvol ERROR V-5-1-10128 Configuration daemon error 441" DESCRIPTION: The issue is observed only on the CDS-EFI disks. The correct EFI flags are not set when the resize operation is in the commit phase. As a result, the disk offset which is 24 K for the CDS-EFI disks is not taken into account for the private region I/Os. The entire write I/Os are shifted to 24 K . This results in the private region corruption. RESOLUTION: The code is modified to set the required flags for the CDS-EFI disks. * 2703384 (Tracking ID: 2692012) SYMPTOM: When moving the subdisks by using the vxassist(1M) command or the vxevac(1M) command, if the disk tags are not the same for the source and the destination, the command fails with a generic error message. The error message does not provide the reason for the failure of the operation. The following error message is displayed: VxVM vxassist ERROR V-5-1-438 Cannot allocate space to replace subdisks DESCRIPTION: When moving the subdisks using the "vxassist move" command, if no target disk is specified, it uses the available disks from the disk group to move. If the disks have the site tag set and the value of the site- tag attribute is not the same, the subsequent move operation using the vxassist(1M) command is expected to fail. However, it fails with the generic message that does not provide the reason for the failure of the operation. RESOLUTION: The code is modified so that a new error message is introduced to specify that the disk failure is due to the mismatch in the site-tag attribute. The enhanced error message is as follows: VxVM vxassist ERROR V-5-1-0 Source and/or target disk belongs to site, can not move over sites * 2832887 (Tracking ID: 2488323) SYMPTOM: Application gets stuck when it waits for I/O on a volume which has links and snapshots configured. DESCRIPTION: When the configured volume has links and snapshots, the write I/Os which are larger than the region size, need to be split. With linked volumes, it is possible to lose track of logging done for individual split I/Os. So, the I/O can wait indefinitely for logging to complete for the individual split I/Os, causing the application to hang. RESOLUTION: The code is modified to split the I/Os in such a way that each individual split I/O tracks its own logging status. * 2836974 (Tracking ID: 2743926) SYMPTOM: During the system boot, the DMP restore daemon fails to restart. The following error message is displayed: "VxVM vxdmpadm ERROR V-5-1-2111 Invalid argument". DESCRIPTION: During the system boot, 'vxvm-sysboot' performs the 'vxdmpadm stop restore' operation and then the 'vxdmpadm start restore' operation. Without the "/etc/vx/dmppolicy.info" file, this restart fails. As the file system is read only during the system boot, creation of the "/etc/vx/dmppolicy.info" file fails with the invalid return values. These invalid return values are used as arguments for starting the restore daemon. This results in the invalid argument error and the restore daemon fails to start. RESOLUTION: The code is modified to return the appropriate value so that the restore daemon is started with the default values. * 2847333 (Tracking ID: 2834046) SYMPTOM: Beginning from the 5.1 release, VxVM automatically reminors the disk group and its objects in certain cases, and displays the following error message: vxvm:vxconfigd: V-5-1-14563 Disk group mydg: base minor was in private pool, will be change to shared pool. NFS clients that attempt to reconnect to a file system on the disk group's volume fail because the file handle becomes stale. The NFS client needs to re- mount the file system and probably a reboot to clear this. The issue is observed under the following situations: aC/ When a private disk group is imported as shared, or a shared disk group is imported as private. aC/ After upgrading from a version of VxVM prior to 5.1 DESCRIPTION: Since the HxRT 5.1 SP1 release, the minor-number space is divided into two pools, one for the private disk groups and the other for the shared disk groups. During the disk group import operation, the disk group base- minor numbers are adjusted automatically, if not in the correct pool. In a similar manner, the volumes in the disk groups are also adjusted. This behavior reduces many minor conflicting cases during the disk group import operation. However, in the NFS environment, it makes all the file handles on the client side stale. Subsequently, customers had to unmount files systems and restart the applications. RESOLUTION: The code is modified to add a new tunable "autoreminor". The default value of the autoreminor tunable is "on". Use "vxdefault set autoreminor off" to turn it off for NFS server environments. If the NFS server is in a CVM cluster, make the same change on all the nodes. * 2851354 (Tracking ID: 2837717) SYMPTOM: The "vxdisk resize" command fails if 'da name' is specified. An example is as following: # vxdisk list eva4k6k1_46 | grep ^pub pubpaths: block=/dev/vx/dmp/eva4k6k1_46 char=/dev/vx/rdmp/eva4k6k1_46 public: slice=0 offset=32896 len=680736 disk_offset=0 # vxdisk resize eva4k6k1_46 length=813632 # vxdisk list eva4k6k1_46 | grep ^pub pubpaths: block=/dev/vx/dmp/eva4k6k1_46 char=/dev/vx/rdmp/eva4k6k1_46 public: slice=0 offset=32896 len=680736 disk_offset=0 After resize operation len=680736 is not changed. DESCRIPTION: The scenario for 'da name' is not handled in the resize code path. As a result, the "vxdisk(1M) resize" command fails if the 'da name' is specified. RESOLUTION: The code is modified such that if 'dm name' is not specified to resize, then only the 'da name' specific operation is performed. * 2860208 (Tracking ID: 2859470) SYMPTOM: The EMC SRDF-R2 disk may go in error state when the Extensible Firmware Interface (EFI) label is created on the R1 disk. For example: R1 site # vxdisk -eo alldgs list | grep -i srdf emc0_008c auto:cdsdisk emc0_008c SRDFdg online c1t5006048C5368E580d266 srdf-r1 R2 site # vxdisk -eo alldgs list | grep -i srdf emc1_0072 auto - - error c1t5006048C536979A0d65 srdf-r2 DESCRIPTION: Since R2 disks are in write protected mode, the default open() call made for the read-write mode fails for the R2 disks, and the disk is marked as invalid. RESOLUTION: The code is modified to change Dynamic Multi-Pathing (DMP) to be able to read the EFI label even on a write-protected SRDF-R2 disk. * 2881862 (Tracking ID: 2878876) SYMPTOM: The vxconfigd(1M) daemon dumps core with the following stack trace: vol_cbr_dolog () vol_cbr_translog () vold_preprocess_request () request_loop () main () DESCRIPTION: This core is a result of a race between the two threads which are processing the requests from the same client. While one thread completes processing a request and is in the phase of releasing the memory used, the other thread processes a "DISCONNECT" request from the same client. Due to the race condition, the second thread attempts to access the memory released and dumps core. RESOLUTION: The code is modified by protecting the common data of the client by a mutex. * 2883606 (Tracking ID: 2189812) SYMPTOM: When the 'vxdisk updateudid' command is executed on a disk which is in the 'online invalid' state, the vxconfigd(1M) daemon dumps core with the following stack trace: priv_join() req_disk_updateudid() request_loop() main() DESCRIPTION: When 'udid' is updated, the null check is not done for an internal data structure. As a result, the vxconfigd(1M) daemon dumps core. RESOLUTION: The code is modified to add null checks for the internal data structure. * 2906832 (Tracking ID: 2398954) SYMPTOM: The system panics while doing I/O on a Veritas File System (VxFS) mounted instant snapshot with the Oracle Disk Manager (ODM) SmartSync enabled. The following stack trace is observed: panic: post_hndlr(): Unresolved kernel interruption cold_vm_hndlr bubbledown as_ubcopy privlbcopy volkio_to_kio_copy vol_multistepsio_overlay_data vol_multistepsio_start voliod_iohandle voliod_loop kthread_daemon_startup DESCRIPTION: Veritas Volume Manager (VxVM) uses the fields av_back and av_forw of io buf structure to store its private information. VxFS also uses these fields to chain I/O buffers before passing I/O to VxVM. When an I/O is received at VxVM layer it always resets these fields. But if the ODM SmartSync is enabled, VxFS uses a special strategy routine to pass on hints to VxVM. Due to a bug in the special strategy routine, the av_back and av_forw fields are not reset and points to a valid buffer in VxFS I/O buffer chain. VxVM interprets these fields (av_back, av_forw) wrongly and modifies its contents which in turn corrupts the next buffer in the chain leading to the panic. RESOLUTION: The av_back and av_forw fields of io buf structure are reset in the special strategy routine. * 2916915 (Tracking ID: 2916911) SYMPTOM: The vxconfigd(1M) deamon triggered a Data TLB Fault panic with the following stack trace: _vol_dev_strategy volsp_strategy vol_dev_strategy voldiosio_start volkcontext_process volsiowait voldio vol_voldio_read volconfig_ioctl volsioctl_real volsioctl vols_ioctl spec_ioctl vno_ioctl ioctl syscall DESCRIPTION: The kernel_force_open_disk() function checks if the disk device is open. The device is opened only if not opened earlier. When the device is opened, it calls the kernel_disk_load() function which in-turn calls the VOL_NEW_DISK ioctl () function. If the VOL_NEW_DISK ioctl fails, the error is not handled correctly as the return values are not checked. This may result in a scenario where the open operation fails but the disk read or write operation proceeds. RESOLUTION: The code is modified to handle the VOL_NEW_DISK ioctl. If the ioctl fails during the open operation of the device that does not exist, then the read or write operations are not allowed on the disk. * 2919718 (Tracking ID: 2919714) SYMPTOM: On a thin Logical Unit Number (LUN), the vxevac(1M) command returns 0 without migrating the unmounted-VxFS volumes. The following error messages are displayed when the unmounted-VxFS volumes are processed: VxVM vxsd ERROR V-5-1-14671 Volume v2 is configured on THIN luns and not mounted. Use 'force' option, to bypass smartmove. To take advantage of smartmove for supporting thin luns, retry this operation after mounting the volume. VxVM vxsd ERROR V-5-1-407 Attempting to cleanup after failure ... DESCRIPTION: On a thin LUN, VxVM does not move or copy data on the unmounted-VxFS volumes unless the 'smartmove' is bypassed. The vxevac(1M) command error messages need to be enhanced to detect the unmounted-VxFS volumes on thin LUNs, and to support a 'force' option that allows the user to bypass the 'smartmove'. RESOLUTION: The vxevac script is modified to check for the unmounted-VxFS volumes on thin LUNs, before the migration is performed. If the unmounted-VxFS volume is detected, the vxevac(1M) command fails with a non-zero return code. Subsequently, a message is displayed that notifies the user to mount the volumes or bypass the 'smartmove' by specifying the 'force' option. The rectified error message is displayed as following: VxVM vxevac ERROR V-5-2-0 The following VxFS volume(s) are configured on THIN luns and not mounted: v2 To take advantage of smartmove support on thin luns, retry this operation after mounting the volume(s). Otherwise, bypass smartmove by specifying the '-f' force option. * 2929003 (Tracking ID: 2928987) SYMPTOM: vxconfigd hung is observed when IO failed by OS layer. DESCRIPTION: DMP is supposed to do number of IO retries that are defined by user. When it receives IO failure from OS layer, due to bug it restarts IO without checking IO retry count, thus IO gets stuck in loop infinitely RESOLUTION: Code changes are done in DMP to use the IO retry count defined by user. * 2934484 (Tracking ID: 2899173) SYMPTOM: In a Clustered Volume Replicator (CVR) environment, an Storage Replicator Log (SRL) failure may cause the vxconfigd(1M) daemon to hang. This eventually causes the 'vradmin stoprep' command to hang. DESCRIPTION: The 'vradmin stoprep' command hangs because of the vxconfigd(1M) daemon that waits indefinitely during a transaction. The transaction waits for I/O completion on SRL. An error handler is generated to handle the I/O failure on the SRL. But if there is an ongoing transaction, the error is not handled properly. This causes the transaction to hang. RESOLUTION: The code is modified so that when an SRL failure is encountered, the transaction itself handles the I/O error on SRL. * 2940448 (Tracking ID: 2940446) SYMPTOM: The I/O may hang on a volume with space optimized snapshot if the underlying cache object is of a very large size ( 30 TB ). It can also lead to data corruption in the cache-object. The following stack is observed: pvthread() et_wait() uphyswait() uphysio() vxvm_physio() volrdwr() volwrite() vxio_write() rdevwrite() cdev_rdwr() spec_erdwr() spec_rdwr() vnop_rdwr() vno_rw() rwuio() rdwr() kpwrite() ovlya_addr_sc_flih_main() DESCRIPTION: The cache volume maintains a B+ tree for mapping the offset and its actual location in the cache object. Copy-on-write I/O generated on snapshot volumes needs to determine the offset of the particular I/O in the cache object. Due to the incorrect type-casting, the value calculated for the large offset truncates to the smaller value due to the overflow, leading to data corruption. RESOLUTION: The code is modified to avoid overflow during the offset calculation in the cache object. It is advised to create multiple cache objects of around 10 TB, rather than creating a single cache object of a very large size. * 2946948 (Tracking ID: 2406096) SYMPTOM: The vxconfigd(1M) daemon dumps core with the following stack trace: vol_cbr_oplistfree() vol_clntaddop() vol_cbr_translog() vold_preprocess_request() request_loop() main() DESCRIPTION: The vxsnap utility forks a child process and the parent process exits. The child process continues the remaining work as the background process. It does not create a new connection with the vxconfigd (1M) daemon and continues to use the parents connection. Since the parent is dead, the vxconfigd(1M) daemon cleans up the client structure. Corresponding to further requests from the child process, the vxconfigd(1M) daemon tries to access the client structure that is already freed. As a result, the vxconfigd daemon dumps core. RESOLUTION: The code is modified by initiating a separate connection with the vxconfigd(1M) daemon from the forked child. * 2950826 (Tracking ID: 2915063) SYMPTOM: During the detachment of a plex of a volume in the Cluster Volume Manager (CVM) environment, the master node panics with the following stack trace: vol_klog_findent() vol_klog_detach() vol_mvcvm_cdetsio_callback() vol_klog_start() voliod_iohandle() voliod_loop() DESCRIPTION: During the plex-detach operation, VxVM searches the plex object to be detached in the kernel. If any transaction is in progress on any disk group in the system, an incorrect plex object may be selected. This results in dereferencing of the invalid addresses and causes the system to panic. RESOLUTION: The code is modified to make sure that the correct plex object is selected. * 2950829 (Tracking ID: 2921816) SYMPTOM: In a VVR environment, if there is Storage Replicator Log (SRL) overflow then the Data Change Map (DCM) logging mode is enabled. For such instances, if there is an I/O failure on the DCM volume then the system panics with the following stack trace: vol_dcm_set_region() vol_rvdcm_log_update() vol_rv_mdship_srv_done() volsync_wait() voliod_loop() ... DESCRIPTION: There is a race condition where the DCM information is accessed at the same time when the DCM I/O failure is handled. This results in panic. RESOLUTION: The code is modified to handle the race condition. * 2957608 (Tracking ID: 2671241) SYMPTOM: When the Dirty Region Logging (DRL) log plex is configured in a volume, the vxnotify(1M) command does not report the volume enabled message. DESCRIPTION: When the DRL log plex is configured in a volume, a two phase start of the volume is made. First plexes are started and the volume state is marked as DETACHED. In the second phase, after the log recovery, the volume state is marked as ENABLED. However, during the notification of the configuration change to the interested client, only the status change from DISABLED to ENABLED is checked. RESOLUTION: The code is modified to generate a notification on state change of a volume from any state to ENABLED and any state to DISABLED. * 2960650 (Tracking ID: 2932214) SYMPTOM: After the "vxdisk resize" operation is performed from less than 1 TB to greater than or equal to 1 TB on a disk with SIMPLE or SLICED format, that has the Sun Microsystems Incorporation (SMI) label, the disk enters the "online invalid" state. DESCRIPTION: VxVM code is not geared enough to resize the SIMPLE or SLICED disks beyond 1 TB. RESOLUTION: The code is modified to prevent the resize of the SIMPLE or SLICED disks with the SMI label from less than 1 TB to greater than or equal to 1 TB. * 2973632 (Tracking ID: 2973522) SYMPTOM: At cable connect on one of the ports of a dual-port Fibre Channel Host Bus Adapters (FC HBA), paths that go through the other port are marked as SUSPECT. DMP does not issue I/O on such paths until the next restore daemon cycle confirms that the paths are functioning. DESCRIPTION: When a cable is connected at one of the ports of a dual-port FC HBA, HBA- Registered State Change Notification (RSCN) event occurs on the other port. When the RSCN event occurs, DMP marks the paths as SUSPECT that goes through that port. RESOLUTION: The code is modified so that the RSCN events that goes through the other port are not marked as SUSPECT. * 2979692 (Tracking ID: 2575051) SYMPTOM: In a CVM environment, master switch or master node takeover operations result in a panic with the following stack trace: volobject_iogen vol_cvol_volobject_iogen vol_cvol_recover3_start voliod_iohandle voliod_loop kernel_thread DESCRIPTION: The panic occurs while accessing the fields of a stale cache-object which is a part of a shared disk group. The cache-object recovery process gets initiated during a master node takeover or master switch operation. If some operation is done on the cache object, like creating a space optimized snapshot over it, while the recovery is in progress, the cache object gets changed. A panic can occur while accessing the changed fields in the cache-object. RESOLUTION: The code is modified to block the operations on a cache object while a cache- object recovery is in progress. * 2983901 (Tracking ID: 2907746) SYMPTOM: At device discovery, the vxconfigd(1M) daemon allocates file descriptors for open instances of "/dev/config", but does not always close them after use, this results in a file descriptor leak over time. DESCRIPTION: Before any API of the "libIO" library is called, the io_init() function needs to be called. This function opens the "/dev/config" device file. Each io_init() function call should be paired with the io_end() function call. This function closes the "/dev/config" device file. However, the io_end() function call is amiss at some places in the device discovery code path. As a result, the file descriptor leaks are observed with the device-discovery command of VxVM. RESOLUTION: The code is modified to pair each io_init() function call with the io_end() function call in every possible code path. * 2986939 (Tracking ID: 2530536) SYMPTOM: Disabling any path of the ASM disk causes multiple DMP reconfigurations. Multiple occurrences of the following messages are seen in the dmpevent log: Reconfiguration is in progress Reconfiguration has finished DESCRIPTION: After enabling/disabling any path/controller of the DMP node, the VxVM daemon processes events generated for that DMP node. If a DMP node is already online, the VxVM daemon avoids re-online. Otherwise, it tries to online the DMP node, four times with some delay. If all attempts to online the DMP node fails, the VxVM daemon fires `vxdisk scandisks` on all the paths of that DMP node. In case of the ASM disk, all attempts to online the DMP node fails, and so the VxVM daemon fires the 'vxdisk scandisks' on all the paths of the DMP node. This in turn results in the multiple DMP reconfigurations. RESOLUTION: The code is modified to skip the event processing of the ASM disks. * 2990730 (Tracking ID: 2970368) SYMPTOM: The SRDF-R2 WD (write-disabled) devices are shown in an error state and many enable and disable path messages that are generated in the "/etc/vx/dmpevents.log" file. DESCRIPTION: DMP driver disables the paths of the write-protected devices. Therefore, these devices are shown in an error state. The vxattachd(1M) daemon tries to online these devices and executes the partial-device discovery for these devices. As part of the partial-device discovery, enabling and disabling the paths of such write-protected devices, generates many path enable and disable messages in the "/etc/vx/dmpevents.log" file. RESOLUTION: The code is modified so that the paths of the write-protected devices in DMP are not disabled. * 3000033 (Tracking ID: 2631369) SYMPTOM: In a CVM environment, when the vxconfigd(1M) daemon is started in the single- threaded mode by specifying the "-x nothreads" option, a cluster reconfiguration such as a node join and VxVM operations on the shared disk group takes approximately five more minutes to complete, as compared to the multithreaded environment. DESCRIPTION: Each node in the cluster exchanges VxVM objects during cluster reconfiguration, and for certain VxVM operations. In the signal handler function of single threaded vxconfigd(1M) daemon, each nodes awaits for the maximum poll timeout value set within the code. RESOLUTION: The code is modified such that the signal-handler routines do not wait for the maximum timeout value, instead it return's immediately after the objects are received. * 3012938 (Tracking ID: 3012929) SYMPTOM: When a disk name is changed when a backup operation is in progress, the vxconfigbackup(1M) command gives the following error: VxVM vxdisk ERROR V-5-1-558 Disk : Disk not in the configuration VxVM vxconfigbackup WARNING V-5-2-3718 Unable to backup Binary diskgroup configuration for diskgroup . DESCRIPTION: If disk names change during the backup, the vxconfigbackup(1M) command does not detect and refresh the changed names and it tries to find the configuration database information from the old diskname. Consequently, the vxconfigbackup (1M) command displays an error message indicating that the old disk is not found in the configuration and it fails to take backup of the disk group configuration from the disk. RESOLUTION: The code is modified to ensure that the new disk names are updated and are used to find and backup the configuration copy from the disk. * 3041018 (Tracking ID: 3041014) SYMPTOM: Sometimes a "relayout" command may fail with following error messages with not much information: 1. VxVM vxassist ERROR V-5-1-15309 Cannot allocate 4294838912 blocks of disk space required by the relayout operation for column expansion: Not enough HDD devices that meet specification. VxVM vxassist ERROR V-5-1-4037 Relayout operation aborted. (7) 2. VxVM vxassist ERROR V-5-1-15312 Cannot allocate 644225664 blocks of disk space required for the relayout operation for temp space: Not enough HDD devices that meet specification. VxVM vxassist ERROR V-5-1-4037 Relayout operation aborted. (7) DESCRIPTION: In some executions of the vxrelayout(1M) command, the error messages do not provide sufficient information. For example, when enough space is not available, the vxrelayout(1M) command displays an error where it mentions less disk space available than required. Hence the "relayout" operation can still fail when the disk space is increased. RESOLUTION: The code is modified to display the correct space required for the "relayout" operation to complete successfully. * 3043203 (Tracking ID: 3038684) SYMPTOM: The restore daemon attempts to re-enable disabled paths of the Business Continuance Volume - Not Ready (BCV-NR) devices, logging many DMP messages as follows: VxVM vxdmp V-5-0-148 enabled path 255/0x140 belonging to the dmpnode 3/0x80 VxVM vxdmp V-5-0-112 disabled path 255/0x140 belonging to the dmpnode 3/0x80 DESCRIPTION: The restore daemon tries to re-enable a disabled path of a BCV-NR device as the probe passes. But the open() operation fails on such devices as no I/O operations are permitted and the path is disabled. There is a check to prevent enabling the path of the device if the open() operation fails. Because of the bug in the open check, it incorrectly tries to re-enable the path of the BCV-NR device. RESOLUTION: The code is modified to do an open check on the BCV-NR block device. * 3047474 (Tracking ID: 3047470) SYMPTOM: Cannot deport the disk group as the device "/dev/vx/esd" is not recreated on reboot with the latest major number. Consider the problem scenario as follows: The old device "/dev/vx/esd"has a major number which is now re-assigned to the "vol" driver. As the device "/dev/vx/esd" has same major number as that of the "vol" driver, it may disallow a disk group deport with the following message, until the vxesd(1M) daemon is stopped: "VxVM vxdg ERROR V-5-1-584 Disk group XXX: Some volumes in the disk group are in use" DESCRIPTION: If the device "/dev/vx/esd" is present in the system with an old major number, then the mknod(1M) command in the startup script fails to recreate the device with the new major number. This leads to change in the functionality. RESOLUTION: The code is modified to delete the device under "/dev/vx/esd," before the mknod (1M) command in the startup script, so that it gets recreated with the latest major number. * 3047803 (Tracking ID: 2969844) SYMPTOM: The DMP database gets destroyed if the discovery fails for some reason. "ddl.log shows numerous entries as follows: DESTROY_DMPNODE: 0x3000010 dmpnode is to be destroyed/freed DESTROY_DMPNODE: 0x3000d30 dmpnode is to be destroyed/freed Numerous vxio errors are seen in the syslog as all VxVM I/O's fail afterwards. DESCRIPTION: VxVM deletes the old device database before it makes the new device database. If the discovery process fails for some reason, this results in a null DMP database. RESOLUTION: The code is modified to take a backup of the old device database before doing the new discovery. Therefore, if the discovery fails we restore the old database and display the appropriate message on the console. * 3059145 (Tracking ID: 2979824) SYMPTOM: While excluding the controller using the vxdiskadm(1M) utility, the unintended paths get excluded DESCRIPTION: The issue occurs due to a logical error related to the grep command, when the hardware path of the controller to be retrieved is excluded. In some cases, the vxdiskadm(1M) utility takes the wrong hardware path for the controller that is excluded, and hence excludes unintended paths. Suppose there are two controllers viz. c189 and c18 and the controller c189 is listed above c18 in the command, and the controller c18 is excluded, then the hardware path of the controller c189 is passed to the function and hence it ends up excluding the wrong controller. RESOLUTION: The script is modified so that the vxdiskadm(1M) utility now takes the hardware path of the intended controller only, and the unintended paths do not get excluded. * 3069507 (Tracking ID: 3002770) SYMPTOM: When a SCSI-inquiry command is executed, any NULL- pointer dereference in Dynamic Multi-Pathing (DMP) causes the system to panic with the following stack trace: dmp_aa_recv_inquiry() dmp_process_scsireq() dmp_daemons_loop() DESCRIPTION: The panic occurs when the SCSI response for the SCSI-inquiry command is handled. In order to determine if the path on which the SCSI-inquiry command issued is read-only, DMP needs to check the error buffer. However, the error buffer is not always prepared. So before making any further checking, DMP should examine if the error buffer is valid before any further checking. Without any error-buffer examination, the system panics with a NULL pointer. RESOLUTION: The code is modified to verify that the error buffer is valid. * 3072890 (Tracking ID: 2352517) SYMPTOM: Excluding a controller from Veritas Volume Manager (VxVM ) using the vxdmpadm exclude ctlr=" command causes the system to panic with the following stack trace: gen_common_adaptiveminq_select_path dmp_select_path gendmpstrategy voldiskiostart vol_subdisksio_start volkcontext_process volkiostart vxiostrategy vx_bread_bp vx_getblk_cmn vx_getblk vx_getmap vx_getemap vx_do_extfree vx_extfree vx_te_trunc_data vx_te_trunc vx_trunc_typed vx_trunc_tran2 vx_trunc_tran vx_trunc vx_inactive_remove vx_inactive_tran vx_local_inactive_list vx_inactive_list vx_workitem_process vx_worklist_process vx_worklist_thread thread_start DESCRIPTION: While excluding a controller from the VxVM view, all the paths must also be excluded. The panic occurs because the controller is excluded before the paths belonging to that controller are excluded. While excluding the path, the controller of that path which is NULL is accessed. RESOLUTION: The code is modified to exclude all the paths belonging to a controller before excluding a controller. * 3077756 (Tracking ID: 3077582) SYMPTOM: A Veritas Volume Manager (VxVM) volume may become inaccessible causing any read/write to fail with the following error: # dd if=/dev/vx/dsk// of=/dev/null count=10 dd read error: No such device 0+0 records in 0+0 records out DESCRIPTION: If I/Os to the disks timeout due to some hardware failures like weak Storage Area Network (SAN) cable link or Host Bus Adapter (HBA) failure, VxVM assumes that disk is bad or slow and it sets failio flag on the disk. Because of this flag, all the subsequent I/Os fail with the 'No such device' error. RESOLUTION: The code is modified so that vxdisk now provides a way to clear the 'failio' flag. Use the vxkprint(1M) utility (under /etc/vx/diag.d) to check whether the 'failio' flag is set on the disks. To reset this flag, execute the 'vxdisk set failio=off' command, or deport and import the disk group that holds these disks. * 3083188 (Tracking ID: 2622536) SYMPTOM: Under a heavy I/O load, write I/Os on the Veritas Volume Replicator (VVR) Primary logowner takes a very long time to complete. DESCRIPTION: VVR cannot allow more than 2048 I/Os outstanding on the Storage Replicator Log (SRL) volume. Any I/Os beyond this threshold is throttled. The throttled I/Os are restarted after every SRL header flush operation. The restarted throttled I/Os contend with the new I/Os and can starve if new I/Os get preference. RESOLUTION: The SRL allocation algorithm is modified to give priority to the throttled I/Os. The new I/Os goes behind the throttled I/Os. * 3083189 (Tracking ID: 3025713) SYMPTOM: In a VVR environment, VxVM "vxdg adddisk" and "vxdg rmdisk" commands take long time (approximately 90 seconds) to execute. DESCRIPTION: The VxVM commands on a VVR disk group does not complete until all the outstanding I/Os in that disk group are drained completely. If replication is active, the outstanding I/Os include network I/Os (the I/Os to be sent from the primary node to the secondary node). VxVM commands take a long time to complete as they wait for these network I/Os to drain or "rlink" to be disconnected. RESOLUTION: The code is modified such that, if there are outstanding network I/Os lying in the read-back pool, the 'rlink' is disconnected to allow the VxVM commands to complete. * 3087113 (Tracking ID: 3087250) SYMPTOM: In a CVM environment, during the node join operation when the host node joins the cluster node, this takes a long time to execute. DESCRIPTION: During the node join operation when the non-A/A storage with the shared disk groups is connected, then the master node gives the information about which array controller is to be used. The node that is to be joined, registers the Persistent Group Reservation (PGR) keys on the required array controller, that may take a little longer to execute, if the array is slow in processing the PGR commands. RESOLUTION: The code is modified so that DMP offloads the processing to register the PGR keys on the non-A/A storage with the shared disk groups. As a result, the node join operation is faster. * 3087777 (Tracking ID: 3076093) SYMPTOM: The patch upgrade script "installrp" can panic the system while doing a patch upgrade. The panic stack trace looks is observed as following: devcclose spec_close vnop_close vno_close closef closefd fs_exit kexitx kexit DESCRIPTION: When an upgrade is performed, the VxVM device drivers are not loaded, but the patch-upgrade process tries to start or stop the eventsource (vxesd) daemon. This can result in a system panic. RESOLUTION: The code is modified so that the eventsource (vxesd) daemon does not start unless the VxVM device drivers are loaded. * 3100378 (Tracking ID: 2921147) SYMPTOM: The udid_mismatch flag is absent on a clone disk when source disk is unavailable. The 'vxdisk list' command does not show the udid_mismatch flag on a disk. This happens even when the 'vxdisk -o udid list' or 'vxdisk -v list diskname | grep udid' commands show different Device Discovery Layer (DDL) generated and private region unique identifier for disks (UDIDs). DESCRIPTION: When DDL generates the UDID and private region UDID of a disk do not match, Veritas Volume Manager (VxVM) sets the udid_mismatch flag on the disk. This flag is used to detect a disk as clone, which is marked with the clone-disk flag. The vxdisk (1M) utility is used to suppress the display of the udid_mismatch flag if the source Logical Unit Number (LUN) is unavailable on the same host. RESOLUTION: The vxdisk (1M) utility is modified to display the udid_mismatch flag, if it is set on the disk. Display of this flag is no longer suppressed, even when source LUN is unavailable on same host. * 3139302 (Tracking ID: 3139300) SYMPTOM: At device discovery, the vxconfigd(1M) daemon allocates memory but does not release it after use, causing a user memory leak. The Resident Memory Size (RSS) of the vxconfigd(1M) daemon thus keeps growing and may reach maxdsiz(5) in the extreme case that causes the vxconfigd(1M) daemon to abort. DESCRIPTION: At some places in the device discovery code path, the buffer is not freed. This results in memory leaks. RESOLUTION: The code is modified to free the buffers. * 3140407 (Tracking ID: 2959325) SYMPTOM: The vxconfigd(1M) daemon dumps core while performing the disk group move operation with the following stack trace: dg_trans_start () dg_configure_size () config_enable_copy () da_enable_copy () ncopy_set_disk () ncopy_set_group () ncopy_policy_some () ncopy_set_copies () dg_balance_copies_helper () dg_transfer_copies () in vold_dm_dis_da () in dg_move_complete () in req_dg_move () in request_loop () in main () DESCRIPTION: The core dump occurs when the disk group move operation tries to reduce the size of the configuration records in the disk group, when the size is large and the disk group move operation needs more space for the new- configrecord entries. Since, both the reduction of the size of configuration records (compaction) and the configuration change by disk group move operation cannot co-exist, this result in the core dump. RESOLUTION: The code is modified to make the compaction first before the configuration change by the disk group move operation. * 3140735 (Tracking ID: 2861011) SYMPTOM: The "vxdisk -g resize " command fails with an error for the Cross-platform Data Sharing(CDS) formatted disk. The error message is displayed as following: "VxVM vxdisk ERROR V-5-1-8643 Device : resize failed: One or more subdisks do not fit in pub reg" DESCRIPTION: During the resize operation, VxVM updates the VM disk's private region with the new public region size, which is evaluated based on the raw disk geometry. But for the CDS disks, the geometry information stored in the disk label is fabricated such that the cylinder size is aligned with 8KB. The resize failure occurs when there is a mismatch in the public region size obtained from the disk label, and that stored in the private region. RESOLUTION: The code is modified such the new public region size is now evaluated based on the fabricated geometry considering the 8 KB alignment for the CDS disks, so that it is consistent with the size obtained from the disk label. * 3142325 (Tracking ID: 3130353) SYMPTOM: Disabled and enabled path messages are displayed continuously on the console for the EMC NR (Not Ready) devices: I/O error occurred on Path hdisk139 belonging to Dmpnode emc1_1f2d Disabled Path hdisk139 belonging to Dmpnode emc1_1f2d due to path failure Enabled Path hdisk139 belonging to Dmpnode emc1_1f2d I/O error occurred on Path hdisk139 belonging to Dmpnode emc1_1f2d Disabled Path hdisk139 belonging to Dmpnode emc1_1f2d due to path failure DESCRIPTION: As part of the device discovery, DMP marks the paths belonging to the EMC NR devices as disabled, so that they are not used for I/O. However, the DMP- restore logic, which issues inquiry on the disabled path brings the NR device paths back to the enabled state. This cycle is repetitive, and as a result the disabled and enabled path messages are seen continuously on the console. RESOLUTION: The DMP code is modified to specially handle the EMC NR devices, so that they are not disabled/enabled repeatedly. This means that we are not just suppressing the messages, but we are also handling the devices in a different manner. * 3144794 (Tracking ID: 3136272) SYMPTOM: In a CVM environment, the disk group import operation with the "-o noreonline" option takes additional import time. DESCRIPTION: On a slave node when the clone disk group import is triggered by the master node, the "da re-online" takes place irrespective of the "-o noreonline" flag passed. This results in the additional import time. RESOLUTION: The code is modified to pass the hint to the slave node when the "-o noreonline" option is specified. Depending on the hint, the "da re-online" is either done or skipped. This avoids any additional import time. * 3147666 (Tracking ID: 3139983) SYMPTOM: I/Os are returned as failure to the application from the Dynamic Multi-Pathing (DMP) driver without retrying for the "iotimeout" value set by the "timebound" recovery option. The default "iotimeout" value is 300 seconds. The following messages are displayed in the console log: [..] Mon Apr xx 04:18:01.885: I/O analysis done as DMP_PATH_OKAY on Path c172t0d4 belonging to Dmpnode abc1_1dff Mon Apr xx 04:18:01.885: I/O error occurred (errno=0x0) on Dmpnode abc1_1dff [..] DESCRIPTION: DMP uses "timebound" recovery option to retry the failed I/Os from paths for the "iotimeout" value. Once the value is expired, DMP returns the I/O as failure to the application. Due to a bug in the calculation of the "iotimeout" value, DMP returns the I/O as failure within a few seconds rather than retrying it for actual 300 seconds. RESOLUTION: The code is modified to calculate the "iotimeout" value in DMP, so that DMP retries the I/Os for the specified value. * 3158099 (Tracking ID: 3090667) SYMPTOM: The "vxdisk -o thin, fssize list" command can cause system to hang or panic due to a kernel memory corruption. This command is also issued by Veritas Operations Manager (VOM) internally during Storage Foundation (SF) discovery. The following stack trace is observed: panic string: kernel heap corruption detected vol_objioctl vol_object_ioctl voliod_ioctl - frame recycled volsioctl_real DESCRIPTION: Veritas Volume Manager (VxVM) allocates data structures and invokes thin Logical Unit Numbers (LUNs) specific function handlers, to determine the disk space that is actively used by the file system. One of the function handlers wrongly accesses the system memory beyond the allocated data structure, which results in the kernel memory corruption. RESOLUTION: The code is modified so that the problematic function handler accesses only the allocated memory. * 3158780 (Tracking ID: 2518067) SYMPTOM: The disabling of a switch port of the last-but-one active path to a Logical Unit Number (LUN) disables the DMP node, and results in I/O failures on the DMP node even when an active path is available for the I/O. DESCRIPTION: The execution of the "vxdmpadm disable" command and the simultaneous disabling of a port on the same path causes the DMP node to get disabled. The DMP I/O error handling code path fails to interlock these two simultaneous operations on the same path. This leads to a wrong assumption of a DMP node failure. RESOLUTION: The code is modified to detect the simultaneous execution of operations and prevent the DMP node from being disabled. * 3158781 (Tracking ID: 2495338) SYMPTOM: Veritas Volume Manager (VxVM) disk initialization with hpdisk format fails with the following error: $vxdisksetup -ivf eva4k6k0_8 format=hpdisk privoffset=256 VxVM vxdisksetup ERROR V-5-2-2186 privoffset is incompatible with the hpdisk format. DESCRIPTION: VxVM imposes the limitation of private region offset as 128K or 256 sectors on non-boot(data) disks with the hpdisk format. RESOLUTION: The code is modified to relax the limitation to initialize a data disk with the hpdisk format with a fixed private region offset. The disks can now be initialized with a private region offset of 128K or greater. * 3158790 (Tracking ID: 2779580) SYMPTOM: In a Veritas Volume Replicator (VVR) environment, the secondary node gives the configuration error 'no Primary RVG' when the primary master node (default logowner ) is rebooted and the slave node becomes the new master node. DESCRIPTION: After the primary master node is rebooted, the new master node sends a 'handshake' request for the vradmind communication to the secondary node. As a part of the 'handshake' request, the secondary node deletes the old configuration including the 'Primary RVG'. During this phase, the secondary node receives the configuration update message from the primary node for the old configuration. The secondary node does not find the old 'Primary RVG' configuration for processing this message. Hence, it cannot proceed with the pending 'handshake' request. This results in a 'no Primary RVG' configuration error. RESOLUTION: The code is modified such that during the 'handshake' request phase, the configuration messages of the old 'Primary RVG' gets discarded. * 3158793 (Tracking ID: 2911040) SYMPTOM: The restore operation from a cascaded snapshot succeeds even when one of the source is inaccessible. Subsequently, if the primary volume is made accessible for the restore operation, the I/O operation may fail on the volume, as the source of the volume is inaccessible. Any deletion of the snapshot also fails due to the dependency of the primary volume on the snapshots. When the user tries to remove the snapshot using the "vxedit rm" command, the following error message is displayed: "VxVM vxedit ERROR V-5-1-XXXX Volume YYYYYY has dependent volumes" DESCRIPTION: When a snapshot is restored from any snapshot, the snapshot becomes the source of the data for the regions on the primary volume that differ between the two volumes. If the snapshot itself depends on some other volume and that volume is not accessible, effectively the primary volume becomes inaccessible after the restore operation. For such instances, the snapshot cannot be deleted as the primary volume depends on it. RESOLUTION: The code is modified so that if a snapshot or any later cascaded snapshot is inaccessible, the restore operation from that snapshot is prevented. * 3158794 (Tracking ID: 1982965) SYMPTOM: The file system mount operation fails when the volume is resized and the volume has a link to the volume. The following error messages are displayed: # mount -V vxfs /dev/vx/dsk// UX:vxfs fsck: ERROR: V-3-26248: could not read from block offset devid/blknum . Device containing meta data may be missing in vset or device too big to be read on a 32 bit system. UX:vxfs fsck: ERROR: V-3-20694: cannot initialize aggregate file system check failure, aborting ... UX:vxfs mount: ERROR: V-3-26883: fsck log replay exits with 1 UX:vxfs mount: ERROR: V-3-26881: Cannot be mounted until it has been cleaned by fsck. Please run "fsck -V vxfs -y /dev/vx/dsk//"before mounting DESCRIPTION: The vxconfigd(1M) daemon stores disk access records based on the Dynamic Multi Pathing (DMP) names. If the vxdg(1M) command passes a name other than DMP name for the device, vxconfigd(1M) daemon cannot map it to a disk access record. As the vxconfigd(1M) daemon cannot locate a disk access record corresponding to passed input name from the vxdg(1M) command, it fails the import operation. RESOLUTION: The code is modified so that the vxdg(1M) command converts the input name to DMP name before passing it to the vxconfigd(1M) daemon for further processing. * 3158798 (Tracking ID: 2825102) SYMPTOM: In a CVM environment, some or all VxVM volumes become inaccessible on the master node. VxVM commands on the master node as well as the slave node(s) hang. On the master node, vxiod and vxconfigd sleep and the following stack traces is observed: "vxconfigd" on master : sleep_one vol_ktrans_iod_wakeup vol_ktrans_commit volconfig_ioctl volsioctl_real volsioctl vols_ioctl spec_ioctl vno_ioctl ioctl syscall "vxiod" on master : sleep vxvm_delay cvm_await_mlocks volmvcvm_cluster_reconfig_exit volcvm_master volcvm_vxreconfd_thread DESCRIPTION: VxVM maintains a list of all the volume devices in volume device list. This list can be corrupted by simultaneous access from CVM reconfiguration code path and VxVM transaction code path. This results in inaccessibility of some or all the volumes. RESOLUTION: The code is modified to avoid simultaneous access to the volume device list from the CVM reconfiguration code path and the VxVM transaction code path. * 3158799 (Tracking ID: 2735364) SYMPTOM: The "clone_disk" disk flag attribute is not cleared when a cloned disk group is removed by the "vxdg destroy " command. DESCRIPTION: When a cloned disk group is removed by the "vxdg destroy " command, the Veritas Volume Manager (VxVM) "clone_disk" disk flag attribute is not cleared. The "clone_disk" disk flag attribute should be automatically turned off when the VxVM disk group is destroyed. RESOLUTION: The code is modified to turn off the "clone_disk" disk flag attribute when a cloned disk group is removed by the "vxdg destroy " command. * 3158800 (Tracking ID: 2886333) SYMPTOM: The vxdg(1M) join operation allows mixing of clone and non-clone disks in a disk group. The subsequent import of a new joined disk group fails. The following error message is displayed: "VxVM vxdg ERROR V-5-1-17090 Source disk group tdg and destination disk group tdg2 are not homogeneous, trying to Mix of cloned diskgroup to standard disk group or vice versa is not allowed. Please follow the vxdg (1M) man page." DESCRIPTION: Mixing of the clone and non-clone disk group is not allowed. The part of the code where the join operation is performed, executes the operation without validating the mix of the clone and the non-clone disk groups. This results in the newly joined disk group having a mix of the clone and non-clone disks. Subsequent import of the newly joined disk group fails. RESOLUTION: The code is modified so that during the disk group join operation, both the disk groups are checked. If a mix of clone and non-clone disk group is found, the join operation is aborted. * 3158802 (Tracking ID: 2091520) SYMPTOM: Customers cannot selectively disable VxVM configuration copies on the disks associated with a disk group. DESCRIPTION: An enhancement is required to enable customers to selectively disable VxVM configuration copies on disks associated with a disk group. RESOLUTION: The code is modified to provide a "keepmeta=skip" option to the vxdiskset(1M) command to allow a customer to selectively disable VxVM configuration copies on disks that are a part of the disk group. * 3158804 (Tracking ID: 2236443) SYMPTOM: In a VCS environment, the "vxdg import" command does not display an informative error message, when a disk group cannot be imported because the fencing keys are registered to another host. The following error messages are displayed: # vxdg import sharedg VxVM vxdg ERROR V-5-1-10978 Disk group sharedg: import failed: No valid disk found containing disk group The system log contained the following NOTICE messages: Dec 18 09:32:37 htdb1 vxdmp: NOTICE: VxVM vxdmp V-5-0-0 i/o error occured (errno=0x5) on dmpnode 316/0x19b Dec 18 09:32:37 htdb1 vxdmp: [ID 443116 kern.notice] NOTICE: VxVM vxdmp V-5-0-0 i/o error occured (errno=0x5) on dmpnode 316/0x19b DESCRIPTION: The error messages that are displayed when a disk group cannot be imported because the fencing keys are registered to another host, needs to be more informative. RESOLUTION: The code has been added to the VxVM disk group import command to detect when a disk is reserved by another host, and issue a SCSI3 PR reservation conflict error message. * 3158809 (Tracking ID: 2969335) SYMPTOM: The node that leaves the cluster node while the instant operation is in progress, hangs in the kernel and cannot join to the cluster node unless it is rebooted. The following stack trace is displayed in the kernel, on the node that leaves the cluster: voldrl_clear_30() vol_mv_unlink() vol_objlist_free_objects() voldg_delete_finish() volcvmdg_abort_complete() volcvm_abort_sio_start() voliod_iohandle() voliod_loop() DESCRIPTION: In a clustered environment, during any instant snapshot operation such as the snapshot refresh/restore/reattach operation that requires metadata modification, the I/O activity on volumes involved in the operation is temporarily blocked, and once the metadata modification is complete the I/Os are resumed. During this phase if a node leaves the cluster, it does not find itself in the I/O hold-off state and cannot properly complete the leave operation and hangs. An after effect of this is that the node will not be able to join to the cluster node. RESOLUTION: The code is modified to properly unblock I/Os on the node that leaves. This avoids the hang. * 3158813 (Tracking ID: 2845383) SYMPTOM: The site gets detached if the plex detach operation is performed with the site consistency set to off. DESCRIPTION: If the plex detach operation is performed on the last complete plex of a site, the site is detached to maintain the site consistency. The site should be detached only if the site consistency is set. Initially, the decision to detach the site is made based on the value of the 'allsites' flag. So, the site gets detached when the last complete plex is detached, even if the site consistency is off. RESOLUTION: The code is modified to ensure that the site is detached when the last complete plex is detached, only if the site consistency is set. If the site consistency is off and the 'allsites' flag is on, detaching the last complete plex leads to the plex being detached. * 3158818 (Tracking ID: 2909668) SYMPTOM: In case of multiple sets of the cloned disks of the same source disk group, the import operation on the second set of the clone disk fails, if the first set of the clone disks were imported with "updateid". Import fails with following error message: VxVM vxdg ERROR V-5-1-10978 Disk group firstdg: import failed: No tagname disks for import DESCRIPTION: When multiple sets of the clone disk exists for the same source disk group, each set needs to be identified with the separate tags. If one set of the cloned disk with the same tag is imported using the "updateid" option, it replaces the disk group ID on the imported disk with the new disk group ID. The other set of the cloned disk with the different tag contains the disk group ID. This leads to the import failure for the tagged import for the other sets, except for the first set. Because the disk group name maps to the latest imported disk group ID. RESOLUTION: The code is modified in case of the tagged disk group import for disk groups that have multiple sets of the clone disks. The tag name is given higher priority than the latest update time of the disk group, during the disk group name to disk group ID mapping. * 3158819 (Tracking ID: 2685230) SYMPTOM: In a Cluster Volume Replicator (CVR) environment, if Storage Replicator Log (SRL) is resized with the logowner set on the CVM slave node and followed by a CVM master node switch operation, then there could be a SRL corruption that leads to the Rlink detach. DESCRIPTION: In a CVR environment, if SRL is resized when the logowner is set on the CVM slave node and if this is followed by the master node switch operation, then the new master node does not have the correct mapping of the SRL volume. This results in I/Os issued on the new master node to be corrupted on the SRL-volume contents and detaches the Rlink. RESOLUTION: The code is modified to correctly update the SRL mapping so that the SRL corruption does not occur. * 3158821 (Tracking ID: 2910367) SYMPTOM: In a VVR environment, when Storage Replicator Log (SRL) is inaccessible or after the paths to the SRL volume of the secondary node are disabled, the secondary node panics with the following stack trace: __bad_area_nosemaphore vsnprintf page_fault vol_rv_service_message_start thread_return sigprocmask voliod_iohandle voliod_loop kernel_thread DESCRIPTION: The SRL failure in handled differently on the primary node and the secondary node. On the secondary node, if there is no SRL, replication is not allowed and Rlink is detached. The code region is common for both, and at one place flags are not properly set during the transaction phase. This creates an assumption that the SRL is still connected and tries to access the structure. This leads to the panic. RESOLUTION: The code is modified to mark the necessary flag properly in the transaction phase. * 3164583 (Tracking ID: 3065072) SYMPTOM: Data loss occurs during the import of a clone disk group, when some of the disks are missing and the import "useclonedev" and "updateid" options are specified. The following error message is displayed: VxVM vxdg ERROR V-5-1-10978 Disk group pdg: import failed: Disk for disk group not found DESCRIPTION: During the clone disk group import if the "updateid" and "useclonedev" options are specified and some disks are unavailable, this causes the permanent data loss. Disk group Id is updated on the available disks during the import operation. The missing disks contain the old disk group Id, hence are not included in the later attempts to import the disk group with the new disk group Id. RESOLUTION: The code is modified such that any partial import of the clone disk group with the "updateid" option will no longer be allowed without the "f" (force) option. If the user forces the partial import of the clone disk group using the "f" option, the missing disks are not included in the later attempts to import the clone disk group with the new disk group Id. * 3164596 (Tracking ID: 2882312) SYMPTOM: The Storage Replicator Log (SRL) faults in the middle of the I/O load. An immediate read on data that is written during the SRL fault may return old data. DESCRIPTION: In case of an SRL fault, the Replicated Volume Group (RVG) goes into the passthrough mode. The read/write operations are directly issued on the data volume. If the SRL is faulted while writing, and a read command is issued immediately on the same region, the read may return the old data. If a write command fails on the SRL, then the VVR acknowledges the write completion and places the RVG in the passthrough mode. The data-volume write is done asynchronously after acknowledging the write completion. If the read comes before the data volume write is finished, then it can return old data causing data corruption. It is race condition between write and read during the SRL failure. RESOLUTION: The code is modified to restart the write in case of the SRL failure and it does not acknowledge the write completion. When the write is restarted the RVG will be in passthrough mode, and it is directly issued on the data volume. Since, the acknowledgement is done only after the write completion, any subsequent read gets the latest data. * 3164601 (Tracking ID: 2957555) SYMPTOM: The vxconfigd(1M) daemon on the CVM master node hangs in the userland during the vxsnap(1M) restore operation. The following stack trace is displayed: rec_find_rid() position_in_restore_chain() kernel_set_object_vol() kernel_set_object() kernel_dg_commit_kernel_objects_20() kernel_dg_commit() commit() dg_trans_commit() slave_trans_commit() slave_response() fillnextreq() DESCRIPTION: During the snapshot restore operation, when the volume V1 gets restored from the source volume V2, and at the same time if the volume V2 gets restored from V1 or the child of V1 , the vxconfigd(1M) daemon tries to find the position of the volume that gets restored in the snapshot chain. For such instances, finding the position in the restore chain, causes the vxconfigd(1M) daemon to enter in an infinite loop and hangs. RESOLUTION: The code is modified to remove the infinite loop condition when the restore position is found. * 3164610 (Tracking ID: 2966990) SYMPTOM: In a VVR environment, the I/O hangs at the primary side after multiple cluster reconfigurations are triggered in parallel. The stack trace is as following: delay vol_rv_transaction_prepare vol_commit_iolock_objects vol_ktrans_commit volconfig_ioctl volsioctl_real volsioctl fop_ioctl ioctl DESCRIPTION: With I/O on the master node and the slave node, rebooting the slave node triggers the cluster reconfiguration, which in turn triggers the RVG recovery. Before the reconfiguration is complete, the slave node joins back again, which interrupts the leave reconfiguration in the middle of the operation. The node join reconfiguration does not trigger any RVG recovery, and so the recovery is skipped. The regular I/Os wait for the recovery to be completed. This situation leads to a hang. RESOLUTION: The code is modified such that the join reconfiguration does the RVG recovery, if there are any pending RVG recoveries. * 3164611 (Tracking ID: 2962010) SYMPTOM: Replication hangs when the Storage Replicator Log (SRL) is resized. An example is as follows: For example: # vradmin -g vvrdg -l repstatus rvg ... Replication status: replicating (connected) Current mode: asynchronous Logging to: SRL ( 813061 Kbytes behind, 19 % full Timestamp Information: behind by 0h 0m 8s DESCRIPTION: When a SRL is resized, its internal mapping gets changed and a new stream of data gets started. Generally, we revert back to the old mapping immediately when the conditions requisite for the resize is satisfied. However, if the SRL gets wrapped around, the conditions are not satisfied immediately. The old mapping is referred to when all the requisite conditions are satisfied, and the data is sent with the old mapping. This is done without starting the new stream. This causes a replication hang, as the secondary node continues to expect the data according to the new stream. Once the hang occurs, the replication status remains unchanged. The replication status is not changed even though the Rlink is connected. RESOLUTION: The code is modified to start the new stream of data whenever the old mapping is reverted. * 3164612 (Tracking ID: 2746907) SYMPTOM: Under heavy I/O load, the vxconfigd(1M) daemon hangs on the master node during the reconfiguration. The stack is observed as follows: vxconfigd stack: schedule volsync_wait vol_rwsleep_rdlock vol_get_disks volconfig_ioctl volsioctl_real vols_ioctl vols_compat_ioctl compat_sys_ioctl cstar_dispatch DESCRIPTION: When there is a reconfiguration, the vxconfigd(1M) daemon tries to acquire the volop_rwsleep write lock. This attempt fails as the I/O takes the read lock. As a result, the vxconfigd(1M) daemon tries to get the write lock. Thus I/O load starves out the vxconfigd(1M) daemons attempt to get the write lock. This results in the hang. RESOLUTION: The code is modified so that a new API is used to block out the read locks when an attempt is made to get the write locks. When the API is used during the reconfiguration starvation, the write lock is avoided. Thereby, the hang issue is resolved. * 3164613 (Tracking ID: 2814891) SYMPTOM: The vxconfigrestore(1M) utility utility does not work properly if SCSI page 83 inquiry returns more than one FPCH name identifier for a single LUN. The restoration fails with the following error message: /etc/vx/bin/vxconfigrestore[1606]: shift: 4: bad number expr: syntax error expr: syntax error .. .. Installing volume manager disk header for 17 ... VxVM vxdisk ERROR V-5-1-558 Disk 17: Disk not in the configuration 17 disk format has been changed from cdsdisk to . VxVM vxdisk ERROR V-5-1-5433 Device 17: init failed: Device path not valid DESCRIPTION: The vxconfigrestore(1M) utility is used to restore a disk group's configuration information if it is lost or is corrupted. This operation should pick up the same LUNs that were used during the vxconfigbackup(1M) utility to save the configuration. If the SCSI-page-83 inquiry returns more than one FPCH-name identifier for a single LUN, the vxconfigrestore(1M) utility picks up the wrong LUN. So, the recovery may fail or succeed with the incorrect LUNs. RESOLUTION: The code is modified to pick up the correct LUN even when the SCSI-page-83 inquiry returns more than one identifier. * 3164615 (Tracking ID: 3102114) SYMPTOM: System crash during the 'vxsnap restore' operation can cause the vxconfigd(1M) daemon to dump core with the following stack on system start-up: rinfolist_iter() process_log_entry() scan_disk_logs() ... startup() main() DESCRIPTION: To recover from an incomplete restore operation, an entry is made in the internal logs. If the corresponding volume to that entry is not accessible, accessing a non-existent record causes the vxconfigd(1M) daemon to dump core with the SIGSEGV signal. RESOLUTION: The code is modified to ignore such an entry in the internal logs, if the corresponding volume does not exist. * 3164616 (Tracking ID: 2992667) SYMPTOM: When the framework for SAN of VIS is changed from FC-switcher to the direct connection, the new DMP disk cannot be retrieved by running the "vxdisk scandisks" command. DESCRIPTION: Initially, the DMP node had multiple paths. Later, when the framework for SAN of VIS is changed from the FC switcher to the direct connection, the number of paths of each affected DMP node is reduced to 1. At the same time, some new disks are added to the SAN. Newly added disks are reused by the device number of the removed devices (paths). As a result, the "vxdisk list" command does not show the newly added disks even after the "vxdisk scandisks" command is executed. RESOLUTION: The code is modified so that DMP can handle the device number reuse scenario in a proper manner. * 3164617 (Tracking ID: 2938710) SYMPTOM: The vxassist(1M) command dumps core with the following stack during the relayout operation: relayout_build_unused_volume () relayout_trans () vxvmutil_trans () trans () transaction () do_relayout () main () DESCRIPTION: During the relayout operation, the vxassist(1M) command sends a request to the vxconfigd(1M) daemon to get the object record of the volume. If the request fails, the vxassist(1M) command tries to print the error message using the name of the object from the record retrieved. This causes a NULL pointer de- reference and subsequently dumps core. RESOLUTION: The code is modified to print the error message using the name of the object from a known reference. * 3164618 (Tracking ID: 3058746) SYMPTOM: When there are two disk groups that are located in two different RAID groups on the disk array and the disks of one of the disk group is disabled, and an I/O hang occurs on the other disk group. DESCRIPTION: When there are two disk groups that are located in two different RAID groups on the disk array and disks of one disk group is disabled, a spinlock is acquired but is not released. This leads to a deadlock situation that causes the I/O hang on any other disk group. RESOLUTION: The code is modified to prevent the hang by releasing the spinlock to allow the I/Os to proceed * 3164619 (Tracking ID: 2866059) SYMPTOM: When disk-resize operation fails, the error messages displayed do not give the exact details of the failure. The following error messages are displayed: 1. "VxVM vxdisk ERROR V-5-1-8643 Device : resize failed: One or more subdisks do not fit in pub reg" 2. "VxVM vxdisk ERROR V-5-1-8643 Device : resize failed: Cannot remove last disk in disk group" DESCRIPTION: When a disk-resize operation fails, both the error messages need to be enhanced to display the exact details of the failure. RESOLUTION: The code is modified to improve the error messages. Error message (1) is modified to: VxVM vxdisk ERROR V-5-1-8643 Device emc_clariion0_338: resize failed: One or more subdisks do not fit in pub region. vxconfigd log : 01/16 02:23:23: VxVM vxconfigd DEBUG V-5-1-0 dasup_resize_check: SD emc_clariion0_338-01 not contained in disk sd: start=0 end=819200, public: offset=0 len=786560 Error message (2) is modified to: VxVM vxdisk ERROR V-5-1-0 Device emc_clariion0_338: resize failed: Cannot remove last disk in disk group. Resizing this device can result in data loss. Use -f option to force resize. * 3164620 (Tracking ID: 2993667) SYMPTOM: VxVM allows setting the Cross-platform Data Sharing (CDS) attribute for a disk group even when a disk is missing, because it experienced I/O errors. The following command succeeds even with an inaccessible disk: vxdg -g set cds=on DESCRIPTION: When the CDS attribute is set for a disk group, VxVM does not fail the operation if some disk is not accessible. If the disk genuinely had an I/O error and fails, VxVM does not allow setting the disk group as CDS, because the state of the failed disk cannot be determined. If a disk had a non-CDS format and fails with all the other disks in the disk group with the CDS format, this allows the disk group to be set as CDS. If the disk returns and the disk group is re-imported, there could be a CDS disk group that has a non-CDS disk. This violates the basic definition of the CDS disk group and results in the data corruption. RESOLUTION: The code is modified such that VxVM fails to set the CDS attribute for a disk group, if it detects a failed disk inaccessible because of an I/O error. Hence, the below operation would fail with an error as follows: # vxdg -g set cds=on Cannot enable CDS because device corresponding to is in-accessible. * 3164624 (Tracking ID: 3101419) SYMPTOM: In a CVR environment, I/Os to the data volumes in an RVG may temporarily experience a hang during the SRL overflow with the heavy I/O load. DESCRIPTION: The SRL flush occurs at a slower rate than the incoming I/Os from the master node and the slave nodes. I/Os initiated on the master node get starved for a long time, this appears like an I/O hang. The I/O hang disappears once the SRL flush is complete. RESOLUTION: The code is modified to provide a fair schedule for the I/Os to be initiated on the master node and the slave nodes. * 3164626 (Tracking ID: 3067784) SYMPTOM: The grow and shrink operations by the vxresize(1M) utility may dump core in the vfprintf() function.The following stack trace is observed: vfprintf () volumivpfmt () volvpfmt () volpfmt () main () DESCRIPTION: The vfprintf() function dumps core as the format specified to print the file system type is incorrect. The integer/ hexadecimal value is printed as a string, using the %s. RESOLUTION: The code is modified to print the file system type as the hexadecimal value, using the %x. * 3164627 (Tracking ID: 2787908) SYMPTOM: The vxconfigd(1M) daemon dumps core when the slave node joins the CVM cluster. The following stack trace is displayed: client_delete master_delete_client master_leavers role_assume vold_set_new_role kernel_get_cvminfo cluster_check vold_check_signal request_loop main _start DESCRIPTION: When the slave node joins the CVM cluster, the vxconfigd(1M) daemon frees a memory structure that is already freed and dumps core. RESOLUTION: The code is modified to ensure that the client memory structure is not freed twice. * 3164628 (Tracking ID: 2952553) SYMPTOM: The vxsnap(1M) command allows refreshing a snapshot from a different volume other than the source volume. An example is as follows: # vxsnap refresh source= DESCRIPTION: The vxsnap(1M) command allows refreshing a snapshot from a different volume other than the source volume. This can result in an unintended loss of the snapshot. RESOLUTION: The code is modified to print the message requesting the user to use the "-f" option. This prevents any accidental loss of the snapshot. * 3164629 (Tracking ID: 2855707) SYMPTOM: I/O hangs with the SUN6540 array during the path fault injection test. During the hang the DMP daemon kernel thread stack trace is displayed as following: e_block_thread e_sleep_thread dmpEngenio_issue_failover gen_issue_failover gen_dmpnode_update_cur_pri dmp_failover_to_other_path dmp_failover dmp_error_action dmp_error_analysis_callback dmp_process_scsireq dmp_daemons_loop vxdmp_start_thread_enter procentry DESCRIPTION: I/O hang is caused due to the cyclic dependency, where all the DMP daemon kernel threads wait for the 'dmpEngIO' Array Policy Module (APM). In turn, the 'dmpEngIO' APM thread requires the DMP daemon kernel thread to issue the failover. This results in a deadlock. RESOLUTION: The code is modified to eliminate the cyclic dependency on the DMP daemon kernel threads. * 3164631 (Tracking ID: 2933688) SYMPTOM: When the 'Data corruption protection' check is activated by DMP, the device- discovery operation aborts, but the I/O to the affected devices continues, this results in the data corruption. The message is displayed as following: Data Corruption Protection Activated - User Corrective Action Needed: To recover, first ensure that the OS device tree is up to date (requires OS specific commands). Then, execute 'vxdisk rm' on the following devices before reinitiating device discovery using 'vxdisk scandisks' DESCRIPTION: When 'Data corruption protection' check is activated by DMP, the device- discovery operation aborts after displaying a message. However, the device- discovery operation does not stop I/Os from being issued on the DMP device on the affected devices, for all those devices whose discovery information changed unexpectedly and are no longer valid. RESOLUTION: The code is modified so that DMP is changed to forcibly fail the I/Os on devices whose discovery information is changed unexpectedly. This prevents any further damage to the data. * 3164633 (Tracking ID: 3022689) SYMPTOM: The vxbrk_rootmir(1M) utility succeeds with an error printed on the command line, but disks that are a part of the break-off DG which are under NMP, now come under DMP as the "/etc/vx/darecs" file is not correctly updated. DESCRIPTION: The vxbrk_rootmir(1M) utility uses a hard-coded path "/dev/rdsk" and expects the DMP node to be available. This is done to get the new name corresponding to the input name. With the EBN name as the input, the vxbrk_rootmir(1M) utility does not find the OS new node and displays the error message. RESOLUTION: The code is modified such that the vxbrk_roomir(1M) utility does not use the hard-coded path to get the new node from the DMP input name. * 3164637 (Tracking ID: 2054606) SYMPTOM: During the DMP driver unload operation the system panics with the following stack trace: kmem_free dmp_remove_mp_node dmp_destroy_global_db dmp_unload vxdmp`_fini moduninstall modunrload modctl syscall_trap DESCRIPTION: The system panics during the DMP driver unload operation when its internal data structures are destroyed, because DMP attempts to free the memory associated with a DMP device that is marked for deletion from DMP. RESOLUTION: The code is modified to check the DMP device state before any attempt is made to free the memory associated with it. * 3164639 (Tracking ID: 2898324) SYMPTOM: Set of memory-leak issues in the user-land daemon "vradmind" that are reported by the Purify tool. DESCRIPTION: The issue is reported due to the improper/no initialization of the allocated memory. RESOLUTION: The code is modified to ensure that the proper initialization is done for the allocated memory. * 3164643 (Tracking ID: 2815441) SYMPTOM: The file system mount operation fails when the volume is resized and the volume has a link to the volume. The following error messages are displayed: # mount -V vxfs /dev/vx/dsk// UX:vxfs fsck: ERROR: V-3-26248: could not read from block offset devid/blknum . Device containing meta data may be missing in vset or device too big to be read on a 32 bit system. UX:vxfs fsck: ERROR: V-3-20694: cannot initialize aggregate file system check failure, aborting ... UX:vxfs mount: ERROR: V-3-26883: fsck log replay exits with 1 UX:vxfs mount: ERROR: V-3-26881: Cannot be mounted until it has been cleaned by fsck. Please run "fsck -V vxfs -y /dev/vx/dsk//"before mounting DESCRIPTION: The resize operation of the link volumes can be interrupted and restarted by the vxrecover(1M) command . The recover operation is triggered when the disk recovers from the failure or during the cluster reconfigurations. If the resize and the recover operation are run in parallel, and there are mirrors in link- to, the volume-recovery offset does not get updated properly for the linked volume. RESOLUTION: The code is modified to grow the "linked from" volume if we see that the volume sizes for "linked to" and "linked from" volumes are not the same when the vxrecover(1M) command is run. Accordingly, update the volume recovery offset for "linked to" volumes properly. * 3164645 (Tracking ID: 3091916) SYMPTOM: In VCS cluster environment, the syslog overflows with following the Small Computer System Interface (SCSI) I/O error messages: reservation conflict Unhandled error code Result: hostbyte=DID_OK driverbyte=DRIVER_OK CDB: Write(10): 2a 00 00 00 00 90 00 00 08 00 reservation conflict VxVM vxdmp V-5-0-0 [Error] i/o error occurred (errno=0x5) on dmpnode 201/0x60 Buffer I/O error on device VxDMP7, logical block 18 lost page write due to I/O error on VxDMP7 DESCRIPTION: In VCS cluster environment when the private disk group is flushed and deported on one node, some I/Os on the disk are cached as the disk writes are done asynchronously. Importing the disk group immediately ater with PGR keys causes I/O errors on the previous node as the PGR keys are not reserved on that node. RESOLUTION: The code is modified to write the I/Os synchronously on the disk. * 3164646 (Tracking ID: 2893530) SYMPTOM: When a system is rebooted and there are no VVR configurations, the system panics with following stack trace: nmcom_server_start() vxvm_start_thread_enter() ... DESCRIPTION: The panic occurs because the memory segment is accessed after it is released. The access happens in the VVR module and can happen even if no VVR is configured on the system. RESOLUTION: The code is modified so that the memory segment is not accessed after it is released. * 3164647 (Tracking ID: 3006245) SYMPTOM: While executing a snapshot operation on a volume which has 'snappoints' configured, the system panics infrequently with the following stack trace: ... voldco_copyout_pervolmap () voldco_map_get () volfmr_request_getmap () ... DESCRIPTION: When the 'snappoints' are configured for a volume by using the vxsmptadm(1M) command in the kernel, the relationship is maintained using a field. This field is also used for maintaining the snapshot relationships. Sometimes, the 'snappoints' field may wrongly be identified with the snapshot field. This causes the system to panic. RESOLUTION: The code is modified to properly identify the fields that are used for the snapshot and the 'snappoints' and handle the fields accordingly. * 3164650 (Tracking ID: 2812161) SYMPTOM: In a VVR environment, after the Rlink is detached, the vxconfigd(1M) daemon on the secondary host may hang. The following stack trace is observed: cv_wait delay_common delay vol_rv_service_message_start voliod_iohandle voliod_loop ... DESCRIPTION: There is a race condition if there is a node crash on the primary site of VVR and if any subsequent Rlink is detached. The vxconfigd(1M) daemon on the secondary site may hang, because it is unable to clear the I/Os received from the primary site. RESOLUTION: The code is modified to resolve the race condition. * 3164759 (Tracking ID: 2635640) SYMPTOM: The "vxdisksetup(1M) -ifB" command fails on the Enclosure Based Naming (EBN) devices when the legacy tree is removed. The following error message is displayed: "idisk: Stat of disk failed(2)" DESCRIPTION: The implementation of the function to get the unsliced path, returns the legacy tree based path if the input parameter does not specify the full device path. This leads to further failures when any direct or indirect use of the function is performed. RESOLUTION: The code is modified such that the function returns the DMP based path when the input parameter does not specify the full device path. * 3164790 (Tracking ID: 2959333) SYMPTOM: For Cross-platform Data Sharing (CDS) disk group, the "vxdg list" command does not list the CDS flag when that disk group is disabled. DESCRIPTION: When the CDS disk group is disabled, the state of the record list may not be stable. Hence, it is not considered if disabled disk group was CDS. As a result, Veritas Volume Manager (VxVM) does not mark any such flag. RESOLUTION: The code is modified to display the CDS flag for disabled CDS disk groups. * 3164792 (Tracking ID: 1783763) SYMPTOM: In a VVR environment, the vxconfigd(1M) daemon may hang during a configuration change operation. The following stack trace is observed: delay vol_rv_transaction_prepare vol_commit_iolock_objects vol_ktrans_commit volconfig_ioctl volsioctl_real volsioctl vols_ioctl ... DESCRIPTION: Incorrect serialization primitives are used. This results in the vxconfigd(1M) daemon to hang. RESOLUTION: The code is modified to use the correct serialization primitives. * 3164793 (Tracking ID: 3015181) SYMPTOM: I/O can hang on all the nodes of a cluster when the complete non-Active/Active (A/A) class of the storage is disconnected. The problem is only CVM specific. DESCRIPTION: The issue occurs because the CVM-DMP protocol does not progress any further when the 'ioctls' on the corresponding DMP 'metanodes' fail. As a result, all hosts hold the I/Os forever. RESOLUTION: The code is modified to complete the CVM-DMP protocol when any of the 'ioctls' on the DMP 'metanodes' fail. * 3164874 (Tracking ID: 2986596) SYMPTOM: The disk groups imported with mix of standard and clone Logical Unit Numbers (LUNs) may lead to data corruption. DESCRIPTION: The vxdg(1M) command import operation should not allow mixing of clone and non- clone LUNs since it may result in data corruption if the clone copy is not up- to-date. vxdg(1M) import code was going ahead with clone LUNs when corresponding standard LUNs were unavailable on the same host. RESOLUTION: The code is modified for the vxdg(1M) command import operation, so that it does not pick up the clone disks in above case and prevent mix disk group import. The import fails if partial import is not allowed based on other options specified during the import. * 3164880 (Tracking ID: 3031796) SYMPTOM: When a snapshot is reattached using the "vxsnap reattach" command-line- interface (CLI), the operation fails with the following error message: "VxVM vxplex ERROR V-5-1-6616 Internal error in get object. Rec " DESCRIPTION: When a snapshot is reattached to the volume, the volume manager checks the consistency by locking all the related snapshots. If any related snapshots are not available the operation fails. RESOLUTION: The code is modified to ignore any inaccessible snapshot. This prevents any inconsistency during the operation. * 3164881 (Tracking ID: 2919720) SYMPTOM: The vxconfigd(1M) daemon dumps core in the rec_lock1_5() function. The following stack trace is observed: rec_lock1_5() rec_lock1() rec_lock() client_trans_start() req_vol_trans() request_loop() main() DESCRIPTION: During any configuration changes in VxVM, the vxconfigd(1M) command locks all the objects involved in the operations to avoid any unexpected modification. Some objects which do not belong to the current transactions are not handled properly. This results in a core dump. This case is particularly observed during the snapshot operations of the cross disk group linked-volume snapshots. RESOLUTION: The code is modified to avoid the locking of the records which are not yet a part of the committed VxVM configuration. * 3164883 (Tracking ID: 2933476) SYMPTOM: The vxdisk(1M) command resize operation fails with the following generic error messages that do not state the exact reason for the failure: VxVM vxdisk ERROR V-5-1-8643 Device 3pardata0_3649: resize failed: Operation is not supported. DESCRIPTION: The disk-resize operation fails in in the following cases: 1. When shared disk has simple or nopriv format. 2. When GPT (GUID Partition Table) labeled disk has simple or sliced format. 3. When the Cross-platform Data Sharing (CDS) disk is part of the disk group which has version less than 160 and disk resize is done to greater than 1 TB size. RESOLUTION: The code is modified to enhance disk resize failure messages. * 3164884 (Tracking ID: 2935771) SYMPTOM: In a Veritas Volume Replicator (VVR) environment, the 'rlinks' disconnect after switching the master node. DESCRIPTION: Sometimes switching a master node on the primary node can cause the 'rlinks' to disconnect. The "vradmin repstatus" command displays "paused due to network disconnection" as the replication status. VVR uses a connection to check if the secondary node is alive. The secondary node responds to these requests by replying back, indicating that it is alive. On a master node switch, the old master node fails to close this connection with the secondary node. Thus, after the master node switch the old master node as well as the new master node sends the requests to the secondary node. This causes a mismatch of connection numbers on the secondary node, and the secondary node does not reply to the requests of the new master node. Thus, it causes the 'rlinks' to disconnect. RESOLUTION: The code is modified to close the connection of the old master node with the secondary node, so that it does not send the connection requests to the secondary node. * 3164911 (Tracking ID: 1901838) SYMPTOM: After installation of a license key that enables multi-pathing, the state of the controller is shown as DISABLED in the command- line-interface (CLI) output for the vxdmpadm(1M) command. DESCRIPTION: When the multi-pathing license key is installed, the state of the active paths of the Logical Unit Number (LUN) is changed to the ENABLED state. However, the state of the controller is not updated. As a result, the state of the controller is shown as DISABLED in the CLI output for the vxdmpadm(1M) command. RESOLUTION: The code is modified so that the states of the controller and the active LUN paths are updated when the multi-pathing license key is installed. * 3164916 (Tracking ID: 1973983) SYMPTOM: Relocation fails with the following error when the Data Change Object (DCO) plex is in a disabled state: VxVM vxrelocd ERROR V-5-2-600 Failure recovering in disk group DESCRIPTION: When a mirror-plex is added to a volume using the "vxassist snapstart" command, the attached DCO plex can be in DISABLED or DCOSNP state. If the enclosure is disabled while recovering such DCO plexes, the plex can get in to the DETACHED/DCOSNP state. This can result in relocation failures. RESOLUTION: The code is modified to handle the DCO plexes in disabled state during relocation. * 3178903 (Tracking ID: 2270686) SYMPTOM: The vxconfigd(1M) daemon on the master node hangs if there is a reconfiguration (node join followed by a leave operation) when the snapshot is taken. Stack trace of vxsnap syncstart is as follows: e_block_thread pse_block_thread pse_sleep_thread volsiowait volpvsiowait vol_mv_do_resync vol_object_ioctl voliod_ioctl volsioctl_real volsioctl vols_ioctl rdevioctl spec_ioctl vnop_ioctl vno_ioctl common_ioctl ovlya_addr_sc_flih_main __ioctl Ioctl vol_syncstart_internal do_syncstart_vols vset_voliter_all do_syncstart main __start Join sio (during reconfiguration) is hung with following stack trace: e_block_thread() pse_block_thread() pse_sleep_thread() vol_rwsleep_wrlock() volopenter_exclusive() volcvm_lockdg() volcvm_joinsio_start() voliod_iohandle() voliod_loop() vol_kernel_thread_init() threadentry() DESCRIPTION: During the reconfiguration volop_rwsleep write lock enters the block mode. The snapshot operation takes the same lock in the read mode. The snapshot waits for a response from a node that has left the cluster. The leave processing is held up as the reconfiguration waits for the write lock in the block mode. Hence, the deadlock occurs. RESOLUTION: The code is modified so that reconfiguration takes the write lock in the non- block mode. If it is not possible to get the write lock, then the reconfiguration is restarted. This results in the leave being detected and the appropriate action is taken, so that there is no expectation of any response from the nodes that have left the cluster. As a result, the deadlock is avoided. * 3181315 (Tracking ID: 2898547) SYMPTOM: The 'vradmind' process dumps core on the Veritas Volume Replicator (VVR) secondary site in a Clustered Volume Replicator (CVR) environment. The stack trace would look like: __kernel_vsyscall raise abort fmemopen malloc_consolidate delete delete[] IpmHandle::~IpmHandle IpmHandle::events main DESCRIPTION: When log owner service group is moved across the nodes on the primary site, it induces the deletion of IpmHandle of the old log owner node, as the IpmHandle of the new log owner node gets created. During the destruction of IpmHandle object, a pointer '_cur_rbufp' is not set to NULL, which can lead to freeing up of memory which is already freed. This causes 'vradmind' to dump core. RESOLUTION: The code is modified for the destructor of IpmHandle to set the pointer to NULL after it is deleted. * 3181318 (Tracking ID: 3146715) SYMPTOM: The 'rinks' do not connect with the Network Address Translation (NAT) configurations on Little Endian Architecture (LEA). DESCRIPTION: On LEAs, the Internet Protocol (IP) address configured with the NAT mechanism is not converted from the host-byte order to the network-byte order. As a result, the address used for the rlink connection mechanism gets distorted and the 'rlinks' fail to connect. RESOLUTION: The code is modified to convert the IP address to the network-byte order before it is used. * 3183145 (Tracking ID: 2477418) SYMPTOM: In a VVR environment, if the logowner node on the secondary node performs under the low memory situation, the system panics with the following stack trace: vx_buffer_pack vol_ru_data_pack vol_ru_setup_verification vol_rv_create_logsio vol_rv_service_update vol_rv_service_message_start voliod_iohandle voliod_loop vol_kernel_thread_init ... DESCRIPTION: While receiving the updates on the secondary node, a retry of the receive operation is needed if VVR runs out of memory. In this situation, the retry gets overlapped for updates for which the receive operation is successful. This leads to a panic. RESOLUTION: The code is modified to correctly handle the retry operations. * 3189869 (Tracking ID: 2959733) SYMPTOM: When the device paths are moved across LUNs or enclosures, the vxconfigd(1M) daemon can dump core, or the data corruption can occur due to the internal data structure inconsistencies. The following stack trace is observed: ddl_reconfig_partial () ddl_reconfigure_all () ddl_find_devices_in_system () find_devices_in_system () req_discover_disks () request_loop () main () DESCRIPTION: When the device path configuration is changed after a planned or unplanned disconnection by moving only a subset of the device paths across LUNs or the other storage arrays (enclosures), DMP's internal data structure is inconsistent. This causes the vxconfigd(1M) daemon to dump core. Also, for some instances the data corruption occurs due to the incorrect LUN to the path mappings. RESOLUTION: The vxconfigd(1M) code is modified to detect such situations gracefully and modify the internal data structures accordingly, to avoid a vxconfigd(1M) daemon core dump and the data corruption. * 3224030 (Tracking ID: 2433785) SYMPTOM: In a CVM environment, the node join operation to the cluster node fails intermittently. The syslog entries observed are as following: Jun 17 19:51:19 sfqapa31 vxvm:vxconfigd: V-5-1-8224 slave:disk 1370256959.514.sfqapa40 not shared Jun 17 19:51:19 sfqapa31 vxvm:vxconfigd: V-5-1-7830 cannot find disk 1370256959.514.sfqapa40 Jun 17 19:51:19 sfqapa31 vxvm:vxconfigd: V-5-1-11092 cleanup_client: (Cannot find disk on slave node) 222 DESCRIPTION: In a clustered environment during the node join operation, the node joiner checks if the disk is shared or not and tries to re-read the configuration data from the disk. An improper check of the flag results in the node join failure. RESOLUTION: The code is modified to remove the incorrect check of the flag. * 3227719 (Tracking ID: 2588771) SYMPTOM: The system panics when the multi-controller enclosure is disabled. The following stack trace is observed: dmpCLARiiON_get_unit_path_report page_fault dmp_ioctl_by_bdev dmp_handle_delay_open gen_dmpnode_update_cur_pri dmp_start_failover gen_update_cur_pri dmp_update_cur_pri dmp_reconfig_update_cur_pri dmp_decipher_instructions dmp_process_instruction_buffer dmp_reconfigure_db dmp_compat_ioctl DESCRIPTION: When all the controllers associated with an enclosure are disabled one-by-one, the internal tasks are generated to update the current active path of the DMP devices. The race condition between these two activities leads to the system panic. RESOLUTION: The code is modified to start the update task after all the required controllers are disabled. * 3235365 (Tracking ID: 2438536) SYMPTOM: When a site is reattached after it is either manually detached or detached due to the storage inaccessibility, this results in the data corruption on the volume. The issue would be observed only on mirrored volume (mirrored across different sites). DESCRIPTION: When a site is reattached, possibly after split-brain, it is possible that a site-consistent volume is updated on each site independently. For such instances, the tracking map needs to be recovered from each site to take care of the updates done from both the sites. These maps are stored in Data Change Object (DCO). Recovery of DCO involves the map to be updated that tracks the detached mirror (detach map) from the active I/O tracking map. While this is performed it would update the last block of regions in the detach map from the previous block of the active map. This results in the data corruption of the detached map. RESOLUTION: The code is modified to ensure that the pointer to the active map buffer is updated correctly even for the last block. * 3238094 (Tracking ID: 3243355) SYMPTOM: The vxres_lvmroot(1M) utility which restores the Logical Volume Manager (LVM) root disk from the VxVM root disk fails with the following error message: VxVM vxres_lvmroot ERROR V-5-2-2493 Extending X MB LV for vg00 DESCRIPTION: When the LVM root disk is restored from the VxVM root disk, the 'vxres_lvmroot' creates the Logical Volumes (LVs) based on the Physical Extent (PEs) of the size that corresponds to the VxVM volume. If the volume size is less than the PE size, the vxres_lvmroot(1M) utility creates LV of the size 0, this fails the entire restoration command. RESOLUTION: The code is modified such that the vxres_lvmroot(1M) utility creates the LVs of at least the PE size, if the corresponding volume size is smaller than that of the PE size. * 3240788 (Tracking ID: 3158323) SYMPTOM: In a VVR environment, with multiple secondaries, if SRL overflows for rlinks at different times, it may result in the vxconfigd(1M) daemon to hang on the primary node. The stack trace observed is as following: vol_commit_iowait_objects() vol_commit_iolock_objects() vol_ktrans_commit() volconfig_ioctl() volsioctl_real() volsioctl() vols_ioctl() DESCRIPTION: When the first rlink tries to go into the DCM log mode, the transaction is timed out waiting for the NIO drain. The transaction is retried but it gets timed out. The Network I/O (NIO) drain does not happen because VVR dropped the acknowledgements while in the transaction mode. This results in the vxconfigd (1M) daemon hang. RESOLUTION: The code is modified to accept the acknowledgement even during the transaction phase. * 3242839 (Tracking ID: 3194358) SYMPTOM: Continuous I/O error messages on OS device and DMP node can be seen in the syslog associated with the EMC Symmetrix not-ready (NR) Logical units. DESCRIPTION: VxVM tries to online the EMC not-ready (NR) logical units. As part of the disk online process, it tries to read the disk label from the logical unit. Because the logical unit is NR the I/O fails. The failure messages are displayed in the syslog file. RESOLUTION: The code is modified to skip the disk online for the EMC NR LUNs. * 3245608 (Tracking ID: 3261485) SYMPTOM: The vxcdsconvert(1M) utility fails with following error messages: VxVM vxcdsconvert ERROR V-5-2-2777 : Unable to initialize the disk as a CDS disk VxVM vxcdsconvert ERROR V-5-2-2780 : Unable to move volume off of the disk VxVM vxcdsconvert ERROR V-5-2-3120 Conversion process aborted DESCRIPTION: As part of the conversion process, the vxcdsconvert(1M) utility moves all the volumes to some other disk, before the disk is initialized with the CDS format. On the VxVM formatted disks apart from the CDS format, the VxVM volume starts immediately in the PUBLIC partition. If LVM or FS was stamped on the disk, even after the data migration to some other disk within the disk group, this signature is not erased. As part of the vxcdsconvert operation, when the disk is destroyed, only the SLICED tags are erased but the partition table still exists. The disk is then recognized to have a file system or LVM on the partition where the PUBLIC region existed earlier. The vxcdsconvert(1M) utility fails because the vxdisksetup which is invoked internally to initialize the disk with the CDS format, prevents the disk initialization for any foreign FS or LVM. RESOLUTION: The code is modified so that the vxcdsconvert(1M) utility forcefully invokes the vxdisksetup(1M) command to erase any foreign format. * 3247983 (Tracking ID: 3248281) SYMPTOM: When the "vxdisk scandisks" or "vxdctl enable" commands are run consecutively, an error is displayed as following: VxVM vxdisk ERROR V-5-1-0 Device discovery failed. DESCRIPTION: The device discovery failure occurs because in some cases the variable that is passed to the OS specific function is not set properly. RESOLUTION: The code is modified to set the correct variable before the variable is passed to the OS specific function. * 3253306 (Tracking ID: 2876256) SYMPTOM: The "vxdisk set mediatype" command fails with the new naming scheme. The following error message is displayed: "VxVM vxdisk ERROR V-5-1-12952 Device not in configuration or associated with DG " For example, the "vxdisk set mediatype" command fails when the naming scheme is set to "new" instead of being set as "vxdisk set mediatype". DESCRIPTION: When the vxdisk(1M) command to set the media type is run on the newly named disk it fails. The case to handle the scenario to get the corresponding 'da name' of the 'new name' is not handled. RESOLUTION: The code is modified so that if the 'new name' is not found, then the corresponding 'da name' in the 'set media type' code path is retrieved. * 3256806 (Tracking ID: 3259926) SYMPTOM: The vxdmpadm(1M) command fails to enable the paths when the '-f' option is provided. The following error message is displayed: "VxVM vxdmpadm ERROR V-5-1-2395 Invalid arguments" DESCRIPTION: When the '-f' option is provided, the "vxdmpadm enable" command displays an error message, and fails to enable the paths that were disabled previously. This occurs because the improper argument value is passed to the respective function. RESOLUTION: The code is modified so that the "vxdmpadm enable" command successfully enables the paths when the '-f' option is provided. INSTALLING THE PATCH -------------------- Please refer to Release Notes for install instructions REMOVING THE PATCH ------------------ Please refer to Release Notes for uninstall instructions SPECIAL INSTRUCTIONS -------------------- Install latest VRTSaslapm package on 5.1SP1 along with this patch. OTHERS ------ NONE