README VERSION : 1.1 README CREATION DATE : 2014-04-10 PATCH-ID : 6.0.500.000 PATCH NAME : VRTSvxvm 6.0.500.000 BASE PACKAGE NAME : VRTSvxvm BASE PACKAGE VERSION : 6.0.100.0 SUPERSEDED PATCHES : 6.0.300.200 REQUIRED PATCHES : NONE INCOMPATIBLE PATCHES : NONE SUPPORTED PADV : sol11_x86 (P-PLATFORM , A-ARCHITECTURE , D-DISTRIBUTION , V-VERSION) PATCH CATEGORY : CORE , CORRUPTION , HANG , MEMORYLEAK , PANIC , PERFORMANCE PATCH CRITICALITY : CRITICAL HAS KERNEL COMPONENT : YES ID : NONE REBOOT REQUIRED : YES REQUIRE APPLICATION DOWNTIME : YES PATCH INSTALLATION INSTRUCTIONS: -------------------------------- Please refer to Install guide for install instructions PATCH UNINSTALLATION INSTRUCTIONS: ---------------------------------- Please refer to Install guide for uninstall instructions SPECIAL INSTRUCTIONS: --------------------- 1) Delete '.vxvm-configured' # rm /etc/vx/reconfig.d/state.d/.vxvm-configured 2) Refresh vxvm-configure # svcadm refresh vxvm-configure 3) Delete 'install-db' # rm /etc/vx/reconfig.d/state.d/install-db 4) Reboot the system using shutdown command. You need to use the shutdown command to reboot the system after patch installation or de-installation: #shutdown -g0 -y -i6 SUMMARY OF FIXED ISSUES: ----------------------------------------- PATCH ID:6.0.500.000 2885171 (2882566) You can successfully add a disk which is removed from a disk group using the vxdg rmdisk -k command to another disk group without any error messages. 2941224 (2921816) System panics while starting replication after disabling the DCM volumes. 2957567 (2957556) The vxdisksetup(1M) command fails when the tpdmode attribute is set to native and the enclosure-based naming scheme is on. 2960654 (2932214) "vxdisk resize" operation may cause the disk goes into "online invalid" state. 2974602 (2986596) The disk groups imported with mix of standard and clone logical unit numbers (LUNs) may lead to data corruption. 2999881 (2999871) The vxinstall(1M) command gets into a hung state when it is invoked through Secure Shell (SSH) remote execution. 3022349 (3052770) The vradmin syncrvg operation with a volume set fails to synchronize the secondary RVG with the primary RVG. 3032358 (2952403) Shared disk group fails to destroy if master has lost storage. 3033904 (2308875) vxddladm(1M) list command options (hbas, ports, targets) don't display the correct values for the state attribute. 3036949 (3045033) "vxdg init" should not create a disk group on clone disk that was previously part of a disk group. 3042507 (2101093) A system panic is observed in the dmp_signal_event() function. 3043206 (3038684) The restore daemon enables the paths of Business Continuance Volumes-Not Ready (BCV-NR) devices. 3049356 (3060327) The vradmin repstatus(1M) shows "dcm contains 0 kbytes" during the Smart Autosync. 3094185 (3091916) The Small Computer System Interface (SCSI) I/O errors overflow the syslog. 3144764 (2398954) The system panics while performing I/O on a VxFS mounted instant snapshot with the Oracle Disk Manager (ODM) SmartSync enabled. 3157342 (3116990) When you run Veritas Volume Manager (VxVM) commands such as 'vxdisk scandisks', 'vxdctl enable', and 'vxconfigd restart' on write protected hardware mirror LUNs, error messages are displayed in the console logs. 3189041 (3130353) Continuous disable or enable path messages are seen on the console for EMC Not Ready (NR) devices. 3195695 (2954455) During Dynamic Reconfiguration Operations in vxdiskadm, when a pattern is specified to match a range of LUNs for removal, the pattern is matched erroneously. 3197460 (3098559) Cluster File System (CFS) data corrupted due to cloned copy of logical unit numbers (LUNs) that is imported with volume asymmetry. 3254133 (3240858) The /etc/vx/vxesd/.udev_lock file may have different permissions at different instances. 3254199 (3015181) I/O hangs on both the nodes of the cluster when the disk array is disabled. 3254201 (3121380) I/O of replicated volume group (RVG) hangs after one data volume is disabled. 3254204 (2882312) If an SRL fault occurs in the middle of an I/O load, and you immediately issue a read operation on data written during the SRL fault, the system returns old data. 3254205 (3162418) The vxconfigd(1M) command dumps core due to wrong check in ddl_find_cdevno() function. 3254231 (3010191) Previously excluded paths are not excluded after upgrade to VxVM 5.1SP1RP3. 3254233 (3012929) The vxconfigbackup(1M) command gives errors when disk names are changed. 3254301 (3199056) Veritas Volume Replicator (VVR) primary system panics in the vol_cmn_err function due to the VVR corrupted queue. 3261607 (3261601) System panics when dmp_destroy_dmpnode() attempts to free an already free virtual address. 3264166 (3254311) System panics when reattaching site to a site-consistent diskgroup having volume larger than 1.05 TB 3271596 (3271595) Veritas Volume Manager (VxVM) should prevent the disk reclaim flag from getting turned off, when there are pending reclaims on the disk. 3306164 (2972513) In CVM, PGR keys from shared data disks are not removed after stopping VCS. 3309931 (2959733) Handling the device path reconfiguration in case the device paths are moved across LUNs or enclosures to prevent the vxconfigd(1M) daemon coredump. 3344127 (2969844) The device discovery failure should not cause the DMP database to be destroyed completely. 3344128 (2643506) vxconfigd dumps core when LUNs from the same enclosure are presented as different types, say A/P and A/P-F. 3344129 (2910367) When SRL on the secondary site disabled, secondary panics. 3344130 (2825102) CVM reconfiguration and VxVM transaction code paths can simultaneously access volume device list resulting in data corruption. 3344132 (2860230) In a Cluster Volume Manager (CVM) environment, the shared disk remains as opaque after execution of vxdiskunsetup(1M)command on a master node. 3344134 (3011405) Execution of "vxtune -o export" command fails and displays an error message. 3344138 (3041014) Beautify error messages seen during relayout operation. 3344140 (2966990) In a Veritas Volume Replicator (VVR) environment, the I/O hangs at the primary side after multiple cluster reconfigurations are triggered in parallel. 3344142 (3178029) When you synchronize a replicated volume group (RVG), the diff string is over 100%. 3344143 (3101419) In CVR environment, I/Os to the data volumes in an RVG experience may temporary hang during the SRL overflow with the heavy I/O load. 3344145 (3076093) The patch upgrade script "installrp" can panic the system while doing a patch upgrade. 3344148 (3111062) When diffsync is executed, vxrsync gets the following error in lossy networks: VxVM VVR vxrsync ERROR V-5-52-2074 Error opening socket between [HOST1] and [HOST2] -- [Connection timed out] 3344150 (2992667) When new disks are added to the SAN framework of the Virtual Intelligent System (VIS) appliance and the Fibre Channel (FC) switcher is changed to the direct connection, the "vxdisk list" command does not show the newly added disks even after the "vxdisk scandisks" command is executed. 3344161 (2882412) The 'vxdisk destroy' command uninitializes a VxVM disk which belongs to a deported disk group. 3344167 (2979824) The vxdiskadm(1M) utility bug results in the exclusion of the unintended paths. 3344175 (3114134) The Smart (sync) Autosync feature fails to work and instead replicates the entire volume size for larger sized volumes. 3344272 (2106530) The vxresize(1M) command fails on the data volume in rootdg if the file system is mounted using block device reference as 'bootdg'. 3344273 (2165920) The vxrelocd(1M) daemon creates a defunct (zombie) process. 3344274 (3185471) If there are iSCSI LUNs visible to the host, then VM imports all the disk groups which are visible to the host, regardless of the noautoimport flag. 3344275 (3038382) The vxlufinish(1M) command runs 'fuser -k' on non-root file systems, which is unexpected. 3344276 (3002498) When a disk is initialized with the "vxdisk -f init " command, vxconfigd(1M) dumps core. 3344277 (3138849) "ERROR: Configuration daemon is not accessible" message is displayed during the boot process. 3344278 (3131071) VxVM patch installation in Solaris Alternate Boot Environment (ABE) results in data corruption. 3344280 (2422535) Changes on the Veritas Volume Manager (VxVM) recovery operations are not retained after the patch or package upgrade. 3344286 (2933688) When the 'Data corruption protection' check is activated by Dynamic Mult- Pathing (DMP), the device- discovery operation aborts, but the I/O to the affected devices continues, this results in data corruption. 3347380 (3031796) Snapshot reattach operation fails if any other snapshot of the primary volume is not accessible. 3349877 (2685230) In a Cluster Volume Replicator (CVR) environment, if the SRL is resized and the logowner is switched to and from the master node to the slave node, then there could be a SRL corruption that leads to the Rlink detach. 3349917 (2952553) Refresh of a snapshot should not be allowed from a different source volume without force option. 3349939 (3225660) The Dynamic Reconfiguration (DR) tool does not list thin provisioned LUNs during a LUN removal operation. 3349985 (3065072) Data loss occurs during the import of a clone disk group, when some of the disks are missing and the import "useclonedev" and "updateid" options are specified. 3349990 (2054606) During the DMP driver unload operation the system panics. 3350000 (3323548) In the Cluster Volume Replicator (CVR) environment, a cluster-wide vxconfigd hang occurs on primary when you start the cache object. 3350019 (2020017) Cluster node panics when mirrored volumes are configured in the cluster. 3350027 (3239521) When you do the PowerPath pre-check, the Dynamic Reconfiguration (DR) tool displays the following error message: 'Unable to run command [/sbin/powermt display]' and exits. 3350232 (2993667) Veritas Volume Manager (VxVM) allows setting the Cross-platform Data Sharing (CDS) attribute for a disk group even when a disk is missing, because it experienced I/O errors. 3350235 (3084449) The shared flag sets during the import of private disk group because a shared disk group fails to clear due to minor number conflict error during the import abort operation. 3350241 (3067784) The grow and shrink operations by the vxresize(1M) utility may dump core in vfprintf() function. 3350265 (2898324) UMR errors reported by Purify tool in "vradmind migrate" command. 3350288 (3120458) In cluster volume replication (CVR) in data change map (DCM) mode, cluster-wide vxconfigd hang is seen when one of the nodes is stopped. 3350293 (2962010) The replication hangs when the Storage Replicator Log (SRL) is resized. 3350787 (2969335) The node that leaves the cluster node while the instant operation is in progress, hangs in the kernel and cannot join back to the cluster node unless it is rebooted. 3350789 (2938710) The vxassist(1M) command dumps core during the relayout operation . 3350979 (3261485) The vxcdsconvert(1M) utility failed with the error "Unable to initialize the disk as a CDS disk". 3350989 (3152274) The dd command to SRDF-R2 (write disable) device hangs, which causes the vm command hangs for a long time. But no issues with the Operating System (OS)devices. 3351005 (2933476) The vxdisk(1M) command resize fails with a generic error message. Failure messages need to be more informative. 3351035 (3144781) In the Veritas Volume Replicator (VVR) environment, execution of the vxrlink pause command causes a hang on the secondary node if the rlink disconnect is already in progress. 3351075 (3271985) In Cluster Volume Replication (CVR), with synchronous replication, aborting a slave node from the Cluster Volume Manager (CVM) cluster makes the slave node panic. 3351092 (2950624) vradmind fails to operate on the new master when a node leaves the cluster. 3351125 (2812161) In a Veritas Volume Replicator (VVR) environment, after the Rlink is detached, the vxconfigd(1M) daemon on the secondary host may hang. 3351922 (2866299) The NEEDSYNC flag set on volumes in a Replicated Volume Group (RVG) not getting cleared after the vxrecover command is run. 3352027 (3188154) The vxconfigd(1M) daemon does not come up after enabling the native support and rebooting the host. 3352208 (3049633) In Veritas Volume Replicator (VVR) environment, the VxVM configuration daemon vxconfigd(1M) hangs on secondary node when all disk paths are disabled on secondary node. 3352218 (3268905) After reboot, the non-root zpools created using DMP device go into FAULTED state and DMP device state is shown as UNAVAIL. 3352226 (2893530) With no VVR configuration, when system is rebooted, it panicked. 3352247 (2929206) When turning on the dmp_native_support tunable with Solaris10 U10 and onwards, the Zettabyte File System (ZFS) pools are seen on the OS device paths but not on the dynamic multipathing (DMP) devices. 3352282 (3102114) A system crash during the 'vxsnap restore' operation can cause the vxconfigd (1M) daemon to dump core after the system reboots. 3352963 (2746907) The vxconfigd(1M) daemon can hang under the heavy I/O load on the master node during the reconfiguration. 3353059 (2959333) The Cross-platform Data Sharing (CDS) flag is not listed for disabled CDS disk groups. 3353064 (3006245) While executing a snapshot operation on a volume which has 'snappoints' configured, the system panics infrequently. 3353244 (2925746) In the cluster volume manager (CVM) environment, cluster-wide vxconfigd may hang during CVM reconfiguration. 3353953 (2996443) In a cluster volume replication (CVR) environment, log owner name mismatch configuration error is seen on Slave nodes after it brings down the master node. 3353985 (3088907) A node in a Cluster Volume Manager can panic while destroying a shared disk group. 3353990 (3178182) During a master take over task, shared disk group re-import operation fails due to false serial split brain (SSB) detection. 3353995 (3146955) Remote disks (lfailed or lmissing disks) go into the "ONLINE INVALID LFAILED" or "ONLINE INVALID LMISSING" state after the disk loses global disk connectivity. 3353997 (2845383) The site gets detached if the plex detach operation is performed with the site- consistency set to off. 3354023 (2869514) In the clustered environment with large Logical unit number (LUN) configuration, the node join process takes long time. 3354024 (2980955) Disk group (dg) goes into disabled state if vxconfigd(1M) is restarted on new master after master switch. 3354028 (3136272) The disk group import operation with the "-o noreonline" option takes additional import time. 3355830 (3122828) Dynamic Reconfiguration (DR) tool lists the disks which are tagged with Logical Volume Manager (LVM), for removal or replacement. 3355856 (2909668) In case of multiple sets of the cloned disks of the same source disk group, the import operation on the second set of the clone disk fails, if the first set of the clone disks were imported with "updateid". 3355878 (2735364) The "clone_disk" disk flag attribute is not cleared when a cloned disk group is removed by the "vxdg destroy " command. 3355883 (3085519) Missing disks are permanently detached from the disk group because -o updateid and tagname options are used to import partial disks. 3355971 (3289202) Handle KMSG_EPURGE error in CVM disk connectivity protocols. 3355973 (3003991) The vxdg adddisk command hangs when paths for all the disks in the disk group are disabled. 3356836 (3125631) Snapshot creation on volume sets may fail with the error: "vxsnap ERROR V-5-1-6433 Component volume has changed". 3361957 (2912263) On Solaris LDOMs, the "vxdmpadm exclude" command fails to exclude a controller. 3361977 (2236443) Disk group import failure should be made fencing aware, in place of VxVM vxdmp V-5-0-0 i/o error message. 3361998 (2957555) The vxconfigd(1M) daemon on the CVM master node hangs in the userland during the vxsnap(1M) restore operation. 3362065 (2861011) The "vxdisk -g resize " command fails with an error for the Cross-platform Data Sharing(CDS) formatted disk. 3362087 (2916911) The vxconfigd(1M) daemon sends a VOL_DIO_READ request before the device is open. This may result in a scenario where the open operation fails but the disk read or write operations proceeds. 3362114 (2856579) When a disk is resized from less than 1TB to greater than 1TB, "EFI PART" is missing in the primary label. 3362144 (1942051) IO hangs on a master node after disabling the secondary paths from slave node and rebooting the slave node. 3362948 (2599887) The DMP device paths that are marked as "Disabled" cannot be excluded from VxVM control. 3365287 (2973786) Dynamic Reconfiguration (DR) tool pre-check fails for "Is OS & VM Device Tree in Sync" item if dmp_native_support is on. 3365295 (3053073) Dynamic Reconfiguration (DR) Tool doesn't pick thin LUNs in "online invalid" state for disk remove operation. 3365313 (3067452) If new LUNs are added in the cluster, and its naming scheme has the avid set option set to 'no', then DR (Dynamic Reconfiguration) Tool changes the mapping between dmpnode and disk record. 3365321 (3238397) Dynamic Reconfiguration (DR) Tool's Remove LUNs option does not restart the vxattachd daemon. 3365390 (3222707) Dynamic Reconfiguration (DR) tool does not permit the removal of disks associated with a deported diskgroup(dg). 3368953 (3368361) When site consistency is configured within a private disk group and CVM is up, the reattach operation of a detached site fails. 3378817 (3143622) Initializing power-path managed disks through VxVM generates error message. 3384633 (3142315) Disk is misidentified as clone disk with udid_mismatch flag. 3384636 (3244217) Cannot reset the clone_disk flag during vxdg import. 3384662 (3127543) Non-labeled disks go into udid_mismatch after vxconfigd restart. 3384697 (3052879) Auto import of the cloned disk group fails after reboot even when source disk group is not present. 3384986 (2996142) Data is corrupted or lost if the mapping from disk access (DA) to Data Module (DM) of a disk is incorrect. 3386843 (3279932) The vxdisksetup and vxdiskunsetup utilities were failing on disk which is part of a deported disk group (DG), even if "-f" option is specified. 3395499 (3373142) Updates to vxassist and vxedit man pages for behavioral changes after 6.0. 3401836 (2790864) For OTHER_DISKS enclosure, the vxdmpadm config reset CLI fails while trying to reset IO Policy value. 3404455 (3247094) The DR (Dynamic Reconfiguration) tool is unable to apply SMI label for newly added devices which had EFI label. 3404625 (3133908) DR (Dynamic Reconfiguration) Tool throws cfgadm(1M) usage message while adding the LUN. 3405032 (3451625) Operating System fails to boot after encapsulation of root disk in LDom (Logical Domain) guest. 3405318 (3259732) In a CVR environment, rebooting the primary slave followed by connect-disconnect in loop causes rlink to detach. 3408321 (3408320) Thin reclamation fails for EMC 5875 arrays. 3409473 (3409612) The value of reclaim_on_delete_start_time cannot be set to values outside the range: 22:00-03:59 3413044 (3400504) Upon disabling the host side Host Bus Adapter (HBA) port, extended attributes of some devices are not seen anymore. 3414151 (3280830) Multiple vxresize operations on a layered volume fail with error message "There are other recovery activities. Cannot grow volume" 3414265 (2804326) In the Veritas Volume Replicator (VVR) environment, secondary logging is seen in effect even if Storage Replicator Log (SRL) size mismatch is seen across primary and secondary. 3416320 (3074579) The "vxdmpadm config show" CLI does not display the configuration file name which is present under the root(/) directory. 3416406 (3099796) The vxevac command fails on volumes having Data Change Object (DCO) log. The error message "volume is not using the specified disk name" is displayed. 3417081 (3417044) System becomes unresponsive while creating a VVR TCP connection. 3417672 (3287880) In a clustered environment, if a node doesn't have storage connectivity to clone disks, then the vxconfigd on the node may dump core during the clone disk group import. 3419831 (3435475) The vxcdsconvert(1M) conversion process gets aborted for a thin LUN formatted as a simple disk with Extensible Firmware Interface (EFI) format. 3423613 (3399131) For Point Patch (PP) enclosure, both DA_TPD and DA_COEXIST_TPD flags are set. 3423644 (3416622) The hot-relocation feature fails for a corrupted disk in the CVM environment. 3424795 (3424798) Veritas Volume Manager (VxVM) mirror attach operations (e.g., plex attach, vxassist mirror, and third-mirror break-off snapshot resynchronization) may take longer time under heavy application I/O load. 3427124 (3435225) In a given CVR setup, rebooting the master node causes one of the slaves to panic. 3427480 (3163549) vxconfigd(1M) hangs on master node if slave node joins the master having disks which are missing on master. 3430318 (3435041) The "vxdmpadm settune dmp_native_support=on" CLI fails with script errors. 3434079 (3385753) Replication to the Disaster Recovery (DR) site hangs even though Replication links (Rlinks) are in the connected state. 3434080 (3415188) I/O hangs during replication in Veritas Volume Replicator (VVR). 3434189 (3247040) vxdisk scandisks enables the PowerPath (PP) enclosure which was disabled previously. 3435000 (3162987) The disk has a UDID_MISMATCH flag in the vxdisk list output. 3435008 (2958983) Memory leak is observed during the reminor operations. 3439102 (3441356) Pre-check of the upgrade_start.sh script fails on Solaris. 3442602 (3046560) The vradmin syncrvg command fails with the error: "VxVM VVR vxrsync ERROR V-5-52-2009 Could not open device [devicename] due to: DKIOCGVTOC ioctl to raw character volume failed". 3446452 (3090488) Memory leaks occur in the device discovery code path of Veritas Volume Manager (VxVM). 3461200 (3163970) The "vxsnap -g syncstart " command is unresponsive on the Veritas Volume Replicator (VVR) DR site. PATCH ID:6.0.300.200 3358311 (3152769) DMP path failover takes time in the Solaris LDOM environment when 1 I/O domain is down. 3358313 (3194358) The continuous messages displayed in the syslog file with EMC not-ready (NR) LUNs. 3358342 (2724067) Enhance vxdisksetup CLI to specify disk label content to label all corresponding paths for DMP device 3358345 (2091520) The ability to move the configdb placement from one disk to another using "vxdisk set keepmeta=[always|skip|default]" command. 3358346 (3353211) A. After EMC Symmetrix BCV (Business Continuance Volume) device switches to read-write mode, continuous vxdmp (Veritas Dynamic Multi Pathing) error messages flood syslog. B. DMP metanode/path under DMP metanode gets disabled unexpectedly. 3358347 (3057554) The VxVM (Veritas Volume Manager) command "vxdiskunsetup -o shred =1" failed for EFI (Extensible Firmware Interface) disks on Solaris X86 system. 3358348 (2665425) The vxdisk -px "attribute" list(1M) Command Line Interface (CLI) does not support some basic VxVM attributes 3358350 (3189830) When you use the 'Mirror volumes on a disk' option of the vxdiskadm(1M) command for a root disk, you get an error. 3358351 (3158320) VxVM (Veritas Volume Manager) command "vxdisk -px REPLICATED list (disk)" displays wrong output. 3358352 (3326964) VxVM hangs in Clustered Volume Manager (CVM) environments in the presence of FMR operations. 3358353 (3271315) The vxdiskunsetup command with the shred option fails to shred sliced or simple disks on Solaris X86 platform. 3358354 (3332796) Getting message: VxVM vxisasm INFO V-5-1-0 seeking block #... while initializing disk that is not ASM disk. 3358367 (3230148) Clustered Volume Manager (CVM) hangs during split brain testing. 3358368 (3249264) Veritas Volume Manager (VxVM) thin disk reclamation functionality causes disk label loss, private region corruption and data corruption. 3358369 (3250369) Execution of vxdisk scandisks command causes endless I/O error messages in syslog. 3358370 (2921147) udid_mismatch flag is absent on a clone disk when source disk is unavailable. 3358371 (3125711) When the secondary node is restarted and the reclaim operation is going on the primary node, the system panics. 3358372 (3156295) When DMP native support is enabled for Oracle Automatic Storage Management (ASM) devices, the permission and ownership of /dev/raw/raw# devices goes wrong after reboot. 3358373 (3218013) Dynamic Reconfiguration (DR) Tool does not delete the stale OS (Operating System) device handles. 3358374 (3237503) System hangs after creating space-optimized snapshot with large size cache volume. 3358377 (3199398) Output of the command "vxdmpadm pgrrereg" depends on the order of DMP node list where the terminal output depends on the last LUN (DMP node) 3358379 (1783763) In a Veritas Volume Replicator (VVR) environment, the vxconfigd(1M) daemon may hang during a configuration change operation. 3358380 (2152830) A diskgroup (DG) import fails with a non-descriptive error message when multiple copies (clones) of the same device exist and the original devices are either offline or not available. 3358381 (2859470) The Symmetrix Remote Data Facility R2 (SRDF-R2) with the Extensible Firmware Interface (EFI) label is not recognized by Veritas Volume Manager (VxVM) and goes in an error state. 3358382 (3086627) The "vxdisk -o thin,fssize list" command fails with error message V-5-1-16282 3358404 (3021970) A secondary node panics due to NULL pointer dereference when the system frees an interlock. 3358405 (3026977) The Dynamic Reconfiguration (DR) option with vxdiskadm(1M) removes Logical Unit Numbers (LUNs) which even are not in the Failing or Unusable state. 3358407 (3107699) VxDMP causes system panic after a shutdown/reboot. 3358408 (3115206) When ZFS root and the 'dmp_native_support' tunable are enabled, system panics along with a stack trace. 3358414 (3139983) Failed I/Os from SCSI are retried only on very few paths to a LUN instead of utilizing all the available paths, and may result in DMP sending I/O failures to the application bounded by the recovery option tunable. 3358416 (3312162) Data corruption may occur on the Secondary Symantec Volume Replicator (VVR) Disaster Recovery (DR) Site. 3358417 (3325122) In a Clustered Volume Replicator (CVR) environment, when you create stripe-mirror volumes with logtype=dcm, creation may fail. 3358418 (3283525) The vxconfigd(1M) daemon hangs due to Data Change Object (DCO) corruption after volume resize. 3358420 (3236773) Multiple error messages of the same format are displayed during setting or getting the failover mode for EMC Asymmetric Logical Unit Access (ALUA) disk array. 3358423 (3194305) In the Veritas Volume Replicator (VVR) environment, replication status goes in a paused state. 3358429 (3300418) VxVM volume operations on shared volumes cause unnecessary read I/Os. 3358430 (3258276) The system panics when the Dynamic Multi-Pathing (DMP) cache open parameter is enabled. 3358433 (3301470) All cluster volume replication (CVR) nodes panic repeatedly due to null pointer dereference in vxio. 3362234 (2994976) System panics during mirror break-off snapshot creation or plex detach operation in vol_mv_pldet_callback() function. 3365296 (2824977) The Command Line Interface (CLI) "vxdmpadm setattr enclosure failovermode" which is meant for Asymmetric Logical Unit Access (ALUA) type of arrays fails with an error on certain arrays without providing an appropriate reason for the failure. 3366688 (2957645) When the vxconfigd daemon/command is restarted, the terminal gets flooded with error messages. 3366703 (3056311) For release < 5.1 SP1, allow disk initialization with CDS format using raw geometry. 3368236 (3327842) In the Cluster Volume Replication (CVR) environment, with IO load on Primary and replication going on, if the user runs the vradmin resizevol(1M) command on Primary, often these operations terminate with error message "vradmin ERROR Lost connection to host". 3371422 (3087893) EMC TPD emcpower names are changing every reboot with VxVM 3371753 (3081410) Dynamic Reconfiguration (DR) tool fails to pick up any disk for LUNs removal operation. 3373213 (3373208) DMP wrongly sends the SCSI PR OUT command with APTPL bit value as A0A to arrays. 3374166 (3325371) Panic occurs in the vol_multistepsio_read_source() function when snapshots are used. 3374735 (3423316) The vxconfigd(1M) daemon observes a core dump while executing the vxdisk(1M) scandisks command. 3375424 (3250450) In the presence of a linked volume, running the vxdisk(1M) command with the -o thin, fssize list option causes the system to panic. 3376953 (3372724) When the user installs VxVM, the system panics with a warning. 3377209 (3377383) The vxconfigd crashes when a disk under Dynamic Multi-pathing (DMP) reports device failure. 3381922 (3235350) I/O on grown region of a volume leads to system panic if the volume has instant snapshot. 3383673 (3147241) The pkgchk(1M) command on VRTSvxvm fails on Solaris x86. 3387405 (3019684) I/O hang is observed when SRL is about to overflow after the logowner switches from slave to master. 3387417 (3107741) The vxrvg snapdestroy command fails with the "Transaction aborted waiting for io drain" error message. PATCH ID:6.0.300.100 2892702 (2567618) The VRTSexplorer dumps core in vxcheckhbaapi/print_target_map_entry. 3090670 (3090667) The system panics or hangs while executing the "vxdisk -o thin,fssize list" command as part of Veritas Operations Manager (VOM) Storage Foundation (SF) discovery. 3099508 (3087893) EMC TPD emcpower names are changing every reboot with VxVM 3133012 (3160973) vxlist hangs while detecting foreign disk format on EFI disk. 3140411 (2959325) The vxconfigd(1M) daemon dumps core while performing the disk group move operation. 3150893 (3119102) Support LDOM Live Migration with fencing enabled 3156719 (2857044) System crash on voldco_getalloffset when trying to resize filesystem. 3159096 (3146715) 'Rlinks' do not connect with NAT configurations on Little Endian Architecture. 3254227 (3182350) VxVM volume creation or size increase hangs 3254229 (3063378) VM commands are slow when Read Only disks are presented 3254427 (3182175) "vxdisk -o thin,fssize list" command can report incorrect File System usage data 3280555 (2959733) Handling device path reconfiguration incase the device paths are moved across LUNs or enclosures to prevent vxconfigd coredump. 3294641 (3107741) vxrvg snapdestroy fails with "Transaction aborted waiting for io drain" error and vxconfigd hangs for around 45 minutes 3294642 (3019684) IO hang on master while SRL is about to overflow PATCH ID:6.0.300.000 2853712 (2815517) vxdg adddisk allows mixing of clone & non-clone disks in a DiskGroup. 2860207 (2859470) EMC SRDF (Symmetrix Remote Data Facility) R2 disk with EFI label is not recognized by VxVM (Veritas Volume Manager) and its shown in error state. 2863672 (2834046) NFS migration failed due to device reminoring. 2863708 (2836528) Unable to grow LUN dynamically on Solaris x86 using "vxdisk resize" command. 2876865 (2510928) The extended attributes reported by "vxdisk -e list" for the EMC SRDF luns are reported as "tdev mirror", instead of "tdev srdf-r1". 2892499 (2149922) Record the diskgroup import and deport events in syslog 2892571 (1856733) Support for FusionIO on Solaris x64 2892590 (2779580) Secondary node gives configuration error (no Primary RVG) after reboot of master node on Primary site. 2892621 (1903700) Removing mirror using vxassist does not work. 2892630 (2742706) Panic due to mutex not being released in vxlo_open 2892643 (2801962) Growing a volume takes significantly large time when the volume has version 20 DCO attached to it. 2892650 (2826125) VxVM script daemon is terminated abnormally on its invocation. 2892660 (2000585) vxrecover doesn't start remaining volumes if one of the volumes is removed during vxrecover command run. 2892665 (2807158) On Solaris platform, sometimes system can hang during VM upgrade or patch installation. 2892682 (2837717) "vxdisk(1M) resize" command fails if 'da name' is specified. 2892684 (1859018) "link detached from volume" warnings are displayed when a linked-breakoff snapshot is created 2892689 (2836798) In VxVM, resizing simple EFI disk fails and causes system panic/hang. 2892698 (2851085) DMP doesn't detect implicit LUN ownership changes for some of the dmpnodes 2892702 (2567618) VRTSexplorer coredumps in checkhbaapi/print_target_map_entry. 2892716 (2753954) When a cable is disconnected from one port of a dual-port FC HBA, the paths via another port are marked as SUSPECT PATH. 2922770 (2866997) VxVM Disk initialization fails as an un-initialized variable gets an unexpected value after OS patch installation. 2922798 (2878876) vxconfigd dumps core in vol_cbr_dolog() due to race between two threads processing requests from the same client. 2924117 (2911040) Restore from a cascaded snapshot leaves the volume in unusable state if any cascaded snapshot is in detached state. 2924188 (2858853) After master switch, vxconfigd dumps core on old master. 2924207 (2886402) When re-configuring devices, vxconfigd hang is observed. 2930399 (2930396) The vxdmpasm command (in 5.1SP1 release) and the vxdmpraw command (in 6.0 release) do not work on Solaris platform. 2933467 (2907823) If the user removes the lun at the storage layer and at VxVM layer beforehand, DMP DR tool is unable to clean-up cfgadm (leadville) stack. 2933468 (2916094) Enhancements have been made to the Dynamic Reconfiguration Tool(DR Tool) to create a separate log file every time DR Tool is started, display a message if a command takes longer time, and not to list the devices controlled by TPD (Third Party Driver) in 'Remove Luns' option of DR Tool. 2933469 (2919627) Dynamic Reconfiguration tool should be enhanced to remove LUNs feasibly in bulk. 2934259 (2930569) The LUNs in 'error' state in output of 'vxdisk list' cannot be removed through DR(Dynamic Reconfiguration) Tool. 2940447 (2940446) Full fsck hangs on I/O in VxVM when cache object size is very large 2941167 (2915751) Solaris machine panics during dynamic lun expansion of a CDS disk. 2941193 (1982965) vxdg import fails if da-name is based on naming scheme which is different from the prevailing naming scheme on the host 2941226 (2915063) Rebooting VIS array having mirror volumes, master node panicked and other nodes CVM FAULTED 2941234 (2899173) vxconfigd hang after executing command "vradmin stoprep" 2941237 (2919318) The I/O fencing key value of data disk are different and abnormal in a VCS cluster with I/O fencing. 2941252 (1973983) vxunreloc fails when dco plex is in DISABLED state 2942166 (2942609) Message displayed when user quits from Dynamic Reconfiguration Operations is shown as error message. 2944708 (1725593) The 'vxdmpadm listctlr' command has to be enhanced to print the count of device paths seen through the controller. 2944710 (2744004) vxconfigd is hung on the VVR secondary node during VVR configuration. 2944714 (2833498) vxconfigd hangs while reclaim operation is in progress on volumes having instant snapshots 2944717 (2851403) System panic is seen while unloading "vxio" module. This happens whenever VxVM uses SmartMove feature and the "vxportal" module gets reloaded (for e.g. during VxFS package upgrade) 2944722 (2869594) Master node panics due to corruption if space optimized snapshots are refreshed and 'vxclustadm setmaster' is used to select master. 2944724 (2892983) vxvol dumps core if new links are added while the operation is in progress. 2944725 (2910043) Avoid order 8 allocation by vxconfigd in node reconfig. 2944727 (2919720) vxconfigd core in rec_lock1_5() 2944729 (2933138) panic in voldco_update_itemq_chunk() due to accessing invalid buffer 2944741 (2866059) Improving error messages hit during vxdisk resize operation 2962257 (2898547) vradmind on VVR Secondary Site dumps core, when Logowner Service Group on VVR (Veritas Volume Replicator) Primary Site is shuffled across its CVM (Clustered Volume Manager) nodes. 2964567 (2964547) About DMP message - cannot load module 'misc/ted'. 2974870 (2935771) In VVR environment, RLINK disconnects after master switch. 2976946 (2919714) On a THIN lun, vxevac returns 0 without migrating unmounted VxFS volumes. 2976956 (1289985) vxconfigd core dumps upon running "vxdctl enable" command 2976974 (2875962) During the upgrade of VRTSaslapm package, a conflict is encountered with VRTSvxvm package because an APM binary is included in VRTSvxvm package which is already installed 2978189 (2948172) Executing "vxdisk -o thin,fssize list" command can result in panic. 2979767 (2798673) System panics in voldco_alloc_layout() while creating volume with instant DCO 2983679 (2970368) Enhance handling of SRDF-R2 Write-Disabled devices in DMP. 3004823 (2692012) vxevac move error message needs to be enhanced to be less generic and give clear message for failure. 3004852 (2886333) vxdg join command should not allow mixing clone & non-clone disks in a DiskGroup 3005921 (1901838) Incorrect setting of Nolicense flag can lead to dmp database inconsistency. 3006262 (2715129) Vxconfigd hangs during Master takeover in a CVM (Clustered Volume Manager) environment. 3011391 (2965910) Volume creation with vxassist using "-o ordered alloc=<disk-class>" dumps core. 3011444 (2398416) vxassist dumps core while creating volume after adding attribute "wantmirror=ctlr" in default vxassist rulefile 3020087 (2619600) Live migration of virtual machine having SFHA/SFCFSHA stack with data disks fencing enabled, causes service groups configured on virtual machine to fault. 3025973 (3002770) Accessing NULL pointer in dmp_aa_recv_inquiry() caused system panic. 3026288 (2962262) Uninstall of dmp fails in presence of other multipathing solutions 3027482 (2273190) Incorrect setting of UNDISCOVERED flag can lead to database inconsistency SUMMARY OF KNOWN ISSUES: ----------------------------------------- 2892689(2836798) Dynamic LUN expansion is not supported for EFI disks in simple or sliced formats 2949012(2951032) DR Tool is not supported in Solaris x86 machine. 3037620(2979786) The SCSI registration keys are not removed if VCS engine is stopped for the second time. 3041167(3107699) System panics because dmp_signal_event() called psignal() with incorrect vxesd proc pointer 3114107(2939321) vxlufinish fails during upgrade to Solaris 10 update 11 due to luumount or luactivate failure. 3399012(3422185) Reclamation of storage fails with error 3404678(3405223) The Dynamic Reconfiguration tool is not supported inside the guest domain for Oracle VM Server for SPARC. 3445201(2705055) Duplicate disk access(da) on vxdisk list. 3483447(3418222) The VVR vradmin addsec command fails on Solaris 11 KNOWN ISSUES : -------------- * INCIDENT NO::2892689 TRACKING ID ::2836798 SYMPTOM:: Dynamic LUN expansion is not supported for EFI (Extensible Firmware Interface) disks in simple or sliced formats. The recommended format is the Cross-platform Data Sharing (CDS) disk format. WORKAROUND:: Convert the disk format to CDS using the vxcdsconvert utility. * INCIDENT NO::2949012 TRACKING ID ::2951032 SYMPTOM:: As the Dynamic Reconfiguration (DR) script does not contain i386 as list of supported architecture, pre check for DR Tool fails. WORKAROUND:: NONE * INCIDENT NO::3037620 TRACKING ID ::2979786 SYMPTOM:: If VCS engine is stopped for the first time, the SCSI registration keys are removed. But if VCS engine is stopped for the second time, the keys are not removed. WORKAROUND:: None * INCIDENT NO::3041167 TRACKING ID ::3107699 SYMPTOM:: On Solaris 11 SPARC SRU1, system panics after executing the vxrecover operation on the master node. WORKAROUND:: None for this issue in 6.0.5. * INCIDENT NO::3114107 TRACKING ID ::2939321 SYMPTOM:: vxlufinish fails with following error- ------ Generating boot-sign for ABE /bin/rmdir: directory "/tmp/.liveupgrade.num1.num2/.alt.luactivate": \ Directory is a mount point or in use Generating partition and slice \ information for ABE /tmp/.liveupgrade.num1.num2/.alt.luactivate i?1/4About Veritas Storage Foundation and High Availability Solutions 77 Known Issues ERROR: The target boot environment root device \ is already mounted. ERROR: The root slice <> of the target boot environment \ is not available. rm: Unable to remove directory /tmp/.liveupgrade.num1.num2/.\ alt.luactivate: Device busy rm: Unable to remove directory /tmp/.liveupgrade.num1.num2: File exists ERROR: vxlufinish Failed: luactivate dest.num3 WORKAROUND:: This is an issue with Oracle Solaris 10 update 11 (x86). After installing OS patch 121430-88 or later issue resolved. * INCIDENT NO::3399012 TRACKING ID ::3422185 SYMPTOM:: Reclamation of storage fails with the following error: Reclaiming storage on: Disk : Failed. Failed to reclaim /, / and . Where is the file system mount point. is the name of the disk. The error occurs in the following scenario: A disk group is created of thin disks and some volumes are created of different layouts on this disk group. A volume set is created of these volumes and a file system is mounted over it. When some volumes of this volume set are removed, reclamation fails. WORKAROUND:: None. * INCIDENT NO::3404678 TRACKING ID ::3405223 SYMPTOM:: SFHA provides a Dynamic Reconfiguration tool to simplify online dynamic reconfiguration of a LUN. The Dynamic Reconfiguration tool is not supported if SFHA is running inside the guest domain for Oracle VM Server for SPARC. WORKAROUND:: None. * INCIDENT NO::3445201 TRACKING ID ::2705055 SYMPTOM:: If Change in naming scheme in case of some disks under disk group is not accessible will introduce duplicate disk access(da) entry on same node WORKAROUND:: Following work around will resolve duplicate disk access(da) record just after disk accessible on same node . 1. vxdisk rm 2. vxdisk scandisks * INCIDENT NO::3483447 TRACKING ID ::3418222 SYMPTOM:: On Oracle Solaris 11, the /etc/hosts file is not updated to map the system's nodename to one of the system's non-loopback IP address. Instead, the host name is mapped to a system's IPv4 and IPv6 addresses. For example: ::1 foobar localhost 127.0.0.1 foobar loghost localhost This is a Solaris 11 issue. For details, see http://docs.oracle.com/cd/E23824_01/html/E24456/gllcs.html. WORKAROUND:: Manually modify the /etc/hosts file as follows: :1 localhost 127.0.0.1 loghost localhost 129.148.174.232 foobar FIXED INCIDENTS: ---------------- PATCH ID:6.0.500.000 * INCIDENT NO:2885171 TRACKING ID:2882566 SYMPTOM: You can successfully add a disk to another disk group, which is earlier removed from a different disk group using the vxdg rmdisk -k command. DESCRIPTION: If the -k option is specified, the disk media records are kept, even if the disk media records are in the removed state and the subdisk records still point to them. The subdisks and any plex that refer to them remain unusable until the disk is re-added using the adddisk command with the -k option. VxVM disables volumes when all plexes become unusable. The -k option used with rmdisk is designed to enable you to re-add the disk back to the same disk group. RESOLUTION: The adddisk command is enhanced to detect such a situation and not allow adding a disk which is previously removed from a disk group using vxdg rmdisk -k to another disk group. * INCIDENT NO:2941224 TRACKING ID:2921816 SYMPTOM: In a VVR environment, if there is Storage Replicator Log (SRL) overflow then the Data Change Map (DCM) logging mode is enabled. For such instances, if there is an I/O failure on the DCM volume then the system panics with the following stack trace: vol_dcm_set_region() vol_rvdcm_log_update() vol_rv_mdship_srv_done() volsync_wait() voliod_loop() ... DESCRIPTION: There is a race condition where the DCM information is accessed at the same time when the DCM I/O failure is handled. This results in panic. RESOLUTION: The code is modified to handle the race condition. * INCIDENT NO:2957567 TRACKING ID:2957556 SYMPTOM: The vxdisksetup(1M) command fails when the tpdmode attribute is set to native and the enclosure-based naming scheme is on. DESCRIPTION: When the tpdmode attribute is set to native and the enclosure-based naming scheme is on, slicing information added to the device name is not correct, which leads to the "Overlapping partitions detected" error. RESOLUTION: The code is modified to ensure that the slicing information can be added properly. * INCIDENT NO:2960654 TRACKING ID:2932214 SYMPTOM: After the "vxdisk resize" operation is performed from less than 1 TB to greater than or equal to 1 TB on a disk with SIMPLE or SLICED format, that has the Sun Microsystems Incorporation (SMI) label, the disk enters the "online invalid" state. DESCRIPTION: When the SIMPLE or SLICED disk, which has the Sun Microsystems Incorporation (SMI) label, is resized from less than 1 TB to greater than or equal to 1 TB by "vxdisk resize" operation, the disk will show the "online invalid" state. RESOLUTION: The code is modified to prevent the resize of the SIMPLE or SLICED disks with the SMI label from less than 1 TB to greater than or equal to 1 TB. * INCIDENT NO:2974602 TRACKING ID:2986596 SYMPTOM: The disk groups imported with mix of standard and clone Logical Unit Numbers (LUNs) may lead to data corruption. DESCRIPTION: The vxdg(1M) command import operation should not allow mixing of clone and non- clone LUNs since it may result in data corruption if the clone copy is not up- to-date. vxdg(1M) import code was going ahead with clone LUNs when corresponding standard LUNs were unavailable on the same host. RESOLUTION: The code is modified for the vxdg(1M) command import operation, so that it does not pick up the clone disks in above case and prevent mix disk group import. The import fails if partial import is not allowed based on other options specified during the import. * INCIDENT NO:2999881 TRACKING ID:2999871 SYMPTOM: The vxinstall(1M) command gets into a hung state when it is invoked through Secure Shell (SSH) remote execution. DESCRIPTION: The vxconfigd process which starts from the vxinstall script fails to close the inherited file descriptors, causing the vxinstall to enter the hung state. RESOLUTION: The code is modified to handle the inherited file descriptors for the vxconfigd process. * INCIDENT NO:3022349 TRACKING ID:3052770 SYMPTOM: On little endian systems, vradmin syncrvg operation failed if RVG includes a volume set. DESCRIPTION: The operation failed due to different memory read convention on little-endian machines than big-endian machines. RESOLUTION: Operation for little-endian machines also has been taken care. * INCIDENT NO:3032358 TRACKING ID:2952403 SYMPTOM: In a four node cluster if all storage connected to a disk group are removed from master, then the disk group destroy command fails. DESCRIPTION: During initial phase of the destroy operation, the disk association with disk group's is removed. When an attempt is made to clean up disk headers, the IO shipping does not happen as disk(s) does not belong to any disk group. RESOLUTION: The code is modified to save the shared disk group association until the disk group destroy operation is completed. The information is used to determine whether IO shipping can be used to complete disk header updates during disk group destroy operation. * INCIDENT NO:3033904 TRACKING ID:2308875 SYMPTOM: vxddladm(1M) list command options (hbas, ports, targets) don't display correct values for the state attribute. DESCRIPTION: In some cases, VxVM doesn't use device names with correct slice information, which leads to the vxddladm(1M) list command options (hbas, ports, targets) not displaying the correct values for the state attribute. RESOLUTION: The code is modified to use the device name with appropriate slice information. * INCIDENT NO:3036949 TRACKING ID:3045033 SYMPTOM: "vxdg init" should not create a disk group on clone disk that was previously part of a disk group. DESCRIPTION: If the disk contains copy of data from some other VxVM disk, attempt to add or initiate that disk with a new disk group (dg) should fail. Before it is added to the new dg, the disk should be explicitly cleaned up of the clone flag and udid flag on the disk by using the following command: vxdisk -c updateudid RESOLUTION: The code is modified to fail the "vxdg init" command or add operation on a clone disk that was previously part of a disk group. * INCIDENT NO:3042507 TRACKING ID:2101093 SYMPTOM: On the Solaris operating system, the system panics when the dmp_signal_event() function is called. Additionally, the system logs the following stack trace: dmp_signal_event_daemon dmp_signal_vold dmp_throttle_paths dmp_process_stats dmp_daemons_loop thread_start DESCRIPTION: In the Solaris 2.6 operating environment, the drv_getparm() function is replaced with ddi_get_lbolt(9F), ddi_get_time(9F), and ddi_get_pid(9F) functions. However, when the Veritas Volume Manager (VxVM) code attempts to call the nonexistent drv_getparm(9F) function, the system panics. RESOLUTION: The VxVM code is modified to enable VxVM to refer to the ddi_get_lbolt(9F)function, ddi_get_time(9F), and ddi_get_pid(9F) functions. * INCIDENT NO:3043206 TRACKING ID:3038684 SYMPTOM: The restore daemon attempts to re-enable disabled paths of the Business Continuance Volume - Not Ready (BCV-NR) devices, logging many DMP messages as follows: VxVM vxdmp V-5-0-148 enabled path 255/0x140 belonging to the dmpnode 3/0x80 VxVM vxdmp V-5-0-112 disabled path 255/0x140 belonging to the dmpnode 3/0x80 DESCRIPTION: The restore daemon tries to re-enable a disabled path of a BCV-NR device as the probe passes. But the open() operation fails on such devices as no I/O operations are permitted and the path is disabled. There is a check to prevent enabling the path of the device if the open() operation fails. Because of the bug in the open check, it incorrectly tries to re-enable the path of the BCV-NR device. RESOLUTION: The code is modified to do an open check on the BCV-NR block device. * INCIDENT NO:3049356 TRACKING ID:3060327 SYMPTOM: As a part of initial synchronization using Smart Autosync, the vradmin repstatus(1M) command shows incorrect status of Data Change Map (DCM): root@hostname#vradmin -g dg1 repstatus rvg Replicated Data Set: rvg Primary: Host name: primary ip RVG name: rvg DG name: dg1 RVG state: enabled for I/O Data volumes: 1 VSets: 0 SRL name: srl SRL size: 1.00 G Total secondaries: 1 Secondary: Host name: primary ip RVG name: rvg DG name: dg1 Data status: inconsistent Replication status: resync in progress (smartsync autosync) Current mode: asynchronous Logging to: DCM (contains 0 Kbytes) (autosync) Timestamp Information: N/A The issue is specific to configurations in which primary data volumes have Veritas File System (VxFS) mounted. DESCRIPTION: The DCM status is not correctly retrieved and displayed when the Smartmove utility is being used for Autosync. RESOLUTION: Handle the Smartmove case for the vradmin repstatus(1M) command. * INCIDENT NO:3094185 TRACKING ID:3091916 SYMPTOM: In VCS cluster environment, the syslog overflows with following the Small Computer System Interface (SCSI) I/O error messages: reservation conflict Unhandled error code Result: hostbyte=DID_OK driverbyte=DRIVER_OK CDB: Write(10): 2a 00 00 00 00 90 00 00 08 00 reservation conflict VxVM vxdmp V-5-0-0 [Error] i/o error occurred (errno=0x5) on dmpnode 201/0x60 Buffer I/O error on device VxDMP7, logical block 18 lost page write due to I/O error on VxDMP7 DESCRIPTION: In VCS cluster environment when the private disk group is flushed and deported on one node, some I/Os on the disk are cached as the disk writes are done asynchronously. Importing the disk group immediately ater with PGR keys causes I/O errors on the previous node as the PGR keys are not reserved on that node. RESOLUTION: The code is modified to write the I/Os synchronously on the disk. * INCIDENT NO:3144764 TRACKING ID:2398954 SYMPTOM: The system panics while doing I/O on a Veritas File System (VxFS) mounted instant snapshot with the Oracle Disk Manager (ODM) SmartSync enabled. The following stack trace is observed: panic: post_hndlr(): Unresolved kernel interruption cold_vm_hndlr bubbledown as_ubcopy privlbcopy volkio_to_kio_copy vol_multistepsio_overlay_data vol_multistepsio_start voliod_iohandle voliod_loop kthread_daemon_startup DESCRIPTION: Veritas Volume Manager (VxVM) uses the fields av_back and av_forw of io buf structure to store its private information. VxFS also uses these fields to chain I/O buffers before passing I/O to VxVM. When an I/O is received at VxVM layer it always resets these fields. But if the ODM SmartSync is enabled, VxFS uses a special strategy routine to pass on hints to VxVM. Due to a bug in the special strategy routine, the av_back and av_forw fields are not reset and points to a valid buffer in VxFS I/O buffer chain. VxVM interprets these fields (av_back, av_forw) wrongly and modifies its contents which in turn corrupts the next buffer in the chain leading to the panic. RESOLUTION: The av_back and av_forw fields of io buf structure are reset in the special strategy routine. * INCIDENT NO:3157342 TRACKING ID:3116990 SYMPTOM: When you run VxVM commands such as 'vxdisk scandisks', 'vxdctl enable', and 'vxconfid restart' on write protected hardware mirror LUNs, the following error messages are displayed in the console logs: [..] scsi: [ID 107833 kern.warning] WARNING:/pci@400/pci@2/pci@0/pci@1/pci@0/pci@2/SUNW,qlc@0/fp@0,0/ssd@w50060e8016 02ba15,21 (ssd315): Error for Command: write Error Level: Fatal scsi: [ID 107833 kern.notice] Requested Block: 0 Error Block: 0 scsi: [ID 107833 kern.notice] Sense Key: Write_Protected scsi: [ID 107833 kern.notice] ASC: 0x22 (illegal function), ASCQ: 0x0, FRU: 0x0 scsi: [ID 107833 kern.warning] [..] DESCRIPTION: As a part of the VxVM commands, the disk online operation is run on write protected disks. Online operation updates the disk label in case if the geometry information in the disk label is stale or if the disk label is absent. Such writes on the disk causes SCSI driver to report errors for write protected LUNs. RESOLUTION: The code is modified within the VxVM to inform SCSI to prevent these error messages. * INCIDENT NO:3189041 TRACKING ID:3130353 SYMPTOM: Disabled and enabled path messages are displayed continuously on the console for the EMC NR (Not Ready) devices: I/O error occurred on Path hdisk139 belonging to Dmpnode emc1_1f2d Disabled Path hdisk139 belonging to Dmpnode emc1_1f2d due to path failure Enabled Path hdisk139 belonging to Dmpnode emc1_1f2d I/O error occurred on Path hdisk139 belonging to Dmpnode emc1_1f2d Disabled Path hdisk139 belonging to Dmpnode emc1_1f2d due to path failure DESCRIPTION: As part of the device discovery, DMP marks the paths belonging to the EMC NR devices as disabled, so that they are not used for I/O. However, the DMP- restore logic, which issues inquiry on the disabled path brings the NR device paths back to the enabled state. This cycle is repetitive, and as a result the disabled and enabled path messages are seen continuously on the console. RESOLUTION: The DMP code is modified to specially handle the EMC NR devices, so that they are not disabled/enabled repeatedly. This means that we are not just suppressing the messages, but we are also handling the devices in a different manner. * INCIDENT NO:3195695 TRACKING ID:2954455 SYMPTOM: When a pattern is specified to vxdiskadm to match a range of Buns for removal, the pattern is matched erroneously. DESCRIPTION: While using the Dynamic Reconfiguration operation to remove logical unit numbers (LUNs) in vxdiskadm, if a pattern is specified to match a range of disks, the pattern matching is erroneous. And the operation fails subsequently. For example: If the range specified is "emc0_0738--emc0_0740", the matched pattern ignores the leading zero and matches emc0_738 instead of emc0_0738:<< VxVM vxdmpadm ERROR V-5-1-14053 Failed to get subpaths from emc0_739 .. VxVM vxdmpadm ERROR V-5-1-14053 Failed to get subpaths from emc0_740 VxVM vxdisk ERROR V-5-1-558 Disk emc0_740: Disk not in the configuration VxVM vxdmpadm ERROR V-5-1-2268 emc0_740 is not a valid dmp node name RESOLUTION: The pattern matching logic is enhanced to account for the leading zeros. * INCIDENT NO:3197460 TRACKING ID:3098559 SYMPTOM: In clustered environment if there exist standard disk group (dg) and clone dg, the Slave will import clone dg when standard dg import is triggered cluster wide if LUNS of standard dg are disconnected from slave. This will cause corruption. DESCRIPTION: If there are deported cloned and standard dg in Clustered Volume Manager (CVM), and disks of standard dg are not accessible from slave, then the original dg import that is triggered imports cloned dg on that slave and original dg on other nodes. CFS gets mounted with data corruption. RESOLUTION: The code is modified to detect proper disks for dg import. * INCIDENT NO:3254133 TRACKING ID:3240858 SYMPTOM: File '/etc/vx/vxesd/.udev_lock' might have different permissions at different instances. DESCRIPTION: ESD_UDEV_LOCKFILE is opened/created by both vxesd and vxesd_post_event support utility called by vxvm-udev rule. Since the open permission mode is not set, the mode gets inherited from the calling process. As a result, ESD_UDEV_LOCKFILE might have different permissions in different situations, which is undesirable. RESOLUTION: Code changes have been made to use uniform and defined set of permissions at all instances. * INCIDENT NO:3254199 TRACKING ID:3015181 SYMPTOM: I/O can hang on all the nodes of a cluster when the complete non-Active/Active (A/A) class of the storage is disconnected. The problem is only CVM specific. DESCRIPTION: The issue occurs because the CVM-DMP protocol does not progress any further when the 'ioctls' on the corresponding DMP 'metanodes' fail. As a result, all hosts hold the I/Os forever. RESOLUTION: The code is modified to complete the CVM-DMP protocol when any of the 'ioctls' on the DMP 'metanodes' fail. * INCIDENT NO:3254201 TRACKING ID:3121380 SYMPTOM: IO hang observed on the primary after disabling paths for one data volume in the RVG. Stack trace looks like: biowait default_physio volrdwr fop_write write syscall_trap32 DESCRIPTION: If the path of data volume is disabled when SRL is about to overflow, then it resulted in a internal data structure corruption. This resulted in a IO hang. RESOLUTION: Modified the code to handle IO error because of path disabled at the time of SRL overflow. * INCIDENT NO:3254204 TRACKING ID:2882312 SYMPTOM: The Storage Replicator Log (SRL) faults in the middle of the I/O load. An immediate read on data that is written during the SRL fault may return old data. DESCRIPTION: In case of an SRL fault, the Replicated Volume Group (RVG) goes into the passthrough mode. The read/write operations are directly issued on the data volume. If the SRL is faulted while writing, and a read command is issued immediately on the same region, the read may return the old data. If a write command fails on the SRL, then the VVR acknowledges the write completion and places the RVG in the passthrough mode. The data-volume write is done asynchronously after acknowledging the write completion. If the read comes before the data volume write is finished, then it can return old data causing data corruption. It is race condition between write and read during the SRL failure. RESOLUTION: The code is modified to restart the write in case of the SRL failure and it does not acknowledge the write completion. When the write is restarted the RVG will be in passthrough mode, and it is directly issued on the data volume. Since, the acknowledgement is done only after the write completion, any subsequent read gets the latest data. * INCIDENT NO:3254205 TRACKING ID:3162418 SYMPTOM: The vxconfigd(1M) command dumps core when VxVM(Veritas Volume Manager) tries to find certain devices by their device numbers. The stack might look like follow: ddi_hash_devno() ddl_find_cdevno() ddl_find_path_cdevno() req_daname_get() vold_process_request() start_thread() DESCRIPTION: When 'vxdisk scandisks' fails to discover devices, the device tree is emptied. Incorrect validating in device searching procedure causes NULL value dereference. RESOLUTION: Code changes are made to correctly detect the NULL value. * INCIDENT NO:3254231 TRACKING ID:3010191 SYMPTOM: Previously excluded paths are not excluded after upgrade to VxVM 5.1SP1RP3. DESCRIPTION: Older version of VxVM maintains the logical path for excluded devices while current version maintains the hardware path for excluded devices. So before update, when the device paths are excluded from volume manager, the excluded- path entries are not recognized by the volume manager and this leads to inconsistency. RESOLUTION: To resolve this issue, the exclude logic code is modified to resolve the logical/hardware path from the exclude file. * INCIDENT NO:3254233 TRACKING ID:3012929 SYMPTOM: When a disk name is changed when a backup operation is in progress, the vxconfigbackup(1M) command gives the following error: VxVM vxdisk ERROR V-5-1-558 Disk : Disk not in the configuration VxVM vxconfigbackup WARNING V-5-2-3718 Unable to backup Binary diskgroup configuration for diskgroup . DESCRIPTION: If disk names change during the backup, the vxconfigbackup(1M) command does not detect and refresh the changed names and it tries to find the configuration database information from the old diskname. Consequently, the vxconfigbackup (1M) command displays an error message indicating that the old disk is not found in the configuration and it fails to take backup of the disk group configuration from the disk. RESOLUTION: The code is modified to ensure that the new disk names are updated and are used to find and backup the configuration copy from the disk. * INCIDENT NO:3254301 TRACKING ID:3199056 SYMPTOM: VVR(Veritas Volume Replicator) Primary system panics with the following stack: panic_trap kernel_add_gate vol_cmn_err .kernel_add_gate skey_kmode nmcom_deliver_ack nmcom_ack_tcp nmcom_server_proc_tcp nmcom_server_proc_enter vxvm_start_thread_enter DESCRIPTION: If primary receives Data acknowledgement prior to the network acknowledgement, VVR fabricates the network acknowledgement for the message and keeps the acknowledgement in a queue. When the real network acknowledgement arrives at primary, VVR removes the acknowledgement from the queue. Only one thread is supposed to access this queue. However, because of improper locking, there is a race where two threads can simultaneously update the queue causing queue corruption. System panic happens when accessing the corrupted queue. RESOLUTION: Code is modified to take the proper lock before entering the critical region. * INCIDENT NO:3261607 TRACKING ID:3261601 SYMPTOM: System panics when the dmp_destroy_dmpnode() attempts to free an already free virtual address and displays the following stack trace: mt_pause_trigger+0x10 () cold_wait_for_lock+0xc0 () spinlock_usav+0xb0 () kmem_arena_free+0xd0 () vxdmp_hp_kmem_free+0x30 () dmp_destroy_dmpnode+0xaa0 () dmp_decode_destroy_dmpnode+0x430 () dmp_decipher_instructions+0x650 () dmp_process_instruction_buffer+0x350 () dmp_reconfigure_db+0xb0 () gendmpioctl+0x910 () dmpioctl+0xe0 () DESCRIPTION: Due to a race condition in the code when the dmp_destroy_dmpnode()attempts to free an already free virtual address, system panics because it tries to access a stale memory address. RESOLUTION: The code is fixed to avoid the occurrence of race condition. * INCIDENT NO:3264166 TRACKING ID:3254311 SYMPTOM: In a Campus Cluster environment, either a manual detach or a detach because of storage connectivity followed by a site reattach, leads to a system panic. The stack trace might look like: ... voldco_or_acmbuf_to_pvmbuf+0x134() voldco_recover_detach_map+0x6e5() volmv_recover_dcovol+0x1ce() vol_mv_precommit+0x19f() vol_commit_iolock_objects+0x100() vol_ktrans_commit+0x23c() volconfig_ioctl+0x436() volsioctl_real+0x316() volsioctl+0x14() ... DESCRIPTION: When a site is reattached, possibly after a split-brain, it is possible that a site-consistent volume is updated on each site independently. In such a case, the tracking maps need to be recovered from each site to take care of updates done from both the sites. These maps are stored in a Data Change Object (DCO). During the recovery of a DCO, it uses a contiguous chunk of memory to read and update the DCO map. This chunk of memory is able to handle the DCO recovery as long as the volume size is less than 1.05 TB. When the volume size is larger than 1.05TB, the map size grows larger than the statically allocated memory buffer. In such a case, it overruns the memory buffer, thus leading to the system panic. RESOLUTION: The code is modified to ensure that the buffer is accessed within the limits and if required, another iteration of the DCO recovery is done. * INCIDENT NO:3271596 TRACKING ID:3271595 SYMPTOM: When a volume on a thin reclaimable disk is deleted, and if the thin reclaim flag is removed from the disk which hosted this volume, on attempt to remove such disk from disk group following error is displayed: # vxdg -g rmdisk VxVM vxdg ERROR V-5-1-0 Disk is used by one or more subdisks which are pending to be reclaimed. Use "vxdisk reclaim " to reclaim space used by these subdisks, and retry "vxdg rmdisk" command. Note: reclamation is irreversible. Attempt to reclaim disk failed with following error: to reclaim the disk using following command. # vxdisk reclaim Disk : Failed. DESCRIPTION: When the volume on a thin reclaimable disk is deleted, the underlying disk is marked for reclaimation. Due to manual removal of the thin reclaim flag, the reclaimation cannot proceed and the pending subdisks associated with this disk cannot be removed. RESOLUTION: The code is modified such that any attempt to turn off thin reclaim flag manually fails on disks which have pending subdisks for reclaimation. * INCIDENT NO:3306164 TRACKING ID:2972513 SYMPTOM: In CVM, PGR keys from shared data disks are not removed after stopping VCS. DESCRIPTION: In clustered environment with fencing enabled, for slave node, improper PGR key registration was happening. Due to this when hastop was issued on master and then on slave, slave was not able to clear PGR keys on disk. For example below PGR key mismatch will be seen on reading disk keys key[0]: [Numeric Format]: 66,80,71,82,48,48,48,0 [Character Format]: BPGR000 Use only the numeric format to perform operations. The key has null characters which are represented as spaces in the character format. [Node Format]: Cluster ID: unknown Node ID: 1 Node Name: sles92219 key[1]: [Numeric Format]: 65,80,71,82,48,48,48,49 [Character Format]: APGR0001 [Node Format]: Cluster ID: unknown Node ID: 0 Node Name: sles92218 sles92219:~ # RESOLUTION: Code changes are done to correctly register PGR keys on slave. * INCIDENT NO:3309931 TRACKING ID:2959733 SYMPTOM: When the device paths are moved across LUNs or enclosures, the vxconfigd(1M) daemon can dump core, or the data corruption can occur due to the internal data structure inconsistencies. The following stack trace is observed: ddl_reconfig_partial () ddl_reconfigure_all () ddl_find_devices_in_system () find_devices_in_system () req_discover_disks () request_loop () main () DESCRIPTION: When the device path configuration is changed after a planned or unplanned disconnection by moving only a subset of the device paths across LUNs or the other storage arrays (enclosures), DMP's internal data structure is inconsistent. This causes the vxconfigd(1M) daemon to dump core. Also, for some instances the data corruption occurs due to the incorrect LUN to the path mappings. RESOLUTION: The vxconfigd(1M) code is modified to detect such situations gracefully and modify the internal data structures accordingly, to avoid a vxconfigd(1M) daemon core dump and the data corruption. * INCIDENT NO:3344127 TRACKING ID:2969844 SYMPTOM: The DMP database gets destroyed if the discovery fails for some reason. "ddl.log shows numerous entries as follows: DESTROY_DMPNODE: 0x3000010 dmpnode is to be destroyed/freed DESTROY_DMPNODE: 0x3000d30 dmpnode is to be destroyed/freed Numerous vxio errors are seen in the syslog as all VxVM I/O's fail afterwards. DESCRIPTION: VxVM deletes the old device database before it makes the new device database. If the discovery process fails for some reason, this results in a null DMP database. RESOLUTION: The code is modified to take a backup of the old device database before doing the new discovery. Therefore, if the discovery fails we restore the old database and display the appropriate message on the console. * INCIDENT NO:3344128 TRACKING ID:2643506 SYMPTOM: vxconfigd dumps core when LUNs from the same enclosure are presented as different types, say A/P and A/P-F. DESCRIPTION: The VxVM configuration daemon, vxconfigd(1M) dumps core because Dynamic Muulti-Pathing (DMP) does not support the set up where LUNs from the same enclosure are configured as different types. RESOLUTION: The code is modified to ensure that the user receives a warning message when this situation arises. Example- Enclosure with cabinet serial number CK200070800815 has LUNs of type CLR-ALUA and CLR-A/PF. Enclosures having more than one array type is not supported. * INCIDENT NO:3344129 TRACKING ID:2910367 SYMPTOM: In a VVR environment, when Storage Replicator Log (SRL) is inaccessible or after the paths to the SRL volume of the secondary node are disabled, the secondary node panics with the following stack trace: __bad_area_nosemaphore vsnprintf page_fault vol_rv_service_message_start thread_return sigprocmask voliod_iohandle voliod_loop kernel_thread DESCRIPTION: The SRL failure in handled differently on the primary node and the secondary node. On the secondary node, if there is no SRL, replication is not allowed and Rlink is detached. The code region is common for both, and at one place flags are not properly set during the transaction phase. This creates an assumption that the SRL is still connected and tries to access the structure. This leads to the panic. RESOLUTION: The code is modified to mark the necessary flag properly in the transaction phase. * INCIDENT NO:3344130 TRACKING ID:2825102 SYMPTOM: In a CVM environment, some or all VxVM volumes become inaccessible on the master node. VxVM commands on the master node as well as the slave node(s) hang. On the master node, vxiod and vxconfigd sleep and the following stack traces is observed: "vxconfigd" on master : sleep_one vol_ktrans_iod_wakeup vol_ktrans_commit volconfig_ioctl volsioctl_real volsioctl vols_ioctl spec_ioctl vno_ioctl ioctl syscall "vxiod" on master : sleep vxvm_delay cvm_await_mlocks volmvcvm_cluster_reconfig_exit volcvm_master volcvm_vxreconfd_thread DESCRIPTION: VxVM maintains a list of all the volume devices in volume device list. This list can be corrupted by simultaneous access from CVM reconfiguration code path and VxVM transaction code path. This results in inaccessibility of some or all the volumes. RESOLUTION: The code is modified to avoid simultaneous access to the volume device list from the CVM reconfiguration code path and the VxVM transaction code path. * INCIDENT NO:3344132 TRACKING ID:2860230 SYMPTOM: In a Cluster Volume Manager (CVM) environment, the shared disk remains as opaque after execution of vxdiskunsetup(1M)command on a master node. DESCRIPTION: In a Cluster Volume Manager (CVM) environment, if a disk group has opaque disks and the disk group is destroyed on the master node followed by execution of vxdiskunsetup(1M) command, then the slave still views disk as opaque. RESOLUTION: The code is modified to ensure that the slave receives signal to remove opaque disk. * INCIDENT NO:3344134 TRACKING ID:3011405 SYMPTOM: Execution of "vxtune -o export" command fails with the following error message: "VxVM vxtune ERROR Unable to rename temp file to .Cross-device link VxVM vxtune ERROR Unable to export tunables". DESCRIPTION: During the export of componentorfeature specific tunables, initially, all the VxVM tunables are dumped into the file provided by the user. The component or the feature tunables are then extracted into temporary file. The temporary file is then renamed to the file provided by the user. If the file provided by the user is mounted on a different file system, instead of the root file system or /etc/, then the renaming fails because the links are on different file systems. RESOLUTION: The code is modified to dynamically create the temporary file on the same directory instead of hard coding the temporary file where user wants to have the exported file. In this way, cross-device link does not occur and the 'rename' operation does not fail. * INCIDENT NO:3344138 TRACKING ID:3041014 SYMPTOM: Sometimes a "relayout" command may fail with following error messages with not much information: 1. VxVM vxassist ERROR V-5-1-15309 Cannot allocate 4294838912 blocks of disk space required by the relayout operation for column expansion: Not enough HDD devices that meet specification. VxVM vxassist ERROR V-5-1-4037 Relayout operation aborted. (7) 2. VxVM vxassist ERROR V-5-1-15312 Cannot allocate 644225664 blocks of disk space required for the relayout operation for temp space: Not enough HDD devices that meet specification. VxVM vxassist ERROR V-5-1-4037 Relayout operation aborted. (7) DESCRIPTION: In some executions of the vxrelayout(1M) command, the error messages do not provide sufficient information. For example, when enough space is not available, the vxrelayout(1M) command displays an error where it mentions less disk space available than required. Hence the "relayout" operation can still fail when the disk space is increased. RESOLUTION: The code is modified to display the correct space required for the "relayout" operation to complete successfully. * INCIDENT NO:3344140 TRACKING ID:2966990 SYMPTOM: In a VVR environment, the I/O hangs at the primary side after multiple cluster reconfigurations are triggered in parallel. The stack trace is as following: delay vol_rv_transaction_prepare vol_commit_iolock_objects vol_ktrans_commit volconfig_ioctl volsioctl_real volsioctl fop_ioctl ioctl DESCRIPTION: With I/O on the master node and the slave node, rebooting the slave node triggers the cluster reconfiguration, which in turn triggers the RVG recovery. Before the reconfiguration is complete, the slave node joins back again, which interrupts the leave reconfiguration in the middle of the operation. The node join reconfiguration does not trigger any RVG recovery, and so the recovery is skipped. The regular I/Os wait for the recovery to be completed. This situation leads to a hang. RESOLUTION: The code is modified such that the join reconfiguration does the RVG recovery, if there are any pending RVG recoveries. * INCIDENT NO:3344142 TRACKING ID:3178029 SYMPTOM: When you synchronize a replicated volume group (RVG), the diff string is over 100%, with output like the following: [2013-03-12 15:33:48] [17784] 03:07:35 180.18.18.161 swmdb_data_ swmdb_data_ 1379M/3072M 860% 4% DESCRIPTION: The value of 'different blocks' is defined as the 'unsigned long long' type, but the statistic function defines it as the 'int" type, therefore the value is truncated and it causes the incorrect output. RESOLUTION: The code is modified so that the 'unsigned long long' type is not defined by the statistic function as the integer type. * INCIDENT NO:3344143 TRACKING ID:3101419 SYMPTOM: In a CVR environment, I/Os to the data volumes in an RVG may temporarily experience a hang during the SRL overflow with the heavy I/O load. DESCRIPTION: The SRL flush occurs at a slower rate than the incoming I/Os from the master node and the slave nodes. I/Os initiated on the master node get starved for a long time, this appears like an I/O hang. The I/O hang disappears once the SRL flush is complete. RESOLUTION: The code is modified to provide a fair schedule for the I/Os to be initiated on the master node and the slave nodes. * INCIDENT NO:3344145 TRACKING ID:3076093 SYMPTOM: The patch upgrade script "installrp" can panic the system while doing a patch upgrade. The panic stack trace looks is observed as following: devcclose spec_close vnop_close vno_close closef closefd fs_exit kexitx kexit DESCRIPTION: When an upgrade is performed, the VxVM device drivers are not loaded, but the patch-upgrade process tries to start or stop the eventsource (vxesd) daemon. This can result in a system panic. RESOLUTION: The code is modified so that the eventsource (vxesd) daemon does not start unless the VxVM device drivers are loaded. * INCIDENT NO:3344148 TRACKING ID:3111062 SYMPTOM: When diffsync is executed, vxrsync gets the following error in lossy networks: VxVM VVR vxrsync ERROR V-5-52-2074 Error opening socket between [HOST1] and [HOST2] -- [Connection timed out] DESCRIPTION: The current socket connection mechanism gives up after a single try. When the single attempt to connect fails, the command fails as well. RESOLUTION: The code is modified to retry the connection for 10 times. * INCIDENT NO:3344150 TRACKING ID:2992667 SYMPTOM: When the framework for SAN of VIS is changed from FC-switcher to the direct connection, the new DMP disk cannot be retrieved by running the "vxdisk scandisks" command. DESCRIPTION: Initially, the DMP node had multiple paths. Later, when the framework for SAN of VIS is changed from the FC switcher to the direct connection, the number of paths of each affected DMP node is reduced to 1. At the same time, some new disks are added to the SAN. Newly added disks are reused by the device number of the removed devices (paths). As a result, the "vxdisk list" command does not show the newly added disks even after the "vxdisk scandisks" command is executed. RESOLUTION: The code is modified so that DMP can handle the device number reuse scenario in a proper manner. * INCIDENT NO:3344161 TRACKING ID:2882412 SYMPTOM: The 'vxdisk destroy' command uninitializes a disk which belongs to a deported disk group. DESCRIPTION: The 'vxdisk destroy' command does not check whether the Veritas Volume Manager (VxVM) disk belongs to a deported disk group. This can lead to accidental uninitialization of such disk. RESOLUTION: The code is modified to prevent destruction of the disk which belongs to a deported disk group. The 'vxdisk destroy' command can now prompt you that the disk is already part of another disk group and then fail the operation. However, you can still force destroy the disk by using the -f option. * INCIDENT NO:3344167 TRACKING ID:2979824 SYMPTOM: While excluding the controller using the vxdiskadm(1M) utility, the unintended paths get excluded DESCRIPTION: The issue occurs due to a logical error related to the grep command, when the hardware path of the controller to be retrieved is excluded. In some cases, the vxdiskadm(1M) utility takes the wrong hardware path for the controller that is excluded, and hence excludes unintended paths. Suppose there are two controllers viz. c189 and c18 and the controller c189 is listed above c18 in the command, and the controller c18 is excluded, then the hardware path of the controller c189 is passed to the function and hence it ends up excluding the wrong controller. RESOLUTION: The script is modified so that the vxdiskadm(1M) utility now takes the hardware path of the intended controller only, and the unintended paths do not get excluded. * INCIDENT NO:3344175 TRACKING ID:3114134 SYMPTOM: The Smart (sync) Autosync feature fails to work on large volume (size > 1TB) and instead replicates entire volume. DESCRIPTION: Veritas File System (VxFS) reports data in-use for 1-MB chunks only whereas VVR operates on lower block size. Thus, even if the 8k block in 1MB chunk is dirty, VxFS reports the entire 1MB as data-in-use. Thereby, VVR replicates the entire 1MB. RESOLUTION: The code is modified such that VVR-VxFS integration now handles chunk less than 1MB. * INCIDENT NO:3344272 TRACKING ID:2106530 SYMPTOM: The vxresize(1M) command fails on the data volume in an encapsulated root disk, say bootdg, if the file system on that data volume is mounted using block device reference as "bootdg". DESCRIPTION: If the file system in an encapsulated root disk, say rootdg, is mounted using the block device reference as "bootdg" and vice-verse, then the vxresize(1M) command fails on the data volume in that encapsulated root disk. VxVM ignores this reference and fails to execute the operation. RESOLUTION: The code has been changed so that the referencing information is obtained and the necessary steps for the vxresize(1M) operation are performed. * INCIDENT NO:3344273 TRACKING ID:2165920 SYMPTOM: The vxrelocd(1M) daemon creates a defunct (zombie) process. DESCRIPTION: The issue is seen when vxrelocd(1M) daemon is waiting for its child, the vxnotify(1M) daemon to complete a process and vxnotify(1M), in turn, does not get the exit status of its child process. As the child process is forked in the background, it causes a defunct child process of the vxnotify(1M) daemon. This defunct child process is visible in the process tree of the vxrelocd(1M) daemon. RESOLUTION: The code is modified such that the background process is forked only when it is necessary. * INCIDENT NO:3344274 TRACKING ID:3185471 SYMPTOM: The private disk groups with iSCSI disks are imported by wrong node. The disks with noautoimport flag are imported if the host discovers iSCSI devices during system startup. DESCRIPTION: The iSCSI disk discovery function does not validate hostID. Incorrect command is used to get the noautoimport flag. RESOLUTION: The code has been changed to check the hostID and get noautoimport flag if the host discovers iSCSI devices during the system startup. * INCIDENT NO:3344275 TRACKING ID:3038382 SYMPTOM: The vxlufinish(1M) command runs 'fuser -k' on non-root file systems, which is unexpected. DESCRIPTION: The lucreate(1M) command called via vxlustart adds entries of mount points (which are not from boot disk) to vfstab. These entries were used by vxlustart to run 'fuser - k'. RESOLUTION: Changes have been made to the vxlufinish(1M) command to prevent the execution of 'fuser -k' on non-root file systems in ABE. * INCIDENT NO:3344276 TRACKING ID:3002498 SYMPTOM: When a disk is initialized with the "vxdisk -f init " command, vxconfigd(1M) dumps core with the following stack trace: generate_and_write_vtoc() set_slice_parms() devintf_vxvm_tags() auto_sys_destroy() auto_init_format() auto_init_internal() auto_reinit() req_disk_init() request_loop() main() start() DESCRIPTION: This core dump can happen if a disk had Cross-platform Data Sharing (CDS) format initially, but AIX and HP co-existence labels were wiped off through some back-door operations without VxVM knowing it or started experiencing I/O errors. While re-initializing the disk, VxVM in-core information indicates that disk has CDS format, but INIT operation recognizes the original format to be SIMPLE as some of the signatures indicating CDS format have been wiped off. While destroying the original format, VxVM writes LABEL using the custom VxVM function for the CDS format. This LABEL is constructed based on geometry obtained from the disk. But for SIMPLE formatted disks, geometry of the disk is not available as OS IOCTL is used directly. Hence, while constructing LABEL, vxconfigd dumps core. RESOLUTION: VxVM skips using the geometry structure if it is not available and uses standard IOCTL for writing out LABEL for CDS formatted disks. * INCIDENT NO:3344277 TRACKING ID:3138849 SYMPTOM: "ERROR: Configuration daemon is not accessible" message is displayed during the boot process. DESCRIPTION: If the vxesd daemon does not shut down properly, the es_rcm.pl script is not removed and gets triggered before the vxconfigd daemon comes up. As the script tries to execute some vxdmpadm commands while vxconfigd is not up, it displays the error message. RESOLUTION: The script is modified to execute the vxdmpadm commands only if vxconfigd is running. * INCIDENT NO:3344278 TRACKING ID:3131071 SYMPTOM: VxVM patch installation in Solaris Alternate Boot Environment (ABE) results in data corruption. DESCRIPTION: While upgrading VxVM in Solaris ABE, the vxddladm assign names command is invoked, which in turn calls the binary in Primary Boot Environment (PBE). If the binary in PBE is an old version, data corruption may occur when you add or remove LUNs. RESOLUTION: Code in the install script is changed to execute the vxddladm assign names command only if the installation is in the current boot environment. * INCIDENT NO:3344280 TRACKING ID:2422535 SYMPTOM: On Solaris, the vxrelocd progress parameter is hard coded in the vxvm-recover binary and after installing VxVM patches or latest packages, the specific option is lost. DESCRIPTION: On Solaris, you can modify the vxrelocd parameters in the vxvm-recover binary. But after installing Veritas Volume Manager (VxVM) patch or latest packages, the new vxvm-recover binary overwrites the modified vxvm-recover. As a result, the specific parameter is lost. RESOLUTION: On Solaris, a configure file (/etc/vx/vxvm-recover.conf) is added to hold the modified parameter. * INCIDENT NO:3344286 TRACKING ID:2933688 SYMPTOM: When the 'Data corruption protection' check is activated by DMP, the device- discovery operation aborts, but the I/O to the affected devices continues, this results in the data corruption. The message is displayed as following: Data Corruption Protection Activated - User Corrective Action Needed: To recover, first ensure that the OS device tree is up to date (requires OS specific commands). Then, execute 'vxdisk rm' on the following devices before reinitiating device discovery using 'vxdisk scandisks' DESCRIPTION: When 'Data corruption protection' check is activated by DMP, the device- discovery operation aborts after displaying a message. However, the device- discovery operation does not stop I/Os from being issued on the DMP device on the affected devices, for all those devices whose discovery information changed unexpectedly and are no longer valid. RESOLUTION: The code is modified so that DMP is changed to forcibly fail the I/Os on devices whose discovery information is changed unexpectedly. This prevents any further damage to the data. * INCIDENT NO:3347380 TRACKING ID:3031796 SYMPTOM: When a snapshot is reattached using the "vxsnap reattach" command-line- interface (CLI), the operation fails with the following error message: "VxVM vxplex ERROR V-5-1-6616 Internal error in get object. Rec " DESCRIPTION: When a snapshot is reattached to the volume, the volume manager checks the consistency by locking all the related snapshots. If any related snapshots are not available the operation fails. RESOLUTION: The code is modified to ignore any inaccessible snapshot. This prevents any inconsistency during the operation. * INCIDENT NO:3349877 TRACKING ID:2685230 SYMPTOM: In a Cluster Volume Replicator (CVR) environment, if Storage Replicator Log (SRL) is resized with the logowner set on the CVM slave node and followed by a CVM master node switch operation, then there could be a SRL corruption that leads to the Rlink detach. DESCRIPTION: In a CVR environment, if SRL is resized when the logowner is set on the CVM slave node and if this is followed by the master node switch operation, then the new master node does not have the correct mapping of the SRL volume. This results in I/Os issued on the new master node to be corrupted on the SRL-volume contents and detaches the Rlink. RESOLUTION: The code is modified to correctly update the SRL mapping so that the SRL corruption does not occur. * INCIDENT NO:3349917 TRACKING ID:2952553 SYMPTOM: The vxsnap(1M) command allows refreshing a snapshot from a different volume other than the source volume. An example is as follows: # vxsnap refresh source= DESCRIPTION: The vxsnap(1M) command allows refreshing a snapshot from a different volume other than the source volume. This can result in an unintended loss of the snapshot. RESOLUTION: The code is modified to print the message requesting the user to use the "-f" option. This prevents any accidental loss of the snapshot. * INCIDENT NO:3349939 TRACKING ID:3225660 SYMPTOM: The Dynamic Reconfiguration (DR) tool does not list thin provisioned logical unit numbers (LUNs) during a LUN removal operation. DESCRIPTION: Because of the change in the output pattern of a Dynamic MultiPathing (DMP) command, its output gets parsed incorrectly. Due to this, thin provisioned LUNs get filtered out. RESOLUTION: The code is modified to parse the output correctly. * INCIDENT NO:3349985 TRACKING ID:3065072 SYMPTOM: Data loss occurs during the import of a clone disk group, when some of the disks are missing and the import "useclonedev" and "updateid" options are specified. The following error message is displayed: VxVM vxdg ERROR V-5-1-10978 Disk group pdg: import failed: Disk for disk group not found DESCRIPTION: During the clone disk group import if the "updateid" and "useclonedev" options are specified and some disks are unavailable, this causes the permanent data loss. Disk group Id is updated on the available disks during the import operation. The missing disks contain the old disk group Id, hence are not included in the later attempts to import the disk group with the new disk group Id. RESOLUTION: The code is modified such that any partial import of the clone disk group with the "updateid" option will no longer be allowed without the "f" (force) option. If the user forces the partial import of the clone disk group using the "f" option, the missing disks are not included in the later attempts to import the clone disk group with the new disk group Id. * INCIDENT NO:3349990 TRACKING ID:2054606 SYMPTOM: During the DMP driver unload operation the system panics with the following stack trace: kmem_free dmp_remove_mp_node dmp_destroy_global_db dmp_unload vxdmp`_fini moduninstall modunrload modctl syscall_trap DESCRIPTION: The system panics during the DMP driver unload operation when its internal data structures are destroyed, because DMP attempts to free the memory associated with a DMP device that is marked for deletion from DMP. RESOLUTION: The code is modified to check the DMP device state before any attempt is made to free the memory associated with it. * INCIDENT NO:3350000 TRACKING ID:3323548 SYMPTOM: In the Cluster Volume Replicator (CVR) environment, a cluster-wide vxconfigd hang occurs on primary when you start the cache object. Primary master vxconfigd stack: Schedule() volsync_wait() volsiowait() vol_cache_linkdone() vol_commit_link_objects() vol_ktrans_commit() volconfig_ioctl() volsioctl_real() vols_ioctl() vols_compat_ioctl() compat_sys_ioctl() sysenter_do_call() Primary slave vxconfigd stack: Schedule() volsync_wait() vol_kmsg_send_wait() volktcvm_master_request() volktcvm_iolock_wait() vol_ktrans_commit() volconfig_ioctl() volsioctl_real() vols_ioctl() vols_compat_ioctl() compat_sys_ioctl() sysenter_do_call() DESCRIPTION: This is an I/O hang issue on primary master when you start the cache object. The I/O code path is stuck due to incorrect initialization of related flags. RESOLUTION: The code is modified to correctly initialize the flags during the cache object initialization. * INCIDENT NO:3350019 TRACKING ID:2020017 SYMPTOM: Cluster node panics with the following stack when mirrored volumes are configured in the cluster. panic+0xb4 () bad_kern_reference+0xd4 () pfault+0x140 () trap+0x8a4 () thandler+0x96c () volmv_msg_dc+0xa8 () <--- Trap in Kernel mode vol_mv_kmsg_request+0x930 () vol_kmsg_obj_request+0x3cc () vol_kmsg_request_receive+0x4c0 () vol_kmsg_ring_broadcast_receive+0x6c8 () vol_kmsg_receiver+0xa40 () kthread_daemon_startup+0x24 () kthread_daemon_startup+0x0 () DESCRIPTION: When a mirrored volume is opened or closed on any of the nodes in the cluster, a message is sent to all the nodes in the cluster. While receiving the message, a 32 bit integer field is de-referenced as being long and hence the cluster node panics. RESOLUTION: The code is modified to access the field appropriately as 32 bit integer. * INCIDENT NO:3350027 TRACKING ID:3239521 SYMPTOM: When you do the PowerPath pre-check, the Dynamic Reconfiguration (DR) tool displays the following error message: 'Unable to run command [/sbin/powermt display]' and exits. The message details can be as follows: WARN: Please Do not Run any Device Discovery Operations outside the Tool during Reconfiguration operations INFO: The logs of current operation can be found at location /var/adm/vx/dmpdr_20130626_1446.log INFO: Collecting OS Version Info - done INFO: Collecting Arch type Info - done INFO: Collecting SF Product version Info - done. INFO: Checking if Multipathing is PowerPath Unable to run command [/sbin/powermt display 2>&1] DESCRIPTION: This error is seen when PowerPath is unable to display devices because PowerPath is not started on the system. RESOLUTION: The code is modified so that the powermt command was used only to warn you about the devices that are under PowerPath control. If no device gets displayed, you can ignore the message. * INCIDENT NO:3350232 TRACKING ID:2993667 SYMPTOM: VxVM allows setting the Cross-platform Data Sharing (CDS) attribute for a disk group even when a disk is missing, because it experienced I/O errors. The following command succeeds even with an inaccessible disk: vxdg -g set cds=on DESCRIPTION: When the CDS attribute is set for a disk group, VxVM does not fail the operation if some disk is not accessible. If the disk genuinely had an I/O error and fails, VxVM does not allow setting the disk group as CDS, because the state of the failed disk cannot be determined. If a disk had a non-CDS format and fails with all the other disks in the disk group with the CDS format, this allows the disk group to be set as CDS. If the disk returns and the disk group is re-imported, there could be a CDS disk group that has a non-CDS disk. This violates the basic definition of the CDS disk group and results in the data corruption. RESOLUTION: The code is modified such that VxVM fails to set the CDS attribute for a disk group, if it detects a failed disk inaccessible because of an I/O error. Hence, the below operation would fail with an error as follows: # vxdg -g set cds=on Cannot enable CDS because device corresponding to is in-accessible. * INCIDENT NO:3350235 TRACKING ID:3084449 SYMPTOM: The shared flag sets during the import of private disk group because a shared disk group fails to clear due to minor number conflict error during the import abort operation. DESCRIPTION: The shared flag that is set during the import operation fails to clear due to a minor number conflict error during the import abort operation. RESOLUTION: The code is modified such that the shared flag is cleared during import abort operation. * INCIDENT NO:3350241 TRACKING ID:3067784 SYMPTOM: The grow and shrink operations by the vxresize(1M) utility may dump core in the vfprintf() function.The following stack trace is observed: vfprintf () volumivpfmt () volvpfmt () volpfmt () main () DESCRIPTION: The vfprintf() function dumps core as the format specified to print the file system type is incorrect. The integer/ hexadecimal value is printed as a string, using the %s. RESOLUTION: The code is modified to print the file system type as the hexadecimal value, using the %x. * INCIDENT NO:3350265 TRACKING ID:2898324 SYMPTOM: Set of memory-leak issues in the user-land daemon "vradmind" that are reported by the Purify tool. DESCRIPTION: The issue is reported due to the improper/no initialization of the allocated memory. RESOLUTION: The code is modified to ensure that the proper initialization is done for the allocated memory. * INCIDENT NO:3350288 TRACKING ID:3120458 SYMPTOM: When the log overflow protection is set to "dcm", the vxconfigd daemon hangs with following stack as one of the slaves leaves the cluster: vol_rwsleep_wrlock() vol_ktrans_commit() volsioctl_real() fop_ioctl() ioctl() DESCRIPTION: The issue is due to a race between reconfiguration that is triggered by slave leaving cluster and Storage Replicator Log (SRL) overflow. The SRL overflow protection is set to "dcm", which means if the SRL is about to overflow and the Rlink is in connect state, the I/Os should be throttled till about 20 MB becomes available in the SRL or the SRL drains by 5%. The mechanism initiates throttling at slave nodes that are shipping metadata, which never gets reset due to the above mentioned racing. RESOLUTION: The code is modified to throttle metadata shipping requests whenever a CVM reconfiguration is in progress and SRL is about to overflow and the latency protection is "dcm". * INCIDENT NO:3350293 TRACKING ID:2962010 SYMPTOM: Replication hangs when the Storage Replicator Log (SRL) is resized. An example is as follows: For example: # vradmin -g vvrdg -l repstatus rvg ... Replication status: replicating (connected) Current mode: asynchronous Logging to: SRL ( 813061 Kbytes behind, 19 % full Timestamp Information: behind by 0h 0m 8s DESCRIPTION: When a SRL is resized, its internal mapping gets changed and a new stream of data gets started. Generally, we revert back to the old mapping immediately when the conditions requisite for the resize is satisfied. However, if the SRL gets wrapped around, the conditions are not satisfied immediately. The old mapping is referred to when all the requisite conditions are satisfied, and the data is sent with the old mapping. This is done without starting the new stream. This causes a replication hang, as the secondary node continues to expect the data according to the new stream. Once the hang occurs, the replication status remains unchanged. The replication status is not changed even though the Rlink is connected. RESOLUTION: The code is modified to start the new stream of data whenever the old mapping is reverted. * INCIDENT NO:3350787 TRACKING ID:2969335 SYMPTOM: The node that leaves the cluster node while the instant operation is in progress, hangs in the kernel and cannot join to the cluster node unless it is rebooted. The following stack trace is displayed in the kernel, on the node that leaves the cluster: voldrl_clear_30() vol_mv_unlink() vol_objlist_free_objects() voldg_delete_finish() volcvmdg_abort_complete() volcvm_abort_sio_start() voliod_iohandle() voliod_loop() DESCRIPTION: In a clustered environment, during any instant snapshot operation such as the snapshot refresh/restore/reattach operation that requires metadata modification, the I/O activity on volumes involved in the operation is temporarily blocked, and once the metadata modification is complete the I/Os are resumed. During this phase if a node leaves the cluster, it does not find itself in the I/O hold-off state and cannot properly complete the leave operation and hangs. An after effect of this is that the node will not be able to join to the cluster node. RESOLUTION: The code is modified to properly unblock I/Os on the node that leaves. This avoids the hang. * INCIDENT NO:3350789 TRACKING ID:2938710 SYMPTOM: The vxassist(1M) command dumps core with the following stack during the relayout operation: relayout_build_unused_volume () relayout_trans () vxvmutil_trans () trans () transaction () do_relayout () main () DESCRIPTION: During the relayout operation, the vxassist(1M) command sends a request to the vxconfigd(1M) daemon to get the object record of the volume. If the request fails, the vxassist(1M) command tries to print the error message using the name of the object from the record retrieved. This causes a NULL pointer de- reference and subsequently dumps core. RESOLUTION: The code is modified to print the error message using the name of the object from a known reference. * INCIDENT NO:3350979 TRACKING ID:3261485 SYMPTOM: The vxcdsconvert(1M) utility fails with following error messages: VxVM vxcdsconvert ERROR V-5-2-2777 : Unable to initialize the disk as a CDS disk VxVM vxcdsconvert ERROR V-5-2-2780 : Unable to move volume off of the disk VxVM vxcdsconvert ERROR V-5-2-3120 Conversion process aborted DESCRIPTION: As part of the conversion process, the vxcdsconvert(1M) utility moves all the volumes to some other disk, before the disk is initialized with the CDS format. On the VxVM formatted disks apart from the CDS format, the VxVM volume starts immediately in the PUBLIC partition. If LVM or FS was stamped on the disk, even after the data migration to some other disk within the disk group, this signature is not erased. As part of the vxcdsconvert operation, when the disk is destroyed, only the SLICED tags are erased but the partition table still exists. The disk is then recognized to have a file system or LVM on the partition where the PUBLIC region existed earlier. The vxcdsconvert(1M) utility fails because the vxdisksetup which is invoked internally to initialize the disk with the CDS format, prevents the disk initialization for any foreign FS or LVM. RESOLUTION: The code is modified so that the vxcdsconvert(1M) utility forcefully invokes the vxdisksetup(1M) command to erase any foreign format. * INCIDENT NO:3350989 TRACKING ID:3152274 SYMPTOM: The I/O operations hang with Not-Ready(NR) or Write-Disabled(WD) LUNs. System Log floods with the I/O errors. The error messages are like: VxVM vxdmp V-5-0-0 [Error] i/o error occurred (errno=0x6) on dmpnode 201/0xb0 .. .. DESCRIPTION: For performance reasons, Dynamic Multi-Pathing (DMP) immediately routes failed I/O through alternate available path, while performing asynchronous error analysis on I/O failed path. Not-ready (NR) rejects all kinds of I/O requests. Write-Disabled(WD) devices reject write I/O requests. But these devices respond well to Small Computer System Interface (SCSI) probes like inquiry. So, those devices whose I/O is retried through DMP asynchronous error analysis for different paths are not terminated due to a code problem. RESOLUTION: The code is modified to better handle Not-Ready (NR) or Write-Disabled (WD) kinds of devices. DMP asynchronous error analysis code is modified to handle such cases. * INCIDENT NO:3351005 TRACKING ID:2933476 SYMPTOM: The vxdisk(1M) command resize operation fails with the following generic error messages that do not state the exact reason for the failure: VxVM vxdisk ERROR V-5-1-8643 Device 3pardata0_3649: resize failed: Operation is not supported. DESCRIPTION: The disk-resize operation fails in in the following cases: 1. When shared disk has simple or nopriv format. 2. When GPT (GUID Partition Table) labeled disk has simple or sliced format. 3. When the Cross-platform Data Sharing (CDS) disk is part of the disk group which has version less than 160 and disk resize is done to greater than 1 TB size. RESOLUTION: The code is modified to enhance disk resize failure messages. * INCIDENT NO:3351035 TRACKING ID:3144781 SYMPTOM: In Veritas Volume Replicator (VVR) environment, execution of the vxrlink pause command causes a hang on the secondary node and displays the following stack trace: schedule() schedule_timeout() rp_send_request() vol_rp_secondary_cmd() vol_rp_ioctl() vol_objioctl() vol_object_ioctl() voliod_ioctl() volsioctl_real() vols_ioctl() vols_compat_ioctl() DESCRIPTION: The execution of the vxrlink pause command causes a hang on the secondary node, if an rlink disconnect is already in progress. This issue is observed due to a race condition between the two activities: rlink disconnect and pause. RESOLUTION: The code is modified to prevent the occurrence of the race condition between rlink disconnect and pause operations. * INCIDENT NO:3351075 TRACKING ID:3271985 SYMPTOM: In Cluster Volume Replication (CVR), with synchronous replication, aborting a slave node from the Cluster Volume Manager (CVM) cluster makes the slave node panic with the following stack trace: vol_spinlock() vol_rv_wrship_done() voliod_iohandle() voliod_loop() ... DESCRIPTION: When the slave node is aborted, it was processing a message from the log owner or master node. The cluster abort operation and the message processing contend for some common data. This results in the panic. RESOLUTION: The code is modified to make sure the cluster abort operation does not contend with message processing with the log owner. * INCIDENT NO:3351092 TRACKING ID:2950624 SYMPTOM: Following error message is displayed on the Primary repstatus post when a node leaves the cluster: VxVM VVR vradmin ERROR V-5-52-488 RDS has configuration error related to the master and logowner. DESCRIPTION: When a slave node leaves the cluster, VxVM treats it as a critical configuration error. RESOLUTION: The code is modified to separate the status flags for Master or Logowner nodes and Slave nodes. * INCIDENT NO:3351125 TRACKING ID:2812161 SYMPTOM: In a VVR environment, after the Rlink is detached, the vxconfigd(1M) daemon on the secondary host may hang. The following stack trace is observed: cv_wait delay_common delay vol_rv_service_message_start voliod_iohandle voliod_loop ... DESCRIPTION: There is a race condition if there is a node crash on the primary site of VVR and if any subsequent Rlink is detached. The vxconfigd(1M) daemon on the secondary site may hang, because it is unable to clear the I/Os received from the primary site. RESOLUTION: The code is modified to resolve the race condition. * INCIDENT NO:3351922 TRACKING ID:2866299 SYMPTOM: When layered volumes under the RVG are stopped forcefully, the vxprint output shows the NEEDSYNC flag on layered volumes even after running vxrecover. DESCRIPTION: vxrecover moves top level volumes in layered volumes under RVG to "ACTIVE" state whereas subvolumes remain in "NEEDSYNC" state. When vxrecover moves top volumes to "ACTIVE", vxrecover skips resynchronization for subvolumes when it is under the RVG. RESOLUTION: The code is modified to recover subvolumes under the RVG correctly. * INCIDENT NO:3352027 TRACKING ID:3188154 SYMPTOM: The vxconfigd(1M) daemon does not come up after enabling the native support and rebooting the host. DESCRIPTION: The issue occurs because vxconfigd treats the migration of logical unit numbers (LUNs) from JBODs to array support libraries (ASLs) as a Data Corruption Protection Activated (DCPA) condition. RESOLUTION: The code is fixed so that the LUN migration from JBODs to ASLs is not treated as a DCPA condition. * INCIDENT NO:3352208 TRACKING ID:3049633 SYMPTOM: In Veritas Volume Replicator (VVR) environment, the VxVM configuration daemon vxconfigd(1m) hangs on secondary when all disk paths are disabled on secondary and displays the following stack trace: vol_rv_transaction_prepare() vol_commit_iolock_objects() vol_ktrans_commit() volconfig_ioctl() volsioctl_real() volsioctl() vols_ioctl() DESCRIPTION: In response to the disabled disk paths, a transaction is triggered to perform plex detach. However, failing I/Os, if restarted, may wait for the past I/Os to complete. RESOLUTION: It is recommended that if the SRL or data volume fails, free them and proceed with the transaction instead of restarting the writes to it. * INCIDENT NO:3352218 TRACKING ID:3268905 SYMPTOM: After reboot, the non-root zpools created using DMP device go into FAULTED state and the DMP devices go to UNAVAIL state. For example: # zpool list NAME SIZE ALLOC FREE CAP HEALTH ALTROOT dmp_pool - - - - FAULTED - # zpool status pool: dmp_pool state: UNAVAIL status: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-3C scan: none requested config: NAME STATE READ WRITE CKSUM dmp_pool UNAVAIL 0 0 0 insufficient replicas emc_clariion0_333s0 UNAVAIL 0 0 0 cannot open emc_clariion0_334s0 UNAVAIL 0 0 0 cannot open DESCRIPTION: During boot, ZFS tries to import the zpools before VxVM Service Management Facility (SMF) service starts, which happens before the DMP devices are configured. Therefore, ZFS fails to find the DMP devices that are corresponding to the zpool, and its state shows as FAULTED. RESOLUTION: The code is modified to make VxVM SMF service check for FAULTED zpools and try to re-import them with DMP devices. * INCIDENT NO:3352226 TRACKING ID:2893530 SYMPTOM: When a system is rebooted and there are no VVR configurations, the system panics with following stack trace: nmcom_server_start() vxvm_start_thread_enter() ... DESCRIPTION: The panic occurs because the memory segment is accessed after it is released. The access happens in the VVR module and can happen even if no VVR is configured on the system. RESOLUTION: The code is modified so that the memory segment is not accessed after it is released. * INCIDENT NO:3352247 TRACKING ID:2929206 SYMPTOM: When turning on the dmp_native_support tunable with Solaris10 U10 and onwards, the Zettabyte File System (ZFS) pools are seen on the OS device paths but not on the Dynamic Multi-Pathing (DMP) devices. DESCRIPTION: When dmp_native_support tunable is turned on, the ZFS pools are migrated on to DMP devices, and it also involves running some of the ZFS commands. Because the output of the Solaris zdb command has been changed from Solaris10 U10 and onwards, the ZFS pools are not migrated onto the DMP devices. RESOLUTION: The code has been fixed to handle both the new and old output format of the Solaris zdb command. * INCIDENT NO:3352282 TRACKING ID:3102114 SYMPTOM: System crash during the 'vxsnap restore' operation can cause the vxconfigd(1M) daemon to dump core with the following stack on system start-up: rinfolist_iter() process_log_entry() scan_disk_logs() ... startup() main() DESCRIPTION: To recover from an incomplete restore operation, an entry is made in the internal logs. If the corresponding volume to that entry is not accessible, accessing a non-existent record causes the vxconfigd(1M) daemon to dump core with the SIGSEGV signal. RESOLUTION: The code is modified to ignore such an entry in the internal logs, if the corresponding volume does not exist. * INCIDENT NO:3352963 TRACKING ID:2746907 SYMPTOM: Under heavy I/O load, the vxconfigd(1M) daemon hangs on the master node during the reconfiguration. The stack is observed as follows: vxconfigd stack: schedule volsync_wait vol_rwsleep_rdlock vol_get_disks volconfig_ioctl volsioctl_real vols_ioctl vols_compat_ioctl compat_sys_ioctl cstar_dispatch DESCRIPTION: When there is a reconfiguration, the vxconfigd(1M) daemon tries to acquire the volop_rwsleep write lock. This attempt fails as the I/O takes the read lock. As a result, the vxconfigd(1M) daemon tries to get the write lock. Thus I/O load starves out the vxconfigd(1M) daemons attempt to get the write lock. This results in the hang. RESOLUTION: The code is modified so that a new API is used to block out the read locks when an attempt is made to get the write locks. When the API is used during the reconfiguration starvation, the write lock is avoided. Thereby, the hang issue is resolved. * INCIDENT NO:3353059 TRACKING ID:2959333 SYMPTOM: For Cross-platform Data Sharing (CDS) disk group, the "vxdg list" command does not list the CDS flag when that disk group is disabled. DESCRIPTION: When the CDS disk group is disabled, the state of the record list may not be stable. Hence, it is not considered if disabled disk group was CDS. As a result, Veritas Volume Manager (VxVM) does not mark any such flag. RESOLUTION: The code is modified to display the CDS flag for disabled CDS disk groups. * INCIDENT NO:3353064 TRACKING ID:3006245 SYMPTOM: While executing a snapshot operation on a volume which has 'snappoints' configured, the system panics infrequently with the following stack trace: ... voldco_copyout_pervolmap () voldco_map_get () volfmr_request_getmap () ... DESCRIPTION: When the 'snappoints' are configured for a volume by using the vxsmptadm(1M) command in the kernel, the relationship is maintained using a field. This field is also used for maintaining the snapshot relationships. Sometimes, the 'snappoints' field may wrongly be identified with the snapshot field. This causes the system to panic. RESOLUTION: The code is modified to properly identify the fields that are used for the snapshot and the 'snappoints' and handle the fields accordingly. * INCIDENT NO:3353244 TRACKING ID:2925746 SYMPTOM: In CVM environment, during CVM reconfiguration cluster-wide vxconfigd hangs with the following stack trace: pse_sleep_thread() volktenter() vol_get_disks() volconfig_ioctl() volsioctl_real() volsioctl() vols_ioctl() DESCRIPTION: When vxconfigd is heavily loaded as compared to kernel thread during a join reconfiguration, the cluster-wide reconfiguration hangs and leads the cluster-wide vxconfigd to hang. In this case, the cluster reconfiguration sequence is a node leave followed by another node join. RESOLUTION: The code is modified to take care of the missing condition during successive reconfiguration processing. * INCIDENT NO:3353953 TRACKING ID:2996443 SYMPTOM: In CVR environment, on shutting down the Primary-Master, the "vradmin repstatus" and "vradmin printrvg" command show the following configuration error: "vradmind server on host not responding or hostname cannot be resolved". DESCRIPTION: On the non-logowner slave nodes, the logowner change does not get correctly reflected. It prevents the slave from reaching the old logowner which is shut down. RESOLUTION: The code is modified to reflect the changes of master and logowner on the slave nodes. * INCIDENT NO:3353985 TRACKING ID:3088907 SYMPTOM: A node in a Cluster Volume Manager(CVM) can panic while destroying a shared disk group. The following stack trace is displayed: Xvolupd_disk_iocnt_locked() volrdiskiostart() vol_disk_tgt_write_start() voliod_iohandle() voliod_loop() kernel_thread() DESCRIPTION: During a shared disk group destroy, disks in this disk group are moved to a common pool that holds all disks that are not part of any disk group. When the disk group destroy is done, all the existing I/Os on the disks belonging to this disk group are completed. However, the I/O shipping enabled IOs can arrive from remote nodes even after all the local I/Os are cleaned up. This results in I/Os accessing the freed-up resources, which causes the system panic. RESOLUTION: The code has been modified so that during the disk group destroys operation; appropriate locks are taken to synchronize movement of disks from shared disk group to the common pool. * INCIDENT NO:3353990 TRACKING ID:3178182 SYMPTOM: During a master take over task, shared disk group re-import operation fails due to false serial split brain (SSB) detection. DESCRIPTION: The disk private region contents are not updated under certain conditions during a node join. As a result, SSB ID mismatch (due to a stale value in in-core) is detected during re-import operation as a part of master take over task which causes the re-import operation failure. RESOLUTION: The code is modified to update the disk header contents in the memory with the disk header contents on joining node to avoid false SSB detection during master take over operation. * INCIDENT NO:3353995 TRACKING ID:3146955 SYMPTOM: A remote disk (lfailed or lmissing disks) goes into the "ONLINE INVALID LFAILED" or "ONLINE INVALID LMISSING" state after the disk loses global disk connectivity. It becomes difficult to recover the remote disk from this state even if the connectivity is restored. DESCRIPTION: The INVALID state implies that the disk private region contents were read but they were found to be invalid contents. Actually, the private region contents themselves cannot be read in this case, because none of the nodes in the cluster has connectivity to the disk (global connectivity failure). RESOLUTION: The code is modified so that the remote disk is marked with the "Error" state in the case of a global connectivity failure. Setting "Error" state on the remote disk helps the transition to the ONLINE state when the connectivity to the disk is restored at least on any one node in the cluster. * INCIDENT NO:3353997 TRACKING ID:2845383 SYMPTOM: The site gets detached if the plex detach operation is performed with the site consistency set to off. DESCRIPTION: If the plex detach operation is performed on the last complete plex of a site, the site is detached to maintain the site consistency. The site should be detached only if the site consistency is set. Initially, the decision to detach the site is made based on the value of the 'allsites' flag. So, the site gets detached when the last complete plex is detached, even if the site consistency is off. RESOLUTION: The code is modified to ensure that the site is detached when the last complete plex is detached, only if the site consistency is set. If the site consistency is off and the 'allsites' flag is on, detaching the last complete plex leads to the plex being detached. * INCIDENT NO:3354023 TRACKING ID:2869514 SYMPTOM: In the clustered environment with large Logical unit number (LUN) configuration, the node join process takes long time. It may cause the cvm_clus resource to timeout and finally bringing the dependent groups in partial state. DESCRIPTION: In the clustered environment with large LUN configuration, if the node join process is triggered, it takes long time to complete. The reason is that, disk online is called for detecting if disk is connected cluster-wide. Below the vxconfigd(1M) stack is seen on node which takes long time to join: ioctl() ddl_indirect_ioctl() do_read_capacity_35() do_read_capacity() do_spt_getcap() do_readcap() auto_info_get() auto_sys_online() auto_online() da_online() dasup_validate() dapriv_validate() auto_validate() setup_remote_disks() slave_response() fillnextreq() vold_getrequest() request_loop() main() RESOLUTION: Changes have been made to use connectivity framework to detect if storage is accessible cluster wide. * INCIDENT NO:3354024 TRACKING ID:2980955 SYMPTOM: Dg goes into disabled state if the vxconfigd(1M) daemon is restarted on new master after master switch. DESCRIPTION: If the vxconfigd(1M) daemon restarts and master tries to import a dg referring to the stale tempdb copies(/etc/vx/tempdb), the dg goes into disabled state if on. The tempdb is created on the master node (instead of slave nodes) at the time of dg creation. On master switch the tempdb is not cleared, so when the node becomes the master again (after master switch), on vxconfigd restart it tries to import the dg using stale tempdb copies instead of creating new tempdb for the dg. RESOLUTION: The code is changed to prevent dg going to disabled state after vxconfigd(1M) restart on the new master. * INCIDENT NO:3354028 TRACKING ID:3136272 SYMPTOM: In a CVM environment, the disk group import operation with the "-o noreonline" option takes additional import time. DESCRIPTION: On a slave node when the clone disk group import is triggered by the master node, the "da re-online" takes place irrespective of the "-o noreonline" flag passed. This results in the additional import time. RESOLUTION: The code is modified to pass the hint to the slave node when the "-o noreonline" option is specified. Depending on the hint, the "da re-online" is either done or skipped. This avoids any additional import time. * INCIDENT NO:3355830 TRACKING ID:3122828 SYMPTOM: Dynamic Reconfiguration (DR) tool lists the disks which are tagged with Logical Volume Manager (LVM), for removal or replacement. DESCRIPTION: When the tunable DMP Native Support is turned 'OFF', the Dynamic Reconfiguration (DR) tool should not list the Logical Volume Manager (LVM) disks for removal or replacement. When the tunable DMP Native Support is turned 'ON', then the Dynamic Reconfiguration (DR) tool should list the LVM disks for removal or replacement provided there are no open counts in the Dynamic Multi-Pathing (DMP) layer. RESOLUTION: The code is modified to exclude the Logical Volume manager (LVM) disks for removal or replacement option in Dynamic Reconfiguration (DR) tool. * INCIDENT NO:3355856 TRACKING ID:2909668 SYMPTOM: In case of multiple sets of the cloned disks of the same source disk group, the import operation on the second set of the clone disk fails, if the first set of the clone disks were imported with "updateid". Import fails with following error message: VxVM vxdg ERROR V-5-1-10978 Disk group firstdg: import failed: No tagname disks for import DESCRIPTION: When multiple sets of the clone disk exists for the same source disk group, each set needs to be identified with the separate tags. If one set of the cloned disk with the same tag is imported using the "updateid" option, it replaces the disk group ID on the imported disk with the new disk group ID. The other set of the cloned disk with the different tag contains the disk group ID. This leads to the import failure for the tagged import for the other sets, except for the first set. Because the disk group name maps to the latest imported disk group ID. RESOLUTION: The code is modified in case of the tagged disk group import for disk groups that have multiple sets of the clone disks. The tag name is given higher priority than the latest update time of the disk group, during the disk group name to disk group ID mapping. * INCIDENT NO:3355878 TRACKING ID:2735364 SYMPTOM: The "clone_disk" disk flag attribute is not cleared when a cloned disk group is removed by the "vxdg destroy " command. DESCRIPTION: When a cloned disk group is removed by the "vxdg destroy " command, the Veritas Volume Manager (VxVM) "clone_disk" disk flag attribute is not cleared. The "clone_disk" disk flag attribute should be automatically turned off when the VxVM disk group is destroyed. RESOLUTION: The code is modified to turn off the "clone_disk" disk flag attribute when a cloned disk group is removed by the "vxdg destroy " command. * INCIDENT NO:3355883 TRACKING ID:3085519 SYMPTOM: Missing disks are permanently detached from the disk group because -o updateid and tagname options are used to import partial disks. DESCRIPTION: If user imports the partial disks in a disk group using -o updateid and tagname, then the imported disk group will have different dgid. So the missing disks from disk group configurations will be left alone forever. This will lead to permanently detaching missing disks from the disk group. RESOLUTION: The code has been modified so that the -o updateid and tagname are not allowed in the partial disk group import. It now warns the users with an error message similar as the following: # vxdg -o tag=TAG2 -o useclonedev=on -o updateid import testdg2 VxVM vxdg ERROR V-5-1-10978 Disk group testdg2: import failed: Disk for disk group not found * INCIDENT NO:3355971 TRACKING ID:3289202 SYMPTOM: If a Cluster Volume Manager (CVM) node is stopped (stopnode/abortnode) when there are outstanding disk-connectivity related messages initiated by the same node, then vxconfigd may hang with the following stack trace: volsync_wait() vol_syncwait() volcvm_get_connectivity() vol_test_connectivity() volconfig_ioctl() volsioctl_real() vols_ioctl() vols_compat_ioctl() compat_sys_ioctl() sysenter_dispatch() DESCRIPTION: When a CVM node is aborted, all the outstanding messages on that node are cleared or purged. The relevant data structure for the given messages are supposed to be set to proper values in this purge operation. But due to some reason, there is not any flag being set for disk-connectivity protocol message. As a result, the disk-connectivity protocol initiated by vxconfigd-thread will hang after the messaging layer clears the message. RESOLUTION: The code has been changed so that the appropriate flag is set when we purge the disk-connectivity messages in the node leave/abort case. Now the initiator thread (vxconfigd) can detect the correct flag value, so that it fails the internal disk-connectivity protocol gracefully and proceeds further. This way the vxconfigd hang is avoided. * INCIDENT NO:3355973 TRACKING ID:3003991 SYMPTOM: System fails to add a disk to a shared disk group, if all the paths of the existing disks are disabled. DESCRIPTION: The operation fails because there is continuous retry of internally generated IO when the disk is added to the disk group. Due to some error in the internal IO code path, the IO completion count is not kept correctly, which causes this failure. RESOLUTION: The code has been changed to correctly keep the internal IO completion on the given in shared disk group. * INCIDENT NO:3356836 TRACKING ID:3125631 SYMPTOM: The snapshot creation operation using the vxsnap make command on volume sets occasionally fails with the error: "vxsnap ERROR V-5-1-6433 Component volume has changed". It occurs primarily when snapshot operation runs on the volume set after a fresh mount of the file system. DESCRIPTION: The snapshot creation proceeds in multiple atomic stages which are called transactions. If some state changes outside the operation, the operation fails. In the release in question, the state of the volume changes to DIRTY after the first transaction. This is due to asynchronous I/Os after mounting the file system, and it led to the mentioned failure in the later stage. RESOLUTION: The code is modified to expect such changes after the first transaction and not deem that as failure. * INCIDENT NO:3361957 TRACKING ID:2912263 SYMPTOM: On Solaris LDOMs, the "vxdmpadm exclude" command fails to exclude a controller, which is specified in the /etc/vx/vxvm.exclude path. DESCRIPTION: The vxdmpadm exclude command fails to exclude a controller on Solaris LDOMs. The failure occurs due to an error is the parsing logic which discovers the basename of the DMP device. RESOLUTION: The parsing logic is modified to detect the correct the basename of the DMP device. * INCIDENT NO:3361977 TRACKING ID:2236443 SYMPTOM: In a VCS environment, the "vxdg import" command does not display an informative error message, when a disk group cannot be imported because the fencing keys are registered to another host. The following error messages are displayed: # vxdg import sharedg VxVM vxdg ERROR V-5-1-10978 Disk group sharedg: import failed: No valid disk found containing disk group The system log contained the following NOTICE messages: Dec 18 09:32:37 htdb1 vxdmp: NOTICE: VxVM vxdmp V-5-0-0 i/o error occured (errno=0x5) on dmpnode 316/0x19b Dec 18 09:32:37 htdb1 vxdmp: [ID 443116 kern.notice] NOTICE: VxVM vxdmp V-5-0-0 i/o error occured (errno=0x5) on dmpnode 316/0x19b DESCRIPTION: The error messages that are displayed when a disk group cannot be imported because the fencing keys are registered to another host, needs to be more informative. RESOLUTION: The code has been added to the VxVM disk group import command to detect when a disk is reserved by another host, and issue a SCSI3 PR reservation conflict error message. * INCIDENT NO:3361998 TRACKING ID:2957555 SYMPTOM: The vxconfigd(1M) daemon on the CVM master node hangs in the userland during the vxsnap(1M) restore operation. The following stack trace is displayed: rec_find_rid() position_in_restore_chain() kernel_set_object_vol() kernel_set_object() kernel_dg_commit_kernel_objects_20() kernel_dg_commit() commit() dg_trans_commit() slave_trans_commit() slave_response() fillnextreq() DESCRIPTION: During the snapshot restore operation, when the volume V1 gets restored from the source volume V2, and at the same time if the volume V2 gets restored from V1 or the child of V1 , the vxconfigd(1M) daemon tries to find the position of the volume that gets restored in the snapshot chain. For such instances, finding the position in the restore chain, causes the vxconfigd(1M) daemon to enter in an infinite loop and hangs. RESOLUTION: The code is modified to remove the infinite loop condition when the restore position is found. * INCIDENT NO:3362065 TRACKING ID:2861011 SYMPTOM: The "vxdisk -g resize " command fails with an error for the Cross-platform Data Sharing(CDS) formatted disk. The error message is displayed as following: "VxVM vxdisk ERROR V-5-1-8643 Device : resize failed: One or more subdisks do not fit in pub reg" DESCRIPTION: During the resize operation, VxVM updates the VM disk's private region with the new public region size, which is evaluated based on the raw disk geometry. But for the CDS disks, the geometry information stored in the disk label is fabricated such that the cylinder size is aligned with 8KB. The resize failure occurs when there is a mismatch in the public region size obtained from the disk label, and that stored in the private region. RESOLUTION: The code is modified such the new public region size is now evaluated based on the fabricated geometry considering the 8 KB alignment for the CDS disks, so that it is consistent with the size obtained from the disk label. * INCIDENT NO:3362087 TRACKING ID:2916911 SYMPTOM: The vxconfigd(1M) deamon triggered a Data TLB Fault panic with the following stack trace: _vol_dev_strategy volsp_strategy vol_dev_strategy voldiosio_start volkcontext_process volsiowait voldio vol_voldio_read volconfig_ioctl volsioctl_real volsioctl vols_ioctl spec_ioctl vno_ioctl ioctl syscall DESCRIPTION: The kernel_force_open_disk() function checks if the disk device is open. The device is opened only if not opened earlier. When the device is opened, it calls the kernel_disk_load() function which in-turn calls the VOL_NEW_DISK ioctl () function. If the VOL_NEW_DISK ioctl fails, the error is not handled correctly as the return values are not checked. This may result in a scenario where the open operation fails but the disk read or write operation proceeds. RESOLUTION: The code is modified to handle the VOL_NEW_DISK ioctl. If the ioctl fails during the open operation of the device that does not exist, then the read or write operations are not allowed on the disk. * INCIDENT NO:3362114 TRACKING ID:2856579 SYMPTOM: When a cross-platform data sharing (CDS) disk is resized from less than 1TB to 1TB or larger, subsequent I/O to the DMPNODE gives incorrect data. DESCRIPTION: When a VxVM disk is resized from less than 1TB to 1TB or larger, disk label format is changed from Sun Microsystems Inc (SMI) to Extensible firmware interface (EFI). And the OS changes device numbers and device special files (DSF'S) for all the paths. Corresponding changes are also made to DA records, etc. However, the DMP database and DMP DFS'S are not updated accordingly. Hence I/O on the whole DMP device gets redirected to incorrect underlying partition. The DMP database gets updated during the subsequent "vxdisk scandisks" operation and further I/O's get correctly redirected. It is unlikely to have direct I/O on DMP DSF for which migration from VTOC to EFI happens from an application. RESOLUTION: The code has been modified to update the DMP database and DSF's when disk partition layout changes from SMI to EFI during disk resize operation. * INCIDENT NO:3362144 TRACKING ID:1942051 SYMPTOM: IO hangs on a master node after disabling the secondary paths from slave node and rebooting the slave node. DESCRIPTION: When the GAB receiver flow control gets enabled due to heavy I/O load, I/O hang happens. When GAB receiver flow control gets enabled, GAB asks other nodes not to send any more messages. So every node is blocked until it clears the receiver flow control. The receiver flow control is supposed to be cleared when the receiver queue reaches the "VOL_KMSG_RECV_Q_LOW" limit. RESOLUTION: Clear the receiver flow control, once the receive queue reaches the "VOL_KMSG_RECV_Q_LOW" limit. * INCIDENT NO:3362948 TRACKING ID:2599887 SYMPTOM: The DMP device paths that are marked as "Disabled" cannot be excluded from VxVM control. DESCRIPTION: The DMP device paths which encounter connectivity issues are marked as "DISABLED". These disabled DMP device paths cannot be excluded from VxVM control due to a check for active state on the path. RESOLUTION: The offending check has been removed so that a path can be excluded from VxVM control irrespective of its state. * INCIDENT NO:3365287 TRACKING ID:2973786 SYMPTOM: DR Tool pre-check fails for "Is OS & VM Device Tree in Sync" item as follows: ------------------------------------------------------------------------------ OS : SunOS |PASS OS Version : 5.10 |PASS Leadville Driver Version : v20100509-1.143 |PASS SF Product Version : 6.1.000.000 |PASS Architecture : sparc |PASS Is MPXIO Enabled ? : No |PASS Is Multipathing Powerpath : No |FAIL Is OS & VM Device Tree in Sync: No |FAIL <<-- Is cfgadm working? : Yes |PASS Any Device Failing/Unusable : No |PASS ------------------------------------------------------------------------------- DESCRIPTION: When the dmp_native_support tunable is on, the DR Tool's pre-check for OS and VxVM device tree synchronization fails due to mismatch between the output of the Solaris "format" command and the VxVM command. The mismatch is because the format command has some /dev/vx/[r]dmp device entries which don't match with VxVM command's device path entries. RESOLUTION: The code is modified to validate the entry that is not found in the "format" command output by checking the corresponding dmpnode * INCIDENT NO:3365295 TRACKING ID:3053073 SYMPTOM: DR (Dynamic Reconfiguration) Tool doesn't pick thin LUNs in "online invalid" state for disk remove operation. DESCRIPTION: DR tool script used to parse the list of disks incorrectly, resulting in skipping thin LUNs in "online invalid" state. RESOLUTION: DR scripts have been modified to correctly parse the list of disks to select them properly. * INCIDENT NO:3365313 TRACKING ID:3067452 SYMPTOM: If new LUNs are added in the cluster, and its naming scheme has option avid set to 'no', then DR Tool changes the mapping between dmpnode and disk record. Example: After adding 2 new disks, they get indexed at the beginning of the DMP (Dynamic MultiPathing) device list and the mapping of 'DEVICE' and 'DISK' changes: # vxdisk list DEVICE TYPE DISK GROUP STATUS xiv0_0 auto:cdsdisk - - online thinrclm --> xiv0_1 auto:cdsdisk - - online thinrclm --> xiv0_2 auto:cdsdisk xiv0_0 dg1 online thinrclm shared xiv0_3 auto:cdsdisk xiv0_1 dg1 online thinrclm shared xiv0_4 auto:cdsdisk xiv0_2 dg1 online thinrclm shared xiv0_5 auto:cdsdisk xiv0_3 dg1 online thinrclm shared xiv0_6 auto:cdsdisk xiv0_4 dg1 online thinrclm shared DESCRIPTION: When Reconfiguration events are improperly handled to assign device names, disk record mappings are changed by DR tool. RESOLUTION: The code has been modified so that the DR script will prompt the user to decide whether to run 'vxddladm assign names' when a Reconfiguration event is generated. Here is an example of the prompt: Do you want to run command [vxddladm assign names] and regenerate DMP device names : * INCIDENT NO:3365321 TRACKING ID:3238397 SYMPTOM: Dynamic Reconfiguration (DR) Tool's Remove LUNs option does not restart the vxattachd daemon. DESCRIPTION: Due to wrong sequence of starting and stopping of the vxattachd daemon in the DR tool, the Remove LUNs option does not start the vxattachd daemon. In such cases, the disks that go offline will not be detected automatically when they come back online and the whole site can go offline. RESOLUTION: The code is modified to correct the sequence of start/stop operations in the DR scripts. * INCIDENT NO:3365390 TRACKING ID:3222707 SYMPTOM: The Dynamic Reconfiguration (DR) tool does not permit the removal of disks associated with a deported diskgroup(dg). DESCRIPTION: The Dynamic Reconfiguration tool interface does not consider the disks associated with a deported diskgroup as valid candidates for the removal of disk(s). RESOLUTION: The code is modified to include the disks associated with a deported diskgroup for removal. * INCIDENT NO:3368953 TRACKING ID:3368361 SYMPTOM: When site consistency is configured within a private disk group and Cluster Volume Manager (CVM) is up, the reattach operation of a detached site fails. DESCRIPTION: When you try to reattach the detached site configured in a private disk-group with CVM up on that node, the reattach operation fails with the following error "Disk (disk_name) do not have connectivity from one or more cluster nodes". The reattach operation fails because you are not checking the shared attribute of the disk group when you apply the disk connectivity check for a private disk group. RESOLUTION: The code is modified to make the disk connectivity check explicit for a shared disk group by checking the shared attribute of a disk group. * INCIDENT NO:3378817 TRACKING ID:3143622 SYMPTOM: The following command fails and displays the following error message for the power path managed devices. #etc/vx/bin/vxdisksetup -i emcpower2 VxVM vxdisk WARNING V-5-1-16737 cannot open /dev/vx/rdmp/emcpower2c to check for ASM disk format VxVM vxdisk WARNING V-5-1-16736 cannot open /dev/vx/rdmp/emcpower2c to check disk format DESCRIPTION: While checking for Automatic Storage Management (ASM) disk format, Veritas Volume Manager (VxVM) issues open on device. Due to bug in the code, open is issued on the operating system (OS) device name with DMP device directory path. As a result, the open system call fails and displays error messages. RESOLUTION: The code is modified to appropriately use DMP device name with DMP device directory path while issuing open system call. * INCIDENT NO:3384633 TRACKING ID:3142315 SYMPTOM: Sometimes the udid_mismatch flag gets set on a disk due to Array Support Library (ASL) upgrade, consequently the disk is misidentified as clone disk after import. DESCRIPTION: Sometimes udid_mismatch flag gets set on a disk due to Array Support Library (ASL) upgrade, consequently the disk is misidentified as clone disk after import. The operation is frequently required after ASL upgrade when ASL has a new logic in udid formation. The following two separate commands are required to reset the clone_flag on a clone disk. vxdisk updateudid vxdisk set clone=off RESOLUTION: A new vxdisk option '-c' is introduced to reset the clone_disk flag and update the udid. * INCIDENT NO:3384636 TRACKING ID:3244217 SYMPTOM: There is no way to reset clone flag during import of disk group. DESCRIPTION: If user wants to reset clone flag on disk during import of a disk group. Then one has to deport the disk group. Then on each disk reset the clone flag and then import the disk group. RESOLUTION: The code has been modified to provide a -c option during disk group import to reset the clone flag. * INCIDENT NO:3384662 TRACKING ID:3127543 SYMPTOM: Non-labeled disks go into udid_mismatch after vxconfigd restart. DESCRIPTION: Non-labeled disks go into udid_mismatch as the udid is not stamped on disk. The udid provided by Array Support Libraries (ASL) is compared with the invalid value, so the disk is marked as udid_mismatch. RESOLUTION: The code is modified so that the system does not compare the udid for non-labeled disks. * INCIDENT NO:3384697 TRACKING ID:3052879 SYMPTOM: Auto import of the cloned disk group fails after reboot even when source disk group is not present. DESCRIPTION: If the source disk group is not present on the host, then the imported clone disk group on this host is not automatically imported after reboot. This was not allowed earlier. RESOLUTION: The code is modified. Now the auto import of the clone disk group is allowed as long as the disk group is imported prior to reboot, no matter whether the source disk group is available on host or not. * INCIDENT NO:3384986 TRACKING ID:2996142 SYMPTOM: Data may corrupt or get lost due to incorrect mapping from DA to DM of a disk. DESCRIPTION: For various hardware or operating system issues, some of the disks lose VM configuration or label or partition, so the disk becomes 'online invalid' or 'error'. In an attempt to import disk group, those disks cannot be imported because the DM record is lost and the disk becomes 'failed disk'. For example: -- # vxdisk -o alldgs list|grep sdg hdisk10 auto:cdsdisk hdisk10 sdg online shared - - hdisk9 sdg failed was:hdisk9 RESOLUTION: Unique disk identifier (UDID) provided by device discovery layer (DDL) is added in DM record when a DM record is associated with the disk in the disk group. It helps identify the failed disks correctly. * INCIDENT NO:3386843 TRACKING ID:3279932 SYMPTOM: The vxdisksetup and vxdiskunsetup utilities was failing on disk which is part of deported disk group (DG), even if "-f" option is specified. The vxdisksetup command fails with following error: VxVM vxedpart ERROR V-5-1-10089 partition modification failed : Device or resource busy The vxdiskunsetup command fails following error: VxVM vxdisk ERROR ERROR V-5-1-0 Device appears to be owned by disk group . Use -f option to force destroy. VxVM vxdiskunsetup ERROR V-5-2-5052 : Disk destroy failed. DESCRIPTION: The vxdisksetup and vxdiskunsetup utilities internally call the "vxdisk" utility. Due to a defect in vxdisksetup and vxdiskunsetup, the vxdisk operation used to fail on disk which is part of deported DG, even if "force" operation is requested by user. RESOLUTION: Code changes are done to the vxdisksetup and vxdiskunsetup utilities so that when "-f" option is specified the operation succeeds. * INCIDENT NO:3395499 TRACKING ID:3373142 SYMPTOM: Manual pages for vxedit and vxassist do not contain details about updated behavior of these commands. DESCRIPTION: 1. vxedit manual page On this page it explains that if the reserve flag is set for a disk, then vxassist will not allocate a data subdisk on that disk unless the disk is specified on the vxassist command line. But Data Change Object (DCO) volume creation by vxassist or vxsnap command will not honor the reserved flag. 2. vxassist manual page DCO allocation policy has been updated starting from 6.0. The allocation policy may not succeed if there is insufficient disk space. The vxassist command then uses available space on the remaining disks of the disk group. This may prevent certain disk group from splitting or moving if the DCO plexes cannot accompany their parent data volume. RESOLUTION: The manual pages for both commands have been updated to reflect the new behavioral changes. * INCIDENT NO:3401836 TRACKING ID:2790864 SYMPTOM: For OTHER_DISKS enclosure, the vxdmpadm config reset CLI fails while trying to reset IO Policy value. DESCRIPTION: The 'vxdmpadm config reset' CLI sets all DMP entity properties to default values. For OTHER_DISKS enclosure, default IO policy is shown as 'Vendor Defined', but this does not comply with a DMP IO policy. As a result, the config reset CLI fails while resetting IO Policy value. RESOLUTION: The code is modified so that for OTHER_DISKS enclosure, the default IO policy is set to SINGLE-ACTIVE when the 'vxdmpadm config reset' operation is performed. * INCIDENT NO:3404455 TRACKING ID:3247094 SYMPTOM: DR (Dynamic Reconfiguration) tool is unable to apply SMI label for newly added devices which had EFI label. DESCRIPTION: DR Tool does not handle the case where the newly added device had EFI label and is being labeled now as SMI. RESOLUTION: Code changes have been made to handle this case. * INCIDENT NO:3404625 TRACKING ID:3133908 SYMPTOM: DR (Dynamic Reconfiguration) Tool throws cfgadm(1M) usage message as follows, while adding the LUN. Usage: cfgadm [-f] [-y|-n] [-v] [-o hardware_opts ] -c function ap_id [ap_id...] cfgadm [-f] [-y|-n] [-v] [-o hardware_opts ] -x function ap_id [ap_id...] cfgadm [-v] [-s listing_options ] [-o hardware_opts ] [-a] [-l [ap_id|ap_type...]] cfgadm [-v] [-o hardware_opts ] -t ap_id [ap_id...] cfgadm [-v] [-o hardware_opts ] -h [ap_id|ap_type...] INFO: Running command cfgadm ... this might take some time DESCRIPTION: If the cfgadm(1M) command is given an invalid input, this error message is seen. The DR Tool code passed the wrong input to the cfgadm(1M) command. RESOLUTION: Code changes have been made so that only valid parameter should be passed to the cfgadm command. * INCIDENT NO:3405032 TRACKING ID:3451625 SYMPTOM: If ZFS volume is exported as virtual disk and used as root disk in a LDom guest, then the system panics on VxVM root encapsulation with the following message and stack trace: NOTICE: VxVM vxio V-5-0-74 Cannot open disk ROOTDISK: kernel error 6 Cannot open mirrored root device, error 6 Cannot remount root on /pseudo/vxio@0:0 fstype ufs panic[cpu0]/thread=180e000: vfs_mountroot: cannot remount root genunix:vfs_mountroot() genunix:main() DESCRIPTION: If ZFS volume is exported as virtual disk and used as root disk in a LDom guest, when system is rebooted, machine panics on root encapsulation. During early boot, the VxVM IO driver depends on VxVM DMP driver to identify the DMP device corresponding to root disk. As root disk is a ZFS volume, DMP fails to identify such device and does not create DMP device for it. Thus VxVM IO throws error and fails to initialize rootdg corresponding to root volume. This causes the system to panic. RESOLUTION: The code has been modified so that the VxVM DMP driver recognizes such device during early boot and creates DMP device for root disk. This allows root disk encapsulation to work with ZFS volume exported virtual disks in LDom guest. * INCIDENT NO:3405318 TRACKING ID:3259732 SYMPTOM: In a Clustered Volume Replicator (CVR) environment, if the SRL size grows and if it is followed by a slave node leaving and then re-joining the cluster then rlink is detached. DESCRIPTION: After the slave re-joins the cluster, it does not correctly receive and process the SRL resize information received from the master. This means that application writes initiated on this slave may corrupt the SRL causing rlink to detach. RESOLUTION: The code is modified so that when a slave joins the cluster, make sure that the SRL resize related information is correctly received and processed by the slave. * INCIDENT NO:3408321 TRACKING ID:3408320 SYMPTOM: Thin reclamation fails for EMC 5875 arrays with the following message: # vxdisk reclaim Reclaiming thin storage on: Disk : Reclaim Partially Done. Device Busy. DESCRIPTION: As a result of recent changes in EMC Microcode 5875, Thin Reclamation for EMC 5875 arrays fails because reclaim request length exceeds the maximum "write_same" length supported by the array. RESOLUTION: The code has been modified to correctly set the maximum "write_same" length of the array. * INCIDENT NO:3409473 TRACKING ID:3409612 SYMPTOM: Fails to run "vxtune reclaim_on_delete_start_time " if the specified value is outside the range of 22:00-03:59 (E.g. setting it to 04:00 or 19:30 fails). DESCRIPTION: Tunable reclaim_on_delete_start_time can be set to any time value within 00:00 to 23:59. But because of the wrong regular expression to parse time, it cannot be set to all values in 00:00 - 23:59. RESOLUTION: The regular expression has been updated to parse time format correctly. Now all values in 00:00-23:59 can be set. * INCIDENT NO:3413044 TRACKING ID:3400504 SYMPTOM: While disabling the host side HBA port, extended attributes of some devices are not present anymore. This happens even when there is a redundant controller present on the host which is in enabled state. An example output is shown below where the 'srdf' attribute of an EMC device (which has multiple paths through multiple controllers) gets affected. Before the port is disabled- # vxdisk -e list emc1_4028 emc1_4028 auto:cdsdisk emc1_4028 dg21 online c6t5000097208191154d112s2 srdf-r1 After the port is disabled- # vxdisk -e list emc1_4028 emc1_4028 auto:cdsdisk emc1_4028 dg21 online c6t5000097208191154d112s2 - DESCRIPTION: The code which prints the extended attributes isused to print the attributes of the first path in the list of all paths. If the first path belongs to the controller which is disabled, its attributes will be empty. RESOLUTION: The code is modified to look for the path in enabled state among all the paths and then print the attributes of such path. If all the paths are in disabled state, no attributes will be shown. * INCIDENT NO:3414151 TRACKING ID:3280830 SYMPTOM: Multiple vxresize operations on layered volume fail. Following error message is observed. "ERROR V-5-1-16092 Volume : There are other recovery activities. Cannot grow volume". DESCRIPTION: Veritas Volume Manager internally maintains recovery offset for each volume which indicates the length of so far recovered volume. The Shrinkto operation called on the volume sets incorrect recovery offset. The following growto operation called on the same volume treats volume to be in recovery phase due to incorrect recovery offset set by the earlier shrink to operation. RESOLUTION: The code is modified to correctly set volume's recovery offset. * INCIDENT NO:3414265 TRACKING ID:2804326 SYMPTOM: Secondary logging is seen in effect even if Storage Replicator Log (SRL) size mismatch is seen across primary and secondary. DESCRIPTION: Secondary logging should be turned off if there is mismatch in SRL size across primary and secondary. Considering that SRL size is the same before start of replication, if SRL size is increased on primary after replication is turned on, secondary logging would get turned off. In other case when SRL size is increased on secondary, secondary logging is not turned off. RESOLUTION: The code has been modified to turn off secondary logging when SRL size is changed on secondary. * INCIDENT NO:3416320 TRACKING ID:3074579 SYMPTOM: The "vxdmpadm config show" CLI does not display the configuration file name which is present under the root(/) directory. DESCRIPTION: The "vxdmpadm config show" CLI displays the configuration file name which is loaded using the "vxdmpadm config load file" CLI. If a file is located under the root(/) directory, the "vxdmpadm config show" CLI does not display the name of such files. RESOLUTION: The code has been modified to display any configuration files that are loaded. * INCIDENT NO:3416406 TRACKING ID:3099796 SYMPTOM: When the vxevac command is invoked on a volume with DCO log associated, it fails with error message "volume volname_dcl is not using the specified disk name". The error is only seen only for log volumes. No error is seen in case of simple volumes. DESCRIPTION: The vxevac command evacuates all volumes from a disk. It moves sub-disks of all volumes off the specified VxVM disk to the destination disks or any non-volatile, non-reserved disks within the disk group. During the evacuation of data volumes, it also implicitly evacuates the log volumes associated with them. If log volumes are explicitly placed in the list of volumes to be evacuated, above error is seen as those log volumes have already been evacuated off the disk though their corresponding data volumes. RESOLUTION: The code has been updated to avoid explicit evacuation of log volumes. * INCIDENT NO:3417081 TRACKING ID:3417044 SYMPTOM: The system becomes unresponsive while creating Veritas Volume Replication (VVR) TCP connection. The vxiod kernel thread reports the following stack trace: mt_pause_trigger() wait_for_lock() spinlock_usav() kfree() t_kfree() kmsg_sys_free() nmcom_connect() vol_rp_connect() vol_rp_connect_start() voliod_iohandle() voliod_loop() DESCRIPTION: When multiple TCP connections are configured, some of these connections are still in the active state and the connection request process function attempts to free a memory block. If this block is already freed by a previous connection, then the kernel thread may become unresponsive on a HPUX platform. RESOLUTION: The code is modified to resolve the issue of freeing a memory block which is already freed by another connection. * INCIDENT NO:3417672 TRACKING ID:3287880 SYMPTOM: In a clustered environment, if a node doesn't have storage connectivity to clone disks, then the vxconfigd on the node may dump core during the clone disk group import. The stack trace is as follows: chosen_rlist_delete() dg_import_complete_clone_tagname_update() req_dg_import() vold_process_request() DESCRIPTION: In a clustered environment, if a node doesn't have storage connectivity to clone disks due to improper cleanup handling in clone database, then the vxconfigd on the node may dump core during the clone disk group import. RESOLUTION: The code has been modified to properly cleanup clone database. * INCIDENT NO:3419831 TRACKING ID:3435475 SYMPTOM: The vxcdsconvert(1M) conversion process gets aborted for a thin LUN formatted as a simple disk with Extensible Firmware Interface (EFI) format with the following error: VxVM vxcdsconvert ERROR V-5-2-2767 : Unable to add the disk back to the disk group DESCRIPTION: The vxcdsconvert(1M) command evacuates sub-disks to some other disks within DG before initializing non-CDS disks with CDS format. Subdisks residing on thin reclaimable disks are marked pending to be reclaimed after having been evacuated to other disks. When such a disk is removed from DG, since its subdisk is pending reclamation, subdisk records still point to the same disk. For EFI formatted disks the public region length of non-CDS disk is greater than that of CDS disks. When this disk is converted from non-CDS format to CDS format and is added back to DG, vxconfigd thinks the subdisk residing on the disk lies beyond the public region space of the converted CDS disk and it fails the conversion. RESOLUTION: The code is modified to not check sub-disk boundaries for sub-disks pending RECLAMATION as these records will be deleted eventually and are stale. * INCIDENT NO:3423613 TRACKING ID:3399131 SYMPTOM: The following command fails with an error for a path managed by Third Party Driver (TPD) which co-exists with DMP. # vxdmpadm -f disable path= VxVM vxdmpadm ERROR V-5-1-11771 Operation not support DESCRIPTION: The Third Party Drivers manage the devices with or without the co-existence of Dynamic Multi Path driver. Disabling the paths managed by a third party driver which does not co-exist with DMP is not supported. But due to bug in the code, disabling the paths managed by a third party driver which co-exists with DMP also fails. The same flags are set for all third party driver devices. RESOLUTION: The code has been modified to block this command only for the third party drivers which cannot co-exist with DMP. * INCIDENT NO:3423644 TRACKING ID:3416622 SYMPTOM: The hotArelocation feature fails with the following message for a corrupted disk in the Cluster Volume Manager (CVM) environment due to the disk connectivity check. vxprint output after hot relocation failure for corrupted disk ams_wms0_15. Disk group: testdg TY NAME ASSOC KSTATE LENGTH PLOFFS STATE TUTIL0 PUTIL0 dg testdg testdg - - - - - - dm ams_wms0_14 ams_wms0_14 - 4043008 - - - - dm ams_wms0_15 - - - - NODEVICE - - dm ams_wms0_16 ams_wms0_16 - 4043008 - SPARE - - v vol1 fsgen ENABLED 4042752 - ACTIVE - - pl vol1-01 vol1 ENABLED 4042752 - ACTIVE - - sd ams_wms0_14-01 vol1-01 ENABLED 4042752 0 - - - pl vol1-02 vol1 DISABLED 4042752 - NODEVICE - - sd ams_wms0_15-01 vol1-02 DISABLED 4042752 0 NODEVICE - - No hot relocation is done even a spare disk with sufficient space is available in locally connected disk in the CVM environment. DESCRIPTION: The hot-relocation feature fails due to connectivity check that wrongly assumes the disk being relocated is remotely connected. Additionally the disk is not selected for relocation due to a mismatch on Unique Disk ID (UDID) during the check. RESOLUTION: The code is modified to introduce an avoidance fix in case of corrupted source disk irrespective of whether it is local or remote. However, the hot-relocation feature functions only on locally connected target disk at master node. * INCIDENT NO:3424795 TRACKING ID:3424798 SYMPTOM: Veritas Volume Manager (VxVM) mirror attach operations (e.g., plex attach, vxassist mirror, and third-mirror break-off snapshot resynchronization) may take longer time under heavy application I/O load. The vxtask list command shows tasks are in the 'auto-throttled (waiting)' state for a long time. DESCRIPTION: With the AdminIO de-prioritization feature, VxVM administrative I/O's (e.g. plex attach, vxassist mirror, and third-mirror break-off snapshot resynchronization) are de-prioritized under heavy application I/O load, but this can lead to very slow progress of these operations. RESOLUTION: The code is modified to disable the AdminIO de-prioritization feature. * INCIDENT NO:3427124 TRACKING ID:3435225 SYMPTOM: In a given CVR setup, rebooting the master node causes one of the slaves to panic with following stack: pse_sleep_thread vol_rwsleep_rdlock vol_kmsg_send_common vol_kmsg_send_prealloc cvm_obj_sendmsg_prealloc vol_rv_async_done volkcontext_process voldiskiodone DESCRIPTION: The issue is triggered by one of the code paths sleeping in interrupt context. RESOLUTION: The code is modified so that sleep is not invoked in interrupt context. * INCIDENT NO:3427480 TRACKING ID:3163549 SYMPTOM: If slave node tries to join the cluster with master node that has the same set of disks which are missing on master, the vxconfigd(1M) daemon may hang on master with following stack: kernel_vsyscall() ioctl() kernel_ioctl() kernel_write_disk() dasup_write() priv_update_header() priv_update_toc() priv_check() dasup_validate() dapriv_validate() auto_validate() dg_kernel_dm_changes() dg_kernel_changes() client_trans_start() dg_trans_start() dg_check_kernel() vold_check_signal() request_loop() main() DESCRIPTION: Since local connection doesnAt exist on the master, priv region I/O fails. The I/O is retried by ioshipping as other nodes have connection to this disk and a signal to vold is generated. The remote I/O succeeds and vold level transaction goes through fine. Then vold picks up the signal and initiates an internal transaction that is successfully completed on vold level. The operation results in initiating transaction on kernel level. The disk doesnAt change the disk group, but we end up picking vol_nulldg for this disk even though it is part of shared dg. Consequently it ends up switching to disk I/O policy even though it is not enabled. Thus the system keeps switching between remote and local policy continuously. RESOLUTION: The code is changed to pick appropriate dg. * INCIDENT NO:3430318 TRACKING ID:3435041 SYMPTOM: When a system has the existing non-root ZFS zpools on the OS devices, the "vxdmpadm settune dmp_native_support=on" CLI fails with the following script errors: awk: syntax error near line 1 awk: bailing out near line 1 awk: syntax error near line 1 awk: illegal statement near line 1 pool: testzpool id: 6877344227343180008 state: ONLINE action: The pool can be imported using its name or numeric identifier. config: testzpool ONLINE c5t21210002AC0005A4d25s2 ONLINE VxVM vxdmpadm ERROR V-5-1-15690 Operation failed for one or more zpools VxVM vxdmpadm ERROR V-5-1-15686 The following zpool(s) could not be migrated as the operation failed forthe underlying dmpnodes - testzpool DESCRIPTION: Due to an issue in the script related to dmp_native_support tunable error is reported and operation is aborted. Thus, zpools are not migrated to the DMP devices. RESOLUTION: The code is modified to fix the errors in the dmp_native_support tunable related script. * INCIDENT NO:3434079 TRACKING ID:3385753 SYMPTOM: Replication to the Disaster Recovery (DR) site hangs even though Replication links (Rlinks) are in the connected state. DESCRIPTION: Based on Network conditions under User Datagram Protocol (UDP), Symantec Replication Option (Veritas Volume Replicator (VVR) has its own flow control mechanism to control the flow/amount of data to be sent over the network. Under error prone network conditions which cause timeouts, VVR's flow control values become invalid, resulting in replication hang. RESOLUTION: The code is modified to ensure valid values for the flow control even under error prone network conditions. * INCIDENT NO:3434080 TRACKING ID:3415188 SYMPTOM: Filesystem/IO hangs during data replication with Symantec Replication Option (VVR) with the following stack trace: schedule() volsync_wait() volopobjenter() vol_object_ioctl() voliod_ioctl() volsioctl_real() vols_ioctl() vols_compat_ioctl() compat_sys_ioctl() sysenter_dispatch() DESCRIPTION: One of the structures of Symantec Replication Option that are associated with Storage Replicator Log (SRL) can become invalid because of improper locking mechanism in code that leads to IO/file system hang. RESOLUTION: The code is changed to have appropriate locks to protect the code. * INCIDENT NO:3434189 TRACKING ID:3247040 SYMPTOM: The execution of the "vxdisk scandisks" command enables the PP enclosure which was previously disabled using the "vxdmpadm disable enclosure=" command. DESCRIPTION: During device discovery due to a wrong check for PP enclosure, Dynamic Multi-pathing (DMP) destroys the old PP enclosure from device discovery layer (DDL) database and adds it as a new enclosure. This process removes all the old flags that are set on PP enclosure, and then DMP treats the enclosure as enabled due to the absence of the required flags. RESOLUTION: The code is modified to keep PP enclosure in the DDL database during device discovery, and the existing flags on the paths of the PP enclosure will not be reset. * INCIDENT NO:3435000 TRACKING ID:3162987 SYMPTOM: When the disk is disconnected from the node due a cable pull or using the zone remove sort of operations, the disk has a UDID_MISMATCH flag in the vxdisk list output. DESCRIPTION: To verify if the disk has a mismatch udid DDL disk entry and private region udid, both the udids are retrieved and compared. In case the disk does not have connectivity, the udids mismatch value is set for the INVALID_UDID string prior the udid mismatch check. RESOLUTION: The code is modified such that the udid mismatch condition check is not performed for the disks that don't have connectivity. * INCIDENT NO:3435008 TRACKING ID:2958983 SYMPTOM: Memory leak is observed during the reminor operation in the vxconfigd binary. DESCRIPTION: The reminor code path has a memory leak issue. And, this code path gets traversed when the auto reminor value is set to AOFFA. RESOLUTION: The code is modified to free the memory when the auto reminor value is set to AOFFA. * INCIDENT NO:3439102 TRACKING ID:3441356 SYMPTOM: Pre-check of the upgrade_start.sh script fails with the following error: ERROR "VxVM vxprint ERROR V-5-1-15324 Specify a disk group with -g " DESCRIPTION: Current logic in upgrade_start.sh expects "RESERVED_DG_BOOT" to be set to "nodg". This is valid only for releases older than 5.x. In the case of newer releases, "RESERVED_DG_BOOT" is set to either "bootdg" or . RESOLUTION: The code is modified to change the logic in the upgrade_start.sh script. * INCIDENT NO:3442602 TRACKING ID:3046560 SYMPTOM: The vradmin syncrvg command fails with the error: "VxVM VVR vxrsync ERROR V-5-52-2009 Could not open device [devicename] due to: DKIOCGVTOC ioctl to raw character volume failed". DESCRIPTION: The latest versions of the Solaris operating system do not support the DKIOCGVTOC ioctl, because it has been deprecated. Hence the vradmin syncrvg command fails as it uses this ioctl. RESOLUTION: The code has been modified to use alternate method to retrieve the required information. * INCIDENT NO:3446452 TRACKING ID:3090488 SYMPTOM: Memory leaks occur in the device discovery code path of Veritas Volume Manager (VxVM). DESCRIPTION: In the device discovery code path, at some instances, VxVM fails to free the buffer space used by it, causing the memory leaks. RESOLUTION: The code is modified to free the buffer space and avoid memory leaks in the device discovery code path. * INCIDENT NO:3461200 TRACKING ID:3163970 SYMPTOM: The "vxsnap -g syncstart " command hangs on the Veritas Volume Replicator (VVR) DR site with the following stack trace: cv_wait() delay_common() delay() vol_object_ioctl() voliod_ioctl() volsioctl_real() spec_ioctl() fop_ioctl() ioctl() syscall_trap32() DESCRIPTION: The vxsnap(1M) command internally calls the vxassist(1M) command, which is waiting for the open operation to be successful on a volume. When the operation fails because of a mismatch in the counter values causing the vxsnap(1M) command to become unresponsive. RESOLUTION: The code is modified such that the 'open on' operation does not fail on the volume. PATCH ID:6.0.300.200 * INCIDENT NO:3358311 TRACKING ID:3152769 SYMPTOM: In Solaris LDOM environment, when 1 I/O domain is down, DMP takes time on path failover. DESCRIPTION: DMP provides SCSI bypass framework, wherein a SCSI buffer is created and directly sent to HBA bypassing OS SCSI driver layer. In Solaris LDOM environments, when 1 I/O domain goes down, it takes time because validation causes multiple timeouts in LDOM virtual disk client driver layer. RESOLUTION: DMP no longer uses the SCSI bypass framework in LDOM environment. * INCIDENT NO:3358313 TRACKING ID:3194358 SYMPTOM: Continuous messages in syslog with EMC not-ready (NR) Logical units. Messages from syslog (/var/adm/messages): May 10 18:40:43 scsi: [ID 107833 kern.warning] WARNING: /pci@1e,600000/SUNW,jfca@2/fp@0,0/ssd@w5006048c5368e580,16c (ssd144): May 10 18:40:43 drive offline May 10 18:40:43 scsi: [ID 107833 kern.warning] WARNING: /pci@1e,600000/SUNW,jfca@2/fp@0,0/ssd@w5006048c536979a0,127 (ssd392): May 10 18:40:43 drive offline May 10 18:40:43 scsi: [ID 107833 kern.warning] WARNING: /pci@1e,600000/SUNW,jfca@2/fp@0,0/ssd@w5006048c5368e5a0,16b (ssd270): May 10 18:40:43 drive offline May 10 18:40:43 i/o error occurred (errno=0x5) on dmpnode 201/0x1c0 DESCRIPTION: VxVM tries to online the EMC not-ready (NR) logical units. As part of the disk online process, it tries to read the disk label from the logical unit. Because the logical unit is NR the I/O fails. The failure messages are displayed in the syslog file. RESOLUTION: The code is modified to skip the disk online for the EMC NR LUNs. * INCIDENT NO:3358342 TRACKING ID:2724067 SYMPTOM: Assume format(1) is run on a disk to change the label type from EFI to SMI prior to invoking vxdisksetup. This used to result in failure of vxdisksetup. DESCRIPTION: Prior to this enhancement, vxdisksetup used to initialize the disk with specified VxVM layout (e.g. simple, sliced or cdsdisk) using the pre-existing label type (e.g. smi or efi). We used to require users to change the label to the desired label type using format(1) prior to invoking vxdisksetup. When format(1) is run on a single physical path to change the label type from EFI to SMI, format (1) used to update the device special file on that single physical path only; this resulted in i/o failure on the remaining physical paths and thereby causing vxdisksetup to fail. RESOLUTION: vxdisksetup is enhanced such that label type can be specified along with VxVM disk layout. vxdisksetup invokes format(1) on a single physical path to change label type. Then, it invokes dd(1) on the remaining physical paths during the very early phase. In turn, dd(1) invokes open(2) on each of the remaining physical paths. open(2) updates device special files on each of those physical paths to reflect the latest label type. * INCIDENT NO:3358345 TRACKING ID:2091520 SYMPTOM: Customers cannot selectively disable VxVM configuration copies on the disks associated with a disk group. DESCRIPTION: An enhancement is required to enable customers to selectively disable VxVM configuration copies on disks associated with a disk group. RESOLUTION: The code is modified to provide a "keepmeta=skip" option to the vxdiskset(1M) command to allow a customer to selectively disable VxVM configuration copies on disks that are a part of the disk group. * INCIDENT NO:3358346 TRACKING ID:3353211 SYMPTOM: A. After EMC Symmetrix BCV (Business Continuance Volume) device switches to read-write mode, continuous vxdmp (Veritas Dynamic Multi Pathing) error messages flood syslog as shown below: NOTE VxVM vxdmp V-5-3-1061 dmp_restore_node: The path 18/0x2 has not yet aged - 299 NOTE VxVM vxdmp 0 dmp_tur_temp_pgr: open failed: error = 6 dev=0x24/0xD0 NOTE VxVM vxdmp V-5-3-1062 dmp_restore_node: Unstable path 18/0x230 will not be available for I/O until 300 seconds NOTE VxVM vxdmp V-5-3-1061 dmp_restore_node: The path 18/0x2 has not yet aged - 299 NOTE VxVM vxdmp V-5-0-0 [Error] i/o error occurred (errno=0x6) on dmpnode 36/0xD0 .. .. B. DMP metanode/path under DMP metanode gets disabled unexpectedly. DESCRIPTION: A. DMP caches the last discovery NDELAY open for the BCV dmpnode paths. BCV device switching to read-write mode is an array side operation. Typically in such cases, the system administrators are required to run the following command: 1. vxdisk rm OR In case of parallel backup jobs, 1. vxdisk offline 2. vxdisk online This causes DMP to close cached open, and during the next discovery, the device is opened in read-write mode. If the above steps are skipped then it causes the DMP device to go in state where one of the paths is in read-write mode and the others remain in NDELAY mode. If the above layers request for NORMAL open, then the DMP has the code to close NDELAY cached open and reopen in NORMAL mode. When the dmpnode is online, this happens only for one of the paths of dmpnode. B. DMP performs error analysis for paths on which IO has failed. In some cases, the SCSI probes are sent, failed with the return value/sense codes that are not handled by DMP. This causes the paths to get disabled. RESOLUTION: A. The code is modified for the DMP EMC ASL (Array Support Library) to handle case A for EMC Symmetrix arrays. B. The DMP code is modified to handle the SCSI conditions correctly for case B. * INCIDENT NO:3358347 TRACKING ID:3057554 SYMPTOM: The VxVM (Veritas Volume Manager) command "vxdiskunsetup -o shred=1" failed for EFI (Extensible Firmware Interface) disks on Solaris X86 system. DESCRIPTION: Disk shred fails on Solaris X86 system for EFI disks because the IO to the last sector fails. Solaris X86 OS has an issue where last sector of a LUN cannot be written. RESOLUTION: The code is modified to skip last sector while shredding, which makes shred pass with EFI disk. * INCIDENT NO:3358348 TRACKING ID:2665425 SYMPTOM: The vxdisk -px "attribute" list(1M) Command Line Interface (CLI) does not support some basic VxVM attributes, nor does it allow the user to specify multiple attributes in a specific sequence. The display layout was not presented in a readable or parsable manner. DESCRIPTION: The vxdisk -px "attribute" list(1M) CLI, which is useful for customizing the command output, does not support Some basic VxVM disk attributes. The display output is also not aligned by column or suitable for parsing by a utility. In addition, the CLI does not allow multiple attributes to be specified in a usable manner. RESOLUTION: Support for the following VxVM disk attributes has been added to the CLI: SETTINGS ALERTS INFO HOSTID DISK_TYPE FORMAT DA_INFO PRIV_OFF PRIV_LEN PUB_OFF PUB_LEN PRIV_UDID DG_NAME DGID DG_STATE DISKID DISK_TIMESTAMP STATE The CLI has been enhanced to support multiple attributes separated by a comma, and to align the display output by a column, separable by a comma for parsing. For example: # vxdisk -px ENCLOSURE_NAME,DG_NAME,LUN_SIZE,SETTINGS,state list DEVICE ENCLOSURE_NAME DG_NAME LUN_SIZE SETTINGS STATE sda disk - 143374650 - online sdb disk - 143374650 - online sdc storwizev70000 fencedg 10485760 thinrclm,coordinator online * INCIDENT NO:3358350 TRACKING ID:3189830 SYMPTOM: When you use the 'Mirror volumes on a disk' option of the vxdiskadm(1M) command for a root disk, you get following error- VxVM ERROR V-5-2-673 Mirroring of disk disk0 failed. Error: VxVM vxmirror ERROR V-5-2-6147 The mirror disk geometry does not match with the root disk. Replace disk with matching geometry of original boot disk. Or re-run this command with -f option. DESCRIPTION: When you use the 'Mirror volumes on a disk' option of the vxdiskadm(1M) command for a root disk, if the disk geometry does not match, then the command fails. Currently, the vxdiskadm(1M) command does not provide any way to specify the -f option for this operation. So, you have to manually execute the vxmirror command with the -f option. RESOLUTION: The code is modified to allow you to specify the force option (-f) through the vxdiskadm interface. * INCIDENT NO:3358351 TRACKING ID:3158320 SYMPTOM: VxVM (Veritas Volume Manager) command "vxdisk -px REPLICATED list (disk)" displays wrong output. DESCRIPTION: The "vxdisk -px REPLICATED list disk" command when executed, shows the same output as "vxdisk -px REPLICATED_TYPE list disk" and does not work as designed to show the values as "yes", "no" or "-". The command line parameter specified is parsed incorrectly and hence, the REPLICATED attribute is wrongly dealt as REPLICATED_TYPE. RESOLUTION: The code is modified to display the "REPLICATED" attribute correctly. * INCIDENT NO:3358352 TRACKING ID:3326964 SYMPTOM: VxVM () hangs in CVM environment in presence of Fast Mirror Resync FMR)/Flashsnap operations with the following stack trace: voldco_cvm_serialize() voldco_serialize() voldco_handle_dco_error() voldco_mapor_sio_done() voliod_iohandle() voliod_loop() child_rip() voliod_loop() child_rip() DESCRIPTION: During split brain testing in presence of FMR activities, when errors occur on the Data change object (DCO), the DCO error handling code sets up a flag due to which the same error gets set again in its handler. Consequently the VxVM Staged I/O (SIO) loop around the same code and causes the hang. RESOLUTION: The code is changed to appropriately handle the scenario. * INCIDENT NO:3358353 TRACKING ID:3271315 SYMPTOM: The vxdiskunsetup command with the shred option fails to shred sliced or simple disks on Solaris X86 platform. Errors of the following format can be seen: VxVM vxdisk ERROR V-5-1-16576 disk_shred: Shred failed one or more writes VxVM vxdisk ERROR V-5-1-16658 disk_shred: Shred wrote 1 pages, of which 1 encountered errors DESCRIPTION: The error occurs as the check for 'disk size' in the vxdiskunsetup command returns incorrect value for sliced or simple disks on Solaris x86. RESOLUTION: The code is modified to correctly check for disk size of simple and sliced disks. * INCIDENT NO:3358354 TRACKING ID:3332796 SYMPTOM: The following message is seen while initializing any EFI disk, though the disk was not used previously as ASM disk. "VxVM vxisasm INFO v-5-1-0 seeking block #... " DESCRIPTION: As a part of disk initialization for every EFI disk, VxVM will check if an EFI disk has ASM label. "VxVM vxisasm INFO v-5-1-0 seeking block #..." is printed unconditionally and this is unreasonable. RESOLUTION: Code changes have been made to not display the message. * INCIDENT NO:3358367 TRACKING ID:3230148 SYMPTOM: CVM hangs during split brain testing with the following stack trace: volmv_cvm_serialize() vol_mv_wrback_done() voliod_iohandle() voliod_loop() kernel_thread_helper() DESCRIPTION: During split brain testing in presence of Fast Mirror Resync (FMR) activities, a read-writeback operation or Staged I/O (SIO) can be issued as part of Data change object (DCO) chunk update. The SIO tries to read from plex1, and when the read operation fails, it reads from other available plex(es) and performs a write on all other plexes. As the other plex has already failed, write operation also fails and gets retried with IOSHIPPING, which also fails due to unavailable plex from other nodes as well (because of split brain testing). As remote plex is unavailable, write fails again and serialization is called again on the SIO during which system hangs due to mismatch in active and serial counts. RESOLUTION: The code is changed to take care of active or serial counts when the SIOs are restarted with IOSHIPPING. * INCIDENT NO:3358368 TRACKING ID:3249264 SYMPTOM: Veritas Volume Manager (VxVM) thin disk reclamation functionality causes disk label loss, private region corruption and data corruption. Following DESCRIPTION: The partition offset is not taken into consideration when VxVM calls array specific reclamation interface. Incorrect data blocks, which may have disk label and VxVM private/public region content, are reclaimed. RESOLUTION: Code changes have been made to take partition offset into consideration when calling array specific reclamation interface. * INCIDENT NO:3358369 TRACKING ID:3250369 SYMPTOM: Execution of vxdisk scandisks command causes endless I/O error messages in syslog. DESCRIPTION: Execution of the command triggers a re-online of all the disks, which involves reading of the private region from all the disks. Failure of these reading of I/Os generate error events, which are notified to all the clients waiting on "vxnotify". One of such clients is the "vxattachd" daemon. The daemon initiates a "vxdisk scandisks", when the number of events are more than 256. Therefore, the "vxattachd" initiates another cycle of above activity resulting in endless events. RESOLUTION: The code is modified to change the count which triggers vxattachd daemon from '256' to '1024'. Also, the DMP events are sub categorized further as per requirement of vxattachd daemon. * INCIDENT NO:3358370 TRACKING ID:2921147 SYMPTOM: The udid_mismatch flag is absent on a clone disk when source disk is unavailable. The 'vxdisk list' command does not show the udid_mismatch flag on a disk. This happens even when the 'vxdisk -o udid list' or 'vxdisk -v list diskname | grep udid' commands show different Device Discovery Layer (DDL) generated and private region unique identifier for disks (UDIDs). DESCRIPTION: When DDL generates the UDID and private region UDID of a disk do not match, Veritas Volume Manager (VxVM) sets the udid_mismatch flag on the disk. This flag is used to detect a disk as clone, which is marked with the clone-disk flag. The vxdisk (1M) utility is used to suppress the display of the udid_mismatch flag if the source Logical Unit Number (LUN) is unavailable on the same host. RESOLUTION: The vxdisk (1M) utility is modified to display the udid_mismatch flag, if it is set on the disk. Display of this flag is no longer suppressed, even when source LUN is unavailable on same host. * INCIDENT NO:3358371 TRACKING ID:3125711 SYMPTOM: When the secondary node is restarted and the reclaim operation is going on the primary node, the system panics with the following stack: do_page_fault() page_fault() dmp_reclaim_device() dmp_reclaim_storage() gendmpioctl() dmpioctl() vol_dmp_ktok_ioctl() voldisk_reclaim_region() vol_reclaim_disk() vol_subdisksio_start() voliod_iohandle() voliod_loop() DESCRIPTION: In the Veritas Volume Replicator (VVR)environment, there is a corner case with reclaim operation on the secondary node. The reclaim length is calculated incorrectly and thus the memory allocation failure. This resulted in the system panic. RESOLUTION: Modify the condition to calculate reclaim length correctly. * INCIDENT NO:3358372 TRACKING ID:3156295 SYMPTOM: When Dynamic Multi-pathing (DMP) native support is enabled for Oracle Automatic Storage Management (ASM) devices, the permission and ownership of /dev/raw/raw# devices goes wrong after reboot. DESCRIPTION: When VxVM binds the Dynamic Multi-pathing (DMP) devices to raw devices during a restart, it invokes the 'raw' command to create raw devices and then tries to set the permission and ownership of the raw devices immediately after invoking the 'raw' command asynchronously. However, in some cases, VxVM doesn't create the raw devices at the time when it tries to set the permission and ownership. In that case, VxVM eventually creates the raw device without correct permission and ownership. RESOLUTION: The code is modified to set the permission and ownership of the raw devices when DMP gets the OS event which implies the raw device is created. It ensures that VxVM sets up permission and ownership of the raw devices correctly. * INCIDENT NO:3358373 TRACKING ID:3218013 SYMPTOM: Dynamic Reconfiguration (DR) Tool does not delete the stale OS (Operating System) device handles. DESCRIPTION: DR Tool does not check for the stale OS device handles during the Logical Unit Number (LUN) removal operation. As a result, there are stale OS device handles even after LUNs are successfully removed. RESOLUTION: Code has been changed to check and delete stale OS device handles. * INCIDENT NO:3358374 TRACKING ID:3237503 SYMPTOM: System hangs after creating space-optimized snapshot with large size cache volume. DESCRIPTION: For all the changes written to the cache volume after the snapshot volume is created, a translation map with B+tree data structure is used to accelerate search/insert/delete operations. During an attempt to insert a node to the tree, type casting of page offset to 'unsigned int' causes value truncation for the offset beyond maximum 32 bit integer. The value truncation corrupts the B+tree data structure, resulting in SIO (VxVM Staged IO) hang. RESOLUTION: The code is modified to remove all type casting to 'unsigned int' in cache volume code. * INCIDENT NO:3358377 TRACKING ID:3199398 SYMPTOM: Output of the command "vxdmpadm pgrrereg" depends on the order of DMP (Dynamic MultiPathing) node list where the terminal output depends on the last LUN (DMP node). 1. Terminal message when PGR (Persistent Group Reservation) re-registration is succeeded on the last LUN # vxdmpadm pgrrereg VxVM vxdmpadm INFO V-5-1-0 DMP PGR re-registration done for ALL PGR enabled dmpnodes. 2. Terminal message when PGR re-registration is failed on the last LUN # vxdmpadm pgrrereg vxdmpadm: Permission denied DESCRIPTION: "vxdmpadm pgrrereg" command has been introduced to support the facility to move a guest OS on one physical node to another node. In Solaris LDOM environment, the feature is called "Live Migration". When a customer is using I/O fencing feature and a guest OS is moved to another physical node, I/O will not be succeeded in the guest OS after the physical node migration because each DMP nodes of the guest OS doesn't have a valid SCSI-3 PGR key as the physical HBA is changed. This command will help on re-registering the valid PGR keys for new physical nodes, however its command output is depending on the last LUN (DMP node). RESOLUTION: Code changes are done to log the re-registration failures in System log file. Terminal output now instructs to look into the system log when an error is seen on a LUN. * INCIDENT NO:3358379 TRACKING ID:1783763 SYMPTOM: In a VVR environment, the vxconfigd(1M) daemon may hang during a configuration change operation. The following stack trace is observed: delay vol_rv_transaction_prepare vol_commit_iolock_objects vol_ktrans_commit volconfig_ioctl volsioctl_real volsioctl vols_ioctl ... DESCRIPTION: Incorrect serialization primitives are used. This results in the vxconfigd(1M) daemon to hang. RESOLUTION: The code is modified to use the correct serialization primitives. * INCIDENT NO:3358380 TRACKING ID:2152830 SYMPTOM: A diskgroup (DG) import fails with a non-descriptive error message when multiple copies (clones) of the same device exist and the original devices are either offline or not available. For example: # vxdg import mydg VxVM vxdg ERROR V-5-1-10978 Disk group mydg: import failed: No valid disk found containing disk group DESCRIPTION: If the original devices are offline or unavailable, the vxdg(1M) command picks up cloned disks for import.DG import fails unless the clones are tagged and the tag is specified during the DG import. The import failure is expected, but the error message is non-descriptive and does not specify the corrective action to be taken by the user. RESOLUTION: The code is modified to give the correct error message when duplicate clones exist during import. Also, details of the duplicate clones are reported in the system log. * INCIDENT NO:3358381 TRACKING ID:2859470 SYMPTOM: The EMC SRDF-R2 disk may go in error state when the Extensible Firmware Interface (EFI) label is created on the R1 disk. For example: R1 site # vxdisk -eo alldgs list | grep -i srdf emc0_008c auto:cdsdisk emc0_008c SRDFdg online c1t5006048C5368E580d266 srdf-r1 R2 site # vxdisk -eo alldgs list | grep -i srdf emc1_0072 auto - - error c1t5006048C536979A0d65 srdf-r2 DESCRIPTION: Since R2 disks are in write protected mode, the default open() call made for the read-write mode fails for the R2 disks, and the disk is marked as invalid. RESOLUTION: The code is modified to change Dynamic Multi-Pathing (DMP) to be able to read the EFI label even on a write-protected SRDF-R2 disk. * INCIDENT NO:3358382 TRACKING ID:3086627 SYMPTOM: The "vxdisk -o thin,fssize list" command fails with the following error: VxVM vxdisk ERROR V-5-1-16282 Cannot retrieve stats: Bad address DESCRIPTION: This issue happens when system has more than 200 Logical Unit Numbers (LUNs). VxVM reads the file system statistical information for each LUN to generate file system size data. After reading the information for first 200 LUNs, buffer is not reset correctly. So, subsequent access to buffer address generates this error. RESOLUTION: The code has been changed to properly reset buffer address. * INCIDENT NO:3358404 TRACKING ID:3021970 SYMPTOM: A secondary node panics due to NULL pointer dereference when the system frees an interlock. The stack trace looks like the following: page_fault volsio_ilock_free vol_rv_inactivate_wsio vol_rv_restart_wsio vol_rv_serialise_sec_logging vol_rv_serialize vol_rv_errorhandler_start voliod_iohandle voliod_loop ... DESCRIPTION: The panic occurs if there is a node crash or anode reconfiguration on the primary node. The secondary node does not correctly handle the updates for the period of crash and results in a panic. RESOLUTION: The code is modified to properly handle the freeing of an interlock for node crash or reconfigurations on the primary side. * INCIDENT NO:3358405 TRACKING ID:3026977 SYMPTOM: The Dynamic Reconfiguration (DR) option with vxdiskadm(1M) removes Logical Unit Numbers (LUNs) which even are not in the Failing or Unusable state. DESCRIPTION: The grep with -i option in the script of SunOS.pm retrieves the information of failing or unusable disks. As this option yields multiple entries including unrequired disks, it leads to removal of LUNs even which are not in the failing or unusable state. RESOLUTION: The code is modified to retrieve the correct information of the failing or unusable disks. * INCIDENT NO:3358407 TRACKING ID:3107699 SYMPTOM: VxDMP causes system panic after a shutdown or reboot and displays the following stack trace: mutex_enter() volinfo_ioct() volsioctl_real() cdev_ioctl() dmp_signal_vold() dmp_throttle_paths() dmp_process_stats() dmp_daemons_loop() thread_start() OR panicsys() vpanic_common() panic+0x1c() mutex_enter() cdev_ioctl() dmp_signal_vold() dmp_check_path_state() dmp_restore_callback() dmp_process_scsireq() dmp_daemons() thread_start() DESCRIPTION: In a special scenario of system shutdown/reboot, the DMP (Dynamic MultiPathing) I/O statistic daemon tries to call the ioctl functions in VXIO module which is being unloaded and this causes system panic. RESOLUTION: The code is modified to stop the DMP I/O statistic daemon before system shutdown/reboot. Also added a code chage to avoid to other probe to vxio devices during shutdown. * INCIDENT NO:3358408 TRACKING ID:3115206 SYMPTOM: When ZFS root and the 'dmp_native_support' tunable are enabled, system panics with the following stack trace: bp_mapout() vdev_disk_io_intr() gendmpiodone() sd_return_command() sdintr() xsvhba_complete_cmd_and_callback() xsvhba_process_rsp() xsvhba_process_dqp_msg() xsvhba_process_data_buffers() xsvhba_data_recv_handler() xstl_chan_recv_handler() taskq_thread() thread_start() DESCRIPTION: When the I/O process is completed with error flag, the operation system might access the "b_shadow" field in the return buffer, which DMP doesn't return. As a result, system panics due to the NULL pointer. RESOLUTION: The code is modified to return the correct value of the "b_shadow" field to the operating system. * INCIDENT NO:3358414 TRACKING ID:3139983 SYMPTOM: Failed I/Os from SCSI are retried only on very few paths to a LUN instead of utilizing all the available paths. Sometimes this can cause multiple I/O retrials without success. This can cause DMP to send I/O failures to the application bounded by the recoveryoption tunable. The following messages are displayed in the console log: [..] Mon Apr xx 04:18:01.885: I/O analysis done as DMP_PATH_OKAY on Path belonging to Dmpnode Mon Apr xx 04:18:01.885: I/O error occurred (errno=0x0) on Dmpnode [..] DESCRIPTION: When I/O failure is returned to DMP with a retry error from SCSI, DMP retries that I/O on another path. However, it fails to choose the path that has the higher probability of successfully handling the I/O. RESOLUTION: The code is modified to implement the intelligence of choosing appropriate paths that can successfully process the I/Os during retrials. * INCIDENT NO:3358416 TRACKING ID:3312162 SYMPTOM: Data corruption may occur on Secondary Symantec Volume Replicator (VVR) Disaster Recovery (DR) Site with the following signsi1/4 1) The vradmin verifydata command output reports data differences even though replication is up-to-date. 2) Secondary site may require a Full Fsck operations after the Migrate or Takeover Operations. 3) Error messages may be displayed. For example: msgcnt 21 mesg 017: V-2-17: vx_dirlook - /dev/vx/dsk// file system inode marked bad incore 4) Silent corruption may occur without any visible errors. DESCRIPTION: With Secondary Logging enabled, VVR writes replicated data on DR site on to its Storage Replicator Log (SRL) first, and later applied to the corresponding data volumes. When VVR flushes the write operations from SRL on to the data volumes, data corruption may occur, provided all the following conditions occur together: aC/ Multiple write operations for the same data block occur in a short time. For example, when VVR flushes the given set of SRL writes on to its data volumes. aC/ Based on relative timing, VVR grants the locks to perform the write operations on the same data block out of order. As a result, VVR applies the write operations out of order. RESOLUTION: The code is modified to protect the write order fidelity in strict order by ensuring VVR grants locks in its strict order. * INCIDENT NO:3358417 TRACKING ID:3325122 SYMPTOM: In a Clustered Volume Replicator (CVR) environment, when you create stripe-mirror volumes with logtype=dcm, creation may fail with the following error message: VxVM vxplex ERROR V-5-1-10128 Unexpected kernel error in configuration update DESCRIPTION: In layered volumes, Data Change Map (DCM) plex is attached to the storage volumes, rather than attaching to the top level volume. CVR configuration didn't handle that case correctly. RESOLUTION: The code is modified to handle the DCM plex placement correctly in the case of layered volumes. * INCIDENT NO:3358418 TRACKING ID:3283525 SYMPTOM: Stop and start of the data volume (with an associated DCO volume) results in a VxVM configuration daemon vxconfigd(1m) hang with the following stack trace. The data volume has undergone vxresize earlier. volsync_wait() volsiowait() volpvsiowait() voldco_get_accumulator() voldco_acm_pagein() voldco_write_pervol_maps_instant() voldco_write_pervol_maps() volfmr_copymaps_instant() vol_mv_precommit() vol_commit_iolock_objects() vol_ktrans_commit() volconfig_ioctl() volsioctl_real() vols_ioctl() vols_compat_ioctl() compat_sys_ioctl() sysenter_dispatch() DESCRIPTION: In the VxVM code, the Data Change Object (DCO) Table of Content (TOC) entry is not marked with an appropriate flag which prevents the in-core new map size to be flushed to the disk. This leads to corruption. A subsequent stop and start of the volume reads the incorrect TOC from disk detecting the corruption and results in VxVM configuration daemon vxconfigd(1M) daemon hang. RESOLUTION: The code is modified to mark the DCO TOC entry with an appropriate flag which ensures that the in-core data is flushed to disk to prevent the corruption and the subsequent VxVM configuration daemon vxconfigd(1m) hang. Also, a fix is made to ensure that the precommit fails if the paging module growth fails. * INCIDENT NO:3358420 TRACKING ID:3236773 SYMPTOM: "vxdmpadm getattr enclosure failovermode" generates multiple "vxdmp V-5-3-0 dmp_indirect_ioctl: Ioctl Failed" error messages in the system log if the enclosure is configured as EMC ALUA. DESCRIPTION: EMC disk array with ALUA mode only supports "implicit" type of failover-mode. Moreover such disk array doesn't support the set or get failover-mode operations. Then any set or get attempts for the failover-mode attribute generates "Ioctl Failed" error messages. RESOLUTION: The code is modified to not log such error messages during setting or getting failover-mode for EMC ALUA hardware configurations. * INCIDENT NO:3358423 TRACKING ID:3194305 SYMPTOM: In the Veritas Volume Replicator (VVR) environment, replication status goes in a paused state since the vxstart_vvr command does not start the vxnetd daemon automatically on the secondary side. vradmin -g vvrdg repstatus vvrvg Replicated Data Set: vvrvg Primary: Host name: Host IP RVG name: vvrvg DG name: vvrdg RVG state: enabled for I/O Data volumes: 1 VSets: 0 SRL name: srlvol SRL size: 5.00 G Total secondaries: 1 Secondary: Host name: Host IP RVG name: vvrvg DG name: vvrdg Data status: consistent, up-to-date Replication status: paused due to network disconnection Current mode: asynchronous Logging to: SRL Timestamp Information: behind by 0h 0m 0s DESCRIPTION: The vxnetd daemon stops on the secondary side, as a result, the replication status pauses on the primary side. The vxnetd daemon needs to start gracefully on the secondary for the replication to be in proper state. RESOLUTION: The code is modified to implement internal retry-able mechanism for starting vxnetd. * INCIDENT NO:3358429 TRACKING ID:3300418 SYMPTOM: VxVM volume operations on shared volumes cause unnecessary read I/Os on the disks that have both configuration copy and log copy disabled on slaves. DESCRIPTION: The unnecessary disk read I/Os are generated on slaves when VxVM is refreshing the private region information into memory during VxVM transaction. In fact, there is no need to refresh the private region information if the disk already has disabled the configuration copy and log copy. RESOLUTION: The code has been changed to skip the refreshing if both configuration copy and log copy are already disabled on master and slaves. * INCIDENT NO:3358430 TRACKING ID:3258276 SYMPTOM: The system panics with the following stack when the Dynamic Multi-Pathing (DMP) cache open parameter is enabled: panicsys() vpanic_common() cmn_err() mod_rele_dev_by_major() ddi_rele_driver() dmp_dev_close() gendmpclose() dev_close() dmp_dev_close() dmp_indirect_ioctl() gendmpioctl() dmpioctl() spec_ioctl() fop_ioctl() ioctl() syscall_trap32() DESCRIPTION: There is an overflow of a particular count in DMP which causes the solid state disk (SSD) driver's total open count to overflow, and that leads to the system panic. RESOLUTION: The code is modified to avoid the overflow of that particular count in DMP. * INCIDENT NO:3358433 TRACKING ID:3301470 SYMPTOM: In a CVR environment, a recovery on the primary side causes all the nodes to panic with the following stack: trap ktl0 search_vxvm_mem voliomem_range_iter vol_ru_alloc_buffer_start voliod_iohandle voliod_loop DESCRIPTION: Recovery tries to do a zero size readback from the Storage Replicator Log (SRL), which results in a panic. RESOLUTION: The code is modified to handle the corner case which causes a zero sized readback. * INCIDENT NO:3362234 TRACKING ID:2994976 SYMPTOM: System panics during mirror break-off snapshot creation or plex detach operation with the following stack trace: vol_mv_pldet_callback() vol_klog_start() voliod_iohandle() voliod_loop() DESCRIPTION: VxVM performs some metadata update operations in-core as well as on Data Change Object (DCO) of volume during creation of mirror break-off snapshot or detaching plex due to I/O error. The panic occurs due to incorrect access in-core metadata fields. This issue is observed when volume has DCO configured and it is mounted with VxFS (veritas file system) file system. RESOLUTION: The code is modified so that the in-core metadata fields are accessed properly during plex detach and snapshot creation. * INCIDENT NO:3365296 TRACKING ID:2824977 SYMPTOM: The CLI "vxdmpadm setattr enclosure failovermode" which is meant for ALUA arrays fails with an error on certain arrays without providing an appropriate reason for the failure. DESCRIPTION: The CLI "vxdmpadm setattr enclosure failovermode" which is meant for AULA arrays fails on certain arrays without an appropriate reason, because, the failover mode attribute for an ALUA array can be set only if the array provides such a facility which is indicated by response to the SCSI mode sense command. RESOLUTION: The code is modified to check for the SCSI mode sense command response to check if the array supports changing the failover mode attribute and reports an appropriate message if such a facility is not available for the array. * INCIDENT NO:3366688 TRACKING ID:2957645 SYMPTOM: When the vxconfigd daemon/command is restarted, the terminal gets flooded with error messages as shown: VxVM INFO V-5-2-16543 connresp: new client ID allocation failed for cvm nodeid * with error *. DESCRIPTION: When the vxconfigd daemon is restarted, it fails to get a client ID. There is no need to print error message at the default level. But in spite of this, the terminal gets flooded with error messages. RESOLUTION: The code is modified to print error messages only at debug level. * INCIDENT NO:3366703 TRACKING ID:3056311 SYMPTOM: Following problems can be seen on disks initialized with 5.1SP1 listener and which are being used for older releases like 4.1, 5.0, 5.0.1: 1. Creation of a volume failed on a disk indicating in-sufficient space available. 2. Data corruption seen. CDS backup label signature seen within PUBLIC region data. 3. Disks greater than 1TB in size will appear "online invalid" after on older releases. DESCRIPTION: VxVM listener can be used to initialize Boot disks and Data disks which can be used with older VxVM releases. Eg: 5.1SP1 Listener can be used to initialize disks which can be used with all previous VxVM releases like 5.0.1, 5.0, 4.1 etc. With 5.1SP1 onwards VxVM always uses Fabricated geometry while initializing disk with CDS format. Older releases like 4.1, 5.0, 5.0.1 use Raw geometry. These releases do not honor LABEL geometry. Hence, if a disk was initialized through 5.1SP1 listener, disk would be stamped with Fabricated geometry. When such a disk was used with older VxVM releases like 5.0.1, 5.0, 4.1, there can be a mismatch between the stamped geometry (Fabricated) and in-memory geometry (Raw). If on-disk cylinder size < in-memory cylinder size, we might encounter data corruption issues. To prevent any data corruption issues, we need to initialize disks through listener with older CDS format by using raw geometry. Also, if disk size is >= 1TB, 5.1SP1 VxVM will initialize the disk with CDS EFI format. Older releases like 4.1, 5.0, 5.0.1 etc. do not understand EFI format. RESOLUTION: From releases 5.1SP1 onwards, through HP-UX listener, disk to be used for older releases like 4.1, 5.0, 5.0.1 will be initialized with raw geometry. Also, initialization of disk through HPUX listener whose size is greater than 1TB will fail. * INCIDENT NO:3368236 TRACKING ID:3327842 SYMPTOM: In the Cluster Volume Replication (CVR) environment, with IO load on Primary and replication going on, if the user runs the vradmin resizevol(1M) command on Primary, often these operations terminate with error message "vradmin ERROR Lost connection to host". DESCRIPTION: There is a race condition on the Secondary between the transaction and messages delivered from Primary to Secondary. This results in repeated timeouts of transactions on the Secondary. The repeated transaction timeout resulted in session timeouts between Primary and Secondary vradmind. RESOLUTION: The code is modified to resolve the race condition. * INCIDENT NO:3371422 TRACKING ID:3087893 SYMPTOM: EMC PowerPath pseudo device mappings change with each reboot with VxVM (Veritas Volume Manager). DESCRIPTION: VxVM invokes PowerPath command 'powermt display unmanaged' to discover PowerPath unmanaged devices. This command destroys PowerPath devices mappings during early boot stage when PowerPath isn't fully up. RESOLUTION: EMC fixed the issue by introducing an environment variable MPAPI_EARLY_BOOT to powermt command. VxVM startup script set the variable to TRUE before calling powermt command. Powermt understands the early boot phase and does things differently. The variable is unset by VxVM after device discovery. * INCIDENT NO:3371753 TRACKING ID:3081410 SYMPTOM: When you remove the LUNs, DR tool reports "ERROR: No luns available for removal". DESCRIPTION: The DR tool does not check correct condition for the devices under the control of the third-party driver (TPD). Therefore no device gets listed for removal under certain cases. RESOLUTION: The code is modified to correctly identify the TPD-controlled devices in the DR tool. * INCIDENT NO:3373213 TRACKING ID:3373208 SYMPTOM: Veritas Dynamic Multipathing (DMP) wrongly sends the SCSI PR OUT command with Activate Persist Through Power Loss (APTPL) bit with value as A0A to array that supports the APTPL capabilities. DESCRIPTION: DMP correctly recognizes the APTPL bit settings and stores them in the database. DMP verifies this information before sending the SCSI PR OUT command so that the APTPL bit can be set appropriately in the command. But, due to issue in the code, DMP was not handling the node's device number properly. Due to which, the APTPL bit was getting incorrectly set in the SCSI PR OUT command. RESOLUTION: The code is modified to handle the node's device number properly in the DMP SCSI command code path. * INCIDENT NO:3374166 TRACKING ID:3325371 SYMPTOM: Panic occurs in the vol_multistepsio_read_source() function when VxVM's FastResync feature is used. The stack trace observed is as following: vol_multistepsio_read_source() vol_multistepsio_start() volkcontext_process() vol_rv_write2_start() voliod_iohandle() voliod_loop() kernel_thread() DESCRIPTION: When a volume is resized, Data Change Object (DCO) also needs to be resized. However, the old accumulator contents are not copied into the new accumulator. Thereby, the respective regions are marked as invalid. Subsequent I/O on these regions triggers the panic. RESOLUTION: The code is modified to appropriately copy the accumulator contents during the resize operation. * INCIDENT NO:3374735 TRACKING ID:3423316 SYMPTOM: The vxconfigd(1M) daemon observes a core dump while executing the vxdisk(1M) scandisks command along with the following stack: ncopy_tree_build() dg_balance_copies_helper() dg_balance_copies() dg_update() commit() dg_trans_commit() devintf_dm_reassoc_da() devintf_add_autoconfig_main() devintf_add_autoconfig() req_set_naming_scheme() request_loop() main() DESCRIPTION: As part of the vxdisk scandisks operation, device discovery process happens. During this process, unique device entries are populated in the device list. The core dump occurs due to improper freeing of device entry in the device list. RESOLUTION: The code is modified to appropriately handle the device list. * INCIDENT NO:3375424 TRACKING ID:3250450 SYMPTOM: While running the vxdisk(1M) command with the-o thin, fssize list option in the presence of a linked volume causes a system panic with the following stack vol_mv_lvsio_ilock() vol_mv_linkedvol_sio_start() volkcontext_process() volsiowait() vol_objioctl() vol_object_ioctl() voliod_ioctl() volsioctl_real() volsioctl() DESCRIPTION: The vxdisk(1M) command with the -o thin, fssize list option creates reclaim I/Os. All the I/Os performed on the linked volumes are stabilized. However, the reclaim I/Os should not be stabilized since it leads to a null pointer dereference. RESOLUTION: The code is modified to prevent stabilization of reclaim I/Os, whichprevents the null pointer dereference from occurring. * INCIDENT NO:3376953 TRACKING ID:3372724 SYMPTOM: When the user installs VxVM, the system panics with the following warnings: vxdmp: WARNING: VxVM vxdmp V-5-0-216 mod_install returned 6 vxspec V-5-0-0 vxspec: vxio not loaded. Aborting vxspec load DESCRIPTION: When the user installs VxVM, if the DMP module fails to load, the cleanup procedure fails to reset the statistics timer (which is set while loading). As a result, the timer dereferences a function pointer which is already unloaded. Thereby, the system panics. RESOLUTION: The code is modified to perform a complete cleanup when DMP fails to load. * INCIDENT NO:3377209 TRACKING ID:3377383 SYMPTOM: The vxconfigd crashes when a disk under DMP reports device failure. After this, the following error will be seen when a VxVM command is excuted:- "VxVM vxdisk ERROR V-5-1-684 IPC failure: Configuration daemon is not accessible" DESCRIPTION: If a disk fails and reports certain failure to DMP, then vxconfigd crashes because that error is not handled properly. RESOLUTION: The code is modified to properly handle the device failures reported by a failed disk under DMP. * INCIDENT NO:3381922 TRACKING ID:3235350 SYMPTOM: If an operational volume has version 20 data change object (DCO) attached, operations that lead to growing of volume can lead to system panic, such as 'vxresize' and 'vxassist growby/growto'. The panic stack looks like following: volpage_gethash+000074() volpage_getlist_internal+00075C() volpage_getlist+000060() voldco_get_regionstate+0001C8() volfmr_get_regionstate+00003C() voldco_checktargetsio_start+0000A8() voliod_iohandle+000050() DESCRIPTION: When any update is done on the grown region of a volume, it verifies the state of the same region on the snapshot to avoid inconsistency. If the snapshot volume is not grown, it tries to verify a non-existent region on the snapshot volume. The accessing memory area goes beyond allocation, which leads to system panic. RESOLUTION: The code is modified to identify such conditions where the verification is done on a non-existent region and handle it correctly. * INCIDENT NO:3383673 TRACKING ID:3147241 SYMPTOM: The pkgchk(1M) command on VRTSvxvm fails with following error message ERROR: /usr/lib/vxvm/bin/vxloadm pathname does not exist DESCRIPTION: The vxloadm(1M) binary is not part of VRTSvxvm package on Solaris x86. The install script of VRTSvxvm causes the system to register vxloadm(1M) binary to be part of VRTSvxvm package database, hence the issue. RESOLUTION: Changes have been made to prevent the issue on Solaris x86, so the vxloadm(1M) utility is not added to the package database. * INCIDENT NO:3387405 TRACKING ID:3019684 SYMPTOM: I/O hang is observed when SRL is about to overflow after the logowner switches from slave to master. Stack trace looks like the following: biowait default_physio volrdwr fop_write write syscall_trap32 DESCRIPTION: I/O hang is observed when SRL is about to overflow after the logowner switches from slave to master, because the master has a stale flag set with incorrect value related to the last SRL overflow. RESOLUTION: Reset the stale flag as to whether the logowner is master or slave. * INCIDENT NO:3387417 TRACKING ID:3107741 SYMPTOM: The vxrvg snapdestroy command fails with the "Transaction aborted waiting for io drain" error message, and vxconfigd(1M) hangs with the following stack trace: vol_commit_iowait_objects vol_commit_iolock_objects vol_ktrans_commit volconfig_ioctl volsioctl_real vols_ioctl vols_compat_ioctl compat_sys_ioctl ... DESCRIPTION: The SmartMove query of Veritas File System (VxFS) depends on some reads and writes. If some transaction in Veritas Volume Manager (VxVM) blocks the new reads and writes, then Application Programming Interface (API) hangs, waiting for the response. This results in a deadlock-like situation where SmartMove API is waiting for a transaction to complete yet the transaction is waiting for the SmartMove API, hence the hang. RESOLUTION: Do not allow transactions when the SmartMove API is used. PATCH ID:6.0.300.100 * INCIDENT NO:2892702 TRACKING ID:2567618 SYMPTOM: The VRTSexplorer dumps core with the segmentation fault in checkhbaapi/print_target_map_entry. The stack trace is observed as follows: print_target_map_entry() check_hbaapi() main() _start() DESCRIPTION: The checkhbaapi utility uses the HBA_GetFcpTargetMapping() API which returns the current set of mappings between the OS and the Fiber Channel Protocol (FCP) devices for a given Host Bus Adapter (HBA) port. The maximum limit for mappings is set to 512 and only that much memory is allocated. When the number of mappings returned is greater than 512, the function that prints this information tries to access the entries beyond that limit, which results in core dumps. RESOLUTION: The code is modified to allocate enough memory for all the mappings returned by the HBA_GetFcpTargetMapping() API. * INCIDENT NO:3090670 TRACKING ID:3090667 SYMPTOM: The "vxdisk -o thin,fssize list" command can cause system to hang or panic due to a kernel memory corruption. This command is also issued by Veritas Operations Manager (VOM) internally during Storage Foundation (SF) discovery. The following stack trace is observed: panic string: kernel heap corruption detected vol_objioctl vol_object_ioctl voliod_ioctl - frame recycled volsioctl_real DESCRIPTION: Veritas Volume Manager (VxVM) allocates data structures and invokes thin Logical Unit Numbers (LUNs) specific function handlers, to determine the disk space that is actively used by the file system. One of the function handlers wrongly accesses the system memory beyond the allocated data structure, which results in the kernel memory corruption. RESOLUTION: The code is modified so that the problematic function handler accesses only the allocated memory. * INCIDENT NO:3099508 TRACKING ID:3087893 SYMPTOM: EMC PowerPath pseudo device mappings change with each reboot with VxVM (Veritas Volume Manager). DESCRIPTION: VxVM invokes PowerPath command 'powermt display unmanaged' to discover PowerPath unmanaged devices. This command destroys PowerPath devices mappings during early boot stage when PowerPath isn't fully up. RESOLUTION: EMC fixed the issue by introducing an environment variable MPAPI_EARLY_BOOT to powermt command. VxVM startup script set the variable to TRUE before calling powermt command. Powermt understands the early boot phase and does things differently. The variable is unset by VxVM after device discovery. * INCIDENT NO:3133012 TRACKING ID:3160973 SYMPTOM: vxlist(1M) hangs while executing on Extensible Firmware Interface (EFI) formatted disk is attached to host. Following is the stack trace: [1] _read(0xa, 0xfe3fdf40, 0x400), [2] read(0xa, 0xfe3fdf40, 0x400), [3] is_asmdisk_efi(0xa, 0xfe3fe868, 0xfe3fe814, 0x200), [4] is_asmdisk(0xfe3fe868, 0xfe3fe814, 0xfe3fec68), [5] is_foreign_disk(0xfe3fe868, 0x0, 0xfe3fec68), [6] vol_is_foreign_disk(0x8126304, 0x0), =>[7] isForeignDisk(da = 0x8125f68), [8] getDaState(da = 0x8125f68), [9] buildMapsForDeportedDgs(cfgvect = 0x81106d8, dacmap = 0x80e8ab8, newdb = 1), [10] buildMapsForDgVec(vcfg = 0x81106c0, vectmap = 0xfe60eff0, dacmap = 0x80e8ab8), [11] initDB(), [12] init_db(), [13] doVmNotify(a = (nil)), [14] _thr_setup(0xfe971200), [15] _lwp_start(), DESCRIPTION: vxvm reads partition table and checks for various foreign format signatures within those partitions. Solaris/x86 SCSI driver cannot access the last sector due to Solaris off-by-one bug. Please see Sun-Solaris Bug Id 6342431. If during this partition recognition, last sector of the disk is accessed, read system call hangs causing vxlist to hang. RESOLUTION: If the partition does contain the very last sector of the disk, we skip skip reading that particular partition. * INCIDENT NO:3140411 TRACKING ID:2959325 SYMPTOM: The vxconfigd(1M) daemon dumps core while performing the disk group move operation with the following stack trace: dg_trans_start () dg_configure_size () config_enable_copy () da_enable_copy () ncopy_set_disk () ncopy_set_group () ncopy_policy_some () ncopy_set_copies () dg_balance_copies_helper () dg_transfer_copies () in vold_dm_dis_da () in dg_move_complete () in req_dg_move () in request_loop () in main () DESCRIPTION: The core dump occurs when the disk group move operation tries to reduce the size of the configuration records in the disk group, when the size is large and the disk group move operation needs more space for the new- configrecord entries. Since, both the reduction of the size of configuration records (compaction) and the configuration change by disk group move operation cannot co-exist, this result in the core dump. RESOLUTION: The code is modified to make the compaction first before the configuration change by the disk group move operation. * INCIDENT NO:3150893 TRACKING ID:3119102 SYMPTOM: Live migration of virtual machine having Storage Foundation stack with data disks fencing enabled, causes service groups configured on virtual machine to fault. DESCRIPTION: After live migration of virtual machine having Storage Foundation stack with data disks fencing enabled is done, I/O fails on shared SAN devices with reservation conflict and causes service groups to fault. Live migration causes SCSI initiator change. Hence I/O coming from migrated server to shared SAN storage fails with reservation conflict. RESOLUTION: Code changes are added to check whether the host is fenced off from cluster. If host is not fenced off, then registration key is re-registered for dmpnode through migrated server and restart IO. Admin needs to manually invoke 'vxdmpadm pgrrereg' from guest which was live migrated after live migration. * INCIDENT NO:3156719 TRACKING ID:2857044 SYMPTOM: System crashes with following stack when resizing volume with DCO version 30. PID: 43437 TASK: ffff88402a70aae0 CPU: 17 COMMAND: "vxconfigd" #0 [ffff884055a47600] machine_kexec at ffffffff8103284b #1 [ffff884055a47660] crash_kexec at ffffffff810ba972 #2 [ffff884055a47730] oops_end at ffffffff81501860 #3 [ffff884055a47760] no_context at ffffffff81043bfb #4 [ffff884055a477b0] __bad_area_nosemaphore at ffffffff81043e85 #5 [ffff884055a47800] bad_area at ffffffff81043fae #6 [ffff884055a47830] __do_page_fault at ffffffff81044760 #7 [ffff884055a47950] do_page_fault at ffffffff8150383e #8 [ffff884055a47980] page_fault at ffffffff81500bf5 [exception RIP: voldco_getalloffset+38] RIP: ffffffffa0bcc436 RSP: ffff884055a47a38 RFLAGS: 00010046 RAX: 0000000000000001 RBX: ffff883032f9eac0 RCX: 000000000000000f RDX: ffff88205613d940 RSI: ffff8830392230c0 RDI: ffff883fd1f55800 RBP: ffff884055a47a38 R8: 0000000000000000 R9: 0000000000000000 R10: 000000000000000e R11: 000000000000000d R12: ffff882020e80cc0 R13: 0000000000000001 R14: ffff883fd1f55800 R15: ffff883fd1f559e8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffff884055a47a40] voldco_get_map_extents at ffffffffa0bd09ab [vxio] #10 [ffff884055a47a90] voldco_update_extents_info at ffffffffa0bd8494 [vxio] #11 [ffff884055a47ab0] voldco_instant_resize_30 at ffffffffa0bd8758 [vxio] #12 [ffff884055a47ba0] volfmr_instant_resize at ffffffffa0c03855 [vxio] #13 [ffff884055a47bb0] voldco_process_instant_op at ffffffffa0bcae2f [vxio] #14 [ffff884055a47c30] volfmr_process_instant_op at ffffffffa0c03a74 [vxio] #15 [ffff884055a47c40] vol_mv_precommit at ffffffffa0c1ad02 [vxio] #16 [ffff884055a47c90] vol_commit_iolock_objects at ffffffffa0c1244f [vxio] #17 [ffff884055a47cf0] vol_ktrans_commit at ffffffffa0c131ce [vxio] #18 [ffff884055a47d70] volconfig_ioctl at ffffffffa0c8451f [vxio] #19 [ffff884055a47db0] volsioctl_real at ffffffffa0c8c9b8 [vxio] #20 [ffff884055a47e90] vols_ioctl at ffffffffa0040126 [vxspec] #21 [ffff884055a47eb0] vols_compat_ioctl at ffffffffa004034d [vxspec] #22 [ffff884055a47ee0] compat_sys_ioctl at ffffffff811ce0ed #23 [ffff884055a47f80] sysenter_dispatch at ffffffff8104a880 DESCRIPTION: While updating DCO TOC(Table Of Content) entries into in-core TOC, a TOC entry is wrongly freed and zeroed out. As a result, traversing TOC entries leads to NULL pointer dereference and thus, causing the panic. RESOLUTION: Code changes have been made to appropriately update the TOC entries. * INCIDENT NO:3159096 TRACKING ID:3146715 SYMPTOM: The 'rinks' do not connect with the Network Address Translation (NAT) configurations on Little Endian Architecture (LEA). DESCRIPTION: On LEAs, the Internet Protocol (IP) address configured with the NAT mechanism is not converted from the host-byte order to the network-byte order. As a result, the address used for the rlink connection mechanism gets distorted and the 'rlinks' fail to connect. RESOLUTION: The code is modified to convert the IP address to the network-byte order before it is used. * INCIDENT NO:3254227 TRACKING ID:3182350 SYMPTOM: If there are more than 8192 paths in the system, the vxassist command hangs while creating a new VxVM volume or increasing the existing volume's size. DESCRIPTION: The vxassist command creates a hash table with max 8192 entries. Hence other paths greater than 8192 will get hashed to an overlapping bucket in this hash table. In such case, multiple paths which hash to the same bucket are linked in a chain. In order to find a particular path in a specified bucket, vxassist command needs to traverse the entire linked chain. However vxassist only searches the first element and hangs. RESOLUTION: The code is modifed to traverse the entire linked chain. * INCIDENT NO:3254229 TRACKING ID:3063378 SYMPTOM: Some VxVM (Volume Manager) commands run slowly when "read only" devices (e.g. EMC SRDF-WD, BCV-NR) are presented and managed by EMC PowerPath. DESCRIPTION: When performing IO write on a "read only" device, IO fails and retry will be done if IO is on TPD(Third Party Driver) device and path status is okay. Owing to the retry, IO will not return until timeout reaches which gives the perception that VxVM commands run slowly. RESOLUTION: Code changes have been done to return IO immediately with disk media failure if IO fails on any TPD device and path status is okay. * INCIDENT NO:3254427 TRACKING ID:3182175 SYMPTOM: "vxdisk -o thin,fssize list" command can report incorrect File System usage data. DESCRIPTION: An integer overflow in the internal calculation can cause this command to report incorrect per disk FS usage. RESOLUTION: Code changes are made so that the command would report the correct File System usage data. * INCIDENT NO:3280555 TRACKING ID:2959733 SYMPTOM: When device paths are moved across LUNs or enclosures, vxconfigd daemon can dump core or data corruption can occur due to internal data structure inconsistencies. DESCRIPTION: When the device path configuration is changed after a planned or unplanned disconnection by moving only a subset of the device paths across LUNs or other storage arrays (enclosures), the DMP's internal data structures get messed up leading to vxconfigd daemon dumping core and in some situations data corruption due to incorrect LUN to path mappings. RESOLUTION: To resolve this issue, the vxconfigd code was modified to detect such situations gracefully and modify the internal data structures accordingly to avoid a vxconfigd coredump and data corruption * INCIDENT NO:3294641 TRACKING ID:3107741 SYMPTOM: "vxrvg snapdestroy" command fails with error message "Transaction aborted waiting for io drain", and vxconfigd hang is observed. vxconfigd stack trace is: vol_commit_iowait_objects vol_commit_iolock_objects vol_ktrans_commit volconfig_ioctl volsioctl_real vols_ioctl vols_compat_ioctl compat_sys_ioctl ... DESCRIPTION: The smartmove query of VxFS depends on some reads and writes. If some transaction in VxVM blocks the new read and write, then API is hung waiting for the response. This creates a deadlock-like situation with Smartmove API is waiting for transaction to complete and transaction waiting Smartmove API is hung waiting for transaction, and hence the hang. RESOLUTION: Disallow transactions during the Smartmove API. * INCIDENT NO:3294642 TRACKING ID:3019684 SYMPTOM: IO hang is observed when SRL is about to overflow after logowner switch from slave to master. Stack trace looks like: biowait default_physio volrdwr fop_write write syscall_trap32 DESCRIPTION: Delineating the steps, with slave as logowner, overflow the SRL and following it up with DCM resync. Then, switching back logowner to master and trying to overflow SRL again would manifest the IO hang in the master when SRL is about to overflow. This happens because the master has a stale flag set with incorrect value related to last SRL overflow. RESOLUTION: Reset the stale flag and ensure that flag is reset whether the logowner is master or slave. PATCH ID:6.0.300.000 * INCIDENT NO:2853712 TRACKING ID:2815517 SYMPTOM: vxdg adddisk succeeds to add a clone disk to non-clone and non-clone disk to clone diskgroup, resulting in mixed diskgroup. DESCRIPTION: vxdg import fails for diskgroup which has mix of clone and non-clone disks. So vxdg adddisk should not allow creation of mixed diskgroup. RESOLUTION: vxdisk adddisk code is modified to return an error for an attempt to add clone disk to non-clone or non-clone disks to clone diskgroup, Thus it prevents addition of disk in diskgroup which leads to mixed diskgroup. * INCIDENT NO:2860207 TRACKING ID:2859470 SYMPTOM: The EMC SRDF-R2 disk may go in error state when you create EFI label on the R1 disk. For example: R1 site # vxdisk -eo alldgs list | grep -i srdf emc0_008c auto:cdsdisk emc0_008c SRDFdg online c1t5006048C5368E580d266 srdf-r1 R2 site # vxdisk -eo alldgs list | grep -i srdf emc1_0072 auto - - error c1t5006048C536979A0d65 srdf-r2 DESCRIPTION: Since R2 disks are in write protected mode, the default open() call (made for read-write mode) fails for the R2 disks, and the disk is marked as invalid. RESOLUTION: As a fix, DMP was changed to be able to read the EFI label even on a write protected SRDF-R2 disk. * INCIDENT NO:2863672 TRACKING ID:2834046 SYMPTOM: VxVM dynamically reminors all the volumes during DG import if the DG base minor numbers are not in the correct pool. This behaviour cases NFS client to have to re-mount all NFS file systems in an environment where CVM is used on the NFS server side. DESCRIPTION: Starting from 5.1, the minor number space is divided into two pools, one for private disk groups and another for shared disk groups. During DG import, the DG base minor numbers will be adjusted automatically if not in the correct pool, and so do the volumes in the disk groups. This behaviour reduces many minor conflicting cases during DG import. But in NFS environment, it makes all file handles on the client side stale. Customers had to unmount files systems and restart applications. RESOLUTION: A new tunable, "autoreminor", is introduced. The default value is "on". Most of the customers don't care about auto-reminoring. They can just leave it as it is. For a environment that autoreminoring is not desirable, customers can just turn it off. Another major change is that during DG import, VxVM won't change minor numbers as long as there is no minor conflicts. This includes the cases that minor numbers are in the wrong pool. * INCIDENT NO:2863708 TRACKING ID:2836528 SYMPTOM: vxdisk resize fails with an error " New geometry makes partition unaligned " bash# vxdisk -g testdg resize disk01 length=8g VxVM vxdisk ERROR V-5-1-8643 Device disk01: resize failed: New geometry makes partition unaligned DESCRIPTION: On Solaris X86 system, the partition 8 is not necessary to align with cylinder size. However VxVM requires this partition to be cylinder aligned. Hence the issue. RESOLUTION: Issue is fixed by doing the necessary changes to skip alignment check for partition 8 on Solaris X86 platform. * INCIDENT NO:2876865 TRACKING ID:2510928 SYMPTOM: The extended attributes reported by "vxdisk -e list" for the EMC SRDF luns are reported as "tdev mirror", instead of "tdev srdf-r1". Example, # vxdisk -e list DEVICE TYPE DISK GROUP STATUS OS_NATIVE_NAME ATTR emc0_028b auto:cdsdisk - - online thin c3t5006048AD5F0E40Ed190s2 tdev mirror DESCRIPTION: The extraction of the attributes of EMC SRDF luns was not done properly. Hence, EMC SRDF luns are erroneously reported as "tdev mirror", instead of "tdev srdf- r1". RESOLUTION: Code changes have been made to extract the correct values. * INCIDENT NO:2892499 TRACKING ID:2149922 SYMPTOM: Record the diskgroup import and deport events in the /var/adm/messages file. Following type of message can be logged in syslog: vxvm:vxconfigd: V-5-1-16254 Disk group import of succeeded. DESCRIPTION: With the diskgroup import or deport, appropriate success message or failure message with the cause for failure should be logged. RESOLUTION: Code changes are made to log diskgroup import and deport events in syslog. * INCIDENT NO:2892571 TRACKING ID:1856733 SYMPTOM: Add support for FusionIO on Solaris x64 DESCRIPTION: FusionIO was not previously supported on Solaris x64 platform. RESOLUTION: Support for FusionIO is added for Solaris x64. * INCIDENT NO:2892590 TRACKING ID:2779580 SYMPTOM: Secondary node gives configuration error 'no Primary RVG' when primary master node(default logowner) is rebooted and slave becomes new master. DESCRIPTION: After reboot of primary master, new master sends handshake request for vradmind communication to secondary. As a part of handshake request, secondary deletes the old configuration including primary RVG. During this phase, secondary receives configuration update message from primary for old configuration. Secondary does not find old primary RVG configuration for processing this message. Hence, it cannot proceed with the pending handshake request and gives 'no Primary RVG' configuration error. RESOLUTION: Code changes are done such that during handshake request phase, configuration messages of old primary RVG are discarded. * INCIDENT NO:2892621 TRACKING ID:1903700 SYMPTOM: vxassist remove mirror does not work if nmirror and alloc is specified, giving an error "Cannot remove enough mirrors" DESCRIPTION: During remove mirror operation, VxVM does not perform correct analysis of plexes. Hence the issue. RESOLUTION: Necessary code changes have been done so that vxassist works properly. * INCIDENT NO:2892630 TRACKING ID:2742706 SYMPTOM: The system panic can happen with following stack, when the Oracle 10G Grid Agent Software invokes the command :- # nmhs get_solaris_disks unix:lock_try+0x0() genunix:turnstile_interlock+0x1c() genunix:turnstile_block+0x1b8() unix:mutex_vector_enter+0x428() unix:mutex_enter() - frame recycled vxlo:vxlo_open+0x2c() genunix:dev_open() - frame recycled specfs:spec_open+0x4f4() genunix:fop_open+0x78() genunix:vn_openat+0x500() genunix:copen+0x260() unix:syscall_trap32+0xcc() DESCRIPTION: The open system call code path of the vxlo (Veritas Loopback Driver) is not releasing the acquired global lock after the work is completed. The panic may occur when the next open system call tries to acquire the lock. RESOLUTION: Code changes have been made to release the global lock appropriately. * INCIDENT NO:2892643 TRACKING ID:2801962 SYMPTOM: Operations that lead to growing of volume, including 'vxresize', 'vxassist growby/growto' take significantly larger time if the volume has version 20 DCO(Data Change Object) attached to it in comparison to volume which doesn't have DCO attached. DESCRIPTION: When a volume with a DCO is grown, it needs to copy the existing map in DCO and update the map to track the grown regions. The algorithm was such that for each region in the map it would search for the page that contains that region so as to update the map. Number of regions and number of pages containing them are proportional to volume size. So, the search complexity is amplified and observed primarily when the volume size is of the order of terabytes. In the reported instance, it took more than 12 minutes to grow a 2.7TB volume by 50G. RESOLUTION: Code has been enhanced to find the regions that are contained within a page and then avoid looking-up the page for all those regions. * INCIDENT NO:2892650 TRACKING ID:2826125 SYMPTOM: VxVM script daemons are not up after they are invoked with the vxvm-recover script. DESCRIPTION: When the VxVM script daemon is starting, it will terminate any stale instance if it does exist. When the script daemon is invoking with exactly the same process id of the previous invocation, the daemon itself is abnormally terminated by killing one own self through a false-positive detection. RESOLUTION: Code changes are made to handle the same process id situation correctly. * INCIDENT NO:2892660 TRACKING ID:2000585 SYMPTOM: If 'vxrecover -sn' is run and at the same time one volume is removed, vxrecover exits with the error 'Cannot refetch volume', the exit status code is zero but no volumes are started. DESCRIPTION: vxrecover assumes that volume is missing because the diskgroup must have been deported while vxrecover was in progress. Hence, it exits without starting remaining volumes. vxrecover should be able to start other volumes, if the DG is not deported. RESOLUTION: Modified the source to skip missing volume and proceed with remaining volumes. * INCIDENT NO:2892665 TRACKING ID:2807158 SYMPTOM: During VM upgrade or patch installation on Solaris platform, sometimes the system can hang due to deadlock with following stack: genunix:cv_wait genunix:ndi_devi_enter genunix:devi_config_one genunix:ndi_devi_config_one genunix:resolve_pathname genunix:e_ddi_hold_devi_by_path vxspec:_init genunix:modinstall genunix:mod_hold_installed_mod genunix:modrload genunix:modload genunix:mod_hold_dev_by_major genunix:ndi_hold_driver genunix:probe_node genunix:i_ndi_config_node genunix:i_ddi_attachchild DESCRIPTION: During the upgrade or patch installation, the vxspec module is unloaded and reloaded. In the vxspec module initialization, it tries to lock root node during the pathname go-through while already holding the subnode, i.e, /pseudo. Meanwhile, if there is another process holding the lock of root node is acquiring the lock of the subnode /pseudo, the deadlock occurs since each process tries to get the lock already hold by peer. RESOLUTION: APIs which are introducing deadlock are replaced. * INCIDENT NO:2892682 TRACKING ID:2837717 SYMPTOM: "vxdisk(1M) resize" command fails if 'da name' is specified. DESCRIPTION: The scenario for 'da name' is not handled in the resize code path. RESOLUTION: The code is modified such that if 'dm name' is not specified to resize, then 'da name' specific operation is performed. * INCIDENT NO:2892684 TRACKING ID:1859018 SYMPTOM: "Link link detached from volume " warnings are displayed when a linked-breakoff snapshot is created. DESCRIPTION: The purpose of these message is to let user and administrators know about the detach of link due to I/O errors. These messages get displayed uneccesarily whenever linked-breakoff snapshot is created. RESOLUTION: Code changes are made to display messages only when link is detached due to I/O errors on volumes involved in link-relationship. * INCIDENT NO:2892689 TRACKING ID:2836798 SYMPTOM: 'vxdisk resize' fails with the following error on the simple format EFI (Extensible Firmware Interface) disk expanded from array side and system may panic/hang after a few minutes. # vxdisk resize disk_10 VxVM vxdisk ERROR V-5-1-8643 Device disk_10: resize failed: Configuration daemon error -1 DESCRIPTION: As VxVM doesn't support Dynamic Lun Expansion on simple/sliced EFI disk, last usable LBA (Logical Block Address) in EFI header is not updated while expanding LUN. Since the header is not updated, the partition end entry was regarded as illegal and cleared as part of partition range check. This inconsistent partition information between the kernel and disk causes system panic/hang. RESOLUTION: Added checks in VxVM code to prevent DLE on simple/sliced EFI disk. * INCIDENT NO:2892698 TRACKING ID:2851085 SYMPTOM: DMP doesn't detect implicit LUN ownership changes DESCRIPTION: DMP does ownership monitoring for ALUA arrays to detect implicit LUN ownership changes. This helps DMP to always use Active/Optimized path for sending down I/O. This feature is controlled using dmp_monitor_ownership tune and is enabled by default. In case of partial discovery triggered through event source daemon (vxesd), ALUA information kept in kernel data structure for ownership monitoring was getting wiped. This causes ownership monitoring to not work for these dmpnodes. RESOLUTION: Source has been updated to handle such case. * INCIDENT NO:2892702 TRACKING ID:2567618 SYMPTOM: VRTSexplorer coredumps in checkhbaapi/print_target_map_entry which looks like: print_target_map_entry() check_hbaapi() main() _start() DESCRIPTION: checkhbaapi utility uses HBA_GetFcpTargetMapping() API which returns the current set of mappings between operating system and fibre channel protocol (FCP) devices for a given HBA port. The maximum limit for mappings was set to 512 and only that much memory was allocated. When the number of mappings returned was greater than 512, the function that prints this information used to try to access the entries beyond that limit, which resulted in core dumps. RESOLUTION: The code has been changed to allocate enough memory for all the mappings returned by HBA_GetFcpTargetMapping(). * INCIDENT NO:2892716 TRACKING ID:2753954 SYMPTOM: When cable is disconnected from one port of a dual-port FC HBA, only paths going through the port should be marked as SUSPECT. But paths going through other port are also getting marked as SUSPECT. DESCRIPTION: Disconnection of a cable from a HBA port generates a FC event. When the event is generated, paths of all ports of the corresponding HBA are marked as SUSPECT. RESOLUTION: The code changes are done to mark the paths only going through the port on which FC event is generated. * INCIDENT NO:2922770 TRACKING ID:2866997 SYMPTOM: After applying Solaris patch 147440-20, disk initialization using vxdisksetup command fails with following error, VxVM vxdisksetup ERROR V-5-2-43 : Invalid disk device for vxdisksetup DESCRIPTION: A un-initialized variable gets a different value after OS patch installation, thereby making vxparms command outputs give an incorrect result. RESOLUTION: Initialize the variable with correct value. * INCIDENT NO:2922798 TRACKING ID:2878876 SYMPTOM: vxconfigd, VxVM configuration daemon dumps core with the following stack. vol_cbr_dolog () vol_cbr_translog () vold_preprocess_request () request_loop () main () DESCRIPTION: This core is a result of a race between two threads which are processing the requests from the same client. While one thread completed processing a request and is in the phase of releasing the memory used, other thread is processing a request "DISCONNECT" from the same client. Due to the race condition, the second thread attempted to access the memory which is being released and dumped core. RESOLUTION: The issue is resolved by protecting the common data of the client by a mutex. * INCIDENT NO:2924117 TRACKING ID:2911040 SYMPTOM: Restore operation from a cascaded snapshot succeeds even when it's one of the source is inaccessible. Subsequently, if the primary volume is made accessible for operation, IO operations may fail on the volume as the source of the volume is inaccessible. Deletion of snapshots would as well fail due to dependency of the primary volume on the snapshots. In such case, following error is thrown when try to remove any snapshot using 'vxedit rm' command: ""VxVM vxedit ERROR V-5-1-XXXX Volume YYYYYY has dependent volumes" DESCRIPTION: When a snapshot is restored from any snapshot, the snapshot becomes the source of data for regions on primary volume that differ between the two volumes. If the snapshot itself depends on some other volume and that volume is not accessible, effectively primary volume becomes inaccessible after restore operation. In such case, the snapshots cannot be deleted as the primary volume depends on it. RESOLUTION: If a snapshot or any later cascaded snapshot is inaccessible, restore from that snapshot is prevented. * INCIDENT NO:2924188 TRACKING ID:2858853 SYMPTOM: In CVM(Cluster Volume Manager) environment, after master switch, vxconfigd dumps core on the slave node (old master) when a disk is removed from the disk group. dbf_fmt_tbl() voldbf_fmt_tbl() voldbsup_format_record() voldb_format_record() format_write() ddb_update() dg_set_copy_state() dg_offline_copy() dasup_dg_unjoin() dapriv_apply() auto_apply() da_client_commit() client_apply() commit() dg_trans_commit() slave_trans_commit() slave_response() fillnextreq() vold_getrequest() request_loop() main() DESCRIPTION: During master switch, disk group configuration copy related flags are not cleared on the old master, hence when a disk is removed from a disk group, vxconfigd dumps core. RESOLUTION: Necessary code changes have been made to clear configuration copy related flags during master switch. * INCIDENT NO:2924207 TRACKING ID:2886402 SYMPTOM: When re-configuring dmp devices, typically using command 'vxdisk scandisks', vxconfigd hang is observed. Since it is in hang state, no VxVM(Veritas volume manager)commands are able to respond. Following process stack of vxconfigd was observed. dmp_unregister_disk dmp_decode_destroy_dmpnode dmp_decipher_instructions dmp_process_instruction_buffer dmp_reconfigure_db gendmpioctl dmpioctl dmp_ioctl dmp_compat_ioctl compat_blkdev_ioctl compat_sys_ioctl cstar_dispatch DESCRIPTION: When DMP(dynamic multipathing) node is about to be destroyed, a flag is set to hold any IO(read/write) on it. The IOs which may come in between the process of setting flag and actual destruction of DMP node, are placed in dmp queue and are never served. So the hang is observed. RESOLUTION: Appropriate flag is set for node which is to be destroyed so that any IO after marking flag will be rejected so as to avoid hang condition. * INCIDENT NO:2930399 TRACKING ID:2930396 SYMPTOM: The vxdmpasm/vxdmpraw command does not work on Solaris. For example: #vxdmpasm enable user1 group1 600 emc0_02c8 expr: syntax error /etc/vx/bin/vxdmpasm: test: argument expected #vxdmpraw enable user1 group1 600 emc0_02c8 expr: syntax error /etc/vx/bin/vxdmpraw: test: argument expected DESCRIPTION: The "length" function of expr command does not work on Solaris. This function was used in the script and used to give error. RESOLUTION: The expr command has been replaced by awk command. * INCIDENT NO:2933467 TRACKING ID:2907823 SYMPTOM: Unconfiguring devices in 'failing' or 'unusable' state (as shown by cfgadm utility) cannot be done using VxVM Dynamic reconfiguration(DR) tool. DESCRIPTION: If devices are not removed properly then they can be in 'failing' or 'unusable' state as shown below: c1::5006048c5368e580,255 disk connected configured failing c1::5006048c5368e580,326 disk connected configured unusable Such devices are ignored by DR Tool, and they need to be manually unconfigured using cgadm utility. RESOLUTION: To fix this, code changes are done so that DR Tool asks user if they wants to unconfigure 'failed' or 'unusable' devices and takes action accordingly. * INCIDENT NO:2933468 TRACKING ID:2916094 SYMPTOM: These are the issues for which enhancements are done: 1. All the DR operation logs are accumulated in one log file 'dmpdr.log', and this file grows very large. 2. If a command takes long time, user may think DR operations have stuck. 3. Devices controlled by TPD are seen in list of luns that can be removed in 'Remove Luns' operation. DESCRIPTION: 1. All the logs of DR operations accumulate and form one big log file which makes it difficult for user to get to the current DR operation logs. 2. If a command takes time, user has no way to know whether the command has stuck. 3. Devices controlled by TPD are visible to user which makes him think that he can remove those devices without removing them from TPD control. RESOLUTION: 1. Now every time user opens DR Tool, a new log file of form dmpdr_yyyymmdd_HHMM.log is generated. 2. A messages is displayed to inform user if a command takes longer time than expected. 3. Changes are made so that devices controlled by TPD are not visible during DR operations. * INCIDENT NO:2933469 TRACKING ID:2919627 SYMPTOM: While doing 'Remove Luns' operation of Dynamic Reconfiguration Tool, there is no feasible way to remove large number of LUNs, since the only way to do so is to enter all LUN names separated by comma. DESCRIPTION: When removing luns in bulk during 'Remove Luns' option of Dynamic Reconfiguration Tool, it would not be feasible to enter all the luns separated by comma. RESOLUTION: Code changes are done in Dynamic Reconfiguration scripts to accept file containing luns to be removed as input. * INCIDENT NO:2934259 TRACKING ID:2930569 SYMPTOM: The LUNs in 'error' state in output of 'vxdisk list' cannot be removed through DR(Dynamic Reconfiguration) Tool. DESCRIPTION: The LUNs seen in 'error' state in VM(Volume Manager) tree are not listed by DR(Dynamic Reconfiguration) Tool while doing 'Remove LUNs' operation. RESOLUTION: Necessary changes have been made to display LUNs in error state while doing 'Remove LUNs' operation in DR(Dynamic Reconfiguration) Tool. * INCIDENT NO:2940447 TRACKING ID:2940446 SYMPTOM: I/O can hang on volume with space optimized snapshot if the underlying cache object is of very large size. It can also lead to data corruption in cache- object. DESCRIPTION: Cache volume maintains B+ tree for mapping the offset and its actual location in cache object. Copy-on-write I/O generated on snapshot volumes needs to determine the offset of particular I/O in cache object. Due to incorrect type- casting the value calculated for large offset truncates to smaller value due to overflow, leading to data corruption. RESOLUTION: Code changes are done to avoid overflow during offset calculation in cache object. * INCIDENT NO:2941167 TRACKING ID:2915751 SYMPTOM: Solaris machine panics while resizing CDS-EFI LUN or CDS VTOC to EFI conversion case where new size of resize is greater than 1TB. DESCRIPTION: While resizing a disk having CDS-EFI format or while resizing a CDS disk from less than 1TB to >= 1TB, machine panics because of the incorrect use of device numbers. VxVM uses the whole slice number s0 instead of s7 which represents the whole device for EFI format. Hence, the device open fails and the incorrect disk maxiosize was populated. While doing an I/O, machine panics with divide by zero error. RESOLUTION: While resizing a disk having CDS-EFI format or while resizing a CDS disk from less than 1TB to >= 1TB, VxVM now correctly uses device number corresponding to partition 7 of the device. * INCIDENT NO:2941193 TRACKING ID:1982965 SYMPTOM: "vxdg import DGNAME " fails when "da-name" used as an input to vxdg command is based on namingscheme which is different from the prevailing namingscheme on the host. Error message seen is: VxVM vxdg ERROR V-5-1-530 Device c6t50060E801002BC73d240 not found in configuration VxVM vxdg ERROR V-5-1-10978 Disk group x86dg: import failed: Not a disk access record DESCRIPTION: vxconfigd stores Disk Access (DA) records based on DMP names. If "vxdg" passes a name other than DMP name for the device, vxconfigd cannot map it to a DA record. As vxconfigd cannot locate a DA record corresponding to passed input name from vxdg, it fails the import operation. RESOLUTION: vxdg command now converts the input name to DMP name before passing it to vxconfigd for further processing. * INCIDENT NO:2941226 TRACKING ID:2915063 SYMPTOM: System panic with following stack during detaching plex of volume in CVM environment. vol_klog_findent() vol_klog_detach() vol_mvcvm_cdetsio_callback() vol_klog_start() voliod_iohandle() voliod_loop() DESCRIPTION: During plex-detach operation VxVM searches the plex object to be detached in kernel. In case if there is some transaction in progress on any diskgroup in the system, incorrect plex object gets selected sometime, which results into dereference of invalid address and panics the system. RESOLUTION: Code changes done to make sure that correct plex object is getting selected. * INCIDENT NO:2941234 TRACKING ID:2899173 SYMPTOM: In CVR environment, SRL failure may result into vxconfigd hang and eventually resulting into 'vradmin stoprep' command hang. DESCRIPTION: 'vradmin stoprep' command is hung because vxconfigd is waiting indefinitely in transaction. Transaction was waiting for IO completion on SRL. We generate error handler to handle IO failure on SRL. But if we are in transaction, this error was not getting handled properly resulting into transaction hang. RESOLUTION: Fix is provided such that when SRL failure is encountered, transaction itself handles IO error on SRL. * INCIDENT NO:2941237 TRACKING ID:2919318 SYMPTOM: In a CVM environment with fencing enabled, wrong fencing keys are registered for opaque disks during node join or dg import operations. DESCRIPTION: During cvm node join and shared dg import code path, when opaque disk registration happens, fencing keys in internal dg records are not in sync with actual keys generated. This was causing wrong fencing keys registered for opaque disks. For rest disks fencing key registration happens correctly. RESOLUTION: Fix is to copy correctly generated key to internal dg record for current dg import/node join scenario and use it for disk registration. * INCIDENT NO:2941252 TRACKING ID:1973983 SYMPTOM: Relocation is failing with following error when DCO(data change object) plex is in disabled state. VxVM vxrelocd ERROR V-5-2-600 Failure recovering in disk group DESCRIPTION: When a mirror-plex is added to a volume using "vxassist snapstart", attached DCO plex can be in DISABLED/DCOSNP state. While recovering such DCO plexes, if enclosure is disabled, plex can get in DETACHED/DCOSNP state and relocation fails. RESOLUTION: Code changes are made to handle DCO plexs in disabled state in relocation. * INCIDENT NO:2942166 TRACKING ID:2942609 SYMPTOM: You will see following message as error message when quiting from Dynamic Reconfiguration Tool. "FATAL: Exiting the removal operation." DESCRIPTION: When user quits from an operation, Dynamic Reconfiguration Tool displays it is quiting as error message. RESOLUTION: Made changes to display the message as Info. * INCIDENT NO:2944708 TRACKING ID:1725593 SYMPTOM: The 'vxdmpadm listctlr' command does not show the count of device paths seen through it DESCRIPTION: The 'vxdmpadm listctlr' currently does not show the number of device paths seen through it. The CLI option has been enhanced to provide this information as an additional column at the end of each line in the CLI's output RESOLUTION: The number of paths under each controller is counted and the value is displayed as the last column in the 'vxdmpadm listctlr' CLI output * INCIDENT NO:2944710 TRACKING ID:2744004 SYMPTOM: When VVR is configured, vxconfigd on secondary gets hung. Any vx commands issued during this time does not complete. DESCRIPTION: Vxconfigd is waiting for IOs to drain before allowing a configuration change command to proceed. The IOs never drain completely resulting into the hang. This is because there is a deadlock where pending IOs are unable to start and vxconfigd keeps waiting for their completion. RESOLUTION: Changed the code so that this deadlock does not arise. The IOs can be started properly and complete allowing vxconfigd to function properly. * INCIDENT NO:2944714 TRACKING ID:2833498 SYMPTOM: vxconfigd daemon hangs in vol_ktrans_commit() while reclaim operation is in progress on volumes having instant snapshots. Stack trace is given below: vol_ktrans_commit volconfig_ioctl DESCRIPTION: Storage reclaim leads to the generation of special IOs (termed as Reclaim IOs), which can be very large in size(>4G) and unlike application IOs, these are not broken into smaller sized IOs. Reclaim IOs need to be tracked in snapshot maps if the volume has full snapshots configured. The mechanism to track reclaim IO is not capable of handling such large IOs causing hang. RESOLUTION: Code changes are made to use the alternative mechanism in Volume manager to track the reclaim IOs. * INCIDENT NO:2944717 TRACKING ID:2851403 SYMPTOM: System panics while unloading 'vxio' module when VxVM SmartMove feature is used and the "vxportal" module gets reloaded (for e.g. during VxFS package upgrade). Stack trace looks like: vxportalclose() vxfs_close_portal() vol_sr_unload() vol_unload() DESCRIPTION: During a smart-move operation like plex attach, VxVM opens the 'vxportal' module to read in-use file system maps information. This file descriptor gets closed only when 'vxio' module is unloaded. If the 'vxportal' module is unloaded and reloaded before 'vxio', the file descriptor with 'vxio' becomes invalid and results in a panic. RESOLUTION: Code changes are made to close the file descriptor for 'vxportal' after reading free/invalid file system map information. This ensures that stale file descriptors don't get used for 'vxportal'. * INCIDENT NO:2944722 TRACKING ID:2869594 SYMPTOM: Master node would panic with following stack after a space optimized snapshot is refreshed or deleted and master node is selected using 'vxclustadm setmaster' volilock_rm_from_ils vol_cvol_unilock vol_cvol_bplus_walk vol_cvol_rw_start voliod_iohandle voliod_loop thread_start In addition to this, all space optimized snapshots on the corresponding cache object may be corrupted. DESCRIPTION: In CVM, the master node owns the responsibility of maintaining the cache object indexing structure for providing space optimized functionality. When a space optimized snapshot is refreshed or deleted, the indexing structure would get rebuilt in background after the operation is returned. When the master node is switched using 'vxclustadm setmaster' before index rebuild is complete, both old master and new master nodes would rebuild the index in parallel which results in index corruption. Since the index is corrupted, the data stored on space optimized snapshots should not be trusted. I/Os issued on corrupted index would lead to panic. RESOLUTION: When the master role is switched using 'vxclustadm setmaster', the index rebuild on old master node would be safely aborted. Only new master node would be allowed to rebuild the index. * INCIDENT NO:2944724 TRACKING ID:2892983 SYMPTOM: vxvol command dumps core with the following stack trace, if executed parallel to vxsnap addmir command strcmp() do_link_recovery trans_resync_phase1() vxvmutil_trans() trans() common_start_resync() do_noderecover() main() DESCRIPTION: During creation of link between two volumes if vxrecover is triggered, vxvol command may not have information about the newly created links. This leads to NULL pointer dereference and dumps core. RESOLUTION: The code has been modified to check if links information is properly present with vxvol command and fail operation with appropriate error message. * INCIDENT NO:2944725 TRACKING ID:2910043 SYMPTOM: Frequent swapin/swapout seen due to higher order memory requests DESCRIPTION: In VxVM operations such as plex attach, snapshot resync/reattach issue ATOMIC_COPY IOCTL's. Default I/O size for these operation is 1MB and VxVM allocates this memory from operating system. Memory allocations of such large size can results into swapin/swapout of pages and are not very efficient. In presence of lot of such operations , system may not work very efficiently. RESOLUTION: VxVM has its own I/O memory management module, which allocates pages from operating system and efficiently manage them. Modified ATOMIC_COPY code to make use of VxVM's internal I/O memory pool instead of directly allocating memory from operating system. * INCIDENT NO:2944727 TRACKING ID:2919720 SYMPTOM: vxconfigd dumps core in rec_lock1_5() function. rec_lock1_5() rec_lock1() rec_lock() client_trans_start() req_vol_trans() request_loop() main() DESCRIPTION: During any configuration changes in VxVM, vxconfigd locks all involved objects in operations to avoid any unexpected modification. Some objects which do not belong to the context of current transactions are not handled properly which resuls in core dump. This case is particularly seen during snapshots operation of cross-dg linked volume snapshots. RESOLUTION: Code changes are done to avoid locking of records which are not yet part of the committed VxVM configuration. * INCIDENT NO:2944729 TRACKING ID:2933138 SYMPTOM: System panics with stack trace given below: voldco_update_itemq_chunk() voldco_chunk_updatesio_start() voliod_iohandle() voliod_loop() DESCRIPTION: While tracking IOs in snapshot MAPS information is stored in- memory pages. For large sized IOs (such as reclaim IOs), this information can span across multiple pages. Sometimes the pages are not properly referenced in MAP update for IOs of larger size which lead to panic because of invalid page addresses. RESOLUTION: Code is modified to properly reference pages during MAP update for large sized IOs. * INCIDENT NO:2944741 TRACKING ID:2866059 SYMPTOM: When disk resize fails, following messages can appear on screen: 1. "VxVM vxdisk ERROR V-5-1-8643 Device : resize failed: One or more subdisks do not fit in pub reg" or 2. "VxVM vxdisk ERROR V-5-1-8643 Device : resize failed: Cannot remove last disk in disk group" DESCRIPTION: In first message extra information should be provided like which subdisk is under consideration and what are subdisk and public region lengths etc. After vxdisk resize fails with the second message, if -f(force) option is used, resize operation succeeds. This message can be improved by suggesting the user to use -f (force) option for resizing RESOLUTION: Code changes are made to improve the error messages. * INCIDENT NO:2962257 TRACKING ID:2898547 SYMPTOM: vradmind dumps core on VVR (Veritas Volume Replicator) Secondary site in a CVR (Clustered Volume Replicator) environment. Stack trace would look like: __kernel_vsyscall raise abort fmemopen malloc_consolidate delete delete[] IpmHandle::~IpmHandle IpmHandle::events main DESCRIPTION: When Logowner Service Group is moved across nodes on the Primary Site, it induces deletion of IpmHandle of the old Logowner Node, as the IpmHandle of the new Logowner Node gets created. During destruction of IpmHandle object, a pointer '_cur_rbufp' is not set to NULL, which can lead to freeing up of memory which is already freed, and thus, causing 'vradmind' to dump core. RESOLUTION: Destructor of IpmHandle is modified to set the pointer to NULL after it is deleted. * INCIDENT NO:2964567 TRACKING ID:2964547 SYMPTOM: Whenever system reboots, below messages are logged on system console: Oct 10 19:10:01 sol11_server unix: [ID 779321 kern.notice] vxdmp: unable to resolve dependency, Oct 10 19:10:01 sol11_server unix: [ID 969242 kern.notice] cannot load module 'misc/ted' DESCRIPTION: Module 'misc/ted' is part of debug package. It was wrongly getting linked with vxdmp driver for non-debug builds. These are harmless messages. RESOLUTION: Source makefile was modified to remove this dependency for non-debug packages. * INCIDENT NO:2974870 TRACKING ID:2935771 SYMPTOM: Rlinks disconnect after switching the master. DESCRIPTION: Sometimes switching a master on the primary can cause the Rlinks to disconnect. vradmin repstatus would show "paused due to network disconnection" as the replication status. VVR uses a connection to check if the secondary is alive. The secondary responds to these requests by replying back, indicating that it is alive. On a master switch, the old master fails to close this connection with the secondary. Thus after the master switch the old master as well as the new master would send the requests to the secondary. This causes a mismatch of connection numbers on the secondary and the secondary does not reply to the requests of the new master. Thus it causes the Rlinks to disconnect. RESOLUTION: The solution is to close the connection of the old master with the secondary, so that it does not keep sending connection requests to the secondary. * INCIDENT NO:2976946 TRACKING ID:2919714 SYMPTOM: On a THIN lun, vxevac returns 0 without migrating unmounted VxFS volumes. The following error messages are displayed when an unmounted VxFS volumes is processed: VxVM vxsd ERROR V-5-1-14671 Volume v2 is configured on THIN luns and not mounted. Use 'force' option, to bypass smartmove. To take advantage of smartmove for supporting thin luns, retry this operation after mounting the volume. VxVM vxsd ERROR V-5-1-407 Attempting to cleanup after failure ... DESCRIPTION: On a THIN lun, VM will not move or copy data on an unmounted VxFS volumes unless smartmove is bypassed. The vxevac command fails needs to be enhanced to detect unmounted VxFS volumes on THIN luns and to support a force option that allows the user to bypass smartmove. RESOLUTION: The vxevac script has been modified to check for unmounted VxFS volumes on THIN luns prior to performing the migration. If an unmounted VxFS volume is detected the command fails with a non-zero return code and displays a message notifying the user to mount the volumes or bypass smartmove by specifying the force option: VxVM vxevac ERROR V-5-2-0 The following VxFS volume(s) are configured on THIN luns and not mounted: v2 To take advantage of smartmove support on thin luns, retry this operation after mounting the volume(s). Otherwise, bypass smartmove by specifying the '-f' force option. * INCIDENT NO:2976956 TRACKING ID:1289985 SYMPTOM: vxconfigd core dumps upon running "vxdctl enable" command, as vxconfigd is not checking the status value returned by the device when it sends SCSI mode sense command to the device. DESCRIPTION: vxconfigd sends SCSI mode sense command to the device to obtain device information, but it only checks the return value of ioctl(). The return value of ioctl() only stands if there is an error while sending the command to target device. Vxconfigd should also check the value of SCSI status byte returned by the device to get the real status of SCSI command execution. RESOLUTION: The code has been changed to check the value of SCSI status byte returned by the device and it takes appropriate action if status value is nonzero. * INCIDENT NO:2976974 TRACKING ID:2875962 SYMPTOM: When an upgrade install is performed from VxVM 5.0MPx to VxVM 5.1(and higher) the installtion script may give the following message: The following files are already installed on the system and are being used by another package: /usr/lib/vxvm/root/kernel/drv/vxapm/dmpsvc.SunOS_5.10 Do you want to install these conflicting files [y,n,?,q] DESCRIPTION: A VxVM 5.0MPx patch incorrectly packaged the IBM SanVC APM with a VxVM patch, which was subsequently corrected in a later patch. Any upgrade performed from that 5.0MPx patch to 5.1 or higher will result in this packaging message. RESOLUTION: Added code to the packaging script of the VxVM package to remove the APM files so that a conflict between VRTSaslapm and VRTSvxvm packages are resolved. * INCIDENT NO:2978189 TRACKING ID:2948172 SYMPTOM: Execution of command "vxdisk -o thin,fssize list" can cause hang or panic. Hang stack trace might look like: pse_block_thread pse_sleep_thread .hkey_legacy_gate volsiowait vol_objioctl vol_object_ioctl voliod_ioctl volsioctl_real volsioctl Panic stack trace might look like: voldco_breakup_write_extents volfmr_breakup_extents vol_mv_indirect_write_start volkcontext_process volsiowait vol_objioctl vol_object_ioctl voliod_ioctl volsioctl_real vols_ioctl vols_compat_ioctl compat_sys_ioctl sysenter_dispatch DESCRIPTION: Command "vxdisk -o thin,fssize list" triggers reclaim I/Os to get file system usage from veritas file system on veritas volume manager mounted volumes. We currently do not support reclamation on volumes with space optimized (SO) snapshots. But because of a bug, reclaim IOs continue to execute for volumes with SO Snapshots leading to system panic/hang. RESOLUTION: Code changes are made to not to allow reclamation IOs to proceed on volumes with SO Snapshots. * INCIDENT NO:2979767 TRACKING ID:2798673 SYMPTOM: System panic is observed with the stacktrace given below: voldco_alloc_layout voldco_toc_updatesio_done voliod_iohandle voliod_loop DESCRIPTION: DCO (data change object) contains metadata information required to start DCO volume and decode further information from the DCO volume. This information is stored in the 1st block of DCO volume. If this metadata information is incorrect/corrupted, the further processing of volume start resulted into panic due to divide-by-zero error in kernel. RESOLUTION: Code changes are made to verify the correctness of DCO volumes metadata information during startup. If the information read is incorrect, volume start operations fails. * INCIDENT NO:2983679 TRACKING ID:2970368 SYMPTOM: SRDF-R2 WD(write-disabled)devices are shown in error state and lots of path enable/disable messages are generated in /etc/vx/dmpevents.log file. DESCRIPTION: DMP(dynamic multi-pathing driver) disables the paths of write protected devices. Therefore these devices are shown in error state. Vxattachd daemon tries to online these devices and executes partial device discovery for these devices. As part of partial device discovery, enabling and disabling the paths of such write protected devices generate lots of path enable/disable messages in /etc/vx/dmpevents.log file. RESOLUTION: This issue is addressed by not disabling paths of write protected devices in DMP. * INCIDENT NO:3004823 TRACKING ID:2692012 SYMPTOM: When moving subdisks using vxassist move (or using vxevac command which in turn call vxassist move),if the disk tag are not same for source & destination, the command used to fail with generic message which does not convey exactly why the operation failed. You will see following generic message: VxVM vxassist ERROR V-5-1-438 Cannot allocate space to replace subdisks DESCRIPTION: When moving subdisks using vxassist move, it uses available disks from disk group to move, if no target disk is specified. If these disks have site tag set and value of site tag attribute is not same, then vxassist move is expected to fail. But it fails with generic message that does not specify why the operation failed. Expectation is to introduce message that precisely convey user why the operation failed. RESOLUTION: New message is introduced which precisely conveys that disk failure is due to site tag attribute mismatch. You will see following message along with the generic message that conveys the actual reason for failure: VxVM vxassist ERROR V-5-1-0 Source and/or target disk belongs to site,can not move over sites * INCIDENT NO:3004852 TRACKING ID:2886333 SYMPTOM: "vxdg(1M) join" command allowed mixing clone and non-clone disk group. Subsequent import of new joined disk group fails. DESCRIPTION: Mixing of clone and non-clone disk group is not allowed. The part of the code where join operation is done is not validating the mix of clone and non-clone disk group and it was going ahead with the operation. This resulted in the new joined disk group having mix of clone & non-clone disks. Subsequent import of new joined disk group fails. RESOLUTION: During disk group join operation, both the disk groups are checked, if there is a mix of clone and non-clone disk group found, the join operation is failed. * INCIDENT NO:3005921 TRACKING ID:1901838 SYMPTOM: After addition of a license key that enables multi-pathing, the state of the controller is still shown as DISABLED in the vxdmpadm CLI output. DESCRIPTION: When the multi-pathing license key is added, the state of active paths of a LUN is changed to ENABLED but the state of the controller is not updated. RESOLUTION: As a fix, whenever multipathing license key is installed, the operation updates the state of the controller in addition to that of the LUN paths. * INCIDENT NO:3006262 TRACKING ID:2715129 SYMPTOM: Vxconfigd hangs during Master takeover in a CVM (Clustered Volume Manager) environment. This results in vx command hang. DESCRIPTION: During Master takeover, VxVM (Veritas Volume Manager) kernel signals Vxconfigd with the information of new Master. Vxconfigd then proceeds with a vxconfigd- level handshake with the nodes across the cluster. Before kernel could signal to vxconfigd, vxconfigd handshake mechanism got started, resulting in the hang. RESOLUTION: Code changes are done to ensure that vxconfigd handshake gets started only upon receipt of signal from the kernel. * INCIDENT NO:3011391 TRACKING ID:2965910 SYMPTOM: vxassist dumps core with following stack: setup_disk_order() volume_alloc_basic_setup() fill_volume() setup_new_volume() make_trans() vxvmutil_trans() trans() transaction() do_make() main() DESCRIPTION: When -o ordered is used, vxassist handles non-disk parameters in a different way. This scenario may result in invalid comparison, leading to a core dump. RESOLUTION: Code changes are made to handle the parameter comparison logic properly. * INCIDENT NO:3011444 TRACKING ID:2398416 SYMPTOM: vxassist dumps core with the following stack: merge_attributes() get_attributes() do_make() main() _start() DESCRIPTION: vxassist dumps core while creating volume when attribute 'wantmirror=ctlr' is added to the '/etc/default/vxassist' file. vxassist reads this default file initially and uses the attributes specified to allocate the storage during the volume creation. However, during the merging of attributes specified in the default file, it accesses NULL attribute structure causing the core dump. RESOLUTION: Necessary code changes have been done to check the attribute structure pointer before accessing it. * INCIDENT NO:3020087 TRACKING ID:2619600 SYMPTOM: Live migration of virtual machine having SFHA/SFCFSHA stack with data disks fencing enabled, causes service groups configured on virtual machine to fault. DESCRIPTION: After live migration of virtual machine having SFHA/SFCFSHA stack with data disks fencing enabled is done, I/O fails on shared SAN devices with reservation conflict and causes service groups to fault. Live migration causes SCSI initiator change. Hence I/O coming from migrated server to shared SAN storage fails with reservation conflict. RESOLUTION: Code changes are added to check whether the host is fenced off from cluster. If host is fenced off, then registration key is re-registered for dmpnode through migrated server and restart IO. * INCIDENT NO:3025973 TRACKING ID:3002770 SYMPTOM: The system panics with the following stack trace: vxdmp:dmp_aa_recv_inquiry vxdmp:dmp_process_scsireq vxdmp:dmp_daemons_loop unix:thread_start DESCRIPTION: The panic happens while handling the SCSI response for SCSI Inquiry command. In order to determine if the path on which SCSI Inquiry command was issued is read-only, the code needs to check the error buffer. However the error buffer is not always prepared. So the code should examine if the error buffer is valid before further checking. Without such error buffer examination, the system may panic with NULL pointer. RESOLUTION: The source code is modified to verify the error buffer to be valid. * INCIDENT NO:3026288 TRACKING ID:2962262 SYMPTOM: When DMP Native Stack support is enabled and some devices are being managed by a multipathing solution other than DMP, then uninstalling DMP fails with an error for not being able to turn off DMP Native Stack support. Performing DMP prestop tasks ...................................... Done The following errors were discovered on the systems: CPI ERROR V-9-40-3436 Failed to turn off dmp_native_support tunable on pilotaix216. Refer to Dynamic Multi-Pathing Administrator's guide to determine the reason for the failure and take corrective action. VxVM vxdmpadm ERROR V-5-1-15690 Operation failed for one or more volume groups The CLI 'vxdmpadm settune dmp_native_support=off' also fails with following error. # vxdmpadm settune dmp_native_support=off VxVM vxdmpadm ERROR V-5-1-15690 Operation failed for one or more volume groups DESCRIPTION: With DMP Native Stack support it is expected that devices which are being used by LVM are multipathed by DMP. Co-existence with other multipath solutions in such cases is not supported. Having some other multipath solution results in this error. RESOLUTION: Code changes have been made to not error out while turning off DMP Native Support if device is not being managed by DMP. * INCIDENT NO:3027482 TRACKING ID:2273190 SYMPTOM: The device discovery commands 'vxdisk scandisks' or 'vxdctl enable' issued just after license key installation may fail and abort. DESCRIPTION: After addition of license key that enables multi-pathing, the state of paths maintained at user level is incorrect. RESOLUTION: As a fix, whenever multi-pathing license key is installed, the operation updates the state of paths both at user level and kernel level. INCIDENTS FROM OLD PATCHES: --------------------------- ~