* * * READ ME * * * * * * Veritas Volume Manager 6.0.3 * * * * * * Hot Fix 200 * * * Patch Date: 2013-12-13 This document provides the following information: * PATCH NAME * OPERATING SYSTEMS SUPPORTED BY THE PATCH * PACKAGES AFFECTED BY THE PATCH * BASE PRODUCT VERSIONS FOR THE PATCH * SUMMARY OF INCIDENTS FIXED BY THE PATCH * DETAILS OF INCIDENTS FIXED BY THE PATCH * INSTALLATION PRE-REQUISITES * INSTALLING THE PATCH * REMOVING THE PATCH PATCH NAME ---------- Veritas Volume Manager 6.0.3 Hot Fix 200 OPERATING SYSTEMS SUPPORTED BY THE PATCH ---------------------------------------- Solaris 11 X86 PACKAGES AFFECTED BY THE PATCH ------------------------------ VRTSvxvm BASE PRODUCT VERSIONS FOR THE PATCH ----------------------------------- * Veritas Storage Foundation for Oracle RAC 6.0.1 * Veritas Storage Foundation Cluster File System 6.0.1 * Veritas Storage Foundation 6.0.1 * Veritas Storage Foundation High Availability 6.0.1 * Veritas Dynamic Multi-Pathing 6.0.1 * Symantec VirtualStore 6.0.1 SUMMARY OF INCIDENTS FIXED BY THE PATCH --------------------------------------- Patch ID: 6.0.300.200 * 3358311 (3152769) DMP Path failover takes time on Solaris LDOM environment when 1 I/O domain is down. * 3358313 (3194358) The continuous messages displayed in the syslog file with EMC not-ready (NR) LUNs. * 3358342 (2724067) Enhance vxdisksetup CLI to specify disk label content to label all corresponding paths for DMP device * 3358345 (2091520) The ability to move the configdb placement from one disk to another using "vxdisk set keepmeta=[always|skip|default]" command. * 3358346 (3353211) Once the BCV device came back to RW state, the OPEN mode of one of the paths is changing from NDELAY to NORMAL, while other path is still retaining the NDELAY mode. * 3358347 (3057554) vxdisk shred fails with EFI disks on Solaris X86 platform. * 3358348 (2665425) Enhance the vxdisk -px list CLI interface to report vxvm disk attribute information * 3358350 (3189830) To have provision for -f option in mirror operation of vxdiskadm command * 3358351 (3158320) [VxVM] command "vxdisk -px REPLICATED list" shows the same as "vxdisk -px REPLICATED_TYPE list" * 3358352 (3326964) VxVM hangs in CVM environments in presence of FMR operations. * 3358353 (3271315) vxdisk shred fails on Solaris X86 platform for SMI Sliced format disks * 3358354 (3332796) Getting message: VxVM vxisasm INFO V-5-1-0 seeking block #... while initializing disk that is not ASM disk. * 3358357 (3277258) BAD TRAP panic in vxio:vol_mv_pldet_callback * 3358367 (3230148) Clustered Volume Manager (CVM) hangs during split brain testing. * 3358368 (3249264) Disks get into 'ERROR' state after being destroyed with the command 'vxdg destroy ' * 3358369 (3250369) vxdisk scandisks causes endless messages in syslog. * 3358370 (2921147) udid_mismatch flag is absent on a clone disk when source disk is unavailable. * 3358371 (3125711) Secondary node panics it is rebooted while reclaim was in progress on primary. * 3358372 (3156295) The permission and owner of /dev/raw/raw# device is wrong after reboot. * 3358374 (3237503) System hang may happen after creating space-optimized snapshot with large size cache volume. * 3358376 (3116990) Syslog message is filled with lots of extra write protected message * 3358377 (3199398) Output of the command "vxdmpadm pgrrereg" depends on the order of DMP node list where the terminal output depends on the last LUN (DMP node) * 3358379 (1783763) In a Veritas Volume Replicator (VVR) environment, the vxconfigd(1M) daemon may hang during a configuration change operation. * 3358380 (2152830) A diskgroup (DG) import fails with a non-descriptive error message when multiple copies (clones) of the same device exist and the original devices are either offline or not available. * 3358381 (2859470) The Symmetrix Remote Data Facility R2 (SRDF-R2) with the Extensible Firmware Interface (EFI) label is not recognized by Veritas Volume Manager (VxVM) and goes in an error state. * 3358382 (3086627) "vxdisk -o thin, fssize list" command fails with error: VxVM vxdisk ERROR V-5-1-16282 Cannot retrieve stats: Bad address * 3358404 (3021970) Secondary Master panic while running IO load running * 3358405 (3026977) DR (Dynamic Reconfiguration) option with 'vxdiskadm' removes Luns even which are not in Failing/Unusable state. * 3358407 (3107699) [VxVM]system panic cause from vxdmp when system shutdown * 3358408 (3115206) Solaris 11 ldom, VxVM 6.0.3 with ZFS root; panic in DMP module. * 3358414 (3139983) Failed I/Os from SCSI are retried only on very few paths to a LUN instead of utilizing all the available paths * 3358416 (3312162) Verification of data on DR site reports differences even though replication is up-to-date. * 3358417 (3325122) In CVR environment, creation of stripe-mirror volume with logtype=dcm failed * 3358418 (3283525) DCO corruption after volume resize leads to vxconfigd hang * 3358420 (3236773) Multiple error messages of format "vxdmp V-5-3-0 dmp_indirect_ioctl: Ioctl Failed" can be seen during set/get failover-mode for EMC ALUA disk array. * 3358423 (3194305) Replication status goes in paused state since the vxstart_vvr start does not start the vxnetd daemon automatically on secondary side. * 3358429 (3300418) Volume operations (vxassist, vxsnap..) creates IO request to all drives in DG * 3358430 (3258276) DMP Paths keep huge layer open number which causes ssd driver's total open number overflows (0x80000000) * 3358433 (3301470) All CVR nodes panic repeatedly due to null pointer dereference in vxio * 3366688 (2957645) vold restart on linux caused terminal flooded with cvm related error messages. * 3366703 (3056311) For release < 5.1 SP1, allow disk initialization with CDS format using raw geometry. * 3367778 (3152274) "dd" command to SRDF-R2 (write disable)device hang, and leads "vm" command hang for long time. but no issue with OS devices * 3368234 (3236772) resizesrl and resizevol operations are getting failed intermittently, with error "vradmin ERROR Lost connection to host" * 3368236 (3327842) "vradmin verifydata" failed with "Lost connection to ; terminating command execution." * 3371422 (3087893) EMC TPD emcpower names are changing every reboot with VxVM * 3374166 (3325371) Panic occurs in the vol_multistepsio_read_source() function when snapshots are used. * 3376953 (3372724) Failed install of VxVM (Veritas Volume Manager) panics the server. * 3387405 (3019684) IO hang on master while SRL is about to overflow * 3387417 (3107741) vxrvg snapdestroy fails with "Transaction aborted waiting for io drain" error and vxconfigd hangs for around 45 minutes Patch ID: 6.0.300.100 * 2892702 (2567618) The VRTSexplorer dumps core in vxcheckhbaapi/print_target_map_entry. * 3090670 (3090667) The system panics or hangs while executing the "vxdisk -o thin, fssize list" command as part of Veritas Operations Manager (VOM) Storage Foundation (SF) discovery. * 3099508 (3087893) EMC TPD emcpower names are changing every reboot with VxVM * 3133012 (3160973) vxlist hangs while detecting foreign disk format on EFI disk. * 3140411 (2959325) The vxconfigd(1M) daemon dumps core while performing the disk group move operation. * 3150893 (3119102) Support LDOM Live Migration with fencing enabled * 3156719 (2857044) System crash on voldco_getalloffset when trying to resize filesystem. * 3159096 (3146715) 'Rlinks' do not connect with NAT configurations on Little Endian Architecture. * 3254227 (3182350) VxVM volume creation or size increase hangs * 3254229 (3063378) VM commands are slow when Read Only disks are presented * 3254427 (3182175) "vxdisk -o thin, fssize list" command can report incorrect File System usage data * 3280555 (2959733) Handling device path reconfiguration incase the device paths are moved across LUNs or enclosures to prevent vxconfigd coredump. * 3294641 (3107741) vxrvg snapdestroy fails with "Transaction aborted waiting for io drain" error and vxconfigd hangs for around 45 minutes * 3294642 (3019684) IO hang on master while SRL is about to overflow Patch ID: 6.0.300.000 * 2853712 (2815517) vxdg adddisk allows mixing of clone & non-clone disks in a DiskGroup. * 2860207 (2859470) EMC SRDF (Symmetrix Remote Data Facility) R2 disk with EFI label is not recognized by VxVM (Veritas Volume Manager) and its shown in error state. * 2863672 (2834046) NFS migration failed due to device reminoring. * 2863708 (2836528) Unable to grow LUN dynamically on Solaris x86 using "vxdisk resize" command. * 2876865 (2510928) The extended attributes reported by "vxdisk -e list" for the EMC SRDF luns are reported as "tdev mirror", instead of "tdev srdf-r1". * 2892499 (2149922) Record the diskgroup import and deport events in syslog * 2892571 (1856733) Support for FusionIO on Solaris x64 * 2892590 (2779580) Secondary node gives configuration error (no Primary RVG) after reboot of master node on Primary site. * 2892621 (1903700) Removing mirror using vxassist does not work. * 2892630 (2742706) Panic due to mutex not being released in vxlo_open * 2892643 (2801962) Growing a volume takes significantly large time when the volume has version 20 DCO attached to it. * 2892650 (2826125) VxVM script daemon is terminated abnormally on its invocation. * 2892660 (2000585) vxrecover doesn't start remaining volumes if one of the volumes is removed during vxrecover command run. * 2892665 (2807158) On Solaris platform, sometimes system can hang during VM upgrade or patch installation. * 2892682 (2837717) "vxdisk(1M) resize" command fails if 'da name' is specified. * 2892684 (1859018) "link detached from volume" warnings are displayed when a linked-breakoff snapshot is created * 2892689 (2836798) In VxVM, resizing simple EFI disk fails and causes system panic/hang. * 2892698 (2851085) DMP doesn't detect implicit LUN ownership changes for some of the dmpnodes * 2892702 (2567618) VRTSexplorer coredumps in checkhbaapi/print_target_map_entry. * 2892716 (2753954) When a cable is disconnected from one port of a dual-port FC HBA, the paths via another port are marked as SUSPECT PATH. * 2922770 (2866997) VxVM Disk initialization fails as an un-initialized variable gets an unexpected value after OS patch installation. * 2922798 (2878876) vxconfigd dumps core in vol_cbr_dolog() due to race between two threads processing requests from the same client. * 2924117 (2911040) Restore from a cascaded snapshot leaves the volume in unusable state if any cascaded snapshot is in detached state. * 2924188 (2858853) After master switch, vxconfigd dumps core on old master. * 2924207 (2886402) When re-configuring devices, vxconfigd hang is observed. * 2930399 (2930396) The vxdmpasm command (in 5.1SP1 release) and the vxdmpraw command (in 6.0 release) do not work on Solaris platform. * 2933467 (2907823) If the user removes the lun at the storage layer and at VxVM layer beforehand, DMP DR tool is unable to clean-up cfgadm (leadville) stack. * 2933468 (2916094) Enhancements have been made to the Dynamic Reconfiguration Tool(DR Tool) to create a separate log file every time DR Tool is started, display a message if a command takes longer time, and not to list the devices controlled by TPD (Third Party Driver) in 'Remove Luns' option of DR Tool. * 2933469 (2919627) Dynamic Reconfiguration tool should be enhanced to remove LUNs feasibly in bulk. * 2934259 (2930569) The LUNs in 'error' state in output of 'vxdisk list' cannot be removed through DR(Dynamic Reconfiguration) Tool. * 2940447 (2940446) Full fsck hangs on I/O in VxVM when cache object size is very large * 2941167 (2915751) Solaris machine panics during dynamic lun expansion of a CDS disk. * 2941193 (1982965) vxdg import fails if da-name is based on naming scheme which is different from the prevailing naming scheme on the host * 2941226 (2915063) Rebooting VIS array having mirror volumes, master node panicked and other nodes CVM FAULTED * 2941234 (2899173) vxconfigd hang after executing command "vradmin stoprep" * 2941237 (2919318) The I/O fencing key value of data disk are different and abnormal in a VCS cluster with I/O fencing. * 2941252 (1973983) vxunreloc fails when dco plex is in DISABLED state * 2942166 (2942609) Message displayed when user quits from Dynamic Reconfiguration Operations is shown as error message. * 2944708 (1725593) The 'vxdmpadm listctlr' command has to be enhanced to print the count of device paths seen through the controller. * 2944710 (2744004) vxconfigd is hung on the VVR secondary node during VVR configuration. * 2944714 (2833498) vxconfigd hangs while reclaim operation is in progress on volumes having instant snapshots * 2944717 (2851403) System panic is seen while unloading "vxio" module. This happens whenever VxVM uses SmartMove feature and the "vxportal" module gets reloaded (for e.g. during VxFS package upgrade) * 2944722 (2869594) Master node panics due to corruption if space optimized snapshots are refreshed and 'vxclustadm setmaster' is used to select master. * 2944724 (2892983) vxvol dumps core if new links are added while the operation is in progress. * 2944725 (2910043) Avoid order 8 allocation by vxconfigd in node reconfig. * 2944727 (2919720) vxconfigd core in rec_lock1_5() * 2944729 (2933138) panic in voldco_update_itemq_chunk() due to accessing invalid buffer * 2944741 (2866059) Improving error messages hit during vxdisk resize operation * 2962257 (2898547) vradmind on VVR Secondary Site dumps core, when Logowner Service Group on VVR (Veritas Volume Replicator) Primary Site is shuffled across its CVM (Clustered Volume Manager) nodes. * 2964567 (2964547) About DMP message - cannot load module 'misc/ted'. * 2974870 (2935771) In VVR environment, RLINK disconnects after master switch. * 2976946 (2919714) On a THIN lun, vxevac returns 0 without migrating unmounted VxFS volumes. * 2976956 (1289985) vxconfigd core dumps upon running "vxdctl enable" command * 2976974 (2875962) During the upgrade of VRTSaslapm package, a conflict is encountered with VRTSvxvm package because an APM binary is included in VRTSvxvm package which is already installed * 2978189 (2948172) Executing "vxdisk -o thin, fssize list" command can result in panic. * 2979767 (2798673) System panics in voldco_alloc_layout() while creating volume with instant DCO * 2983679 (2970368) Enhance handling of SRDF-R2 Write-Disabled devices in DMP. * 3004823 (2692012) vxevac move error message needs to be enhanced to be less generic and give clear message for failure. * 3004852 (2886333) vxdg join command should not allow mixing clone & non-clone disks in a DiskGroup * 3005921 (1901838) Incorrect setting of Nolicense flag can lead to dmp database inconsistency. * 3006262 (2715129) Vxconfigd hangs during Master takeover in a CVM (Clustered Volume Manager) environment. * 3011391 (2965910) Volume creation with vxassist using "-o ordered alloc=<disk-class>" dumps core. * 3011444 (2398416) vxassist dumps core while creating volume after adding attribute "wantmirror=ctlr" in default vxassist rulefile * 3020087 (2619600) Live migration of virtual machine having SFHA/SFCFSHA stack with data disks fencing enabled, causes service groups configured on virtual machine to fault. * 3025973 (3002770) Accessing NULL pointer in dmp_aa_recv_inquiry() caused system panic. * 3026288 (2962262) Uninstall of dmp fails in presence of other multipathing solutions * 3027482 (2273190) Incorrect setting of UNDISCOVERED flag can lead to database inconsistency DETAILS OF INCIDENTS FIXED BY THE PATCH --------------------------------------- This patch fixes the following Symantec incidents: Patch ID: 6.0.300.200 * 3358311 (Tracking ID: 3152769) SYMPTOM: DMP (Dynamic MultiPathing) Path failover takes time on Solaris LDOM environment when 1 I/O domain is down. DESCRIPTION: DMP provides SCSI bypass framework, wherein a SCSI buffer is created and directly sent to HBA bypassing OS SCSI driver layer. It has its own merits in performance and diagnostic purposes in situations of I/O errors. However, in Solaris LDOM environments, when one I/O domain goes down, it takes time as validation causes multiple timeouts in LDOM virtual disk client driver layer. RESOLUTION: Code changes are made to not use the SCSI bypass framework in LDOM environment. * 3358313 (Tracking ID: 3194358) SYMPTOM: Continuous messages in syslog with EMC not-ready (NR) Logical units. Messages from syslog (/var/adm/messages): May 10 18:40:43 scsi: [ID 107833 kern.warning] WARNING: /pci@1e, 600000/SUNW, jfca@2/fp@0, 0/ssd@w5006048c5368e580, 16c (ssd144): May 10 18:40:43 drive offline May 10 18:40:43 scsi: [ID 107833 kern.warning] WARNING: /pci@1e, 600000/SUNW, jfca@2/fp@0, 0/ssd@w5006048c536979a0, 127 (ssd392): May 10 18:40:43 drive offline May 10 18:40:43 scsi: [ID 107833 kern.warning] WARNING: /pci@1e, 600000/SUNW, jfca@2/fp@0, 0/ssd@w5006048c5368e5a0, 16b (ssd270): May 10 18:40:43 drive offline May 10 18:40:43 i/o error occurred (errno=0x5) on dmpnode 201/0x1c0 DESCRIPTION: VxVM tries to online the EMC not-ready (NR) logical units. As part of the disk online process, it tries to read the disk label from the logical unit. Because the logical unit is NR the I/O fails. The failure messages are displayed in the syslog file. RESOLUTION: The code is modified to skip the disk online for the EMC NR LUNs. * 3358342 (Tracking ID: 2724067) SYMPTOM: Assume format(1) is run on a disk to change the label type from EFI to SMI prior to invoking vxdisksetup. This used to result in failure of vxdisksetup. DESCRIPTION: Prior to this enhancement, vxdisksetup used to initialize the disk with specified VxVM layout (e.g. simple, sliced or cdsdisk) using the pre-existing label type (e.g. smi or efi). We used to require users to change the label to the desired label type using format(1) prior to invoking vxdisksetup. When format(1) is run on a single physical path to change the label type from EFI to SMI, format (1) used to update the device special file on that single physical path only; this resulted in i/o failure on the remaining physical paths and thereby causing vxdisksetup to fail. RESOLUTION: vxdisksetup is enhanced such that label type can be specified along with VxVM disk layout. vxdisksetup invokes format(1) on a single physical path to change label type. Then, it invokes dd(1) on the remaining physical paths during the very early phase. In turn, dd(1) invokes open(2) on each of the remaining physical paths. open(2) updates device special files on each of those physical paths to reflect the latest label type. * 3358345 (Tracking ID: 2091520) SYMPTOM: Customers cannot selectively disable VxVM configuration copies on the disks associated with a disk group. DESCRIPTION: An enhancement is required to enable customers to selectively disable VxVM configuration copies on disks associated with a disk group. RESOLUTION: The code is modified to provide a "keepmeta=skip" option to the vxdiskset(1M) command to allow a customer to selectively disable VxVM configuration copies on disks that are a part of the disk group. * 3358346 (Tracking ID: 3353211) SYMPTOM: A. After EMC Symmetrix BCV (Business Continuance Volume) device switches to read- write mode, continuous vxdmp (Veritas Dynamic Multi Pathing) error messages flood syslog as shown below: NOTE VxVM vxdmp V-5-3-1061 dmp_restore_node: The path 18/0x2 has not yet aged - 299 NOTE VxVM vxdmp 0 dmp_tur_temp_pgr: open failed: error = 6 dev=0x24/0xD0 NOTE VxVM vxdmp V-5-3-1062 dmp_restore_node: Unstable path 18/0x230 will not be available for I/O until 300 seconds NOTE VxVM vxdmp V-5-3-1061 dmp_restore_node: The path 18/0x2 has not yet aged - 299 NOTE VxVM vxdmp V-5-0-0 [Error] i/o error occurred (errno=0x6) on dmpnode 36/0xD0 .. .. B. DMP metanode/path under DMP metanode getting disabled unexpectedly DESCRIPTION: A. DMP caches the last discovery NDELAY open for the BCV dmpnode paths. BCV device switching to read-write mode is an array side operation. Typically in such cases system admins are required to run 1. vxdisk rm OR In case of parallel backup jobs, 1. vxdisk offline 2. vxdisk online This would cause DMP to close cached open and during next discovery, device is opened in read-write mode. If the above steps are skipped then it would cause DMP device to go in state where 1 of the path is in read-write mode and others remain in NDELAY mode. If the above layers request for NORMAL open then DMP has the code to close NDELAY cached open and reopen in NORMAL mode. During the online of dmpnode this happens only for 1 of the paths of dmpnode. B. DMP performs error analysis for paths on which I/O failed. In some cases the SCSI probes sent, failed with the return value/sense codes that was not handled by DMP, causing the paths to get disabled. RESOLUTION: A. DMP EMC ASL (Array Support Library) is modified to handle case A for EMC Symmetrix arrays. B. DMP code is modified to handle SCSI conditions correctly for case B. * 3358347 (Tracking ID: 3057554) SYMPTOM: The VxVM (Veritas Volume Manager) command "vxdiskunsetup -o shred =1" failed for EFI (Extensible Firmware Interface) disks on Solaris X86 system. DESCRIPTION: Disk shred is failing on Solaris X86 system for EFI disks because I/O to last sector fails. Solaris X86 OS has an issue where last sector of a LUN cannot be written. RESOLUTION: The code change skip last sector while shredding, this will make shred pass with EFI disk. * 3358348 (Tracking ID: 2665425) SYMPTOM: The "vxdisk -px list" CLI does not support some basic vxvm attributes, nor does it allow a user to specify multiple attributes in a specific sequence. The display layout was not presented in a readable or parsable manner. DESCRIPTION: Some basic vxvm disk attributes are not supported by the "vxdisk -px list" CLI which are useful for customizing the command output. The display output is also not aligned by column or suitable for parsing by a utility. In addition, the CLI does not allow multiple attributes to be specified in a usable manner. RESOLUTION: Support for the following vxvm disk attributes were added to the CLI: SETTINGS ALERTS INFO HOSTID DISK_TYPE FORMAT DA_INFO PRIV_OFF PRIV_LEN PUB_OFF PUB_LEN PRIV_UDID DG_NAME DGID DG_STATE DISKID DISK_TIMESTAMP STATE The CLI has been enhanced to support multiple attributes separated by a comma, and to align the display output by a column, separable by a comma for parsability. For example: # vxdisk -px ENCLOSURE_NAME, DG_NAME, LUN_SIZE, SETTINGS, state list DEVICE ENCLOSURE_NAME DG_NAME LUN_SIZE SETTINGS STATE sda disk - 143374650 - online sdb disk - 143374650 - online sdc storwizev70000 fencedg 10485760 thinrclm, coordinator online * 3358350 (Tracking ID: 3189830) SYMPTOM: User cannot specify force (-f) option while using "Mirror volumes on a disk" operation of vxdiskadm command and has to execute other command (vxmirror) for force (-f) option manually. DESCRIPTION: While using "Mirror volumes on a disk" option of vxdiskadm command for root disk, if disk geometry does not match then command fails. Currently, vxdiskadm command does not provide any way to specify -f option for this operation. So, user is required to manually execute vxmirror command with force (-f) option if they want to use it. RESOLUTION: Code changes are made to allow user to specify force (-f) option through vxdiskadm interface. * 3358351 (Tracking ID: 3158320) SYMPTOM: VxVM (Veritas Volume Manager) command "vxdisk -px REPLICATED list " shows wrong output. DESCRIPTION: "vxdisk -px REPLICATED list " shows the same output as "vxdisk -px REPLICATED_TYPE list " and doesn't work as designed to show the values as "yes", "no" or "-". It is because the command line parameter specified is parsed wrongly so that the REPLICATED attribute is wrongly dealt as REPLICATED_TYPE. RESOLUTION: Code changes have made to deal "REPLICATED" attribute correctly. * 3358352 (Tracking ID: 3326964) SYMPTOM: VxVM (Veritas Volume Manager) hangs in CVM (Clustered Volume Manager) environments in presence of FMR/Flashsnap (Fast Mirror Resync) operations. DESCRIPTION: During split brain testing in presence of FMR activities, when there are errors on the DCO (Data change object), the DCO error handling code hooks up the CPU as the same error gets set again in its handler. This causes the VxVM SIO (Staged IO) loop around the same code, thus, causing the hang. RESOLUTION: Code changes are made to appropriately handle the error prone scenario without causing an infinite loop. * 3358353 (Tracking ID: 3271315) SYMPTOM: Command 'vxdiskunsetup' with 'shred' option fails to shred sliced/simple disks on Solaris X86 platform. Errors of following format can be seen: VxVM vxdisk ERROR V-5-1-16576 disk_shred: Shred failed one or more writes VxVM vxdisk ERROR V-5-1-16658 disk_shred: Shred wrote 1 pages, of which 1 encountered errors DESCRIPTION: The error occurs as the check for 'disk size' returns incorrect value for sliced/simple disks on Solaris x86, as the VTOC values for sliced/simple disks are stored on the second sector instead of first. RESOLUTION: Code changes have been made to check for the second sector as well if no valid VTOC is found on the first sector. * 3358354 (Tracking ID: 3332796) SYMPTOM: The following message is seen while initializing any EFI disk, though the disk was not used previously as ASM disk. "VxVM vxisasm INFO v-5-1-0 seeking block #... " DESCRIPTION: As a part of disk initialization for every EFI disk, VxVM will check if an EFI disk has ASM label. "VxVM vxisasm INFO v-5-1-0 seeking block #..." is printed unconditionally and this is unreasonable. RESOLUTION: Code changes have been made to not display the message. * 3358357 (Tracking ID: 3277258) SYMPTOM: When DRL ( Dirty Region Log ) was off and detach of plex was attempted on a mirrored volume, the below panic happened : Unix:panicsys+0x48() unix:vpanic_common+0x78() unix:panic+0x1c() unix:die+0x78() unix:trap+0x9e0() unix:ktl0+0x48() -- trap data type: 0x31 (data access MMU miss) rp: 0x2a1033396d0 -- addr: 0xf0 pc: 0x7c589f8c vxio:vol_mv_pldet_callback+0x94: lduw [%o1 + 0xf0], %i2 npc: 0x7c589f90 vxio:vol_mv_pldet_callback+0x98: andcc %i2, %i5 ( btst %i2 %i5 ) ----------------------------------------------------------------------------- vxio:vol_mv_pldet_callback+0x94() vxio:vol_klog_start+0x98() vxio:voliod_iohandle+0x30() vxio:voliod_loop+0x3e0() unix:thread_start+0x4() -- end of kernel thread's stack -- DESCRIPTION: In the plex detach code, there was no conditional check for the presence of DRL object before checking for DRL version. Here passing a NULL value to check for DRL version results in system panic. RESOLUTION: DRL version check is removed since it is not necessary to have in place. Further code itself handles version check. * 3358367 (Tracking ID: 3230148) SYMPTOM: Clustered Volume Manager (CVM) hangs during split brain testing. DESCRIPTION: During split brain testing in presence of FMR activities, a read-writeback operation/SIO (Staged IO) can be issued as part of DCO (Data change object) chunk update. This SIO tries to read from plex1, and when this read fails, it reads from other available plex(es) and performs a write on all other plexes. As the other plex has already failed, write operation also fails and gets retried with IOSHIPPING, which also fails due to unavailability of the plex from other nodes as well (because of split brain testing). As remote plex is unavailable, write will fail again and serialization is called again on this sio during which system hangs due to mismatch in active and serial counts. RESOLUTION: Code changes have been done to take care of active/serial counts when the SIOs are restarted with IOSHIPPING. * 3358368 (Tracking ID: 3249264) SYMPTOM: Thin Disks and Disk groups containing thin disk goes in ERROR state or lose the configuration copy after a reclaim operation is performed on the disk. The following are the commonly observed DESCRIPTION: Disks of disk groups with volumes created with option init=zero on thin reclaim disks, formatted as sliced, get into the ERROR state after being destroyed with the vxdg destroy command. As the partition offset is not taken into consideration for these types of disks, the private region data is lost resulting in the disks going into the Error state. RESOLUTION: The code is modified to consider disk_offset during operations on disks formatted as sliced. * 3358369 (Tracking ID: 3250369) SYMPTOM: The following command is triggering number of events. # vxdisk scandisks DESCRIPTION: Execution of the command triggers a re-online of all the disks, which involves reading of the private region from all the disks. Failure of these read IOs generate error events, which are notified to all the clients waiting on "vxnotify". One of such clients is "vxattachd" daemon. The daemon initiates a "vxdisk scandisks", when the number of events are more than 256. Thus "vxattachd" is initiating another cycle of above activity resulting in endless events. RESOLUTION: The count value which triggers the vxattachd daemon is changed from 256 to 1024. The DMP events are sub categorized further, as per the requirement of vxattachd daemon. * 3358370 (Tracking ID: 2921147) SYMPTOM: The udid_mismatch flag is absent on a clone disk when source disk is unavailable. The 'vxdisk list' command does not show the udid_mismatch flag on a disk. This happens even when the 'vxdisk -o udid list' or 'vxdisk -v list diskname | grep udid' commands show different Device Discovery Layer (DDL) generated and private region unique identifier for disks (UDIDs). DESCRIPTION: When DDL generates the UDID and private region UDID of a disk do not match, Veritas Volume Manager (VxVM) sets the udid_mismatch flag on the disk. This flag is used to detect a disk as clone, which is marked with the clone-disk flag. The vxdisk (1M) utility is used to suppress the display of the udid_mismatch flag if the source Logical Unit Number (LUN) is unavailable on the same host. RESOLUTION: The vxdisk (1M) utility is modified to display the udid_mismatch flag, if it is set on the disk. Display of this flag is no longer suppressed, even when source LUN is unavailable on same host. * 3358371 (Tracking ID: 3125711) SYMPTOM: While reclaim is going on Primary node if secondary node is rebooted then it panics with following stack: do_page_fault page_fault dmp_reclaim_device dmp_reclaim_storage gendmpioctl dmpioctl vol_dmp_ktok_ioctl voldisk_reclaim_region vol_reclaim_disk vol_subdisksio_start voliod_iohandle voliod_loop ... DESCRIPTION: In VVR environment, there was a corner case with reclaim operation on secondary where the reclaim length was calculated incorrectly leading to memory allocation failure. This resulted in a panic RESOLUTION: Modified condition to calculate reclaim length correctly * 3358372 (Tracking ID: 3156295) SYMPTOM: When DMP (Dynamic Multi-pathing) native support is enabled for Oracle ASM (Automatic Storage Management) devices, the permission and ownership of '/dev/raw/raw#' devices goes wrong after reboot. DESCRIPTION: When VxVM binds the DMP (Dynamic Multi-pathing) devices to raw devices during a reboot, it invokes 'raw' command to create raw devices and then tries to set the permission and ownership of them immediately after the 'raw' command is invoked asynchronously. However in some cases the raw device is not yet created at the time when VxVM tries to set the permission and ownership. In that case, eventually the raw device gets created but the correct permission and ownership are not set. RESOLUTION: Code changes are done to set the permission and ownership of the raw devices when DMP gets the OS event which implies the raw device creation is finished. It ensures to set the correct permission and ownership of the raw devices. * 3358374 (Tracking ID: 3237503) SYMPTOM: System hang may happen after creating space-optimized snapshot with large size cache volume. DESCRIPTION: For all changes written to cache volume after snapshot volume is created, a translation map with B+tree structure is used to speed up search/insert/delete operations. When trying to insert a node to the tree, type casting of page offset to 'unsigned int' causes value truncation for the offset beyond maximum 32bit integer. The value truncation corrupts the B+tree thereby resulting in an SIO (VxVM Staged IO) hang. RESOLUTION: Code changes were made to remove all type casting to 'unsigned int' in cache volume code. * 3358376 (Tracking ID: 3116990) SYMPTOM: In case of write protected hardware mirror LUNs , VxVM commands such as 'vxdisk scandisks', 'vxdctl enable' and vxconfigd restart emit following error messages in console logs [..] scsi: [ID 107833 kern.warning] WARNING:/pci@400/pci@2/pci@0/pci@1/pci@0/pci@2/SUNW, qlc@0/fp@0, 0/ssd@w50060e8016 02ba15, 21 (ssd315): Error for Command: write Error Level: Fatal scsi: [ID 107833 kern.notice] Requested Block: 0 Error Block: 0 scsi: [ID 107833 kern.notice] Sense Key: Write_Protected scsi: [ID 107833 kern.notice] ASC: 0x22 (illegal function), ASCQ: 0x0, FRU: 0x0 scsi: [ID 107833 kern.warning] [..] DESCRIPTION: As part of the VxVM commands disk online operation is ran on write protected disks. Online operation updates disk label if the geometry info in disk label is stale or no disk label is present. Such disk writes causes SCSI driver to report errors for write protected LUNs. RESOLUTION: Code changes were made within VxVM to inform SCSI to suppress these error messages. * 3358377 (Tracking ID: 3199398) SYMPTOM: Output of the command "vxdmpadm pgrrereg" depends on the order of DMP (Dynamic MultiPathing) node list where the terminal output depends on the last LUN (DMP node). 1. Terminal message when PGR (Persistent Group Reservation) re-registration is succeeded on the last LUN # vxdmpadm pgrrereg VxVM vxdmpadm INFO V-5-1-0 DMP PGR re-registration done for ALL PGR enabled dmpnodes. 2. Terminal message when PGR re-registration is failed on the last LUN # vxdmpadm pgrrereg vxdmpadm: Permission denied DESCRIPTION: "vxdmpadm pgrrereg" command has been introduced to support the facility to move a guest OS on one physical node to another node. In Solaris LDOM environment, the feature is called "Live Migration". When a customer is using I/O fencing feature and a guest OS is moved to another physical node, I/O will not be succeeded in the guest OS after the physical node migration because each DMP nodes of the guest OS doesn't have a valid SCSI-3 PGR key as the physical HBA is changed. This command will help on re-registering the valid PGR keys for new physical nodes, however its command output is depending on the last LUN (DMP node). RESOLUTION: Code changes are done to log the re-registration failures in System log file. Terminal output now instructs to look into the system log when an error is seen on a LUN. * 3358379 (Tracking ID: 1783763) SYMPTOM: In a VVR environment, the vxconfigd(1M) daemon may hang during a configuration change operation. The following stack trace is observed: delay vol_rv_transaction_prepare vol_commit_iolock_objects vol_ktrans_commit volconfig_ioctl volsioctl_real volsioctl vols_ioctl ... DESCRIPTION: Incorrect serialization primitives are used. This results in the vxconfigd(1M) daemon to hang. RESOLUTION: The code is modified to use the correct serialization primitives. * 3358380 (Tracking ID: 2152830) SYMPTOM: A diskgroup (DG) import fails with a non-descriptive error message when multiple copies (clones) of the same device exist and the original devices are either offline or not available. For example: # vxdg import mydg VxVM vxdg ERROR V-5-1-10978 Disk group mydg: import failed: No valid disk found containing disk group DESCRIPTION: If the original devices are offline or unavailable, the vxdg(1M) command picks up cloned disks for import.DG import fails unless the clones are tagged and the tag is specified during the DG import. The import failure is expected, but the error message is non-descriptive and does not specify the corrective action to be taken by the user. RESOLUTION: The code is modified to give the correct error message when duplicate clones exist during import. Also, details of the duplicate clones are reported in the system log. * 3358381 (Tracking ID: 2859470) SYMPTOM: The EMC SRDF-R2 disk may go in error state when the Extensible Firmware Interface (EFI) label is created on the R1 disk. For example: R1 site # vxdisk -eo alldgs list | grep -i srdf emc0_008c auto:cdsdisk emc0_008c SRDFdg online c1t5006048C5368E580d266 srdf-r1 R2 site # vxdisk -eo alldgs list | grep -i srdf emc1_0072 auto - - error c1t5006048C536979A0d65 srdf-r2 DESCRIPTION: Since R2 disks are in write protected mode, the default open() call made for the read-write mode fails for the R2 disks, and the disk is marked as invalid. RESOLUTION: The code is modified to change Dynamic Multi-Pathing (DMP) to be able to read the EFI label even on a write-protected SRDF-R2 disk. * 3358382 (Tracking ID: 3086627) SYMPTOM: "vxdisk -o thin, fssize list" command fails with error: VxVM vxdisk ERROR V-5-1-16282 Cannot retrieve stats: Bad address DESCRIPTION: This issue happens when system has more than 200 LUNs. VxVM reads file system statistical information for each lun to generate file system size data. But after reading the information for first 200 luns, buffer was not reset correctly. So, subsequent access to buffer address will generate this error. RESOLUTION: Code changes are done to properly reset buffer address. * 3358404 (Tracking ID: 3021970) SYMPTOM: Secondary node panics due to NULL pointer dereference while freeing an interlock. page_fault volsio_ilock_free vol_rv_inactivate_wsio vol_rv_restart_wsio vol_rv_serialise_sec_logging vol_rv_serialize vol_rv_errorhandler_start voliod_iohandle voliod_loop ... DESCRIPTION: The panic is seen if there is a node crash/node reconfiguration on the Primary. The secondary did not correctly handle the updates for the period of crash correctly and resulted in a panic. RESOLUTION: Necessary code changes have been done to properly handle the freeing of interlock for node crash/reconfigurations on the Primary side. * 3358405 (Tracking ID: 3026977) SYMPTOM: DR (Dynamic Reconfiguration) option with 'vxdiskadm' removes Luns even which are not in Failing/Unusable state. DESCRIPTION: We use "grep" with "-i" option in the script 'SunOS.pm' for retrieving the information of failing/unusable disks. As this option yields multiple entries including unrequired disks, it leads to removal of luns even which are not in failing/unusable state. RESOLUTION: Code changes have been made to retrieve the correct information of failing/unusable disks. * 3358407 (Tracking ID: 3107699) SYMPTOM: System panic in below stack when shutdown/reboot. unix:panic+0x1c() unix:mutex_enter() vxio:volinfo_ioctl+0x140() vxio:volsioctl_real+0x300() genunix:cdev_ioctl() vxdmp:dmp_signal_vold+0x1a8() vxdmp:dmp_throttle_paths+0x424() vxdmp:dmp_process_stats+0x278() vxdmp:dmp_daemons_loop+0x160() unix:thread_start+0x4() DESCRIPTION: In the special scenario of system shutdown/reboot, the DMP (Dynamic Multi- Pathing) I/O statistic daemon tries to call the ioctl functions in VXIO module which is being unloaded. It causes the system panic. RESOLUTION: Code changes are made to stop DMP I/O statistic daemon before system shutdown/reboot. * 3358408 (Tracking ID: 3115206) SYMPTOM: When ZFS root and dmp_native_support is enabled, system paniced with following stack: bp_mapout+0x58() vdev_disk_io_intr+0x70() gendmpiodone+0x1c8() sd_return_command+0x2f0() sdintr+0x3a8() xsvhba_complete_cmd_and_callback+0xb4() xsvhba_process_rsp+0x1204() xsvhba_process_dqp_msg+0x110() xsvhba_process_data_buffers+0xc0() xsvhba_data_recv_handler+4() xstl_chan_recv_handler+0x160() taskq_thread+0x3a8() thread_start+4() DESCRIPTION: when IO completed, in some conditions, operation system will access the "b_shadow" field in the return buffer. Since DMP doesn't return the correct field value to the operating system, the "b_shadow" will be NULL. As a result, system panic due to the NULL pointer. RESOLUTION: Code changes have been made to return the correct value of buffer's "b_shadow" field to the operating system. * 3358414 (Tracking ID: 3139983) SYMPTOM: Failed I/Os from SCSI are retried only on very few paths to a LUN instead of utilizing all the available paths. At times this results in multiple IO retries without success thus DMP sending IO failure to the application bounded by the recoveryoption tunable. The following messages are displayed in the console log: [..] Mon Apr xx 04:18:01.885: I/O analysis done as DMP_PATH_OKAY on Path belonging to Dmpnode Mon Apr xx 04:18:01.885: I/O error occurred (errno=0x0) on Dmpnode [..] DESCRIPTION: When I/O failure is returned to DMP with a retry error from SCSI, DMP retries that IO on another path. However, it failed to choose the path that has the higher probability of successfully handling the IO. RESOLUTION: The code is modified to implement this intelligence of choosing appropriate paths that can successfully process the I/Os during retries. * 3358416 (Tracking ID: 3312162) SYMPTOM: Data Corruption may occur on VVR DR (Secondary) Site. Following signs may indicate corruption: 1) 'vradmin verifydata' reports data differences even though replication is up- to-date. 2) Secondary site may require a Full Fsck after 'Migrate'/'Takeover' Operations. 3) Error messages of following form may appear: Example: msgcnt 21 mesg 017: V-2-17: vx_dirlook - /dev/vx/dsk// file system inode marked bad incore 4) Silent corruption may occur without any visible errors. DESCRIPTION: With Secondary Logging enabled, replicated data on DR site gets written on to its SRL first, and later applied on the corresponding Data Volumes. While the writes from SRL are being flushed on to the data volumes, data corruption might occur, provided all the following conditions occur together: 1) Multiple writes for the same data block must occur in a short span of time, i.e while the given set of SRL writes are being flushed on to its data volumes. (and) 2) Based on relative timing, locks to perform these writes (which occur on the same data block) get granted out of order, thus, leading to writes themselves being applied out of order. RESOLUTION: Code changes have been done to protect write order fidelity in strict order by ensuring that locks are granted in its strict order. * 3358417 (Tracking ID: 3325122) SYMPTOM: In CVR environment, creation of stripe-mirror volume with logtype=dcm failed with following error: VxVM vxplex ERROR V-5-1-10128 Unexpected kernel error in configuration update DESCRIPTION: In layered volumes, DCM plexes are attached to the storage volumes and not to the top level volume. There was error condition not handled correctly in the CVR configuration. RESOLUTION: Modified code to handle the DCM plex placement in the layered volume case. * 3358418 (Tracking ID: 3283525) SYMPTOM: Stopping and Starting the data volume (with an associated DCO volume) results in a vxconfigd hang with the below stack. The data volume has undergone vxresize earlier. #0 [ffff882fdf625708] schedule at ffffffff8143f640 #1 [ffff882fdf625850] volsync_wait at ffffffffa10117a5 [vxio] #2 [ffff882fdf6258c0] volsiowait at ffffffffa10af89b [vxio] #3 [ffff882fdf625940] volpvsiowait at ffffffffa10af968 [vxio] #4 [ffff882fdf625a30] voldco_get_accumulator at ffffffffa1037741 [vxio] #5 [ffff882fdf625a50] voldco_acm_pagein at ffffffffa1037864 [vxio] #6 [ffff882fdf625b30] voldco_write_pervol_maps_instant at ffffffffa103acb0 [vxio] #7 [ffff882fdf625bb0] voldco_write_pervol_maps at ffffffffa101d34d [vxio] #8 [ffff882fdf625c70] volfmr_copymaps_instant at ffffffffa1072c49 [vxio] #9 [ffff882fdf625d00] vol_mv_precommit at ffffffffa10885db [vxio] #10 [ffff882fdf625d40] vol_commit_iolock_objects at ffffffffa107fd9a [vxio] #11 [ffff882fdf625d90] vol_ktrans_commit at ffffffffa1080b80 [vxio] #12 [ffff882fdf625de0] volconfig_ioctl at ffffffffa10f41b9 [vxio] #13 [ffff882fdf625e10] volsioctl_real at ffffffffa10fc513 [vxio] #14 [ffff882fdf625ee0] vols_ioctl at ffffffffa05fe113 [vxspec] #15 [ffff882fdf625f00] vols_compat_ioctl at ffffffffa05fe18c [vxspec] #16 [ffff882fdf625f10] compat_sys_ioctl at ffffffff8119b413 #17 [ffff882fdf625f80] sysenter_dispatch at ffffffff8144aaf0 DESCRIPTION: In the VxVM code, Data Change Object (DCO) Table of Content (TOC) entry was not marked with an appropriate flag which prevents the incore new map size to be flushed to disk. This leads to corruption. A subsequent stop and start of the volume will read the incorrect TOC from disk detecting the corruption and resulting in vxconfigd hang. RESOLUTION: Mark the DCO TOC entry with the appropriate flag which will ensure that the incore data is flushed to disk to prevent the corruption and the subsequent vxconfigd hang. During grow of volume, if grow of paging module fails, DCO TOC may not be updated as per current size and could lead to inconsistent DCO. The fix is to make sure to fail the precommit if paging module grow failed. * 3358420 (Tracking ID: 3236773) SYMPTOM: "vxdmpadm getattr enclosure failovermode" generates multiple "vxdmp V-5-3-0 dmp_indirect_ioctl: Ioctl Failed" error messages in syslog if the enclosure is configured as EMC ALUA. DESCRIPTION: EMC disk array with ALUA mode only supports "implicit" type of failover-mode. Moreover such disk array doesn't support set or get failover-mode. Then any set/get attempts for the failover-mode attribute generate "Ioctl Failed" error messages. RESOLUTION: The code is modified while set/get failover-mode for EMC ALUA hardware configuration. * 3358423 (Tracking ID: 3194305) SYMPTOM: In VVR environment, replication status goes in paused state since the vxstart_vvr start does not start vxnetd daemon automatically on secondary side. vradmin -g vvrdg repstatus vvrvg Replicated Data Set: vvrvg Primary: Host name: Host IP RVG name: vvrvg DG name: vvrdg RVG state: enabled for I/O Data volumes: 1 VSets: 0 SRL name: srlvol SRL size: 5.00 G Total secondaries: 1 Secondary: Host name: Host IP RVG name: vvrvg DG name: vvrdg Data status: consistent, up-to-date Replication status: paused due to network disconnection Current mode: asynchronous Logging to: SRL Timestamp Information: behind by 0h 0m 0s DESCRIPTION: vxnetd daemon is seen stopped on secondary as a result of which replication status is seen paused on primary. vxnetd needs to start gracefully on secondary for the replication to be in proper state. RESOLUTION: Necessary code changes have been done to implement internal retry able mechanism for starting vxnetd. * 3358429 (Tracking ID: 3300418) SYMPTOM: VxVM volume operations on shared volumes cause unnecessary read I/Os against disks that have both config copy and log copy disabled on slaves. DESCRIPTION: The unnecessary disk read I/Os are generated on slaves while refreshing private region info into memory during VxVM transaction, no need to refresh private region info when the disk already has config copy and log copy disabled. RESOLUTION: Code changes are made to skip the refreshing if both config copy and log copy are disabled on master and slaves. * 3358430 (Tracking ID: 3258276) SYMPTOM: DMP(Dynamic Multi-Pathing) paths keep huge layer open number which causes SSD driver's total open number overflows (0x80000000), when dmp_cache_open is enabled. System panic with following stack: unix:panicsys+0x48() unix:vpanic_common+0x78() genunix:cmn_err+0x98() genunix:mod_rele_dev_by_major+0x80() genunix:ddi_rele_driver() vxdmp:dmp_dev_close+0x18() vxdmp:gendmpclose+0x29c() genunix:dev_close() vxdmp:dmp_dev_close+0xec() vxdmp:dmp_indirect_ioctl+0x1b4() vxdmp:gendmpioctl() vxdmp:dmpioctl+0x20() specfs:spec_ioctl() genunix:fop_ioctl+0x20() genunix:ioctl+0x184() unix:syscall_trap32+0xcc() DESCRIPTION: There is an open leak in DMP which causes SSD driver's total open number overflows, and lead to system panic. RESOLUTION: Code changes have been made to avoid any open leaks. * 3358433 (Tracking ID: 3301470) SYMPTOM: In CVR environment, a recovery on the primary side is causing all the nodes to panic with following stack: trap ktl0 search_vxvm_mem voliomem_range_iter vol_ru_alloc_buffer_start voliod_iohandle voliod_loop DESCRIPTION: Recovery is trying to do a zero size readback from the SRL. This is resulting in a panic RESOLUTION: Modified code to handle the corner case which was resulting in a zero sized readback. * 3366688 (Tracking ID: 2957645) SYMPTOM: Terminal flooded with error messages like bellow: VxVM INFO V-5-2-16543 connresp: new client ID allocation failed for cvm nodeid * with error *. DESCRIPTION: When restart vxconfigd, if failed to get a client ID, there is no need to print the error message as the default level. The messages will flood the terminal. RESOLUTION: Code changes have been made to print those messages only in debug level. * 3366703 (Tracking ID: 3056311) SYMPTOM: Following problems can be seen on disks initialized with 5.1SP1 listener and which are being used for older releases like 4.1, 5.0, 5.0.1: 1. Creation of a volume failed on a disk indicating in-sufficient space available. 2. Data corruption seen. CDS backup label signature seen within PUBLIC region data. 3. Disks greater than 1TB in size will appear "online invalid" after on older releases. DESCRIPTION: VxVM listener can be used to initialize Boot disks and Data disks which can be used with older VxVM releases. Eg: 5.1SP1 Listener can be used to initialize disks which can be used with all previous VxVM releases like 5.0.1, 5.0, 4.1 etc. With 5.1SP1 onwards VxVM always uses Fabricated geometry while initializing disk with CDS format. Older releases like 4.1, 5.0, 5.0.1 use Raw geometry. These releases do not honor LABEL geometry. Hence, if a disk was initialized through 5.1SP1 listener, disk would be stamped with Fabricated geometry. When such a disk was used with older VxVM releases like 5.0.1, 5.0, 4.1, there can be a mismatch between the stamped geometry (Fabricated) and in-memory geometry (Raw). If on-disk cylinder size < in-memory cylinder size, we might encounter data corruption issues. To prevent any data corruption issues, we need to initialize disks through listener with older CDS format by using raw geometry. Also, if disk size is >= 1TB, 5.1SP1 VxVM will initialize the disk with CDS EFI format. Older releases like 4.1, 5.0, 5.0.1 etc. do not understand EFI format. RESOLUTION: From releases 5.1SP1 onwards, through HP-UX listener, disk to be used for older releases like 4.1, 5.0, 5.0.1 will be initialized with raw geometry. Also, initialization of disk through HPUX listener whose size is greater than 1TB will fail. * 3367778 (Tracking ID: 3152274) SYMPTOM: I/O hang seen with Not-Ready(NR) OR write-disabled(WD) LUNs. System syslog floods with I/O error messages like: Apr 22 03:09:51 d2950rs3 kernel: [162164.751628] Apr 22 03:09:51 d2950rs3 kernel: [162164.751632] VxVM vxdmp V-5-0-0 [Error] i/o error occurred (errno=0x6) on dmpnode 201/0xb0 Apr 22 03:09:51 d2950rs3 kernel: [162164.751634] Apr 22 03:09:51 d2950rs3 kernel: [162164.751637] VxVM vxdmp V-5-0-0 [Error] i/o error occurred (errno=0x6) on dmpnode 201/0xb0 Apr 22 03:09:51 d2950rs3 kernel: [162164.751639] Apr 22 03:09:51 d2950rs3 kernel: [162164.751643] VxVM vxdmp V-5-0-0 [Error] i/o error occurred (errno=0x6) on dmpnode 201/0xb0 Apr 22 03:09:51 d2950rs3 kernel: [162164.751644] Apr 22 03:09:51 d2950rs3 kernel: [162164.751648] VxVM vxdmp V-5-0-0 [Error] i/o error occurred (errno=0x6) on dmpnode 201/0xb0 .. .. DESCRIPTION: For performance reasons, DMP immediately routes failed I/O through alternate available path while performing asynchronous error analysis on I/O failed path. Not-ready (NR) rejects all kind of I/O requests. Write-disabled(WD) devices rejects write I/O requests. But these devices respond fine to SCSI probes like inquiry. so for those devices I/O was getting retried through DMP asynchronous error analysis for different paths and was not actually getting terminated due to code bug. RESOLUTION: DMP code was modified to better handle Not-Ready (NR) OR Write-Disabled (WD) kind of devices. DMP async error analysis code was modified to handle such cases. * 3368234 (Tracking ID: 3236772) SYMPTOM: With IO load on primary and replication going on, if we run "vradmin resizevol" on primary, often these operations terminate with error message "vradmin ERROR Lost connection to host". DESCRIPTION: There was a race condition on the secondary between the transaction and messages delivered from Primary to Secondary. This resulted in repeated timeouts of transactions on the Secondary. The repeated transaction timeout resulted in session timeouts between Primary and secondary vradmind. RESOLUTION: Modified code to resolve the race condition. * 3368236 (Tracking ID: 3327842) SYMPTOM: In CVR environment, with IO load on primary and replication going on, if we run "vradmin resizevol" on primary, often these operations terminate with error message "vradmin ERROR Lost connection to host". DESCRIPTION: There was a race condition on the secondary between the transaction and messages delivered from Primary to Secondary. This resulted in repeated timeouts of transactions on the Secondary. The repeated transaction timeout resulted in session timeouts between Primary and secondary vradmind. RESOLUTION: Modified code to resolve the race condition. * 3371422 (Tracking ID: 3087893) SYMPTOM: EMC PowerPath pseudo device mappings change with each reboot with VxVM (Veritas Volume Manager). DESCRIPTION: VxVM invokes PowerPath command 'powermt display unmanaged' to discover PowerPath unmanaged devices. This command destroys PowerPath devices mappings during early boot stage when PowerPath isn't fully up. RESOLUTION: EMC fixed the issue by introducing an environment variable MPAPI_EARLY_BOOT to powermt command. VxVM startup script set the variable to TRUE before calling powermt command. Powermt understands the early boot phase and does things differently. The variable is unset by VxVM after device discovery. * 3374166 (Tracking ID: 3325371) SYMPTOM: Panic occurs in the vol_multistepsio_read_source() function when VxVM's FastResync feature is used. The stack trace observed is as following: vol_multistepsio_read_source() vol_multistepsio_start() volkcontext_process() vol_rv_write2_start() voliod_iohandle() voliod_loop() kernel_thread() DESCRIPTION: When a volume is resized, Data Change Object (DCO) also needs to be resized. However, the old accumulator contents are not copied into the new accumulator. Thereby, the respective regions are marked as invalid. Subsequent I/O on these regions triggers the panic. RESOLUTION: The code is modified to appropriately copy the accumulator contents during the resize operation. * 3376953 (Tracking ID: 3372724) SYMPTOM: System panics while installing VxVM (Veritas Volume Manager) with the following warnings: vxdmp: WARNING: VxVM vxdmp V-5-0-216 mod_install returned 6 vxspec V-5-0-0 vxspec: vxio not loaded. Aborting vxspec load DESCRIPTION: During installation of VxVM, if DMP (Dynamic Multipathing) module fails to load, the cleanup procedure fails to reset the statistics timer (which is set while loading). As a result, the timer dereferences a function pointer which is already unloaded. Hence, the panic. RESOLUTION: Code changes have been made to perform a complete cleanup when DMP fails to load. * 3387405 (Tracking ID: 3019684) SYMPTOM: IO hang is observed when SRL is about to overflow after logowner switch from slave to master. Stack trace looks like: biowait default_physio volrdwr fop_write write syscall_trap32 DESCRIPTION: Delineating the steps, with slave as logowner, overflow the SRL and following it up with DCM resync. Then, switching back logowner to master and trying to overflow SRL again would manifest the IO hang in the master when SRL is about to overflow. This happens because the master has a stale flag set with incorrect value related to last SRL overflow. RESOLUTION: Reset the stale flag and ensure that flag is reset whether the logowner is master or slave. * 3387417 (Tracking ID: 3107741) SYMPTOM: "vxrvg snapdestroy" command fails with error message "Transaction aborted waiting for io drain", and vxconfigd hang is observed. vxconfigd stack trace is: vol_commit_iowait_objects vol_commit_iolock_objects vol_ktrans_commit volconfig_ioctl volsioctl_real vols_ioctl vols_compat_ioctl compat_sys_ioctl ... DESCRIPTION: The smartmove query of VxFS depends on some reads and writes. If some transaction in VxVM blocks the new read and write, then API is hung waiting for the response. This creates a deadlock-like situation with Smartmove API is waiting for transaction to complete and transaction waiting Smartmove API is hung waiting for transaction, and hence the hang. RESOLUTION: Disallow transactions during the Smartmove API. Patch ID: 6.0.300.100 * 2892702 (Tracking ID: 2567618) SYMPTOM: The VRTSexplorer dumps core with the segmentation fault in checkhbaapi/print_target_map_entry. The stack trace is observed as follows: print_target_map_entry() check_hbaapi() main() _start() DESCRIPTION: The checkhbaapi utility uses the HBA_GetFcpTargetMapping() API which returns the current set of mappings between the OS and the Fiber Channel Protocol (FCP) devices for a given Host Bus Adapter (HBA) port. The maximum limit for mappings is set to 512 and only that much memory is allocated. When the number of mappings returned is greater than 512, the function that prints this information tries to access the entries beyond that limit, which results in core dumps. RESOLUTION: The code is modified to allocate enough memory for all the mappings returned by the HBA_GetFcpTargetMapping() API. * 3090670 (Tracking ID: 3090667) SYMPTOM: The "vxdisk -o thin, fssize list" command can cause system to hang or panic due to a kernel memory corruption. This command is also issued by Veritas Operations Manager (VOM) internally during Storage Foundation (SF) discovery. The following stack trace is observed: panic string: kernel heap corruption detected vol_objioctl vol_object_ioctl voliod_ioctl - frame recycled volsioctl_real DESCRIPTION: Veritas Volume Manager (VxVM) allocates data structures and invokes thin Logical Unit Numbers (LUNs) specific function handlers, to determine the disk space that is actively used by the file system. One of the function handlers wrongly accesses the system memory beyond the allocated data structure, which results in the kernel memory corruption. RESOLUTION: The code is modified so that the problematic function handler accesses only the allocated memory. * 3099508 (Tracking ID: 3087893) SYMPTOM: EMC PowerPath pseudo device mappings change with each reboot with VxVM (Veritas Volume Manager). DESCRIPTION: VxVM invokes PowerPath command 'powermt display unmanaged' to discover PowerPath unmanaged devices. This command destroys PowerPath devices mappings during early boot stage when PowerPath isn't fully up. RESOLUTION: EMC fixed the issue by introducing an environment variable MPAPI_EARLY_BOOT to powermt command. VxVM startup script set the variable to TRUE before calling powermt command. Powermt understands the early boot phase and does things differently. The variable is unset by VxVM after device discovery. * 3133012 (Tracking ID: 3160973) SYMPTOM: vxlist(1M) hangs while executing on Extensible Firmware Interface (EFI) formatted disk is attached to host. Following is the stack trace: [1] _read(0xa, 0xfe3fdf40, 0x400), [2] read(0xa, 0xfe3fdf40, 0x400), [3] is_asmdisk_efi(0xa, 0xfe3fe868, 0xfe3fe814, 0x200), [4] is_asmdisk(0xfe3fe868, 0xfe3fe814, 0xfe3fec68), [5] is_foreign_disk(0xfe3fe868, 0x0, 0xfe3fec68), [6] vol_is_foreign_disk(0x8126304, 0x0), =>[7] isForeignDisk(da = 0x8125f68), [8] getDaState(da = 0x8125f68), [9] buildMapsForDeportedDgs(cfgvect = 0x81106d8, dacmap = 0x80e8ab8, newdb = 1), [10] buildMapsForDgVec(vcfg = 0x81106c0, vectmap = 0xfe60eff0, dacmap = 0x80e8ab8), [11] initDB(), [12] init_db(), [13] doVmNotify(a = (nil)), [14] _thr_setup(0xfe971200), [15] _lwp_start(), DESCRIPTION: vxvm reads partition table and checks for various foreign format signatures within those partitions. Solaris/x86 SCSI driver cannot access the last sector due to Solaris off-by-one bug. Please see Sun-Solaris Bug Id 6342431. If during this partition recognition, last sector of the disk is accessed, read system call hangs causing vxlist to hang. RESOLUTION: If the partition does contain the very last sector of the disk, we skip skip reading that particular partition. * 3140411 (Tracking ID: 2959325) SYMPTOM: The vxconfigd(1M) daemon dumps core while performing the disk group move operation with the following stack trace: dg_trans_start () dg_configure_size () config_enable_copy () da_enable_copy () ncopy_set_disk () ncopy_set_group () ncopy_policy_some () ncopy_set_copies () dg_balance_copies_helper () dg_transfer_copies () in vold_dm_dis_da () in dg_move_complete () in req_dg_move () in request_loop () in main () DESCRIPTION: The core dump occurs when the disk group move operation tries to reduce the size of the configuration records in the disk group, when the size is large and the disk group move operation needs more space for the new- configrecord entries. Since, both the reduction of the size of configuration records (compaction) and the configuration change by disk group move operation cannot co-exist, this result in the core dump. RESOLUTION: The code is modified to make the compaction first before the configuration change by the disk group move operation. * 3150893 (Tracking ID: 3119102) SYMPTOM: Live migration of virtual machine having Storage Foundation stack with data disks fencing enabled, causes service groups configured on virtual machine to fault. DESCRIPTION: After live migration of virtual machine having Storage Foundation stack with data disks fencing enabled is done, I/O fails on shared SAN devices with reservation conflict and causes service groups to fault. Live migration causes SCSI initiator change. Hence I/O coming from migrated server to shared SAN storage fails with reservation conflict. RESOLUTION: Code changes are added to check whether the host is fenced off from cluster. If host is not fenced off, then registration key is re-registered for dmpnode through migrated server and restart IO. Admin needs to manually invoke 'vxdmpadm pgrrereg' from guest which was live migrated after live migration. * 3156719 (Tracking ID: 2857044) SYMPTOM: System crashes with following stack when resizing volume with DCO version 30. PID: 43437 TASK: ffff88402a70aae0 CPU: 17 COMMAND: "vxconfigd" #0 [ffff884055a47600] machine_kexec at ffffffff8103284b #1 [ffff884055a47660] crash_kexec at ffffffff810ba972 #2 [ffff884055a47730] oops_end at ffffffff81501860 #3 [ffff884055a47760] no_context at ffffffff81043bfb #4 [ffff884055a477b0] __bad_area_nosemaphore at ffffffff81043e85 #5 [ffff884055a47800] bad_area at ffffffff81043fae #6 [ffff884055a47830] __do_page_fault at ffffffff81044760 #7 [ffff884055a47950] do_page_fault at ffffffff8150383e #8 [ffff884055a47980] page_fault at ffffffff81500bf5 [exception RIP: voldco_getalloffset+38] RIP: ffffffffa0bcc436 RSP: ffff884055a47a38 RFLAGS: 00010046 RAX: 0000000000000001 RBX: ffff883032f9eac0 RCX: 000000000000000f RDX: ffff88205613d940 RSI: ffff8830392230c0 RDI: ffff883fd1f55800 RBP: ffff884055a47a38 R8: 0000000000000000 R9: 0000000000000000 R10: 000000000000000e R11: 000000000000000d R12: ffff882020e80cc0 R13: 0000000000000001 R14: ffff883fd1f55800 R15: ffff883fd1f559e8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffff884055a47a40] voldco_get_map_extents at ffffffffa0bd09ab [vxio] #10 [ffff884055a47a90] voldco_update_extents_info at ffffffffa0bd8494 [vxio] #11 [ffff884055a47ab0] voldco_instant_resize_30 at ffffffffa0bd8758 [vxio] #12 [ffff884055a47ba0] volfmr_instant_resize at ffffffffa0c03855 [vxio] #13 [ffff884055a47bb0] voldco_process_instant_op at ffffffffa0bcae2f [vxio] #14 [ffff884055a47c30] volfmr_process_instant_op at ffffffffa0c03a74 [vxio] #15 [ffff884055a47c40] vol_mv_precommit at ffffffffa0c1ad02 [vxio] #16 [ffff884055a47c90] vol_commit_iolock_objects at ffffffffa0c1244f [vxio] #17 [ffff884055a47cf0] vol_ktrans_commit at ffffffffa0c131ce [vxio] #18 [ffff884055a47d70] volconfig_ioctl at ffffffffa0c8451f [vxio] #19 [ffff884055a47db0] volsioctl_real at ffffffffa0c8c9b8 [vxio] #20 [ffff884055a47e90] vols_ioctl at ffffffffa0040126 [vxspec] #21 [ffff884055a47eb0] vols_compat_ioctl at ffffffffa004034d [vxspec] #22 [ffff884055a47ee0] compat_sys_ioctl at ffffffff811ce0ed #23 [ffff884055a47f80] sysenter_dispatch at ffffffff8104a880 DESCRIPTION: While updating DCO TOC(Table Of Content) entries into in-core TOC, a TOC entry is wrongly freed and zeroed out. As a result, traversing TOC entries leads to NULL pointer dereference and thus, causing the panic. RESOLUTION: Code changes have been made to appropriately update the TOC entries. * 3159096 (Tracking ID: 3146715) SYMPTOM: The 'rinks' do not connect with the Network Address Translation (NAT) configurations on Little Endian Architecture (LEA). DESCRIPTION: On LEAs, the Internet Protocol (IP) address configured with the NAT mechanism is not converted from the host-byte order to the network-byte order. As a result, the address used for the rlink connection mechanism gets distorted and the 'rlinks' fail to connect. RESOLUTION: The code is modified to convert the IP address to the network-byte order before it is used. * 3254227 (Tracking ID: 3182350) SYMPTOM: If there are more than 8192 paths in the system, the vxassist command hangs while creating a new VxVM volume or increasing the existing volume's size. DESCRIPTION: The vxassist command creates a hash table with max 8192 entries. Hence other paths greater than 8192 will get hashed to an overlapping bucket in this hash table. In such case, multiple paths which hash to the same bucket are linked in a chain. In order to find a particular path in a specified bucket, vxassist command needs to traverse the entire linked chain. However vxassist only searches the first element and hangs. RESOLUTION: The code is modifed to traverse the entire linked chain. * 3254229 (Tracking ID: 3063378) SYMPTOM: Some VxVM (Volume Manager) commands run slowly when "read only" devices (e.g. EMC SRDF-WD, BCV-NR) are presented and managed by EMC PowerPath. DESCRIPTION: When performing IO write on a "read only" device, IO fails and retry will be done if IO is on TPD(Third Party Driver) device and path status is okay. Owing to the retry, IO will not return until timeout reaches which gives the perception that VxVM commands run slowly. RESOLUTION: Code changes have been done to return IO immediately with disk media failure if IO fails on any TPD device and path status is okay. * 3254427 (Tracking ID: 3182175) SYMPTOM: "vxdisk -o thin, fssize list" command can report incorrect File System usage data. DESCRIPTION: An integer overflow in the internal calculation can cause this command to report incorrect per disk FS usage. RESOLUTION: Code changes are made so that the command would report the correct File System usage data. * 3280555 (Tracking ID: 2959733) SYMPTOM: When device paths are moved across LUNs or enclosures, vxconfigd daemon can dump core or data corruption can occur due to internal data structure inconsistencies. DESCRIPTION: When the device path configuration is changed after a planned or unplanned disconnection by moving only a subset of the device paths across LUNs or other storage arrays (enclosures), the DMP's internal data structures get messed up leading to vxconfigd daemon dumping core and in some situations data corruption due to incorrect LUN to path mappings. RESOLUTION: To resolve this issue, the vxconfigd code was modified to detect such situations gracefully and modify the internal data structures accordingly to avoid a vxconfigd coredump and data corruption * 3294641 (Tracking ID: 3107741) SYMPTOM: "vxrvg snapdestroy" command fails with error message "Transaction aborted waiting for io drain", and vxconfigd hang is observed. vxconfigd stack trace is: vol_commit_iowait_objects vol_commit_iolock_objects vol_ktrans_commit volconfig_ioctl volsioctl_real vols_ioctl vols_compat_ioctl compat_sys_ioctl ... DESCRIPTION: The smartmove query of VxFS depends on some reads and writes. If some transaction in VxVM blocks the new read and write, then API is hung waiting for the response. This creates a deadlock-like situation with Smartmove API is waiting for transaction to complete and transaction waiting Smartmove API is hung waiting for transaction, and hence the hang. RESOLUTION: Disallow transactions during the Smartmove API. * 3294642 (Tracking ID: 3019684) SYMPTOM: IO hang is observed when SRL is about to overflow after logowner switch from slave to master. Stack trace looks like: biowait default_physio volrdwr fop_write write syscall_trap32 DESCRIPTION: Delineating the steps, with slave as logowner, overflow the SRL and following it up with DCM resync. Then, switching back logowner to master and trying to overflow SRL again would manifest the IO hang in the master when SRL is about to overflow. This happens because the master has a stale flag set with incorrect value related to last SRL overflow. RESOLUTION: Reset the stale flag and ensure that flag is reset whether the logowner is master or slave. Patch ID: 6.0.300.000 * 2853712 (Tracking ID: 2815517) SYMPTOM: vxdg adddisk succeeds to add a clone disk to non-clone and non-clone disk to clone diskgroup, resulting in mixed diskgroup. DESCRIPTION: vxdg import fails for diskgroup which has mix of clone and non-clone disks. So vxdg adddisk should not allow creation of mixed diskgroup. RESOLUTION: vxdisk adddisk code is modified to return an error for an attempt to add clone disk to non-clone or non-clone disks to clone diskgroup, Thus it prevents addition of disk in diskgroup which leads to mixed diskgroup. * 2860207 (Tracking ID: 2859470) SYMPTOM: The EMC SRDF-R2 disk may go in error state when you create EFI label on the R1 disk. For example: R1 site # vxdisk -eo alldgs list | grep -i srdf emc0_008c auto:cdsdisk emc0_008c SRDFdg online c1t5006048C5368E580d266 srdf-r1 R2 site # vxdisk -eo alldgs list | grep -i srdf emc1_0072 auto - - error c1t5006048C536979A0d65 srdf-r2 DESCRIPTION: Since R2 disks are in write protected mode, the default open() call (made for read-write mode) fails for the R2 disks, and the disk is marked as invalid. RESOLUTION: As a fix, DMP was changed to be able to read the EFI label even on a write protected SRDF-R2 disk. * 2863672 (Tracking ID: 2834046) SYMPTOM: VxVM dynamically reminors all the volumes during DG import if the DG base minor numbers are not in the correct pool. This behaviour cases NFS client to have to re-mount all NFS file systems in an environment where CVM is used on the NFS server side. DESCRIPTION: Starting from 5.1, the minor number space is divided into two pools, one for private disk groups and another for shared disk groups. During DG import, the DG base minor numbers will be adjusted automatically if not in the correct pool, and so do the volumes in the disk groups. This behaviour reduces many minor conflicting cases during DG import. But in NFS environment, it makes all file handles on the client side stale. Customers had to unmount files systems and restart applications. RESOLUTION: A new tunable, "autoreminor", is introduced. The default value is "on". Most of the customers don't care about auto-reminoring. They can just leave it as it is. For a environment that autoreminoring is not desirable, customers can just turn it off. Another major change is that during DG import, VxVM won't change minor numbers as long as there is no minor conflicts. This includes the cases that minor numbers are in the wrong pool. * 2863708 (Tracking ID: 2836528) SYMPTOM: vxdisk resize fails with an error " New geometry makes partition unaligned " bash# vxdisk -g testdg resize disk01 length=8g VxVM vxdisk ERROR V-5-1-8643 Device disk01: resize failed: New geometry makes partition unaligned DESCRIPTION: On Solaris X86 system, the partition 8 is not necessary to align with cylinder size. However VxVM requires this partition to be cylinder aligned. Hence the issue. RESOLUTION: Issue is fixed by doing the necessary changes to skip alignment check for partition 8 on Solaris X86 platform. * 2876865 (Tracking ID: 2510928) SYMPTOM: The extended attributes reported by "vxdisk -e list" for the EMC SRDF luns are reported as "tdev mirror", instead of "tdev srdf-r1". Example, # vxdisk -e list DEVICE TYPE DISK GROUP STATUS OS_NATIVE_NAME ATTR emc0_028b auto:cdsdisk - - online thin c3t5006048AD5F0E40Ed190s2 tdev mirror DESCRIPTION: The extraction of the attributes of EMC SRDF luns was not done properly. Hence, EMC SRDF luns are erroneously reported as "tdev mirror", instead of "tdev srdf- r1". RESOLUTION: Code changes have been made to extract the correct values. * 2892499 (Tracking ID: 2149922) SYMPTOM: Record the diskgroup import and deport events in the /var/adm/messages file. Following type of message can be logged in syslog: vxvm:vxconfigd: V-5-1-16254 Disk group import of succeeded. DESCRIPTION: With the diskgroup import or deport, appropriate success message or failure message with the cause for failure should be logged. RESOLUTION: Code changes are made to log diskgroup import and deport events in syslog. * 2892571 (Tracking ID: 1856733) SYMPTOM: Add support for FusionIO on Solaris x64 DESCRIPTION: FusionIO was not previously supported on Solaris x64 platform. RESOLUTION: Support for FusionIO is added for Solaris x64. * 2892590 (Tracking ID: 2779580) SYMPTOM: Secondary node gives configuration error 'no Primary RVG' when primary master node(default logowner) is rebooted and slave becomes new master. DESCRIPTION: After reboot of primary master, new master sends handshake request for vradmind communication to secondary. As a part of handshake request, secondary deletes the old configuration including primary RVG. During this phase, secondary receives configuration update message from primary for old configuration. Secondary does not find old primary RVG configuration for processing this message. Hence, it cannot proceed with the pending handshake request and gives 'no Primary RVG' configuration error. RESOLUTION: Code changes are done such that during handshake request phase, configuration messages of old primary RVG are discarded. * 2892621 (Tracking ID: 1903700) SYMPTOM: vxassist remove mirror does not work if nmirror and alloc is specified, giving an error "Cannot remove enough mirrors" DESCRIPTION: During remove mirror operation, VxVM does not perform correct analysis of plexes. Hence the issue. RESOLUTION: Necessary code changes have been done so that vxassist works properly. * 2892630 (Tracking ID: 2742706) SYMPTOM: The system panic can happen with following stack, when the Oracle 10G Grid Agent Software invokes the command :- # nmhs get_solaris_disks unix:lock_try+0x0() genunix:turnstile_interlock+0x1c() genunix:turnstile_block+0x1b8() unix:mutex_vector_enter+0x428() unix:mutex_enter() - frame recycled vxlo:vxlo_open+0x2c() genunix:dev_open() - frame recycled specfs:spec_open+0x4f4() genunix:fop_open+0x78() genunix:vn_openat+0x500() genunix:copen+0x260() unix:syscall_trap32+0xcc() DESCRIPTION: The open system call code path of the vxlo (Veritas Loopback Driver) is not releasing the acquired global lock after the work is completed. The panic may occur when the next open system call tries to acquire the lock. RESOLUTION: Code changes have been made to release the global lock appropriately. * 2892643 (Tracking ID: 2801962) SYMPTOM: Operations that lead to growing of volume, including 'vxresize', 'vxassist growby/growto' take significantly larger time if the volume has version 20 DCO(Data Change Object) attached to it in comparison to volume which doesn't have DCO attached. DESCRIPTION: When a volume with a DCO is grown, it needs to copy the existing map in DCO and update the map to track the grown regions. The algorithm was such that for each region in the map it would search for the page that contains that region so as to update the map. Number of regions and number of pages containing them are proportional to volume size. So, the search complexity is amplified and observed primarily when the volume size is of the order of terabytes. In the reported instance, it took more than 12 minutes to grow a 2.7TB volume by 50G. RESOLUTION: Code has been enhanced to find the regions that are contained within a page and then avoid looking-up the page for all those regions. * 2892650 (Tracking ID: 2826125) SYMPTOM: VxVM script daemons are not up after they are invoked with the vxvm-recover script. DESCRIPTION: When the VxVM script daemon is starting, it will terminate any stale instance if it does exist. When the script daemon is invoking with exactly the same process id of the previous invocation, the daemon itself is abnormally terminated by killing one own self through a false-positive detection. RESOLUTION: Code changes are made to handle the same process id situation correctly. * 2892660 (Tracking ID: 2000585) SYMPTOM: If 'vxrecover -sn' is run and at the same time one volume is removed, vxrecover exits with the error 'Cannot refetch volume', the exit status code is zero but no volumes are started. DESCRIPTION: vxrecover assumes that volume is missing because the diskgroup must have been deported while vxrecover was in progress. Hence, it exits without starting remaining volumes. vxrecover should be able to start other volumes, if the DG is not deported. RESOLUTION: Modified the source to skip missing volume and proceed with remaining volumes. * 2892665 (Tracking ID: 2807158) SYMPTOM: During VM upgrade or patch installation on Solaris platform, sometimes the system can hang due to deadlock with following stack: genunix:cv_wait genunix:ndi_devi_enter genunix:devi_config_one genunix:ndi_devi_config_one genunix:resolve_pathname genunix:e_ddi_hold_devi_by_path vxspec:_init genunix:modinstall genunix:mod_hold_installed_mod genunix:modrload genunix:modload genunix:mod_hold_dev_by_major genunix:ndi_hold_driver genunix:probe_node genunix:i_ndi_config_node genunix:i_ddi_attachchild DESCRIPTION: During the upgrade or patch installation, the vxspec module is unloaded and reloaded. In the vxspec module initialization, it tries to lock root node during the pathname go-through while already holding the subnode, i.e, /pseudo. Meanwhile, if there is another process holding the lock of root node is acquiring the lock of the subnode /pseudo, the deadlock occurs since each process tries to get the lock already hold by peer. RESOLUTION: APIs which are introducing deadlock are replaced. * 2892682 (Tracking ID: 2837717) SYMPTOM: "vxdisk(1M) resize" command fails if 'da name' is specified. DESCRIPTION: The scenario for 'da name' is not handled in the resize code path. RESOLUTION: The code is modified such that if 'dm name' is not specified to resize, then 'da name' specific operation is performed. * 2892684 (Tracking ID: 1859018) SYMPTOM: "Link link detached from volume " warnings are displayed when a linked-breakoff snapshot is created. DESCRIPTION: The purpose of these message is to let user and administrators know about the detach of link due to I/O errors. These messages get displayed uneccesarily whenever linked-breakoff snapshot is created. RESOLUTION: Code changes are made to display messages only when link is detached due to I/O errors on volumes involved in link-relationship. * 2892689 (Tracking ID: 2836798) SYMPTOM: 'vxdisk resize' fails with the following error on the simple format EFI (Extensible Firmware Interface) disk expanded from array side and system may panic/hang after a few minutes. # vxdisk resize disk_10 VxVM vxdisk ERROR V-5-1-8643 Device disk_10: resize failed: Configuration daemon error -1 DESCRIPTION: As VxVM doesn't support Dynamic Lun Expansion on simple/sliced EFI disk, last usable LBA (Logical Block Address) in EFI header is not updated while expanding LUN. Since the header is not updated, the partition end entry was regarded as illegal and cleared as part of partition range check. This inconsistent partition information between the kernel and disk causes system panic/hang. RESOLUTION: Added checks in VxVM code to prevent DLE on simple/sliced EFI disk. * 2892698 (Tracking ID: 2851085) SYMPTOM: DMP doesn't detect implicit LUN ownership changes DESCRIPTION: DMP does ownership monitoring for ALUA arrays to detect implicit LUN ownership changes. This helps DMP to always use Active/Optimized path for sending down I/O. This feature is controlled using dmp_monitor_ownership tune and is enabled by default. In case of partial discovery triggered through event source daemon (vxesd), ALUA information kept in kernel data structure for ownership monitoring was getting wiped. This causes ownership monitoring to not work for these dmpnodes. RESOLUTION: Source has been updated to handle such case. * 2892702 (Tracking ID: 2567618) SYMPTOM: VRTSexplorer coredumps in checkhbaapi/print_target_map_entry which looks like: print_target_map_entry() check_hbaapi() main() _start() DESCRIPTION: checkhbaapi utility uses HBA_GetFcpTargetMapping() API which returns the current set of mappings between operating system and fibre channel protocol (FCP) devices for a given HBA port. The maximum limit for mappings was set to 512 and only that much memory was allocated. When the number of mappings returned was greater than 512, the function that prints this information used to try to access the entries beyond that limit, which resulted in core dumps. RESOLUTION: The code has been changed to allocate enough memory for all the mappings returned by HBA_GetFcpTargetMapping(). * 2892716 (Tracking ID: 2753954) SYMPTOM: When cable is disconnected from one port of a dual-port FC HBA, only paths going through the port should be marked as SUSPECT. But paths going through other port are also getting marked as SUSPECT. DESCRIPTION: Disconnection of a cable from a HBA port generates a FC event. When the event is generated, paths of all ports of the corresponding HBA are marked as SUSPECT. RESOLUTION: The code changes are done to mark the paths only going through the port on which FC event is generated. * 2922770 (Tracking ID: 2866997) SYMPTOM: After applying Solaris patch 147440-20, disk initialization using vxdisksetup command fails with following error, VxVM vxdisksetup ERROR V-5-2-43 : Invalid disk device for vxdisksetup DESCRIPTION: A un-initialized variable gets a different value after OS patch installation, thereby making vxparms command outputs give an incorrect result. RESOLUTION: Initialize the variable with correct value. * 2922798 (Tracking ID: 2878876) SYMPTOM: vxconfigd, VxVM configuration daemon dumps core with the following stack. vol_cbr_dolog () vol_cbr_translog () vold_preprocess_request () request_loop () main () DESCRIPTION: This core is a result of a race between two threads which are processing the requests from the same client. While one thread completed processing a request and is in the phase of releasing the memory used, other thread is processing a request "DISCONNECT" from the same client. Due to the race condition, the second thread attempted to access the memory which is being released and dumped core. RESOLUTION: The issue is resolved by protecting the common data of the client by a mutex. * 2924117 (Tracking ID: 2911040) SYMPTOM: Restore operation from a cascaded snapshot succeeds even when it's one of the source is inaccessible. Subsequently, if the primary volume is made accessible for operation, IO operations may fail on the volume as the source of the volume is inaccessible. Deletion of snapshots would as well fail due to dependency of the primary volume on the snapshots. In such case, following error is thrown when try to remove any snapshot using 'vxedit rm' command: ""VxVM vxedit ERROR V-5-1-XXXX Volume YYYYYY has dependent volumes" DESCRIPTION: When a snapshot is restored from any snapshot, the snapshot becomes the source of data for regions on primary volume that differ between the two volumes. If the snapshot itself depends on some other volume and that volume is not accessible, effectively primary volume becomes inaccessible after restore operation. In such case, the snapshots cannot be deleted as the primary volume depends on it. RESOLUTION: If a snapshot or any later cascaded snapshot is inaccessible, restore from that snapshot is prevented. * 2924188 (Tracking ID: 2858853) SYMPTOM: In CVM(Cluster Volume Manager) environment, after master switch, vxconfigd dumps core on the slave node (old master) when a disk is removed from the disk group. dbf_fmt_tbl() voldbf_fmt_tbl() voldbsup_format_record() voldb_format_record() format_write() ddb_update() dg_set_copy_state() dg_offline_copy() dasup_dg_unjoin() dapriv_apply() auto_apply() da_client_commit() client_apply() commit() dg_trans_commit() slave_trans_commit() slave_response() fillnextreq() vold_getrequest() request_loop() main() DESCRIPTION: During master switch, disk group configuration copy related flags are not cleared on the old master, hence when a disk is removed from a disk group, vxconfigd dumps core. RESOLUTION: Necessary code changes have been made to clear configuration copy related flags during master switch. * 2924207 (Tracking ID: 2886402) SYMPTOM: When re-configuring dmp devices, typically using command 'vxdisk scandisks', vxconfigd hang is observed. Since it is in hang state, no VxVM(Veritas volume manager)commands are able to respond. Following process stack of vxconfigd was observed. dmp_unregister_disk dmp_decode_destroy_dmpnode dmp_decipher_instructions dmp_process_instruction_buffer dmp_reconfigure_db gendmpioctl dmpioctl dmp_ioctl dmp_compat_ioctl compat_blkdev_ioctl compat_sys_ioctl cstar_dispatch DESCRIPTION: When DMP(dynamic multipathing) node is about to be destroyed, a flag is set to hold any IO(read/write) on it. The IOs which may come in between the process of setting flag and actual destruction of DMP node, are placed in dmp queue and are never served. So the hang is observed. RESOLUTION: Appropriate flag is set for node which is to be destroyed so that any IO after marking flag will be rejected so as to avoid hang condition. * 2930399 (Tracking ID: 2930396) SYMPTOM: The vxdmpasm/vxdmpraw command does not work on Solaris. For example: #vxdmpasm enable user1 group1 600 emc0_02c8 expr: syntax error /etc/vx/bin/vxdmpasm: test: argument expected #vxdmpraw enable user1 group1 600 emc0_02c8 expr: syntax error /etc/vx/bin/vxdmpraw: test: argument expected DESCRIPTION: The "length" function of expr command does not work on Solaris. This function was used in the script and used to give error. RESOLUTION: The expr command has been replaced by awk command. * 2933467 (Tracking ID: 2907823) SYMPTOM: Unconfiguring devices in 'failing' or 'unusable' state (as shown by cfgadm utility) cannot be done using VxVM Dynamic reconfiguration(DR) tool. DESCRIPTION: If devices are not removed properly then they can be in 'failing' or 'unusable' state as shown below: c1::5006048c5368e580, 255 disk connected configured failing c1::5006048c5368e580, 326 disk connected configured unusable Such devices are ignored by DR Tool, and they need to be manually unconfigured using cgadm utility. RESOLUTION: To fix this, code changes are done so that DR Tool asks user if they wants to unconfigure 'failed' or 'unusable' devices and takes action accordingly. * 2933468 (Tracking ID: 2916094) SYMPTOM: These are the issues for which enhancements are done: 1. All the DR operation logs are accumulated in one log file 'dmpdr.log', and this file grows very large. 2. If a command takes long time, user may think DR operations have stuck. 3. Devices controlled by TPD are seen in list of luns that can be removed in 'Remove Luns' operation. DESCRIPTION: 1. All the logs of DR operations accumulate and form one big log file which makes it difficult for user to get to the current DR operation logs. 2. If a command takes time, user has no way to know whether the command has stuck. 3. Devices controlled by TPD are visible to user which makes him think that he can remove those devices without removing them from TPD control. RESOLUTION: 1. Now every time user opens DR Tool, a new log file of form dmpdr_yyyymmdd_HHMM.log is generated. 2. A messages is displayed to inform user if a command takes longer time than expected. 3. Changes are made so that devices controlled by TPD are not visible during DR operations. * 2933469 (Tracking ID: 2919627) SYMPTOM: While doing 'Remove Luns' operation of Dynamic Reconfiguration Tool, there is no feasible way to remove large number of LUNs, since the only way to do so is to enter all LUN names separated by comma. DESCRIPTION: When removing luns in bulk during 'Remove Luns' option of Dynamic Reconfiguration Tool, it would not be feasible to enter all the luns separated by comma. RESOLUTION: Code changes are done in Dynamic Reconfiguration scripts to accept file containing luns to be removed as input. * 2934259 (Tracking ID: 2930569) SYMPTOM: The LUNs in 'error' state in output of 'vxdisk list' cannot be removed through DR(Dynamic Reconfiguration) Tool. DESCRIPTION: The LUNs seen in 'error' state in VM(Volume Manager) tree are not listed by DR(Dynamic Reconfiguration) Tool while doing 'Remove LUNs' operation. RESOLUTION: Necessary changes have been made to display LUNs in error state while doing 'Remove LUNs' operation in DR(Dynamic Reconfiguration) Tool. * 2940447 (Tracking ID: 2940446) SYMPTOM: I/O can hang on volume with space optimized snapshot if the underlying cache object is of very large size. It can also lead to data corruption in cache- object. DESCRIPTION: Cache volume maintains B+ tree for mapping the offset and its actual location in cache object. Copy-on-write I/O generated on snapshot volumes needs to determine the offset of particular I/O in cache object. Due to incorrect type- casting the value calculated for large offset truncates to smaller value due to overflow, leading to data corruption. RESOLUTION: Code changes are done to avoid overflow during offset calculation in cache object. * 2941167 (Tracking ID: 2915751) SYMPTOM: Solaris machine panics while resizing CDS-EFI LUN or CDS VTOC to EFI conversion case where new size of resize is greater than 1TB. DESCRIPTION: While resizing a disk having CDS-EFI format or while resizing a CDS disk from less than 1TB to >= 1TB, machine panics because of the incorrect use of device numbers. VxVM uses the whole slice number s0 instead of s7 which represents the whole device for EFI format. Hence, the device open fails and the incorrect disk maxiosize was populated. While doing an I/O, machine panics with divide by zero error. RESOLUTION: While resizing a disk having CDS-EFI format or while resizing a CDS disk from less than 1TB to >= 1TB, VxVM now correctly uses device number corresponding to partition 7 of the device. * 2941193 (Tracking ID: 1982965) SYMPTOM: "vxdg import DGNAME " fails when "da-name" used as an input to vxdg command is based on namingscheme which is different from the prevailing namingscheme on the host. Error message seen is: VxVM vxdg ERROR V-5-1-530 Device c6t50060E801002BC73d240 not found in configuration VxVM vxdg ERROR V-5-1-10978 Disk group x86dg: import failed: Not a disk access record DESCRIPTION: vxconfigd stores Disk Access (DA) records based on DMP names. If "vxdg" passes a name other than DMP name for the device, vxconfigd cannot map it to a DA record. As vxconfigd cannot locate a DA record corresponding to passed input name from vxdg, it fails the import operation. RESOLUTION: vxdg command now converts the input name to DMP name before passing it to vxconfigd for further processing. * 2941226 (Tracking ID: 2915063) SYMPTOM: System panic with following stack during detaching plex of volume in CVM environment. vol_klog_findent() vol_klog_detach() vol_mvcvm_cdetsio_callback() vol_klog_start() voliod_iohandle() voliod_loop() DESCRIPTION: During plex-detach operation VxVM searches the plex object to be detached in kernel. In case if there is some transaction in progress on any diskgroup in the system, incorrect plex object gets selected sometime, which results into dereference of invalid address and panics the system. RESOLUTION: Code changes done to make sure that correct plex object is getting selected. * 2941234 (Tracking ID: 2899173) SYMPTOM: In CVR environment, SRL failure may result into vxconfigd hang and eventually resulting into 'vradmin stoprep' command hang. DESCRIPTION: 'vradmin stoprep' command is hung because vxconfigd is waiting indefinitely in transaction. Transaction was waiting for IO completion on SRL. We generate error handler to handle IO failure on SRL. But if we are in transaction, this error was not getting handled properly resulting into transaction hang. RESOLUTION: Fix is provided such that when SRL failure is encountered, transaction itself handles IO error on SRL. * 2941237 (Tracking ID: 2919318) SYMPTOM: In a CVM environment with fencing enabled, wrong fencing keys are registered for opaque disks during node join or dg import operations. DESCRIPTION: During cvm node join and shared dg import code path, when opaque disk registration happens, fencing keys in internal dg records are not in sync with actual keys generated. This was causing wrong fencing keys registered for opaque disks. For rest disks fencing key registration happens correctly. RESOLUTION: Fix is to copy correctly generated key to internal dg record for current dg import/node join scenario and use it for disk registration. * 2941252 (Tracking ID: 1973983) SYMPTOM: Relocation is failing with following error when DCO(data change object) plex is in disabled state. VxVM vxrelocd ERROR V-5-2-600 Failure recovering in disk group DESCRIPTION: When a mirror-plex is added to a volume using "vxassist snapstart", attached DCO plex can be in DISABLED/DCOSNP state. While recovering such DCO plexes, if enclosure is disabled, plex can get in DETACHED/DCOSNP state and relocation fails. RESOLUTION: Code changes are made to handle DCO plexs in disabled state in relocation. * 2942166 (Tracking ID: 2942609) SYMPTOM: You will see following message as error message when quiting from Dynamic Reconfiguration Tool. "FATAL: Exiting the removal operation." DESCRIPTION: When user quits from an operation, Dynamic Reconfiguration Tool displays it is quiting as error message. RESOLUTION: Made changes to display the message as Info. * 2944708 (Tracking ID: 1725593) SYMPTOM: The 'vxdmpadm listctlr' command does not show the count of device paths seen through it DESCRIPTION: The 'vxdmpadm listctlr' currently does not show the number of device paths seen through it. The CLI option has been enhanced to provide this information as an additional column at the end of each line in the CLI's output RESOLUTION: The number of paths under each controller is counted and the value is displayed as the last column in the 'vxdmpadm listctlr' CLI output * 2944710 (Tracking ID: 2744004) SYMPTOM: When VVR is configured, vxconfigd on secondary gets hung. Any vx commands issued during this time does not complete. DESCRIPTION: Vxconfigd is waiting for IOs to drain before allowing a configuration change command to proceed. The IOs never drain completely resulting into the hang. This is because there is a deadlock where pending IOs are unable to start and vxconfigd keeps waiting for their completion. RESOLUTION: Changed the code so that this deadlock does not arise. The IOs can be started properly and complete allowing vxconfigd to function properly. * 2944714 (Tracking ID: 2833498) SYMPTOM: vxconfigd daemon hangs in vol_ktrans_commit() while reclaim operation is in progress on volumes having instant snapshots. Stack trace is given below: vol_ktrans_commit volconfig_ioctl DESCRIPTION: Storage reclaim leads to the generation of special IOs (termed as Reclaim IOs), which can be very large in size(>4G) and unlike application IOs, these are not broken into smaller sized IOs. Reclaim IOs need to be tracked in snapshot maps if the volume has full snapshots configured. The mechanism to track reclaim IO is not capable of handling such large IOs causing hang. RESOLUTION: Code changes are made to use the alternative mechanism in Volume manager to track the reclaim IOs. * 2944717 (Tracking ID: 2851403) SYMPTOM: System panics while unloading 'vxio' module when VxVM SmartMove feature is used and the "vxportal" module gets reloaded (for e.g. during VxFS package upgrade). Stack trace looks like: vxportalclose() vxfs_close_portal() vol_sr_unload() vol_unload() DESCRIPTION: During a smart-move operation like plex attach, VxVM opens the 'vxportal' module to read in-use file system maps information. This file descriptor gets closed only when 'vxio' module is unloaded. If the 'vxportal' module is unloaded and reloaded before 'vxio', the file descriptor with 'vxio' becomes invalid and results in a panic. RESOLUTION: Code changes are made to close the file descriptor for 'vxportal' after reading free/invalid file system map information. This ensures that stale file descriptors don't get used for 'vxportal'. * 2944722 (Tracking ID: 2869594) SYMPTOM: Master node would panic with following stack after a space optimized snapshot is refreshed or deleted and master node is selected using 'vxclustadm setmaster' volilock_rm_from_ils vol_cvol_unilock vol_cvol_bplus_walk vol_cvol_rw_start voliod_iohandle voliod_loop thread_start In addition to this, all space optimized snapshots on the corresponding cache object may be corrupted. DESCRIPTION: In CVM, the master node owns the responsibility of maintaining the cache object indexing structure for providing space optimized functionality. When a space optimized snapshot is refreshed or deleted, the indexing structure would get rebuilt in background after the operation is returned. When the master node is switched using 'vxclustadm setmaster' before index rebuild is complete, both old master and new master nodes would rebuild the index in parallel which results in index corruption. Since the index is corrupted, the data stored on space optimized snapshots should not be trusted. I/Os issued on corrupted index would lead to panic. RESOLUTION: When the master role is switched using 'vxclustadm setmaster', the index rebuild on old master node would be safely aborted. Only new master node would be allowed to rebuild the index. * 2944724 (Tracking ID: 2892983) SYMPTOM: vxvol command dumps core with the following stack trace, if executed parallel to vxsnap addmir command strcmp() do_link_recovery trans_resync_phase1() vxvmutil_trans() trans() common_start_resync() do_noderecover() main() DESCRIPTION: During creation of link between two volumes if vxrecover is triggered, vxvol command may not have information about the newly created links. This leads to NULL pointer dereference and dumps core. RESOLUTION: The code has been modified to check if links information is properly present with vxvol command and fail operation with appropriate error message. * 2944725 (Tracking ID: 2910043) SYMPTOM: Frequent swapin/swapout seen due to higher order memory requests DESCRIPTION: In VxVM operations such as plex attach, snapshot resync/reattach issue ATOMIC_COPY IOCTL's. Default I/O size for these operation is 1MB and VxVM allocates this memory from operating system. Memory allocations of such large size can results into swapin/swapout of pages and are not very efficient. In presence of lot of such operations , system may not work very efficiently. RESOLUTION: VxVM has its own I/O memory management module, which allocates pages from operating system and efficiently manage them. Modified ATOMIC_COPY code to make use of VxVM's internal I/O memory pool instead of directly allocating memory from operating system. * 2944727 (Tracking ID: 2919720) SYMPTOM: vxconfigd dumps core in rec_lock1_5() function. rec_lock1_5() rec_lock1() rec_lock() client_trans_start() req_vol_trans() request_loop() main() DESCRIPTION: During any configuration changes in VxVM, vxconfigd locks all involved objects in operations to avoid any unexpected modification. Some objects which do not belong to the context of current transactions are not handled properly which resuls in core dump. This case is particularly seen during snapshots operation of cross-dg linked volume snapshots. RESOLUTION: Code changes are done to avoid locking of records which are not yet part of the committed VxVM configuration. * 2944729 (Tracking ID: 2933138) SYMPTOM: System panics with stack trace given below: voldco_update_itemq_chunk() voldco_chunk_updatesio_start() voliod_iohandle() voliod_loop() DESCRIPTION: While tracking IOs in snapshot MAPS information is stored in- memory pages. For large sized IOs (such as reclaim IOs), this information can span across multiple pages. Sometimes the pages are not properly referenced in MAP update for IOs of larger size which lead to panic because of invalid page addresses. RESOLUTION: Code is modified to properly reference pages during MAP update for large sized IOs. * 2944741 (Tracking ID: 2866059) SYMPTOM: When disk resize fails, following messages can appear on screen: 1. "VxVM vxdisk ERROR V-5-1-8643 Device : resize failed: One or more subdisks do not fit in pub reg" or 2. "VxVM vxdisk ERROR V-5-1-8643 Device : resize failed: Cannot remove last disk in disk group" DESCRIPTION: In first message extra information should be provided like which subdisk is under consideration and what are subdisk and public region lengths etc. After vxdisk resize fails with the second message, if -f(force) option is used, resize operation succeeds. This message can be improved by suggesting the user to use -f (force) option for resizing RESOLUTION: Code changes are made to improve the error messages. * 2962257 (Tracking ID: 2898547) SYMPTOM: vradmind dumps core on VVR (Veritas Volume Replicator) Secondary site in a CVR (Clustered Volume Replicator) environment. Stack trace would look like: __kernel_vsyscall raise abort fmemopen malloc_consolidate delete delete[] IpmHandle::~IpmHandle IpmHandle::events main DESCRIPTION: When Logowner Service Group is moved across nodes on the Primary Site, it induces deletion of IpmHandle of the old Logowner Node, as the IpmHandle of the new Logowner Node gets created. During destruction of IpmHandle object, a pointer '_cur_rbufp' is not set to NULL, which can lead to freeing up of memory which is already freed, and thus, causing 'vradmind' to dump core. RESOLUTION: Destructor of IpmHandle is modified to set the pointer to NULL after it is deleted. * 2964567 (Tracking ID: 2964547) SYMPTOM: Whenever system reboots, below messages are logged on system console: Oct 10 19:10:01 sol11_server unix: [ID 779321 kern.notice] vxdmp: unable to resolve dependency, Oct 10 19:10:01 sol11_server unix: [ID 969242 kern.notice] cannot load module 'misc/ted' DESCRIPTION: Module 'misc/ted' is part of debug package. It was wrongly getting linked with vxdmp driver for non-debug builds. These are harmless messages. RESOLUTION: Source makefile was modified to remove this dependency for non-debug packages. * 2974870 (Tracking ID: 2935771) SYMPTOM: Rlinks disconnect after switching the master. DESCRIPTION: Sometimes switching a master on the primary can cause the Rlinks to disconnect. vradmin repstatus would show "paused due to network disconnection" as the replication status. VVR uses a connection to check if the secondary is alive. The secondary responds to these requests by replying back, indicating that it is alive. On a master switch, the old master fails to close this connection with the secondary. Thus after the master switch the old master as well as the new master would send the requests to the secondary. This causes a mismatch of connection numbers on the secondary and the secondary does not reply to the requests of the new master. Thus it causes the Rlinks to disconnect. RESOLUTION: The solution is to close the connection of the old master with the secondary, so that it does not keep sending connection requests to the secondary. * 2976946 (Tracking ID: 2919714) SYMPTOM: On a THIN lun, vxevac returns 0 without migrating unmounted VxFS volumes. The following error messages are displayed when an unmounted VxFS volumes is processed: VxVM vxsd ERROR V-5-1-14671 Volume v2 is configured on THIN luns and not mounted. Use 'force' option, to bypass smartmove. To take advantage of smartmove for supporting thin luns, retry this operation after mounting the volume. VxVM vxsd ERROR V-5-1-407 Attempting to cleanup after failure ... DESCRIPTION: On a THIN lun, VM will not move or copy data on an unmounted VxFS volumes unless smartmove is bypassed. The vxevac command fails needs to be enhanced to detect unmounted VxFS volumes on THIN luns and to support a force option that allows the user to bypass smartmove. RESOLUTION: The vxevac script has been modified to check for unmounted VxFS volumes on THIN luns prior to performing the migration. If an unmounted VxFS volume is detected the command fails with a non-zero return code and displays a message notifying the user to mount the volumes or bypass smartmove by specifying the force option: VxVM vxevac ERROR V-5-2-0 The following VxFS volume(s) are configured on THIN luns and not mounted: v2 To take advantage of smartmove support on thin luns, retry this operation after mounting the volume(s). Otherwise, bypass smartmove by specifying the '-f' force option. * 2976956 (Tracking ID: 1289985) SYMPTOM: vxconfigd core dumps upon running "vxdctl enable" command, as vxconfigd is not checking the status value returned by the device when it sends SCSI mode sense command to the device. DESCRIPTION: vxconfigd sends SCSI mode sense command to the device to obtain device information, but it only checks the return value of ioctl(). The return value of ioctl() only stands if there is an error while sending the command to target device. Vxconfigd should also check the value of SCSI status byte returned by the device to get the real status of SCSI command execution. RESOLUTION: The code has been changed to check the value of SCSI status byte returned by the device and it takes appropriate action if status value is nonzero. * 2976974 (Tracking ID: 2875962) SYMPTOM: When an upgrade install is performed from VxVM 5.0MPx to VxVM 5.1(and higher) the installtion script may give the following message: The following files are already installed on the system and are being used by another package: /usr/lib/vxvm/root/kernel/drv/vxapm/dmpsvc.SunOS_5.10 Do you want to install these conflicting files [y, n,?, q] DESCRIPTION: A VxVM 5.0MPx patch incorrectly packaged the IBM SanVC APM with a VxVM patch, which was subsequently corrected in a later patch. Any upgrade performed from that 5.0MPx patch to 5.1 or higher will result in this packaging message. RESOLUTION: Added code to the packaging script of the VxVM package to remove the APM files so that a conflict between VRTSaslapm and VRTSvxvm packages are resolved. * 2978189 (Tracking ID: 2948172) SYMPTOM: Execution of command "vxdisk -o thin, fssize list" can cause hang or panic. Hang stack trace might look like: pse_block_thread pse_sleep_thread .hkey_legacy_gate volsiowait vol_objioctl vol_object_ioctl voliod_ioctl volsioctl_real volsioctl Panic stack trace might look like: voldco_breakup_write_extents volfmr_breakup_extents vol_mv_indirect_write_start volkcontext_process volsiowait vol_objioctl vol_object_ioctl voliod_ioctl volsioctl_real vols_ioctl vols_compat_ioctl compat_sys_ioctl sysenter_dispatch DESCRIPTION: Command "vxdisk -o thin, fssize list" triggers reclaim I/Os to get file system usage from veritas file system on veritas volume manager mounted volumes. We currently do not support reclamation on volumes with space optimized (SO) snapshots. But because of a bug, reclaim IOs continue to execute for volumes with SO Snapshots leading to system panic/hang. RESOLUTION: Code changes are made to not to allow reclamation IOs to proceed on volumes with SO Snapshots. * 2979767 (Tracking ID: 2798673) SYMPTOM: System panic is observed with the stacktrace given below: voldco_alloc_layout voldco_toc_updatesio_done voliod_iohandle voliod_loop DESCRIPTION: DCO (data change object) contains metadata information required to start DCO volume and decode further information from the DCO volume. This information is stored in the 1st block of DCO volume. If this metadata information is incorrect/corrupted, the further processing of volume start resulted into panic due to divide-by-zero error in kernel. RESOLUTION: Code changes are made to verify the correctness of DCO volumes metadata information during startup. If the information read is incorrect, volume start operations fails. * 2983679 (Tracking ID: 2970368) SYMPTOM: SRDF-R2 WD(write-disabled)devices are shown in error state and lots of path enable/disable messages are generated in /etc/vx/dmpevents.log file. DESCRIPTION: DMP(dynamic multi-pathing driver) disables the paths of write protected devices. Therefore these devices are shown in error state. Vxattachd daemon tries to online these devices and executes partial device discovery for these devices. As part of partial device discovery, enabling and disabling the paths of such write protected devices generate lots of path enable/disable messages in /etc/vx/dmpevents.log file. RESOLUTION: This issue is addressed by not disabling paths of write protected devices in DMP. * 3004823 (Tracking ID: 2692012) SYMPTOM: When moving subdisks using vxassist move (or using vxevac command which in turn call vxassist move), if the disk tag are not same for source & destination, the command used to fail with generic message which does not convey exactly why the operation failed. You will see following generic message: VxVM vxassist ERROR V-5-1-438 Cannot allocate space to replace subdisks DESCRIPTION: When moving subdisks using vxassist move, it uses available disks from disk group to move, if no target disk is specified. If these disks have site tag set and value of site tag attribute is not same, then vxassist move is expected to fail. But it fails with generic message that does not specify why the operation failed. Expectation is to introduce message that precisely convey user why the operation failed. RESOLUTION: New message is introduced which precisely conveys that disk failure is due to site tag attribute mismatch. You will see following message along with the generic message that conveys the actual reason for failure: VxVM vxassist ERROR V-5-1-0 Source and/or target disk belongs to site, can not move over sites * 3004852 (Tracking ID: 2886333) SYMPTOM: "vxdg(1M) join" command allowed mixing clone and non-clone disk group. Subsequent import of new joined disk group fails. DESCRIPTION: Mixing of clone and non-clone disk group is not allowed. The part of the code where join operation is done is not validating the mix of clone and non-clone disk group and it was going ahead with the operation. This resulted in the new joined disk group having mix of clone & non-clone disks. Subsequent import of new joined disk group fails. RESOLUTION: During disk group join operation, both the disk groups are checked, if there is a mix of clone and non-clone disk group found, the join operation is failed. * 3005921 (Tracking ID: 1901838) SYMPTOM: After addition of a license key that enables multi-pathing, the state of the controller is still shown as DISABLED in the vxdmpadm CLI output. DESCRIPTION: When the multi-pathing license key is added, the state of active paths of a LUN is changed to ENABLED but the state of the controller is not updated. RESOLUTION: As a fix, whenever multipathing license key is installed, the operation updates the state of the controller in addition to that of the LUN paths. * 3006262 (Tracking ID: 2715129) SYMPTOM: Vxconfigd hangs during Master takeover in a CVM (Clustered Volume Manager) environment. This results in vx command hang. DESCRIPTION: During Master takeover, VxVM (Veritas Volume Manager) kernel signals Vxconfigd with the information of new Master. Vxconfigd then proceeds with a vxconfigd- level handshake with the nodes across the cluster. Before kernel could signal to vxconfigd, vxconfigd handshake mechanism got started, resulting in the hang. RESOLUTION: Code changes are done to ensure that vxconfigd handshake gets started only upon receipt of signal from the kernel. * 3011391 (Tracking ID: 2965910) SYMPTOM: vxassist dumps core with following stack: setup_disk_order() volume_alloc_basic_setup() fill_volume() setup_new_volume() make_trans() vxvmutil_trans() trans() transaction() do_make() main() DESCRIPTION: When -o ordered is used, vxassist handles non-disk parameters in a different way. This scenario may result in invalid comparison, leading to a core dump. RESOLUTION: Code changes are made to handle the parameter comparison logic properly. * 3011444 (Tracking ID: 2398416) SYMPTOM: vxassist dumps core with the following stack: merge_attributes() get_attributes() do_make() main() _start() DESCRIPTION: vxassist dumps core while creating volume when attribute 'wantmirror=ctlr' is added to the '/etc/default/vxassist' file. vxassist reads this default file initially and uses the attributes specified to allocate the storage during the volume creation. However, during the merging of attributes specified in the default file, it accesses NULL attribute structure causing the core dump. RESOLUTION: Necessary code changes have been done to check the attribute structure pointer before accessing it. * 3020087 (Tracking ID: 2619600) SYMPTOM: Live migration of virtual machine having SFHA/SFCFSHA stack with data disks fencing enabled, causes service groups configured on virtual machine to fault. DESCRIPTION: After live migration of virtual machine having SFHA/SFCFSHA stack with data disks fencing enabled is done, I/O fails on shared SAN devices with reservation conflict and causes service groups to fault. Live migration causes SCSI initiator change. Hence I/O coming from migrated server to shared SAN storage fails with reservation conflict. RESOLUTION: Code changes are added to check whether the host is fenced off from cluster. If host is fenced off, then registration key is re-registered for dmpnode through migrated server and restart IO. * 3025973 (Tracking ID: 3002770) SYMPTOM: The system panics with the following stack trace: vxdmp:dmp_aa_recv_inquiry vxdmp:dmp_process_scsireq vxdmp:dmp_daemons_loop unix:thread_start DESCRIPTION: The panic happens while handling the SCSI response for SCSI Inquiry command. In order to determine if the path on which SCSI Inquiry command was issued is read-only, the code needs to check the error buffer. However the error buffer is not always prepared. So the code should examine if the error buffer is valid before further checking. Without such error buffer examination, the system may panic with NULL pointer. RESOLUTION: The source code is modified to verify the error buffer to be valid. * 3026288 (Tracking ID: 2962262) SYMPTOM: When DMP Native Stack support is enabled and some devices are being managed by a multipathing solution other than DMP, then uninstalling DMP fails with an error for not being able to turn off DMP Native Stack support. Performing DMP prestop tasks ...................................... Done The following errors were discovered on the systems: CPI ERROR V-9-40-3436 Failed to turn off dmp_native_support tunable on pilotaix216. Refer to Dynamic Multi-Pathing Administrator's guide to determine the reason for the failure and take corrective action. VxVM vxdmpadm ERROR V-5-1-15690 Operation failed for one or more volume groups The CLI 'vxdmpadm settune dmp_native_support=off' also fails with following error. # vxdmpadm settune dmp_native_support=off VxVM vxdmpadm ERROR V-5-1-15690 Operation failed for one or more volume groups DESCRIPTION: With DMP Native Stack support it is expected that devices which are being used by LVM are multipathed by DMP. Co-existence with other multipath solutions in such cases is not supported. Having some other multipath solution results in this error. RESOLUTION: Code changes have been made to not error out while turning off DMP Native Support if device is not being managed by DMP. * 3027482 (Tracking ID: 2273190) SYMPTOM: The device discovery commands 'vxdisk scandisks' or 'vxdctl enable' issued just after license key installation may fail and abort. DESCRIPTION: After addition of license key that enables multi-pathing, the state of paths maintained at user level is incorrect. RESOLUTION: As a fix, whenever multi-pathing license key is installed, the operation updates the state of paths both at user level and kernel level. INSTALLATION PRE-REQUISITES --------------------------- VRTSvxvm 6.0.300.200 requires VRTSaslapm version 06.00.0100.0202 or higher as a prerequisite. Make sure to install VRTSaslapm version 06.00.0100.0202 at https://sort.symantec.com/asl/details/644. INSTALLING THE PATCH -------------------- o Before-the-upgrade :- (a) Stop applications using any VxVM volumes. (b) Stop I/Os to all the VxVM volumes. (c) Umount any filesystems with VxVM volumes. (d) In case of multiple boot environments, boot using the BE you wish to install the patch on. For Solaris 11 release, refer to the man pages for instructions on using install and uninstall options of 'pkg' command provided with Solaris. Any other special or non-generic installation instructions should be described below as special instructions. The following example installs a patch to a standalone machine: example# pkg install --accept -g /patch_location/VRTSvxvm.p5p VRTSvxvm After 'pkg install' please follow mandatory configuration steps mentioned in special instructions REMOVING THE PATCH ------------------ The following example removes a patch from a standalone system: example# pkg uninstall VRTSvxvm Note: Uninstalling the patch will remove the entire package. If you need earlier version of the package, install it f rom the original source media. SPECIAL INSTRUCTIONS -------------------- 1) Delete '.vxvm-configured' # rm /etc/vx/reconfig.d/state.d/.vxvm-configured 2) Refresh vxvm-configure # svcadm refresh vxvm-configure 3) Delete 'install-db' # rm /etc/vx/reconfig.d/state.d/install-db 4) Reboot the system using shutdown command. You need to use the shutdown command to reboot the system after patch installation or de-installation: OTHERS ------ ~