* * * READ ME * * * * * * Symantec Storage Foundation HA 6.2.1 * * * * * * Patch 200 * * * Patch Date: 2019-04-02 This document provides the following information: * PATCH NAME * OPERATING SYSTEMS SUPPORTED BY THE PATCH * PACKAGES AFFECTED BY THE PATCH * BASE PRODUCT VERSIONS FOR THE PATCH * SUMMARY OF INCIDENTS FIXED BY THE PATCH * DETAILS OF INCIDENTS FIXED BY THE PATCH * INSTALLATION PRE-REQUISITES * INSTALLING THE PATCH * REMOVING THE PATCH PATCH NAME ---------- Symantec Storage Foundation HA 6.2.1 Patch 200 OPERATING SYSTEMS SUPPORTED BY THE PATCH ---------------------------------------- Solaris 11 SPARC PACKAGES AFFECTED BY THE PATCH ------------------------------ VRTSamf VRTSllt VRTSodm VRTSvxfs VRTSvxvm BASE PRODUCT VERSIONS FOR THE PATCH ----------------------------------- * Symantec Cluster Server 6.2 * Symantec Dynamic Multi-Pathing 6.2 * Symantec File System 6.2 * Symantec Storage Foundation 6.2 * Symantec Storage Foundation Cluster File System HA 6.2 * Symantec Storage Foundation for Oracle RAC 6.2 * Symantec Storage Foundation HA 6.2 * Symantec Volume Manager 6.2 SUMMARY OF INCIDENTS FIXED BY THE PATCH --------------------------------------- Patch ID: VRTSvxvm-6.2.1.8300 * 3967041 (3945411) System wasn't able to boot after enabling DMP native support for ZFS boot devices * 3967070 (3966965) VxVM(Veritas Volume Manager) disk reclaiming failed because of getting disk array attributes incorrectly. * 3967114 (3953481) A stale entry of the old disk is left under /dev/[r]dsk even after replacing it. * 3971459 (3964779) Changes to support Solaris 11.4 with Volume Manager * 3971577 (3968020) (Solaris 11.4)Appropriate modules are not loaded on the system after OS upgrade * 3972253 (3953523) vxdisk list not showing DMP managed disks post reboot on sol10 LDOM guest * 3972369 (3928114) Plexes inconsistency reported after rebooting node in FSS environment. * 3972434 (3917636) Filesystems from /etc/fstab file are not mounted automatically on boot through systemd on RHEL7 and SLES12. * 3972437 (3952042) vxdmp iostat memory allocation might cause memory fragmentation and pagecache drop. * 3972457 (3959091) volume was in disable status post reboot on sol10 LDOM guest * 3972978 (3965962) Auto recovery gets triggered at the time of slave join. Patch ID: VRTSvxvm-6.2.1.500 * 3920545 (3795739) In a split brain scenario, cluster formation takes very long time. * 3922253 (3919642) IO hang after switching log owner because of IOs not completely quiesced during log owner change. * 3922254 (3919640) IO hang along with vxconfigd hang on master node because of metadata request SIO(Staged IO) hogging CPU. * 3922255 (3919644) IO hang when switching log owner because of stale flags in RV(Replication Volume) kernel structure. * 3922256 (3919641) IO hang when pausing rlink because of deadlock situation. * 3922257 (3919645) IO hang when switching log owner because of stale information in RV(Replication Volume) kernel structure. * 3922258 (3919643) IO hang when switching log owner because of stale information in RV(Replication Volume) kernel structure. * 3927482 (3895950) vxconfigd hang observed due to accessing stale/un-initiliazed lock. * 3931027 (3911930) Provide a way to clear the PGR_FLAG_NOTSUPPORTED on the device instead of using exclude/include commands * 3931028 (3918356) zpools are imported automatically when DMP(Dynamic Multipathing) native support is set to on which may lead to zpool corruption. * 3931040 (3893150) VxDMP vxdmpadm native ls command sometimes doesn't report imported disks' pool name Patch ID: VRTSvxvm-6.2.1.400 * 3913126 (3910432) When a mirror volume is created two log plexes are created by default. * 3915780 (3912672) "vxddladm assign names" causes ASM disks' user-group ownership/permissions loss affecting Oracle databases on system. * 3915786 (3890602) OS command cfgadm command hangs after reboot when hundreds devices are under DMP's(Dynamic Multi-Pathing) control. * 3917323 (3917786) Storage of cold data on dedicated SAN storage spaces increases storage cost and maintenance.Move cold data from local storage to cloud storage. Patch ID: VRTSvxvm-6.2.1.300 * 3802857 (3726110) On systems with high number of CPUs, Dynamic Multi-Pathing (DMP) devices may perform considerably slower than OS device paths. * 3803497 (3802750) VxVM (Veritas Volume Manager) volume I/O-shipping functionality is not disabled even after the user issues the correct command to disable it. * 3812192 (3764326) VxDMP(Veritas Dynamic Multi-Pathing) repeatedly reports "failed to get devid". * 3847745 (3899198) VxDMP (Veritas Dynamic MultiPathing) causes system panic after a shutdown/reboot. * 3850890 (3603792) The first boot after live upgrade to new version of Solaris 11 and VxVM (Veritas Volume Manager) takes long time. * 3851117 (3662392) In the Cluster Volume Manager (CVM) environment, if I/Os are getting executed on slave node, corruption can happen when the vxdisk resize(1M) command is executing on the master node. * 3852148 (3852146) Shared DiskGroup(DG) fails to import when "-c" and "-o noreonline" options are specified together * 3854788 (3783356) After Dynamic Multi-Pathing (DMP) module fails to load, dmp_idle_vector is not NULL. * 3863971 (3736502) Memory leakage is found when transaction aborts. * 3871040 (3868444) Disk header timestamp is updated even if the disk group(DG) import fails. * 3874737 (3874387) Disk header information is not logged to the syslog sometimes even if the disk is missing and dg import fails. * 3875933 (3737585) "Uncorrectable write error" or panic with IOHINT in VVR (Veritas Volume Replicator) environment * 3877637 (3878030) Enhance VxVM DR tool to clean up OS and VxDMP device trees without user interaction. * 3879334 (3879324) VxVM DR tool fails to handle busy device problem while LUNs are removed from OS * 3880573 (3886153) vradmind daemon core dump occurs in a VVR primary-primary configuration because of assert() failure. * 3881334 (3864063) Application I/O hangs because of a race between the Master Pause SIO (Staging I/O) and the Error Handler SIO. * 3881335 (3867236) Application IO hang happens because of a race between Master Pause SIO(Staging IO) and RVWRITE1 SIO. * 3889284 (3878153) VVR 'vradmind' deamon core dump. * 3889850 (3878911) QLogic driver returns an error due to Incorrect aiusize in FC header * 3890666 (3882326) vxconfigd core dump when slice of a device is exported from control domain to LDOM (Logical Domain) * 3893134 (3864318) Memory consuming keeps increasing when reading/writing data against VxVM volume with big block size. * 3893950 (3841242) Use of deprecated APIs provided by Oracle, may result in a system hang. * 3894783 (3628743) New BE takes too much time to startup during live upgrade on Solaris 11.2 * 3897764 (3741003) After removing storage from one of multiple plex in a mirrored DCO (Data Change Object) volume, entire DCO volume is detached and DCO object is having BADLOG flag marked because of a flag reset missing. * 3898129 (3790136) File system hang observed due to IO's in Ditry Region Logging (DRL). * 3898169 (3740730) While creating volume using vxassist CLI, dco log volume length specified at command line was not getting honored. * 3898296 (3767531) In Layered volume layout with FSS configuration, when few of the FSS_Hosts are rebooted, Full resync is happening for non-affected disks on master. * 3902626 (3795739) In a split brain scenario, cluster formation takes very long time. * 3903647 (3868934) System panic happens while deactivate the SIO (staging IO). * 3904008 (3856146) The Solaris sparc 11.2 latest SRUs and Solaris sparc 11.3 System panics during reboot and fails to come up, after turning off the dmp_native_support. * 3904017 (3853151) I/O error occurs when vxrootadm join is triggered. * 3904790 (3795788) Performance degrades when many application sessions open the same data file on the VxVMvolume. * 3904796 (3853049) The display of stats delayed beyond the set interval for vxstat and multiple sessions of vxstat impacted the IO performance. * 3904797 (3857120) Commands like vxdg deport which try to close a VxVM volume might hang. * 3904800 (3860503) Poor performance of vxassist mirroring is observed on some high end servers. * 3904801 (3686698) vxconfigd was getting hung due to deadlock between two threads * 3904802 (3721565) vxconfigd hang is seen. * 3904804 (3486861) Primary node panics when storage is removed while replication is going on with heavy IOs. * 3904805 (3788644) Reuse raw device number when checking for available raw devices. * 3904806 (3807879) User data corrupts because of the writing of the backup EFT GPT disk label during the VxVM disk-group flush operation. * 3904807 (3867145) When VVR SRL occupation > 90%, then output the SRL occupation is shown by 10 percent. * 3904810 (3871750) In parallel VxVM vxstat commands report abnormal disk IO statistic data * 3904811 (3875563) While dumping the disk header information, human readable timestamp was not converted correctly from corresponding epoch time. * 3904819 (3811946) When invoking "vxsnap make" command with cachesize option to create space optimized snapshot, the command succeeds but a plex I/O error message is displayed in syslog. * 3904822 (3755209) The Veritas Dynamic Multi-pathing(VxDMP) device configured in Solaris Logical DOMains(LDOM) guest is disabled when an active controller of an ALUA array is failed. * 3904824 (3795622) With Dynamic Multipathing (DMP) Native Support enabled, Logical Volume Manager (LVM) global_filter is not updated properly in lvm.conf file. * 3904825 (3859009) global_filter of lvm.conf is not updated due to some paths of LVM dmpnode are reused during DDL(Device Discovery Layer) discovery cycle. * 3904833 (3729078) VVR(Veritas Volume Replication) secondary site panic occurs during patch installation because of flag overlap issue. * 3904834 (3819670) When smartmove with 'vxevac' command is run in background by hitting 'ctlr-z' key and 'bg' command, the execution of 'vxevac' is terminated abruptly. * 3904840 (3769927) "vxdmpadm settune dmp_native_support=off" command fails on Solaris. * 3904851 (3804214) VxDMP (Dynamic Multi-Pathing) path enable operation fails after the disk label is changed from guest LDOM. Open fails with error 5 on the path being enabled. * 3904858 (3899568) Adding tunable dmp_compute_iostats to start/stop the iostat gathering persistently. * 3904859 (3901633) vxrsync reports error during rvg sync because of incorrect volume end offset calculation. * 3904861 (3904538) IO hang happens during slave node leave or master node switch because of racing between RV(Replicate Volume) recovery SIO(Staged IO) and new coming IOs. * 3904863 (3851632) Some VxVM commands fail when you use the localized messages. * 3904864 (3769303) System pancis when Cluster Volume Manager (CVM) group is brought online * 3905471 (3868533) IO hang happens because of a deadlock situation. * 3906251 (3806909) Due to some modification in licensing , for STANDALONE DMP, DMP keyless license was not working. * 3907017 (3877571) Disk header is updated even if the dg import operation fails * 3907593 (3660869) Enhance the Dirty region logging (DRL) dirty-ahead logging for sequential write workloads Patch ID: VRTSvxvm-6.2.1.200 * 3795710 (3508122) After one node preempts SCSI-3 reservation for the other node, the I/O from the victim node does not fail. * 3802857 (3726110) On systems with high number of CPUs, Dynamic Multi-Pathing (DMP) devices may perform considerably slower than OS device paths. * 3803497 (3802750) VxVM (Veritas Volume Manager) volume I/O-shipping functionality is not disabled even after the user issues the correct command to disable it. * 3804299 (3804298) Not recording the setting/unsetting of the 'lfailed/lmissing' flag in the syslog * 3812192 (3764326) VxDMP(Veritas Dynamic Multi-Pathing) repeatedly reports "failed to get devid". * 3835562 (3835560) Auto-import of the diskgroup fails if some of the disks in diskgroup are missing. * 3847745 (3677359) VxDMP (Veritas Dynamic MultiPathing) causes system panic after a shutdown or reboot. * 3850890 (3603792) The first boot after live upgrade to new version of Solaris 11 and VxVM (Veritas Volume Manager) takes long time. * 3851117 (3662392) In the Cluster Volume Manager (CVM) environment, if I/Os are getting executed on slave node, corruption can happen when the vxdisk resize(1M) command is executing on the master node. * 3852148 (3852146) Shared DiskGroup(DG) fails to import when "-c" and "-o noreonline" options are specified together * 3854788 (3783356) After Dynamic Multi-Pathing (DMP) module fails to load, dmp_idle_vector is not NULL. * 3859226 (3287880) In a clustered environment, if a node doesn't have storage connectivity to clone disks, then the vxconfigd on the node may dump core during the clone disk group import. * 3862240 (3856146) The Solaris sparc 11.2 latest SRUs and Solaris sparc 11.3 System panics during reboot and fails to come up, after turning off the dmp_native_support. * 3862632 (3769927) "vxdmpadm settune dmp_native_support=off" command fails on Solaris. Patch ID: VRTSodm-6.2.1.8300 * 3972666 (3968788) ODM module failed to load on Solaris 11.4. * 3972873 (3832329) The stat() on /dev/odm/ctl in Solaris 11.2 results in system panic. * 3972887 (3652500) ODM's ktrace command fails with "bad address" error on solaris platform Patch ID: VRTSodm-6.2.1.300 * 3906065 (3757609) CPU usage going high because of contention over ODM_IO_LOCK Patch ID: VRTSvxfs-6.2.1.8300 * 3971860 (3926972) A recovery event can result in a cluster wide hang. * 3972642 (3972641) Handle deprecated Solaris function calls in VxFS code * 3972763 (3968785) VxFS module failed to load on Solaris 11.4. * 3972772 (3929854) Enabling event notification support on CFS for Weblogic watchService on SOLARIS platform * 3972860 (3898565) Solaris no longer supports F_SOFTLOCK. * 3972960 (3905099) VxFS unmount panicked in deactive_super(). * 3972997 (3844820) Removing/Adding vCPU on Solaris could trigger system panic Patch ID: VRTSvxfs-6.2.1.300 * 3734750 (3608239) System panics when deinitializing voprwlock in Solaris. * 3817229 (3762174) fsfreeze and vxdump commands may not work together. * 3896150 (3833816) Read returns stale data on one node of the CFS. * 3896151 (3827491) Data relocation is not executed correctly if the IOTEMP policy is set to AVERAGE. * 3896154 (1428611) 'vxcompress' can spew many GLM block lock messages over the LLT network. * 3896156 (3633683) vxfs thread consumes high CPU while running an application that makes excessive sync() calls. * 3896160 (3808033) When using 6.2.1 ODM on RHEL7, Oracle resource cannot be killed after forced umount via VCS. * 3896218 (3751049) The umountall operation fails on Solaris. * 3896223 (3735697) vxrepquota reports error * 3896248 (3876223) Truncate(fcntl F_FREESP*) on newly created file doesn't update time stamp. * 3896249 (3861713) High %sys CPU seen on Large CPU/Memory configurations. * 3896250 (3870832) Panic due to a race between force umount and nfs lock manager vnode get operation. * 3896261 (3855726) Panic in vx_prot_unregister_all(). * 3896267 (3861271) Missing an inode clear operation when a Linux inode is being de-initialized on SLES11. * 3896269 (3879310) File System may get corrupted after a failed vxupgrade. * 3896270 (3707662) Race between reorg processing and fsadm timer thread (alarm expiry) leads to panic in vx_reorg_emap. * 3896273 (3558087) The ls -l and other commands which uses stat system call may take long time to complete. * 3896277 (3691633) Remove RCQ Full messages * 3896281 (3830300) Degraded CPU performance during backup of Oracle archive logs on CFS vs local filesystem * 3896285 (3757609) CPU usage going high because of contention over ODM_IO_LOCK * 3896303 (3762125) Directory size increases abnormally. * 3896304 (3846521) "cp -p" fails if modification time in nano seconds have 10 digits. * 3896306 (3790721) High cpu usage caused by vx_send_bcastgetemapmsg_remaus * 3896308 (3695367) Unable to remove volume from multi-volume VxFS using "fsvoladm" command. * 3896310 (3859032) System panics in vx_tflush_map() due to NULL pointer de-reference. * 3896311 (3779916) vxfsconvert fails to upgrade layout verison for a vxfs file system with large number of inodes. * 3896312 (3811849) On cluster file system (CFS), while executing lookup() function in a directory with Large Directory Hash (LDH), the system panics and displays an error. * 3896313 (3817734) Direct command to run fsck with -y|Y option was mentioned in the message displayed to user when file system mount fails. * 3896314 (3856363) Filesystem inodes have incorrect blocks. * 3901379 (3897793) Panic happens because of race where the mntlock ID is cleared while mntlock flag still set. * 3903583 (3905607) Internal assert failed during migration. * 3905055 (3880113) Mapbad scenario in case of deletion of cloned files having shared ZFOD extents * 3905056 (3879761) Performance issue observed due to contention on vxfs spin lock vx_worklist_lk. * 3906148 (3894712) ACL permissions are not inherited correctly on cluster file system. * 3907038 (3879799) Due to inconsistent LCT (Link Count Table), Veritas File System (VxFS) mount prompts for full fsck every time. * 3907350 (3817734) Direct command to run fsck with -y|Y option was mentioned in the message displayed to user when file system mount fails. Patch ID: VRTSvxfs-6.2.1.100 * 3754492 (3761603) Internal assert failure because of invalid extop processing at the mount time. * 3756002 (3764824) Internal cluster file system(CFS) testing hit debug assert * 3769992 (3729158) Deadlock due to incorrect locking order between write advise and dalloc flusher thread. * 3817120 (3804400) VRTS/bin/cp does not return any error when quota hard limit is reached and partial write is encountered. Patch ID: VRTSvxfs-6.2.1.000 * 3657150 (3604071) High CPU usage consumed by the vxfs thread process. * 3657152 (3602322) Panic while flushing the dirty pages of the inode * 3657153 (3622323) Cluster Filesystem mounted as read-only panics when it gets sharing and/or compression statistics with the fsadm_vxfs(1M) command. * 3657156 (3604750) The kernel loops during the extent re-org. * 3657157 (3617191) Checkpoint creation takes a lot of time. * 3657158 (3601943) Truncating corrupted block map of a file may lead to an infinite loop. * 3657491 (3657482) Stress test on cluster file system fails due to data corruption * 3665980 (2059611) The system panics due to a NULL pointer dereference while flushing bitmaps to the disk. * 3665984 (2439261) When the vx_fiostats_tunable value is changed from zero to non-zero, the system panics. * 3665990 (3567027) During the File System resize operation, the "fullfsck flag is set. * 3666009 (3647749) On Solaris, an obsolete v_path is created for the VxFS vnode. * 3666010 (3233276) With a large file system, primary to secondary migration takes longer duration. * 3677165 (2560032) System panics after SFHA is upgraded from 5.1SP1 to 5.1SP1RP2 or from 6.0.1 to 6.0.5 * 3688210 (3689104) The module version of the vxcafs module is not displayed after the modinfo vxcafs command is run on Solaris. * 3697966 (3697964) The vxupgrade(1M) command fails to retain the fs_flags after upgrading a file system. * 3699953 (3702136) Link Count Table (LCT) corruption is observed while mounting a file system on a secondary node. * 3715567 (3715566) VxFS fails to report an error when the maxlink and nomaxlink options are set on file systems having disk layout version (DLV) lower than 10. * 3718542 (3269553) VxFS returns inappropriate message for read of hole via Oracle Disk Manager (ODM). * 3721458 (3721466) After a file system is upgraded from version 6 to 7, the vxupgrade(1M) command fails to set the VX_SINGLEDEV flag on a superblock. * 3725347 (3725346) Trimming of underlying SSD volume was not supported for AIX and Solar is using "fsadm -R -o ssd" command. * 3725569 (3731678) During an internal test, a debug assert was observed while handling the error scenario. * 3726403 (3739618) sfcache command with "-i" option maynot show filesystem cache statistic periodically. * 3729111 (3729104) Man pages changes missing for smartiomode option of mount_vxfs (1M) * 3729704 (3719523) 'vxupgrade' retains the superblock replica of old layout versions. * 3736133 (3736772) The sfcache(1M) command does not automatically enable write-back caching on file system once the cache size is increased to enable write-back caching. * 3743913 (3743912) Users could create sub-directories more than 64K for disk layouts having versions lower than 10. * 3755796 (3756750) VxFS may leak memory when File Design Driver (FDD) module is unloaded before the cache file system is taken offline. Patch ID: VRTSvxfs-6.2.0.100 * 3703631 (3615043) Data loss when writing to a file while dalloc is on. Patch ID: VRTSamf-6.2.1.1100 * 3973131 (3970679) Veritas Infoscale Availability does not support Oracle Solaris 11.4. Patch ID: VRTSamf-6.2.1.100 * 3864321 (3862933) On Solaris 11.2 SRU 8 or above with Asynchronous monitoring Framework (AMF) enabled, VCS agent processes may not respond or may encounter AMF errors during registration. Patch ID: VRTSllt-6.2.1.1100 * 3973130 (3970679) Veritas Infoscale Availability does not support Oracle Solaris 11.4. DETAILS OF INCIDENTS FIXED BY THE PATCH --------------------------------------- This patch fixes the following incidents: Patch ID: VRTSvxvm-6.2.1.8300 * 3967041 (Tracking ID: 3945411) SYMPTOM: system kept in cyclic reboot after enabling DMP(Dynamic Multi-Pathing) native support for ZFS boot devices with below error: NOTICE: VxVM vxdmp V-5-0-1990 driver version VxVM Multipathing Driver installed WARNING: VxVM vxdmp V-5-3-2103 dmp_claim_device: Boot device not found in OS tree NOTICE: zfs_parse_bootfs: error 19 Cannot mount root on rpool/40 fstype zfs panic[cpu0]/thread=20012000: vfs_mountroot: cannot mount root Warning - stack not written to the dumpbuf 000000002000fa00 genunix:main+1dc () DESCRIPTION: The boot device was under DMP control after enabling DMP native support. Hence DMP failed to get its device number by inquiring the device under OS control, hence the issue. RESOLUTION: code changes were made to get the correct device number of boot device. * 3967070 (Tracking ID: 3966965) SYMPTOM: Reclaiming disks from Infinibox disk array failed. DESCRIPTION: The disk array attributes should be composed as name-value pair separated by '=', but separator '=' was missed, hence disk reclaiming reports wrong attribute and fails. RESOLUTION: Code changes have been made to correct this problem. * 3967114 (Tracking ID: 3953481) SYMPTOM: A stale entry of a replaced disk was left behind under /dev/[r]dsk to represent the replaced disk. DESCRIPTION: Whenever a disk is removed from DMP view, the driver property information of the disk has to be removed from the kernel, if not a stale entry will be left out under the /dev/[r]dsk. Now when a new disk replaces with the same minor number, instead of refreshing the property, the stale information is left. RESOLUTION: Code is modified to remove the stale device property when a disk is removed. * 3971459 (Tracking ID: 3964779) SYMPTOM: Current load of Vxvm modules i.e vxio and vxspec is failing on Solaris 11.4 DESCRIPTION: The function page_numtopp_nolock has been replaced and renamed as pp_for_pfn_canfail. The _depends_on has been deprecated and cannot be used. VxVM was making use of the attribute to specify the dependency between the modules. RESOLUTION: The changes are mainly around the way we handle unmapped buf in vxio driver. The Solaris API that we were using is no longer valid and is a private API. Replaced hat_getpfnum() -> ppmapin/ppmapout calls with bp_copyin/bp_copyout in I/O code path. In ioshipping, replaced it with miter approach and hat_kpm_paddr_mapin()/hat_kpm_paddr_mapout. * 3971577 (Tracking ID: 3968020) SYMPTOM: If the OS of the system is upgraded from 11.3 to 11.4 , the newer modules fail to load DESCRIPTION: If the OS of the system is upgraded from 11.3 to 11.4 then on reboot newer modules should be loaded pertaining to Solaris 11.4. But since the check for the file is there, loading of the module is skipped on boot. RESOLUTION: Changes have been done to verify the cksum of the modules and then decide where new module replacing is required or not even if the .vxvm-configure file is present. * 3972253 (Tracking ID: 3953523) SYMPTOM: vxdisk list not showing DMP(Dynamic Multi-Pathing) managed disks post reboot on sol10 LDOM guest. DESCRIPTION: Due to some dependency issues between LDOM drivers and DMP deriver, DMP failed to manage all the devices. RESOLUTION: Code changes have been done to refresh OS device tree before startup. * 3972369 (Tracking ID: 3928114) SYMPTOM: Filesystem was corrupted rightly after node joining the cluster in FSS configuration. DESCRIPTION: When rebooting node in FSS environment, corresponding plex would be detached. Subsequent I/O would be marked dirty on corresponding detach map. Due to code error, the offset of detach map was wrongly calculated, hence regions that should be dirtied are not correctly marked. Hence, during the recovery after reboot, dirty regions that should be synced after reattach are not synced, and this result in plexes inconsistency and data corruption. RESOLUTION: Code change has been done to calculate correct offset of detach map. * 3972434 (Tracking ID: 3917636) SYMPTOM: Filesystems from /etc/fstab file are not mounted automatically on boot through systemd on RHEL7 and SLES12. DESCRIPTION: While bootup, when systemd tries to mount using the devices mentioned in /etc/fstab file on the device, the device is not accessible leading to the failure of the mount operation. As the device discovery happens through udev infrastructure, the udev-rules for those devices need to be run when volumes are created so that devices get registered with systemd. In the case udev rules are executed even before the devices in "/dev/vx/dsk" directory are created. Since the devices are not created, devices will not be registered with systemd leading to the failure of mount operation. RESOLUTION: Run "udevadm trigger" to execute all the udev rules once all volumes are created so that devices are registered. * 3972437 (Tracking ID: 3952042) SYMPTOM: dmpevents.log is flooding with below messages: Tue Jul 11 09:28:36.620: Lost 12 DMP I/O statistics records Tue Jul 11 10:05:44.257: Lost 13 DMP I/O statistics records Tue Jul 11 10:10:05.088: Lost 6 DMP I/O statistics records Tue Jul 11 11:28:24.714: Lost 6 DMP I/O statistics records Tue Jul 11 11:46:35.568: Lost 1 DMP I/O statistics records Tue Jul 11 12:04:10.267: Lost 13 DMP I/O statistics records Tue Jul 11 12:04:16.298: Lost 5 DMP I/O statistics records Tue Jul 11 12:44:05.656: Lost 31 DMP I/O statistics records Tue Jul 11 12:44:38.855: Lost 2 DMP I/O statistics records DESCRIPTION: when DMP (Dynamic Multi-Pathing) expand iostat table, DMP allocates a new larger table, replaces the old table with the new one and frees the old one. This increases the possibility of memory fragmentation. RESOLUTION: The code is modified to increase the initial value for iostat table. * 3972457 (Tracking ID: 3959091) SYMPTOM: volume was in disable status post reboot on sol10 LDOM guest. DESCRIPTION: Due to some dependency issues between LDOM drivers and DMP(Dynamical Multi-Pathing) driver, DMP wasn't able to manage all devices when start volumes during system booting. In this case, any of those devices were under a volume, this volume might be marked disabled before startup. hence the issue. RESOLUTION: Code changes have been done to reattach devices and restart volume before startup. * 3972978 (Tracking ID: 3965962) SYMPTOM: Following process is seen in the "ps -ef" output /bin/sh - /usr/sbin/auto_recover DESCRIPTION: When there are plexes in the configuration which needs recovery, then if there are events such as master initialization , master takeover , auto recovery gets triggered at the time of slave join. RESOLUTION: Code changes are done, to allow admins to prevent auto recovery at the time of slave join. Patch ID: VRTSvxvm-6.2.1.500 * 3920545 (Tracking ID: 3795739) SYMPTOM: In a split brain scenario, cluster formation takes very long time. DESCRIPTION: In a split brain scenario, the surviving nodes in the cluster try to preempt the keys of nodes leaving the cluster. If the keys have been already preempted by one of the surviving nodes, other surviving nodes will receive UNIT Attention. DMP (Dynamic Multipathing) then retries the preempt command after a delayof 1 second if it receives Unit attention. Cluster formation cannot complete untill PGR keys of all the leaving nodes are removed from all the disks. If the number of disks are very large, the preemption of keys takes a lot of time, leading to the very long time for cluster formation. RESOLUTION: The code is modified to avoid adding delay for first couple of retries when reading PGR keys. This allows faster cluster formation with arrays that clear the Unit Attention condition sooner. * 3922253 (Tracking ID: 3919642) SYMPTOM: IO hang may happen after log owner switch. DESCRIPTION: IOs are partly quiesced during log owner change, result in some updates' log end process lost hence upcoming IO hang. RESOLUTION: Code changes have been mode to quiesce IO completely during log owner switch. * 3922254 (Tracking ID: 3919640) SYMPTOM: When log owner on slave and SRL overflow triggered, IO hang and vxconfigd hang may happen if there're heavy IO load from master node. DESCRIPTION: After SRL(Serialized Replicate Log) overflow, log owner on slave node sent DCM(Data Change Map) active request to master node. Master node process the request when therere IO daemons idle. But all IO daemons are busy in sending and retry sending metadata request to log owner node, as log owner is migrating to DCM mode, the request couldnt be processed, hence the hang. RESOLUTION: Code changes have been mode to fix the issue. * 3922255 (Tracking ID: 3919644) SYMPTOM: When switching log owner in case rlink is DCM(Data Change Map) mode, IO hang may happen when the log owner switches back. DESCRIPTION: In case log owner switch happens in DCM mode, as some RV kernel flag reset is missed. When the log owner switched back after DCM replay, data inconsistent make DCM activated again and IO hang may happen. RESOLUTION: Code changes have been mode to clear stale information during log owner switch. * 3922256 (Tracking ID: 3919641) SYMPTOM: In case of log owner configured on slave node, when pausing rlink from master node and adding IO from log owner node, IO hang on log owner node. DESCRIPTION: Deadlock may happen between Master Pause SIO(Staged IO) and Error Handler SIO. Master pause SIO needs to disconnect rlink, results in RV serialized by invoking error handler SIO. Serialized RV(Replicate Volume) prevents master pause SIO from re-starting. As error handler SIO depends on master pause SIO done to complete so that RV can't get out of serialized state. RESOLUTION: Code changes have been mode to fix the deadlock issue. * 3922257 (Tracking ID: 3919645) SYMPTOM: When switching log owner in case rlink is DCM(Data Change Map) mode, IO hang may happen when the log owner switches back. DESCRIPTION: In case log owner switch happens in DCM mode, as some RV kernel information reset is missed. When the log owner switched back after DCM replay, data inconsistent make DCM activated again and IO hang may happen. RESOLUTION: Code changes have been mode to clear stale information during log owner switch. * 3922258 (Tracking ID: 3919643) SYMPTOM: When switching log owner in case rlink is in SRL(Serialized Replicate Log) flush state, IO hang may happen when the log owner switches back. DESCRIPTION: SRL flush start/end position is stored in RV kernel structure and reset of these positions is missed during log owner switch, so when log owner switched back this node will continue to perform SRL flush operation. As SRL flush has been completed by other node and SRL may contain new updates, readback from SRL may contain invalid updates, SRL flush couldn't continue, hence the hang. RESOLUTION: Code changes have been mode to clear stale information during log owner switch. * 3927482 (Tracking ID: 3895950) SYMPTOM: vxconfigd hang may be observed all of a sudden. The following stack will be seen as part of threadlist: slock() .disable_lock() volopen_isopen_cluster() vol_get_dev() volconfig_ioctl() volsioctl_real() volsioctl() vols_ioctl() rdevioctl() spec_ioctl() vnop_ioctl() vno_ioctl() common_ioctl(??, ??, ??, ??) DESCRIPTION: Some of the critical structures in the code are protected with lock to avoid simultaneous modification. A particular lock structure gets copied to the local stack memory. In this case the structure might have information about the state of the lock and also at the time of copy that lock structure might be in an intermediate state. When function tries to access such type of lock structure, the result could lead to panic or hang since that lock structure might be in some unknown state. RESOLUTION: When we make local copy of the structure, no one is going to modify the new local structure and hence acquiring lock is not required while accessing this copy. * 3931027 (Tracking ID: 3911930) SYMPTOM: Valid PGR operations sometimes fail on a dmpnode. DESCRIPTION: As part of the PGR operations, if the inquiry command finds that PGR is not supported on the dmpnode node, a flag PGR_FLAG_NOTSUPPORTED is set on the dmpnode. Further PGR operations check this flag and issue PGR commands only if this flag is NOT set. This flag remains set even if the hardware is changed so as to support PGR. RESOLUTION: A new command (namely enablepr) is provided in the vxdmppr utility to clear this flag on the specified dmpnode. * 3931028 (Tracking ID: 3918356) SYMPTOM: zpools are imported automatically when DMP native support is set to on which may lead to zpool corruption. DESCRIPTION: When DMP native support is set to on all zpools are imported using DMP devices so that when the import happens for the same zpool again it is automatically imported using DMP device. In clustered environment if the import of the same zpool is triggered on two different nodes at the same time it can lead to zpool corruption. A way needs to be provided so that zpools are not imported. RESOLUTION: Changes are made to provide a way to customer to not import the zpools if required. The way is to set the variable auto_import_exported_pools to off in the file /var/adm/vx/native_input like below: bash:~# cat /var/adm/vx/native_input auto_import_exported_pools=off * 3931040 (Tracking ID: 3893150) SYMPTOM: VxDMP(Veritas Dynamic Multi-Pathing) vxdmpadm native ls command sometimes doesn't report imported disks' pool name DESCRIPTION: When Solaris pool is imported with extra options like -d or -R, paths in 'zpool status ' can be disk full path. 'Vxdmpadm native ls' command doesn't handle such situation hence fails to report its pool name. RESOLUTION: Code changes have been made to correctly handle disk full path to get its pool name. Patch ID: VRTSvxvm-6.2.1.400 * 3913126 (Tracking ID: 3910432) SYMPTOM: When a mirror volume is created two log plexes are created by default. DESCRIPTION: - For a mirror/RAID 5 volumes a log plex is reqquired. Due to a bug in the code we are creating two log plex records while creation of a volume. RESOLUTION: Code changes are done to create a single log plex by default. * 3915780 (Tracking ID: 3912672) SYMPTOM: "vxddladm assign names" command results in the ASM disks losing their ownership/permission settings which may affect the Oracle databases DESCRIPTION: The command "vxddladm assign names" calls a function which creates raw and block device nodes and set the user-group ownership/permissions as per the mode stored in in-memory record structure. The in-memory records are not getting created (it is NULL) before going to that function. Hence no setting of permissions after device nodes' creation. RESOLUTION: Code changes are done to make sure that in-memory records of user-group ownership/permissions of each dmpnode from vxdmprawdev file gets created before the function call which creates device nodes and sets permissions on them. * 3915786 (Tracking ID: 3890602) SYMPTOM: OS command cfgadm command hangs after reboot when hundreds devices are under DMP's control. DESCRIPTION: DMP generates the same entry for each of the partitions (8). A large number of vxdmp properties that devfsadmd has to touch causes anything that is touching devlinks to temporarily hang behind it. RESOLUTION: Code changes have been done to reduce the properties count by a factor of 8. * 3917323 (Tracking ID: 3917786) SYMPTOM: Storage of cold data on dedicated SAN storage spaces increases storage cost and maintenance. DESCRIPTION: The local storage capacity is consumed by cold or legacy files which are not consumed or processed frequently. These files occupy dedicated SAN storage space, which is expensive. Moving such files to public or private S3 cloud storage services is a better cost-effective solution. Additionally, cloud storage is elastic allowing varying service levels based on changing needs. Operational charges apply for managing objects in buckets for public cloud services using the Storage Transfer Service. RESOLUTION: You can now migrate or move legacy data from local SAN storage to a target Private or public cloud. Patch ID: VRTSvxvm-6.2.1.300 * 3802857 (Tracking ID: 3726110) SYMPTOM: On systems with high number of CPUs, DMP devices may perform considerably slower than OS device paths. DESCRIPTION: In high CPU configuration, I/O statistics related functionality in DMP takes more CPU time because DMP statistics are collected on per CPU basis. This stat collection happens in DMP I/O code path hence it reduces the I/O performance. Because of this, DMP devices perform slower than OS device paths. RESOLUTION: The code is modified to remove some of the stats collection functionality from DMP I/O code path. Along with this, the following tunable need to be turned off: 1. Turn off idle lun probing. #vxdmpadm settune dmp_probe_idle_lun=off 2. Turn off statistic gathering functionality. #vxdmpadm iostat stop Notes: 1. Please apply this patch if system configuration has large number of CPU and if DMP performs considerably slower than OS device paths. For normal systems this issue is not applicable. * 3803497 (Tracking ID: 3802750) SYMPTOM: Once VxVM (Veritas Volume Manager) volume I/O-shipping functionality is turned on, it is not getting disabled even after the user issues the correct command to disable it. DESCRIPTION: VxVM (Veritas Volume Manager) volume I/O-shipping functionality is turned off by default. The following two commands can be used to turn it on and off: vxdg -g <dgname> set ioship=on vxdg -g <dgname> set ioship=off The command to turn off I/O-shipping is not working as intended because I/O-shipping flags are not reset properly. RESOLUTION: The code is modified to correctly reset I/O-shipping flags when the user issues the CLI command. * 3812192 (Tracking ID: 3764326) SYMPTOM: VxDMP repeatedly reports warning messages in system log: WARNING: VxVM vxdmp V-5-0-2046 : Failed to get devid for device 0x70259720 WARNING: VxVM vxdmp V-5-3-2065 dmp_devno_to_devidstr ldi_get_devid failed for devno 0x13800000a60 DESCRIPTION: Due to VxDMP code issue, the device path name is inconsistent during creation and deletion. It leaves stale device file under /devices. Because some devices don't support Solaris devid operations, the devid related functions fail against such devices. VxDMP doesn't skip such devices when creating or removing minor nodes. RESOLUTION: The code is modified to address the device path name inconsistency and skip devid manipulation for third party devices. * 3847745 (Tracking ID: 3899198) SYMPTOM: VxDMP causes system panic after a shutdown or reboot and displays the following stack trace: vpanic volinfo_ioctl() volsioctl_real() ldi_ioctl() dmp_signal_vold() dmp_throttle_paths() dmp_process_stats() dmp_daemons_loop() thread_start() DESCRIPTION: In a special scenario of system shutdown/reboot, the DMP (Dynamic MultiPathing) restore daemon tries to call the ioctl functions in VXIO module which is being unloaded and this causes system panic. RESOLUTION: The code is modified to stop the DMP I/O restore daemon before system shutdown/reboot. * 3850890 (Tracking ID: 3603792) SYMPTOM: The first boot after live upgrade to new version of Solaris 11 and VxVM takes long time since post installation stalled for a long time. DESCRIPTION: In Solaris 11, the OS command devlinks which used to add /dev entries stalled for a long time in post installation of VxVM. The OS command devfsadm should be used in the post-install script. RESOLUTION: The code is modified to replace devlinks with devfsadm in the post installation process of VxVM. * 3851117 (Tracking ID: 3662392) SYMPTOM: In the CVM environment, if I/Os are getting executed on slave node, corruption can happen when the vxdisk resize(1M) command is executing on the master node. DESCRIPTION: During the first stage of resize transaction, the master node re-adjusts the disk offsets and public/private partition device numbers. On a slave node, the public/private partition device numbers are not adjusted properly. Because of this, the partition starting offset is are added twice and causes the corruption. The window is small during which public/private partition device numbers are adjusted. If I/O occurs during this window then only corruption is observed. After the resize operation completes its execution, no further corruption will happen. RESOLUTION: The code has been changed to add partition starting offset properly to an I/O on slave node during execution of a resize command. * 3852148 (Tracking ID: 3852146) SYMPTOM: In a CVM cluster, when importing a shared diskgroup specifying both -c and -o noreonline options, the following error may be returned: VxVM vxdg ERROR V-5-1-10978 Disk group : import failed: Disk for disk group not found. DESCRIPTION: The -c option will update the disk ID and disk group ID on the private region of the disks in the disk group being imported. Such updated information is not yet seen by the slave because the disks have not been re-onlined (given that noreonline option is specified). As a result, the slave cannot identify the disk(s) based on the updated information sent from the master, causing the import to fail with the error Disk for disk group not found. RESOLUTION: The code is modified to handle the working of the "-c" and "-o noreonline" options together. * 3854788 (Tracking ID: 3783356) SYMPTOM: After DMP module fails to load, dmp_idle_vector is not NULL. DESCRIPTION: After DMP module load failure, DMP resources are not cleared off from the system memory, so some of the resources are in NON-NULL value. When system retries to load, it frees invalid data, leading to system panic with error message BAD FREE, because the data being freed is not valid at that point. RESOLUTION: The code is modified to clear up the DMP resources when module failure happens. * 3863971 (Tracking ID: 3736502) SYMPTOM: When FMR is configured in VVR environment, 'vxsnap refresh' fails with below error message: "VxVM VVR vxsnap ERROR V-5-1-10128 DCO experienced IO errors during the operation. Re-run the operation after ensuring that DCO is accessible". Also, multiple messages of connection/disconnection of replication link(rlink) are seen. DESCRIPTION: Inherently triggered rlink connection/disconnection causes the transaction retries. During transaction, memory is allocated for Data Change Object(DCO) maps and is not cleared on abortion of a transaction. This leads to a problem of memory leak and eventually to exhaustion of maps. RESOLUTION: The fix has been added to clear the allocated DCO maps when transaction aborts. * 3871040 (Tracking ID: 3868444) SYMPTOM: Disk header timestamp is updated even if the disk group import fails. DESCRIPTION: While doing dg import operation, during join operation disk header timestamps are updated. This makes difficult for support to understand which disk is having latest config copy if dg import is failed and decision is to be made if force dg import is safe or not. RESOLUTION: Dump the old disk header timestamp and sequence number in the syslog which can be referred on deciding if force dg import would be safe or not * 3874737 (Tracking ID: 3874387) SYMPTOM: Disk header information is not logged to the syslog sometimes even if the disk is missing and dg import fails. DESCRIPTION: In scenarios where disk has config copy enabled and get active disk record, then disk header information was not getting logged even though the disk is missing thereafter dg import fails. RESOLUTION: Dump the disk header information even if the disk record is active and attached to the disk group. * 3875933 (Tracking ID: 3737585) SYMPTOM: Customer encounters following "Uncorrectable write error or below panic in VVR environment: [000D50D0]xmfree+000050 [051221A0]vol_tbmemfree+0000B0 [05122294]vol_memfreesio_start+00001C [05131324]voliod_iohandle+000050 [05131740]voliod_loop+0002D0 [05126438]vol_kernel_thread_init+000024 DESCRIPTION: IOHINT structure allocated from VxFS is also freed by VxFS after IO done from VxVM. IOs to VxVM with VVR needs 2 phases, SRL(Serial Replication Log) write and Data volume write, VxFS gets IO done after SRL write and doesnt wait for Data Volume write completion, so if Data volume write gets started or done after VxFS frees IOHINT, it may cause write IO error or double free panic due to freeing memory wrongly as per corrupted IOHINT info. RESOLUTION: Code changes were done to clone the IOHINT structure before writing to data volume. * 3877637 (Tracking ID: 3878030) SYMPTOM: Enhance VxVM(Veritas Volume Manager) DR(Dynamic Reconfiguration) tool to clean up OS and VxDMP(Veritas Dynamic Multi-Pathing) device trees without user interaction. DESCRIPTION: When users add or remove LUNs, stale entries in OS or VxDMP device trees can prevent VxVM from discovering changed LUNs correctly. It even causes VxVM vxconfigd process core dump under certain conditions, users have to reboot system to let vxconfigd restart again. VxVM has DR tool to help users adding or removing LUNs properly but it requires user inputs during operations. RESOLUTION: Enhancement has been done to VxVM DR tool. It accepts '-o refresh' option to clean up OS and VxDMP device trees without user interaction. * 3879334 (Tracking ID: 3879324) SYMPTOM: VxVM(Veritas Volume Manager) DR(Dynamic Reconfiguration) tool fails to handle busy device problem while LUNs are removed from OS DESCRIPTION: OS devices may still be busy after removing them from OS, it fails 'luxadm - e offline ' operation and leaves staled entries in 'vxdisk list' output like: emc0_65535 auto - - error emc0_65536 auto - - error RESOLUTION: Code changes have been done to address busy devices issue. * 3880573 (Tracking ID: 3886153) SYMPTOM: In a VVR primary-primary configuration, if 'vrstat' command is runing, vradmind core dump may occur with the stack like below: __assert_c99 StatsSession::sessionInitReq StatsSession::processOpReq StatsSession::processOpMsgs RDS::processStatsOpMsg DBMgr::processStatsOpMsg process_message DESCRIPTION: vrstat command initiates StatSession which need to send initilization request to secondary. On Secondary there is assert() to ensure it's secondary that processing the request. In primary-primary configuration it leads to core dump. RESOLUTION: The code changes have been made to fix the issue by returning failure to StatSession initiator. * 3881334 (Tracking ID: 3864063) SYMPTOM: Application I/O hangs after the Master Pause command is issued. DESCRIPTION: Some flags (VOL_RIFLAG_DISCONNECTING or VOL_RIFLAG_REQUEST_PENDING) in VVR (Veritas Volume Replicator) kernel are not cleared because of a race between the Master Pause SIO and the Error Handler SIO. This causes the RU (Replication Update) SIO to fail to proceed, which leads to I/O hang. RESOLUTION: The code is modified to handle the race condition. * 3881335 (Tracking ID: 3867236) SYMPTOM: Application IO hang happens after issuing Master Pause command. DESCRIPTION: The flag VOL_RIFLAG_REQUEST_PENDING in VVR(Veritas Volume Replicator) kernel is not cleared because of a race between Master Pause SIO and RVWRITE1 SIO resulting in RU (Replication Update) SIO to fail to proceed thereby causing IO hang. RESOLUTION: Code changes have been made to handle the race condition. * 3889284 (Tracking ID: 3878153) SYMPTOM: VVR (Veritas Volume Replicator) 'vradmind' deamon core dump in following stack. #0 __kernel_vsyscall () #1 raise () from /lib/libc.so.6 #2 abort () from /lib/libc.so.6 #3 __libc_message () from /lib/libc.so.6 #4 malloc_printerr () from /lib/libc.so.6 #5 _int_free () from /lib/libc.so.6 #6 free () from /lib/libc.so.6 #7 operator delete(void*) () from /usr/lib/libstdc++.so.6 #8 operator delete[](void*) () from /usr/lib/libstdc++.so.6 #9 inIpmHandle::~IpmHandle (this=0x838a1d8, __in_chrg=) at Ipm.C:2946 #10 IpmHandle::events (handlesp=0x838ee80, vlistsp=0x838e5b0, ms=100) at Ipm.C:644 #11 main (argc=1, argv=0xffffd3d4) at srvmd.C:703 DESCRIPTION: Under certain circumstances 'vradmind' daemon may core dump freeing a variable allocated in stack. RESOLUTION: Code change has been done to address the issue. * 3889850 (Tracking ID: 3878911) SYMPTOM: QLogic driver returns following error due to Incorrect aiusize in FC header FC_ELS_MALFORMED, cnt=c60h, size=314h DESCRIPTION: When creating CT pass-through command to be sent, the ct_aiusize we specify in request header does not conform to FT standard. Hence during the sanity check of FT header in OS layer, it reports error and get_topology() failed. RESOLUTION: Code changes have been done so that ct_aiusize is in compliance with FT standard. * 3890666 (Tracking ID: 3882326) SYMPTOM: vxconfigd core dump when slice of a device is exported from control domain to LDOM (Logical Domain) with the following stack: ddl_process_failed_node() ddl_migration_devlist_removed() ddl_reconfig_full() ddl_reconfigure_all() ddl_find_devices_in_system() find_devices_in_system() mode_set() req_vold_enable() request_loop() main() _start() DESCRIPTION: The core dump occurs because of NULL pointer dereference happening during reconfiguration. The issue occurs since ldi_get_devid fails for slice devices and works only for full devices. Even though ldi_get_devid fails vxconfigd core dump should not core dump. RESOLUTION: Code changes have been done to prevent vxconfigd core dump when the failure happens. * 3893134 (Tracking ID: 3864318) SYMPTOM: Memory consuming keeps increasing when reading/writing data against VxVM volume with big block size. DESCRIPTION: In case incoming IO size is too big for a disk to handle, VxVM will split it into smaller ones to move forward. VxVM will allocate memory to backup the those split IOs. Due to code defect, the allocated space doesn't got freed when split IOs are completed. RESOLUTION: The code is modified to free VxVM allocated memory after split IOs competed. * 3893950 (Tracking ID: 3841242) SYMPTOM: Threads will be hung and and the stack will contain any of the following function. ddi_pathname_to_dev_t() ddi_find_devinfo() ddi_install_driver() devinfo_tree_lock e_ddi_get_dev_info() Stack may look like below - void genunix:cv_wait void genunix:ndi_devi_enter int genunix:devi_config_one int genunix:ndi_devi_config_one int genunix:resolve_pathname_noalias int genunix:resolve_pathname dev_t genunix:ddi_pathname_to_dev_t void vxdmp:dmp_setbootdev int vxdmp:_init int genunix:modinstall DESCRIPTION: Oracle has deprecated some APIs ddi_pathname_to_dev_t() ddi_find_devinfo() ddi_install_driver() devinfo_tree_lock e_ddi_get_dev_info() which were used by VxVM(Veritas Volume Manager) and which were not thread safe. If VxVM modules are loaded in parallel to the other OS modules while making use of these APIs, it may result in a deadlock and a hang could be observed. RESOLUTION: Deprecated ddi_x() API calls have been replaced with ldi_x() calls which are thread safe. * 3894783 (Tracking ID: 3628743) SYMPTOM: On Solaris 11.2, New boot environment takes long time to start up during live upgrade. Here deadlock is seen in ndi_devi_enter( ), when loading VxDMP driver and Deadlocks caused by VXVM drivers due to use of Solaris ddi_pathname_to_dev_t or ddi_hold_devi_by_path private interfaces. DESCRIPTION: Here deadlocks caused by VXVM drivers due to use of Solaris ddi_pathname_to_dev_t or e_ddi_hold_devi_by_path private interface and ddi_pathname_to_dev_t/e_ddi_hold_devi_by_path are Solaris internal use only routine and is not multi-thread safe. Normally this is not a problem as the various VXVM drivers don't unload or detach, however there are certain conditions where our _init routines might be called which can expose this deadlock condition. RESOLUTION: Code is modified to resolve deadlock. * 3897764 (Tracking ID: 3741003) SYMPTOM: In CVM (Cluster Volume Manager) environment, after removing storage from one of multiple plex in a mirrored DCO volume, the DCO volume is detached and DCO object is having BADLOG flag marked. DESCRIPTION: When one plexs storage of a mirrored volume is removed, only that plex should be detached instead of entire volume. While IO reading is undergoing on a failed DCO plex, the local failed IO gets restarted and shiped to other nodes for retry, which also gets failed, since the storage is removed from other nodes as well. Because a flag reset is missing, the failed IO returns error result in entire volume is detached and marked as BADLOG flag even though the IO is successful from an alternate plex. RESOLUTION: Code changes are added to handle this case for the Resiliency of VxVM in Partial-Storage outage scenario. * 3898129 (Tracking ID: 3790136) SYMPTOM: File system hang can be observed sometimes due to IO's hung in DRL. DESCRIPTION: There might be some IO's hung in DRL of mirrored volume due to incorrect calculation of outstanding IO's on volume and number of active IO's which are currently in progress on DRL. The value of the outstanding IO on volume can get modified incorrectly leading to IO's on DRL not to progress further which in turns results in a hang kind of scenario. RESOLUTION: Code changes have been done to avoid incorrect modification of value of outstanding IO's on volume and prevent the hang. * 3898169 (Tracking ID: 3740730) SYMPTOM: While creating volume using vxassist CLI, dco-log length specified as command line parameter was not getting honored. E.g. -> bash # vxassist -g make logtype=dco dcolen= VxVM vxassist ERROR V-5-1-16707 Specified dcologlen() is less than minimum dcologlen(17152) DESCRIPTION: While creating volume, using dcologlength attribute of dco-volume in vxassist CLI, the size of dcolog specified is not correctly parsed in the code, because of which it internally compares the size with incorrectly calculated size & throws the error indicating that size specified isn't sufficient. So the values in comparison was incorrect. Hence changed the code to compare the user-specified value passes the minimum-threshold value or not. RESOLUTION: The code is changed to Fix the issue, which honors the length value of dcolog volume specified by user in vxassist CLI. * 3898296 (Tracking ID: 3767531) SYMPTOM: In Layered volume layout with FSS configuration, when few of the FSS_Hosts are rebooted, Full resync is happening for non-affected disks on master. DESCRIPTION: In configuration, where there are multiple FSS-Hosts, with layered volume created on the hosts. When the slave nodes are rebooted , few of the sub-volumes of non-affected disks are fully getting synced on master. RESOLUTION: Code-changes have been made to sync only needed part of sub- volume. * 3902626 (Tracking ID: 3795739) SYMPTOM: In a split brain scenario, cluster formation takes very long time. DESCRIPTION: In a split brain scenario, the surviving nodes in the cluster try to preempt the keys of nodes leaving the cluster. If the keys have been already preempted by one of the surviving nodes, other surviving nodes will receive UNIT Attention. DMP (Dynamic Multipathing) then retries the preempt command after a delayof 1 second if it receives Unit attention. Cluster formation cannot complete untill PGR keys of all the leaving nodes are removed from all the disks. If the number of disks are very large, the preemption of keys takes a lot of time, leading to the very long time for cluster formation. RESOLUTION: The code is modified to avoid adding delay for first couple of retries when reading PGR keys. This allows faster cluster formation with arrays that clear the Unit Attention condition sooner. * 3903647 (Tracking ID: 3868934) SYMPTOM: System panic in the stack like below, while deactivate the VVR(VERITAS Volume Replicator) batch write SIO: panic_trap+000000 vol_cmn_err+000194 vol_rv_inactive+000090 vol_rv_batch_write_start+001378 voliod_iohandle+000050 voliod_loop+0002D0 vol_kernel_thread_init+000024 DESCRIPTION: When VVR do batch write SIO, if it fails to reserve VVR IO memory, the SIO will be put on queue for restart and then will be deactivated. If the deactivation blocked for some time since cannot get the lock, and during this period, the SIO is restarted due to the IO memory reservation request satisfied, the SIO would be corrupted due to be deactivated twice. Hence it causes the system panic. RESOLUTION: Code changes have been made to remove the unnecessary SIO deactivation after VVR IO memory reservation failed. * 3904008 (Tracking ID: 3856146) SYMPTOM: Two issues are hit on latest SRUs Solaris 11.2.8 and greater and Solaris sparc 11.3 when dmp_native support is on. These issues are mentioned below: 1. Turning of dmp_native_support on and off requires reboot. System gets panic during the reboot as a part of setting dmp_native_support off. 2. Sometimes, system comes up after reboot when dmp_native_support is set to off. In such case, panic is observed when system is rebooted after uninstallation of SF and it fails to boot up. The panic string is same for both the issues. panic[cpu0]/thread=20012000: read_binding_file: /etc/name_to_major file not found DESCRIPTION: The issue happened because of /etc/system and /etc/name_to_major files. As per the discussion with Oracle through SR(3-11640878941) Removal of aforementioned 2 files from the boot-archive is causing this panic. Because files -> /etc/name_to_major & /etc/system are included in the SPARC boot_archive of Solaris 11.2.8.4.0 ( and greater versions) and they should not be removed. The system will fail to come up if they are removed." RESOLUTION: The code has been modified to avoid panic while setting dmp_native_support to off. * 3904017 (Tracking ID: 3853151) SYMPTOM: In the Root Disk Encapsulation (RDE) environment, vxrootadm join/split operations will cause DMP I/O errors in syslog as follows: NOTICE: VxVM vxdmp V-5-0-0 [Error] i/o error occurred (errno=0x5) on dmpnode DESCRIPTION: When the splited root disk group joins back and you use the vxrootadm join dgname command, a DMP I/O error message is recorded in the syslog. This occurs because the disk has ioctls when partitions are deleted. RESOLUTION: The code is modified to not to record the DMP I/O error message in the syslog. * 3904790 (Tracking ID: 3795788) SYMPTOM: Performance degradation is seen when many application sessions open the same data file on Veritas Volume Manager (VxVM) volume. DESCRIPTION: This issue occurs because of the lock contention. When many application sessions open the same data file on the VxVM volume, the exclusive lock is occupied on all CPUs. If there are a lot of CPUs in the system, this process could be quite time- consuming, which leads to performance degradation at the initial start of applications. RESOLUTION: The code is modified to change the exclusive lock to the shared lock when the data file on the volume is open. * 3904796 (Tracking ID: 3853049) SYMPTOM: On a server with more number CPUs, the stats output of vxstat is delayed beyond the set interval. Also multiple sessions of vxstat is impacting the IO performance. DESCRIPTION: The vxstat acquires an exclusive lock on each CPU in order to gather the stats. This would affect the consolidation and display of stats in an environment with huge number of CPUs and disks. The output of stats for interval of 1 second can get delayed beyond the set interval. Also the acquisition of lock happens in the IO path which would affect the IO performance due contention of these locks. RESOLUTION: The code modified to remove the exclusive spin lock. * 3904797 (Tracking ID: 3857120) SYMPTOM: If the volume is shared in a CVM configuration, the following stack traces will be seen under a vxiod daemon suggesting an attempt to drain I/O. In this case, CVM halt will be blocked and eventually time out. The stack trace may appear as: sleep+0x3f0 vxvm_delay+0xe0 volcvm_iodrain_dg+0x150 volcvmdg_abort_complete+0x200 volcvm_abort_sio_start+0x140 voliod_iohandle+0x80 or cv_wait+0x3c() delay_common+0x6c vol_mv_close_check+0x68 vol_close_device+0x1e4 vxioclose+0x24 spec_close+0x14c fop_close+0x8c closef2+0x11c closeall+0x3c proc_exit+0x46c exit+8 post_syscall+0x42c syscall_trap+0x188 Since the vxconfigd would be busy in transaction of trying to close the volume or drain the IO then all the other threads which send a request to vxconfigd will hang. DESCRIPTION: VxVM maintains an I/O count of the in-progress I/O on the volume. When two threads from VxVM asynchronously manipulate the I/O count on the volume, the race between these threads might lead to stale I/O count remaining on the volume even though the volume has actually completed all I/Os . Since there is an invalid pending I/O count on the volume due to the race condition, the volume cannot be closed. RESOLUTION: This issue has been fixed in the VxVM code manipulating the I/O count to avoid the race condition between the two threads. * 3904800 (Tracking ID: 3860503) SYMPTOM: Poor performance of vxassist mirroring is observed compared to using raw dd utility to do mirroring . DESCRIPTION: There is huge lock contention on high end server with large number of cpus, because doing copy on each region needs to obtain some unnecessary cpu locks. RESOLUTION: VxVM code has been changed to decrease the lock contention. * 3904801 (Tracking ID: 3686698) SYMPTOM: vxconfigd was getting hung due to deadlock between two threads DESCRIPTION: Two threads were waiting for same lock causing deadlock between them. This will lead to block all vx commands. untimeout function will not return until pending callback is cancelled (which is set through timeout function) OR pending callback has completed its execution (if it has already started). Therefore locks acquired by callback routine should not be held across call to untimeout routine or deadlock may result. Thread 1: untimeout_generic() untimeout() voldio() volsioctl_real() fop_ioctl() ioctl() syscall_trap32() Thread 2: mutex_vector_enter() voldsio_timeout() callout_list_expire() callout_expire() callout_execute() taskq_thread() thread_start() RESOLUTION: Code changes have been made to call untimeout outside the lock taken by callback handler. * 3904802 (Tracking ID: 3721565) SYMPTOM: vxconfigd hang is seen with below stack. genunix:cv_wait_sig_swap_core genunix:cv_wait_sig_swap genunix:pause unix:syscall_trap32 DESCRIPTION: In FMR environment, write is done on a source volume having space-optimized(SO) snapshot. Memory is acquired first and then ILOCKs are acquired on individual SO volumes for pushed writes. On the other hand, a user write on SO snapshot will first acquire ILOCK and then acquire memory. This causes deadlock. RESOLUTION: Code is modified to resolve deadlock. * 3904804 (Tracking ID: 3486861) SYMPTOM: Primary node panics with below stack when storage is removed while replication is going on with heavy IOs. Stack: oops_end no_context page_fault vol_rv_async_done vol_rv_flush_loghdr_done voliod_iohandle voliod_loop DESCRIPTION: In VVR environment, when write to data volume failson primary node, error handling is initiated. As a part of it, SRL header will be flushed. As primary storage is removed, flushing will fail. Panic will be hit as invalid values will be accessed while logging error message. RESOLUTION: Code is modified to resolve the issue. * 3904805 (Tracking ID: 3788644) SYMPTOM: When DMP (Dynamic Multi-Pathing) native support enabled for Oracle ASM environment, if we constantly adding and removing DMP devices, it will cause error like: /etc/vx/bin/vxdmpraw enable oracle dba 775 emc0_3f84 VxVM vxdmpraw INFO V-5-2-6157 Device enabled : emc0_3f84 Error setting raw device (Invalid argument) DESCRIPTION: There is a limitation (8192) of maximum raw device number N (exclusive) of /dev/raw/rawN. This limitation is defined in boot configuration file. When binding a raw device to a dmpnode, it uses /dev/raw/rawN to bind the dmpnode. The rawN is calculated by one-way incremental process. So even if we unbind the device later on, the "released" rawN number will not be reused in the next binding. When the rawN number is increased to exceed the maximum limitation, the error will be reported. RESOLUTION: Code has been changed to always use the smallest available rawN number instead of calculating by one-way incremental process. * 3904806 (Tracking ID: 3807879) SYMPTOM: Writing the backup EFI GPT disk label during the disk-group flush operation may cause data corruption on volumes in the disk group. The backup label could incorrectly get flushed to the disk public region and overwrite the user data with the backup disk label. DESCRIPTION: For EFI disks initialized under VxVM (Veritas Volume Manager), it is observed that during a disk-group flush operation, vxconfigd (veritas configuration daemon) could stop writing the EFI GPT backup label to the volume public region, thereby causing user data corruption. When this issue happens, the real user data are replaced with the backup EFI disk label RESOLUTION: The code is modified to prevent the writing of the EFI GPT backup label during the VxVM disk-group flush operation. * 3904807 (Tracking ID: 3867145) SYMPTOM: When VVR SRL occupation > 90%, then output the SRL occupation is shown by 10 percent. DESCRIPTION: This is kind of enhancement, to show the SRL Occupation when it's more than 90% is previously shown with 10 percentage gap. Here the enhancement is to show the logs with 1 percentage granularity. RESOLUTION: Changes are done to show the syslog messages wih 1 percent granularity, when SRL is filled > 90%. * 3904810 (Tracking ID: 3871750) SYMPTOM: In parallel VxVM(Veritas Volume Manager) vxstat commands report abnormal disk IO statistic data. Like below: # /usr/sbin/vxstat -g -u k -dv -i 1 -S ...... dm emc0_2480 4294967210 4294962421 -382676k 4294967.38 4294972.17 ...... DESCRIPTION: After VxVM IO statistics was optimized for huge CPUs and disks, there's a race condition when multiple vxstat commands are running to collect disk IO statistic data. It causes disk's latest IO statistic value become smaller than previous one, hence VxVM treates the value overflow so that abnormal large IO statistic value is printed. RESOLUTION: Code changes are done to eliminate such race condition. * 3904811 (Tracking ID: 3875563) SYMPTOM: While dumping the disk header information, human readable timestamp was not converted correctly from corresponding epoch time. DESCRIPTION: When disk group import fails if one of the disk is missing while importing the disk group, it will dump the disk header information the syslog. But, human readable time stamp was not getting converted correctly from corresponding epoch time. RESOLUTION: Code changes done to dump disk header information correctly. * 3904819 (Tracking ID: 3811946) SYMPTOM: When invoking "vxsnap make" command with cachesize option to create space optimized snapshot, the command succeeds but the following error message is displayed in syslog: kernel: VxVM vxio V-5-0-603 I/O failed. Subcache object <subcache-name> does not have a valid sdid allocated by cache object <cache-name>. kernel: VxVM vxio V-5-0-1276 error on Plex <plex-name> while writing volume <volume-name> offset 0 length 2048 DESCRIPTION: When space optimized snapshot is created using "vxsnap make" command along with cachesize option, cache and subcache objects are created by the same command. During the creation of snapshot, I/Os from the volumes may be pushed onto a subcache even though the subcache ID has not yet been allocated. As a result, the I/O fails. RESOLUTION: The code is modified to make sure that I/Os on the subcache are pushed only after the subcache ID has been allocated. * 3904822 (Tracking ID: 3755209) SYMPTOM: The Dynamic Multi-pathing(DMP) device configured in Solaris LDOM guest is disabled when an active controller of an ALUA array is failed. DESCRIPTION: DMP in guest environment monitors cached target port ID of virtual paths in LDOM. If a controller of an ALUA array fails for some reason, active/primary target port ID of an ALUA array will be changed in I/O domain resulting in stale entry in the guest. DMP in the guest wrongly interprets this target port change to mark the path as unavailable. This causes I/O on the path to be failed. As a result the DMP device is disabled in LDOM. RESOLUTION: The code is modified to not use the cached target port IDs for LDOM virtual disks. * 3904824 (Tracking ID: 3795622) SYMPTOM: With Dynamic Multi-Pathing (DMP) Native Support enabled, LVM global_filter is not updated properly in lvm.conf file to reject the newly added paths. DESCRIPTION: With DMP Native Support enabled, when new paths are added to existing LUNs, LVM global_filter is not updated properly in lvm.conf file to reject the newly added paths. This can lead to duplicate PV (physical volumes) found error reported by LVM commands. RESOLUTION: The code is modified to properly update global_filter field in lvm.conf file when new paths are added to existing disks. * 3904825 (Tracking ID: 3859009) SYMPTOM: pvs command will show the duplicate PV messages since global_filter of lvm.conf is not updated after fiber switch or storage controller get rebooted. DESCRIPTION: When fiber switch or storage controller reboot, some paths dev No. may get reused during DDL reconfig cycle, in this case VxDMP(Veritas Dynamic Multi-Pathing) wont treat them as newly added devices. For those devices belong to LVM dmpnode, VxDMP will not trigger lvm.conf update for them. As a result, the global_filter of lvm.conf will not be updated. Hence the issue. RESOLUTION: The code has been changed to update lvm.conf correctly. * 3904833 (Tracking ID: 3729078) SYMPTOM: In VVR environment, the panic may occur after SF(Storage Foundation) patch installation or uninstallation on the secondary site. DESCRIPTION: VXIO Kernel reset invoked by SF patch installation removes all Disk Group objects that have no preserved flag set, because the preserve flag is overlapped with RVG(Replicated Volume Group) logging flag, the RVG object won't be removed, but its rlink object is removed, result of system panic when starting VVR. RESOLUTION: Code changes have been made to fix this issue. * 3904834 (Tracking ID: 3819670) SYMPTOM: When smartmove with 'vxevac' command is run in background by hitting 'ctlr-z' key and 'bg' command, the execution of 'vxevac' is terminated abruptly. DESCRIPTION: As part of "vxevac" command for data movement, VxVM submits the data as a task in the kernel, and use select() primitive on the task file descriptor to wait for task finishing events to arrive. When "ctlr-z" and bg is used to run vxevac in background, the select() returns -1 with errno EINTR. VxVM wrongly interprets it as user termination action and hence vxevac is terminated. Instead of terminating vxevac, the select() should be retried untill task completes. RESOLUTION: The code is modified so that when select() returns with errno EINTR, it checks whether vxevac task is finished. If not, the select() is retried. * 3904840 (Tracking ID: 3769927) SYMPTOM: Turning off dmp_native_support tunable fails with the following errors: VxVM vxdmpadm ERROR V-5-1-15690 Operation failed for one or more zpools VxVM vxdmpadm ERROR V-5-1-15686 The following zpool(s) could not be migrated as they are not healthy - <zpool_name>. DESCRIPTION: Turning off the dmp_native_support tunable fails even if the zpools are healthy. The vxnative script doesn't allow turning off the dmp_native_support if it detects that the zpool is unhealthy, which means the zpool state is ONLINE and some action is required to be taken on zpool. "upgrade zpool" is considered as one of the actions indicating unhealthy zpool state. This is not correct. RESOLUTION: The code is modified to consider "upgrade zpool" action as expected. Turning off dmp_native_support tunable is supported if the action is "upgrade zpool".. * 3904851 (Tracking ID: 3804214) SYMPTOM: VxDMP (Dynamic Multi-Pathing) path enable operation fails after the disk label is changed from guest LDOM. Open fails with error 5 (EIO) on the path being enabled. Following error messages can be seen in /var/adm/messages: vxdmp: [ID 808364 kern.notice] NOTICE: VxVM vxdmp V-5-3-0 dmp_open_path: Open failed with 5 for path 237/0x30 vxdmp: [ID 382146 kern.notice] NOTICE: VxVM vxdmp V-5-0-112 [Warn] disabled path 237/0x30 belonging to the dmpnode 307/0x38 due to open failure DESCRIPTION: While a disk is exported to the Solaris LDOM, Solaris OS in the control/IO domain holds NORMAL mode open on the existing partitions of the DMP node. If the disk partitions/label is changed from LDOM such that some of the older partitions are removed, Solaris OS in the control/IO domain does not know about this change and continues to hold NORMAL mode open on those deleted partitions. If a disabled DMP path is enabled in this scenario, the NORMAL mode open the path fails and path enable operation errors out. This can be worked around by detaching and reattaching the disk to the LDOM. Due to a problem in DMP code, the stale NORMAL mode open flag was not being reset even when the DMP disk was detached from the LDOM. This was preventing the DMP path to be enabled even after the DMP disk was detached from the LDOM. RESOLUTION: Code was fixed to reset NORMAL mode open when the DMP disk is detached from the LDOM. With this fix, DMP disk will have to reattached to the LDOM only once after the disk labels change. When the disk is reattached, it will get the correct open mode (NORMAL/NDELAY) on the partitions that exist after label change. * 3904858 (Tracking ID: 3899568) SYMPTOM: "vxdmpadm iostat stop" as per design cannot stop the iostat gathering persistently. To avoid Performance & Memory crunch related issues, it is generally recommended to stop the iostat gathering.There is a requirement to provide such ability to stop/start the iostat gathering persistently in those cases. DESCRIPTION: Today DMP iostat daemon is stopped using - "vxdmpadm iostat stop". but this is not persistent setting. After reboot this would be lost and hence customer needs to also have to put this in init scripts at appropriate place for persistent effect. RESOLUTION: Code is modified to provide a tunable "dmp_compute_iostats" which can start/stop the iostat gathering persistently. Notes: Use following command to start/stop the iostat gathering persistently. # vxdmpadm settune dmp_compute_iostats=on/off. * 3904859 (Tracking ID: 3901633) SYMPTOM: Lots of error messages like the following are reported while performing RVG sync. VxVM VVR vxrsync ERROR V-5-52-2027 getdigest response err [192.168.10.101:/dev/vx/dsk/testdg/v1 <- 192.168.10.105:/dev/vx/dsk/testdg/v1] [[ndigests sent=-1 ndigests received=0]] VxVM VVR vxrsync ERROR V-5-52-2027 getdigest response err [192.168.10.101:/dev/vx/dsk/testdg/v1 <- 192.168.10.105:/dev/vx/dsk/testdg/v1] [[ndigests sent=-2 ndigests received=0]] DESCRIPTION: While performing last volume region read and sync, volume end offset calculation is not correct, which may lead to over volume end read and sync, result in an internal variable became negative number and vxrsync reports error. It can happen if volume size is not multiple of 512KB, plus the last 512KB volume region is partly in use by VxFS. RESOLUTION: Code changes have been done to fix the issue. * 3904861 (Tracking ID: 3904538) SYMPTOM: RV(Replicate Volume) IO hang happens during slave node leave or master node switch. DESCRIPTION: RV IO hang happens because of SRL(Serial Replicate Log) header is updated by RV recovery SIO. After slave node leave or master node switch, RV recovery could be initiated. During RV recovery, all new coming IOs should be quiesced by setting NEED RECOVERY flag on RV to avoid racing. Due to a code defect, this flag is removed by transaction commit, result in conflicting between new IOs and RV recovery SIO. RESOLUTION: Code changes have been made to fix this issue. * 3904863 (Tracking ID: 3851632) SYMPTOM: When you use the localized messages, some VxVM commands fail while mirroring the volume through vxdiskadm. The error message is similar to the following: ? [y, n, q,?] (: y) y /usr/lib/vxvm/voladm.d/bin/disk.repl: test: unknown operator 1 DESCRIPTION: The issue occurs when the output of the vxdisk list command appears in the localized format. When the output is not translated into English language, a mismatch of messages is observed and command fails. RESOLUTION: The code is modified to convert the output of the necessary commands in the scripts into English language before comparing it with the expected output. * 3904864 (Tracking ID: 3769303) SYMPTOM: System pancis when CVM group is brought online with below stack: voldco_acm_pagein voldco_write_pervol_maps_instant voldco_map_update voldco_write_pervol_maps volfmr_copymaps_instant vol_mv_get_attmir vol_subvolume_get_attmir vol_plex_get_attmir vol_mv_fmr_precommit vol_mv_precommit vol_commit_iolock_objects vol_ktrans_commit volconfig_ioctl ns_capable volsioctl_real mntput path_put vfs_fstatat from_kgid_munged read_tsc vols_ioctl vols_compat_ioctl compat_sys_ioctl sysenter_dispatch voldco_get_accumulator DESCRIPTION: In case of layered volumes, when 'vxvol' comamnd is triggered through 'vxrecover' command with '-Z vols(implicit) option, only the volumes passed through CLI are started, the respective top level volumes remain unstarted. As a result, associated DCO volumes also remain unstarted. At this point of time, if any of the plex of sub-volume needs to be attached back, vxrecover will trigger it. With DCO version 30, vxplex command tries to perform some map manipulation as a part of plex-attach transaction. If the DCO volume is not started before plex attach, the in-core DCO contents are improperly loaded and this leads to panic. RESOLUTION: The code is modified to handle the starting of appropriate associated volumes of a layered volume group. * 3905471 (Tracking ID: 3868533) SYMPTOM: IO hang happens when starting replication. VXIO deamon hang with stack like following: vx_cfs_getemap at ffffffffa035e159 [vxfs] vx_get_freeexts_ioctl at ffffffffa0361972 [vxfs] vxportalunlockedkioctl at ffffffffa06ed5ab [vxportal] vxportalkioctl at ffffffffa06ed66d [vxportal] vol_ru_start at ffffffffa0b72366 [vxio] voliod_iohandle at ffffffffa09f0d8d [vxio] voliod_loop at ffffffffa09f0fe9 [vxio] DESCRIPTION: While performing DCM replay in case Smart Move feature is enabled, VxIO kernel needs to issue IOCTL to VxFS kernel to get file system free region. VxFS kernel needs to clone map by issuing IO to VxIO kernel to complete this IOCTL. Just at the time RLINK disconnection happened, so RV is serialized to complete the disconnection. As RV is serialized, all IOs including the clone map IO form VxFS is queued to rv_restartq, hence the deadlock. RESOLUTION: Code changes have been made to handle the dead lock situation. * 3906251 (Tracking ID: 3806909) SYMPTOM: During installation of volume manager installation using CPI in key-less mode, following logs were observed. VxVM vxconfigd DEBUG V-5-1-5736 No BASIC license VxVM vxconfigd ERROR V-5-1-1589 enable failed: License has expired or is not available for operation transactions are disabled. DESCRIPTION: While using CPI for STANDALONE DMP installation in key less mode, volume manager Daemon(vxconfigd) cannot be started due to a modification in a DMP NATIVE license string that is used for license verification and this verification was failing. RESOLUTION: Appropriate code changes are incorporated to resolve the DMP keyless License issue to work with STANDALONE DMP. * 3907017 (Tracking ID: 3877571) SYMPTOM: Disk header is updated even if the dg import operation fails DESCRIPTION: When dg import fails because of the disk failure, importing dg forcefully needs checking the disks having latest configuration copy. But, it is very difficult to decide which disk to choose without disk header update logs. RESOLUTION: Improved the logging to track the disk header changes. * 3907593 (Tracking ID: 3660869) SYMPTOM: Enhance the DRL dirty-ahead logging for sequential write workloads. DESCRIPTION: With the current DRL implementation, when sequential hints are passed by the above FS layer, further regions in the DRL are dirtied to ensure that the write on the DRL is saved when the new IO on the region comes. But with the current design, there is a flaw and the number of IO's on the DRL are similar to the number of IO's on the data volume. Because of the flaw, same region is being dirtied again and again as part of the DRL IO. This can lead to performance hit as well. RESOLUTION: In order to improve the performance, the number of IO's on the DRL are reduced by enhancing the implementation of Dirty-ahead logging with DRL. Patch ID: VRTSvxvm-6.2.1.200 * 3795710 (Tracking ID: 3508122) SYMPTOM: When running vxfentsthdw during the preempt key operation, the I/O on the victim node is expected to fail, but sometimes it doesn't fail DESCRIPTION: After node 1 preempts the SCSI-3 reservations for node 2, it is expected that write I/Os from victim node 2 will fail. It is observed that sometimes the storage does not preempt all the keys of the victim node fast enough, but it fails the I/O with reservation conflict. In such case, the victim node could not correctly identify that it has been preempted and can still do re-registration of keys to perform the I/O. RESOLUTION: The code is modified to correctly identify that SCSI-3 keys of a node has been preempted. * 3802857 (Tracking ID: 3726110) SYMPTOM: On systems with high number of CPUs, DMP devices may perform considerably slower than OS device paths. DESCRIPTION: In high CPU configuration, I/O statistics related functionality in DMP takes more CPU time because DMP statistics are collected on per CPU basis. This stat collection happens in DMP I/O code path hence it reduces the I/O performance. Because of this, DMP devices perform slower than OS device paths. RESOLUTION: The code is modified to remove some of the stats collection functionality from DMP I/O code path. Along with this, the following tunable need to be turned off: 1. Turn off idle lun probing. #vxdmpadm settune dmp_probe_idle_lun=off 2. Turn off statistic gathering functionality. #vxdmpadm iostat stop Notes: 1. Please apply this patch if system configuration has large number of CPU and if DMP performs considerably slower than OS device paths. For normal systems this issue is not applicable. * 3803497 (Tracking ID: 3802750) SYMPTOM: Once VxVM (Veritas Volume Manager) volume I/O-shipping functionality is turned on, it is not getting disabled even after the user issues the correct command to disable it. DESCRIPTION: VxVM (Veritas Volume Manager) volume I/O-shipping functionality is turned off by default. The following two commands can be used to turn it on and off: vxdg -g <dgname> set ioship=on vxdg -g <dgname> set ioship=off The command to turn off I/O-shipping is not working as intended because I/O-shipping flags are not reset properly. RESOLUTION: The code is modified to correctly reset I/O-shipping flags when the user issues the CLI command. * 3804299 (Tracking ID: 3804298) SYMPTOM: For CVM (Cluster Volume Manager) environment, the setting/unsetting of the 'lfailed/lmissing' flag is not recorded in the syslog. DESCRIPTION: For CVM environment, when VxVM (Verita's volume Manager) discovers that a diskis not accessible from a node of CVM cluster, it marks theLFAILED (locally failed) flag on the disk. And when VxVM discovers that a disk isnot discovered by DMP (Dynamic Multipathing) on a node of CVM cluster, it marks the LMISSING (locally missing) flag on the disk. Messages of the setting andunsetting of the 'lmissing/lfailed' flag are not recorded in the syslog. RESOLUTION: The code is modified to record the setting and unsetting of the 'lfailed/lmissing' flag in syslog. * 3812192 (Tracking ID: 3764326) SYMPTOM: VxDMP repeatedly reports warning messages in system log: WARNING: VxVM vxdmp V-5-0-2046 : Failed to get devid for device 0x70259720 WARNING: VxVM vxdmp V-5-3-2065 dmp_devno_to_devidstr ldi_get_devid failed for devno 0x13800000a60 DESCRIPTION: Due to VxDMP code issue, the device path name is inconsistent during creation and deletion. It leaves stale device file under /devices. Because some devices don't support Solaris devid operations, the devid related functions fail against such devices. VxDMP doesn't skip such devices when creating or removing minor nodes. RESOLUTION: The code is modified to address the device path name inconsistency and skip devid manipulation for third party devices. * 3835562 (Tracking ID: 3835560) SYMPTOM: Auto-import of the diskgroup fails if some of the disks in diskgroup are missing. DESCRIPTION: The auto-import of diskgroup fails if some of the disks in diskgroup are missing. Veritas Volume Manager (VxVM) doesn't want to auto-import diskgroup if few of the disks in the diskgroup are missing, because it can lead to data corruption, For some reason, auto-import of diskgroup may be required even with missing disks. A switch of auto-import will be helpful in such case. RESOLUTION: The code is modified to add "forceautoimport" tunable, in case the auto-import of diskgroup is required even with missing disks. It can be set by "vxtune forceautoimport on" and its value is off by default. * 3847745 (Tracking ID: 3677359) SYMPTOM: VxDMP causes system panic after a shutdown or reboot with the following stack trace: mutex_enter() volinfo_ioct() volsioctl_real() cdev_ioctl() dmp_signal_vold() dmp_throttle_paths() dmp_process_stats() dmp_daemons_loop() thread_start() or panicsys() vpanic_common() panic+0x1c() mutex_enter() cdev_ioctl() dmp_signal_vold() dmp_check_path_state() dmp_restore_callback() dmp_process_scsireq() dmp_daemons() thread_start() DESCRIPTION: In a special scenario of system shutdown or reboot, the DMP (Dynamic MultiPathing) I/O statistic daemon tries to call the ioctl functions in VxIO module which is being unloaded. As a result, the system panics. RESOLUTION: The code is modified to stop the DMP I/O statistic daemon and DMP restore daemon before system shutdown or reboot. Also, the code is modified to avoid other probes to VxIO devices during shutdown. * 3850890 (Tracking ID: 3603792) SYMPTOM: The first boot after live upgrade to new version of Solaris 11 and VxVM takes long time since post installation stalled for a long time. DESCRIPTION: In Solaris 11, the OS command devlinks which used to add /dev entries stalled for a long time in post installation of VxVM. The OS command devfsadm should be used in the post-install script. RESOLUTION: The code is modified to replace devlinks with devfsadm in the post installation process of VxVM. * 3851117 (Tracking ID: 3662392) SYMPTOM: In the CVM environment, if I/Os are getting executed on slave node, corruption can happen when the vxdisk resize(1M) command is executing on the master node. DESCRIPTION: During the first stage of resize transaction, the master node re-adjusts the disk offsets and public/private partition device numbers. On a slave node, the public/private partition device numbers are not adjusted properly. Because of this, the partition starting offset is are added twice and causes the corruption. The window is small during which public/private partition device numbers are adjusted. If I/O occurs during this window then only corruption is observed. After the resize operation completes its execution, no further corruption will happen. RESOLUTION: The code has been changed to add partition starting offset properly to an I/O on slave node during execution of a resize command. * 3852148 (Tracking ID: 3852146) SYMPTOM: Shared DiskGroup fails to import when "-c" and "-o noreonline" options are specified together with the below error: VxVM vxdg ERROR V-5-1-10978 Disk group : import failed: Disk for disk group not found DESCRIPTION: When "-c" option is specified we update the DISKID and DGID of the disks in the DG. When the information about the disks in the DG is passed to Slave node, slave node does not have the latest information since the online of the disks would not happen because of "-o noreonline" being specified. Now since slave node does not have the latest information, it would not be able to identify proper disks belonging to the DG which leads to DG import failing with "Disk for disk group not found". RESOLUTION: Code changes have been done to handle the working of "-c" and "-o noreonline" together. * 3854788 (Tracking ID: 3783356) SYMPTOM: After DMP module fails to load, dmp_idle_vector is not NULL. DESCRIPTION: After DMP module load failure, DMP resources are not cleared off from the system memory, so some of the resources are in NON-NULL value. When system retries to load, it frees invalid data, leading to system panic with error message BAD FREE, because the data being freed is not valid at that point. RESOLUTION: The code is modified to clear up the DMP resources when module failure happens. * 3859226 (Tracking ID: 3287880) SYMPTOM: In a clustered environment, if a node doesn't have storage connectivity to clone disks, then the vxconfigd on the node may dump core during the clone disk group import. The stack trace is as follows: chosen_rlist_delete() dg_import_complete_clone_tagname_update() req_dg_import() vold_process_request() DESCRIPTION: In a clustered environment, if a node doesn't have storage connectivity to clone disks due to improper cleanup handling in clone database, then the vxconfigd on the node may dump core during the clone disk group import. RESOLUTION: The code has been modified to properly cleanup clone database. * 3862240 (Tracking ID: 3856146) SYMPTOM: Two issues are hit on latest SRUs Solaris 11.2.8 and greater and Solaris sparc 11.3 when dmp_native support is on. These issues are mentioned below: 1. Turning of dmp_native_support on and off requires reboot. System gets panic during the reboot as a part of setting dmp_native_support off. 2. Sometimes, system comes up after reboot when dmp_native_support is set to off. In such case, panic is observed when system is rebooted after uninstallation of SF and it fails to boot up. The panic string is same for both the issues. panic[cpu0]/thread=20012000: read_binding_file: /etc/name_to_major file not found DESCRIPTION: The issue happened because of /etc/system and /etc/name_to_major files. As per the discussion with Oracle through SR(3-11640878941) Removal of aforementioned 2 files from the boot-archive is causing this panic. Because files -> /etc/name_to_major & /etc/system are included in the SPARC boot_archive of Solaris 11.2.8.4.0 ( and greater versions) and they should not be removed. The system will fail to come up if they are removed." RESOLUTION: The code has been modified to avoid panic while setting dmp_native_support to off. * 3862632 (Tracking ID: 3769927) SYMPTOM: Turning off dmp_native_support tunable fails with the following errors: VxVM vxdmpadm ERROR V-5-1-15690 Operation failed for one or more zpools VxVM vxdmpadm ERROR V-5-1-15686 The following zpool(s) could not be migrated as they are not healthy - <zpool_name>. DESCRIPTION: Turning off the dmp_native_support tunable fails even if the zpools are healthy. The vxnative script doesn't allow turning off the dmp_native_support if it detects that the zpool is unhealthy, which means the zpool state is ONLINE and some action is required to be taken on zpool. "upgrade zpool" is considered as one of the actions indicating unhealthy zpool state. This is not correct. RESOLUTION: The code is modified to consider "upgrade zpool" action as expected. Turning off dmp_native_support tunable is supported if the action is "upgrade zpool".. Patch ID: VRTSodm-6.2.1.8300 * 3972666 (Tracking ID: 3968788) SYMPTOM: ODM module failed to load on Solaris 11.4. DESCRIPTION: The ODM module failed to load on Solaris 11.4 release, due to the kernel level changes in 11.4. RESOLUTION: Added ODM support for Solaris 11.4 release. * 3972873 (Tracking ID: 3832329) SYMPTOM: On Solaris 11.2, system panic occurs because the fop_getattr() finds a vnode with NULL v_vfsp. DESCRIPTION: ODM is unmounted in the global zone, while it is mounted in the local zone. When the file /dev/odm/ctl is accessed from global zone, its vnode would have NULL v_vfsp. The stack is as follows: - die - trap - ktl0 - fop_getattr - cstat - cstatat - syscall_trap RESOLUTION: The code is modified to make sure that whenever ODM is mounted in local zone, it has already been mounted in global. * 3972887 (Tracking ID: 3652500) SYMPTOM: ODM's "ktrace" command failing with "bad address" error like below : # /opt/VRTSodm/sbin/ktrace -e 0xffffffff Bad address: ioctl(K0FF, +koff:on) DESCRIPTION: ktrace do ioctl which was failing because wrong arguments were getting passed to it. RESOLUTION: Modified the source such that ioctl does not fail and ktrace commands work. Patch ID: VRTSodm-6.2.1.300 * 3906065 (Tracking ID: 3757609) SYMPTOM: High CPU usage because of contention over ODM_IO_LOCK DESCRIPTION: While performing ODM IO, to update some of the ODM counters we take ODM_IO_LOCK which leads to contention from multiple of iodones trying to update these counters at the same time. This is results in high CPU usage. RESOLUTION: Code modified to remove the lock contention. Patch ID: VRTSvxfs-6.2.1.8300 * 3971860 (Tracking ID: 3926972) SYMPTOM: Once a node reboots or goes out of the cluster, the whole cluster can hang. DESCRIPTION: This is a three way deadlock, in which a glock grant could block the recovery while trying to cache the grant against an inode. But when it tries for ilock, if the lock is held by hlock revoke and waiting to get a glm lock, in our case cbuf lock, then it won't be able to get that because a recovery is in progress. The recovery can't proceed because glock grant thread blocked it. Hence the whole cluster hangs. RESOLUTION: The fix is to avoid taking ilock in GLM context, if it's not available. * 3972642 (Tracking ID: 3972641) SYMPTOM: Handle deprecated Solaris function calls in VxFS code DESCRIPTION: The page_numtopp(_nolock) and hat_getpfnum has deprecated, so can not use it in VxFS code RESOLUTION: Appropriate code changes are done in VxFS * 3972763 (Tracking ID: 3968785) SYMPTOM: VxFS module failed to load on Solaris 11.4. DESCRIPTION: The VxFS module failed to load on Solaris 11.4 release, due to the kernel level changes in 11.4 kernel. RESOLUTION: Added VxFS support for Solaris 11.4 release. * 3972772 (Tracking ID: 3929854) SYMPTOM: Event notification was not supported on CFS mount point so getting following errors in log file. -bash-4.1# /usr/jdk/jdk1.8.0_121/bin/java test1 myWatcher: sun.nio.fs.SolarisWatchService@70dea4e filesystem provider is : sun.nio.fs.SolarisFileSystemProvider@5c647e05 java.nio.file.FileSystemException: /mnt1: Operation not supported at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91) at sun.nio.fs.UnixException.asIOException(UnixException.java:111) at sun.nio.fs.SolarisWatchService$Poller.implRegister(SolarisWatchService.java:311) at sun.nio.fs.AbstractPoller.processRequests(AbstractPoller.java:260) at sun.nio.fs.SolarisWatchService$Poller.processEvent(SolarisWatchService.java:425) at sun.nio.fs.SolarisWatchService$Poller.run(SolarisWatchService.java:397) at java.lang.Thread.run(Thread.java:745) DESCRIPTION: WebLogic watchservice was failing to register with CFS mount point directory. which is resulting into "/mnt1: Operation not supported" on cfs mount point. RESOLUTION: Added new module parameter "vx_cfsevent_notify" to enable event notification support on CFS. By default vx_cfsevent_notify is disable. This will work only in Active-Passive scenario: -The Primary node(Active) which has set this tunable will receive notifications for the respective events happened on cfs mount point directory. -Secondary node (Passive) will not receive any notifications. * 3972860 (Tracking ID: 3898565) SYMPTOM: System panicked with this stack: - panicsys - panic_common - panic - segshared_fault - as_fault - vx_memlock - vx_dio_zero - vx_dio_rdwri - vx_dio_read - vx_read_common_inline - vx_read_common_noinline - vx_read1 - vx_read - fop_read DESCRIPTION: Solaris no longer supports F_SOFTLOCK. The vx_memlock() uses F_SOFTLOCK to fault in the page. RESOLUTION: Change vxfs code to avoid using F_SOFTLOCK. * 3972960 (Tracking ID: 3905099) SYMPTOM: VxFS unmount panicked in deactive_super(), the panic stack looks like following: #9 vx_fsnotify_flush [vxfs] #10 vx_softcnt_flush [vxfs] #11 vx_idrop [vxfs] #12 vx_detach_fset [vxfs] #13 vx_unmount [vxfs] #14 generic_shutdown_super #15 kill_block_super #16 vx_kill_sb #17 amf_kill_sb #18 deactivate_super #19 mntput_no_expire #20 sys_umount #21 system_call_fastpath DESCRIPTION: Suspected there is a a race between unmount, and a user-space notifier install for root inode. RESOLUTION: Added diaged code and defensive check for fsnotify_flush in vx_softcnt_flush. * 3972997 (Tracking ID: 3844820) SYMPTOM: System panic got triggered by the stress test to release removing/adding VCPUs to the guest domain while VxFS I/O was continued. Stack looks like this: - panicsys - vpanic_common - panic - die - trap - ktl0 DESCRIPTION: The adding/removing of vCPUs can cause address change of this Solaris global array: cpu[]. VxFS saved the addresses of cpu[].cpu_stat at initialization. So updating to this stale address triggered the panic. RESOLUTION: Update the addresses of cpu[].cpu_stat before vx_sar_cpu_update(). Patch ID: VRTSvxfs-6.2.1.300 * 3734750 (Tracking ID: 3608239) SYMPTOM: System panics when deinitializing voprwlock in Solaris. DESCRIPTION: On earlier Solaris release, it was mandatory to implement vnode rwlock operation (voprwlock). This can lead to provide lock handling to other modules as well using vop such as VOP_RWLOCK. If in case, some module take this lock and missed a unlock, system can panic when deinitializing this voprwlock. With the latest Solaris release, the implementation of this lock is now optional. Thus, this lock can be removed. This will help to reduce extra locking, in turn increasing the performance. RESOLUTION: The code is modified to remove voprwlock implementation. * 3817229 (Tracking ID: 3762174) SYMPTOM: When fsfreeze is used together with vxdump, the fsfreeze command gets timeout and vxdump command fails. DESCRIPTION: The vxdump command may try to read mount list file to get information of the corresponding mount points. This behavior results in taking a file system active level, in order to synchronize with file system reinit. But in case of fsfreeze, taking the active level will never succeed, since the file system is already freezed, so this causes a deadlock and finally results in the fsfreeze timeout. RESOLUTION: Don't use fsfreeze and vxdump command together. * 3896150 (Tracking ID: 3833816) SYMPTOM: In a CFS cluster, one node returns stale data. DESCRIPTION: In a 2-node CFS cluster, when node 1 opens the file and writes to it, the locks are used with CFS_MASTERLESS flag set. But when node 2 tries to open the file and write to it, the locks on node 1 are normalized as part of HLOCK revoke. But after the Hlock revoke on node 1, when node 2 takes the PG Lock grant to write, there is no PG lock revoke on node 1, so the dirty pages on node 1 are not flushed and invalidated. The problem results in reads returning stale data on node 1. RESOLUTION: The code is modified to cache the PG lock before normalizing it in vx_hlock_putdata, so that after the normalizing, the cache grant is still with node 1.When node 2 requests PG lock, there is a revoke on node 1 which flushes and invalidates the pages. * 3896151 (Tracking ID: 3827491) SYMPTOM: Data relocation is not executed correctly if the IOTEMP policy is set to AVERAGE. DESCRIPTION: Database table is not created correctly which results in an error on the database query. This affects the relocation policy of data and the files are not relocated properly. RESOLUTION: The code is modified fix the database table creation issue. Therelocation policy based calculations are done correctly. * 3896154 (Tracking ID: 1428611) SYMPTOM: 'vxcompress' command can cause many GLM block lock messages to be sent over the network. This can be observed with 'glmstat -m' output under the section "proxy recv", as shown in the example below - bash-3.2# glmstat -m message all rw g pg h buf oth loop master send: GRANT 194 0 0 0 2 0 192 98 REVOKE 192 0 0 0 0 0 192 96 subtotal 386 0 0 0 2 0 384 194 master recv: LOCK 193 0 0 0 2 0 191 98 RELEASE 192 0 0 0 0 0 192 96 subtotal 385 0 0 0 2 0 383 194 master total 771 0 0 0 4 0 767 388 proxy send: LOCK 98 0 0 0 2 0 96 98 RELEASE 96 0 0 0 0 0 96 96 BLOCK_LOCK 2560 0 0 0 0 2560 0 0 BLOCK_RELEASE 2560 0 0 0 0 2560 0 0 subtotal 5314 0 0 0 2 5120 192 194 DESCRIPTION: 'vxcompress' creates placeholder inodes (called IFEMR inodes) to hold the compressed data of files. After the compression is finished, IFEMR inode exchange their bmap with the original file and later given to inactive processing. Inactive processing truncates the IFEMR extents (original extents of the regular file, which is now compressed) by sending cluster-wide buffer invalidation requests. These invalidations need GLM block lock. Regular file data need not be invalidated across the cluster, thus making these GLM block lock requests unnecessary. RESOLUTION: Pertinent code has been modified to skip the invalidation for the IFEMR inodes created during compression. * 3896156 (Tracking ID: 3633683) SYMPTOM: "top" command output shows vxfs thread consuming high CPU while running an application that makes excessive sync() calls. DESCRIPTION: To process sync() system call vxfs scans through inode cache which is a costly operation. If an user application is issuing excessive sync() calls and there are vxfs file systems mounted, this can make vxfs sync processing thread to consume high CPU. RESOLUTION: Combine all the sync() requests issued in last 60 second into a single request. * 3896160 (Tracking ID: 3808033) SYMPTOM: After a service group is set offline via VOM or VCSOracle process is left in an unkillable state. DESCRIPTION: Whenever ODM issues an async request to FDD, FDD is required to do iodone processing on it, regardless of how far the request gets. The forced unmount causes FDD to take one of the early error branch which misses iodone routine for this particular async request. From ODM's perspective, the request is submitted, but iodone will never be called. This has several bad consequences, one of which is a user thread is blocked uninterruptibly forever, if it waits for request. RESOLUTION: The code is modified to add iodone routine in the error handling code. * 3896218 (Tracking ID: 3751049) SYMPTOM: The umountall operation fails on Solaris with error "V-3-20358: cannot open mnttab" DESCRIPTION: On Solaris, normally, fopen() returns an EMFILE error for 32-bit applications if it attempts to associate a stream with a file accessed by a file descriptor with a value greater than 255. When using umountall to umount more than 256 file systems, the command will fork child process and open more than 256 file descriptors at the same time.This will cross the 256 file descriptor maximum limit and cause the operation to fail. RESOLUTION: Use "F" mode in fopen call to avoid the 256 file descriptor limitation. * 3896223 (Tracking ID: 3735697) SYMPTOM: vxrepquota reports error like, # vxrepquota -u /vx/fs1 UX:vxfs vxrepquota: ERROR: V-3-20002: Cannot access /dev/vx/dsk/sfsdg/fs1:ckpt1: No such file or directory UX:vxfs vxrepquota: ERROR: V-3-24996: Unable to get disk layout version DESCRIPTION: vxrepquota checks each mount point entry in mounted file system table. If any checkpoint mount point entry presents before the mount point specified in the vxrepquota command, vxrepquota will report errors, but the command can succeed. RESOLUTION: Skip checkpoint mount point in the mounted file system table. * 3896248 (Tracking ID: 3876223) SYMPTOM: Truncate(fcntl F_FREESP*) on newly created file doesn't update time stamp. DESCRIPTION: In solaris, F_FREESP64(truncate), doesn't do anything and simply returns, if "truncate from" size matches with the size of the file. RESOLUTION: Code is modified to update mtime and ctime of file in above scenario. * 3896249 (Tracking ID: 3861713) SYMPTOM: Contention observed on vx_sched_lk and vx_worklist_lk spinlock when profiled using lockstats. DESCRIPTION: Internal worker threads take a lock to sleep on a CV while waiting for work. This lock is global, If there are large numbers of CPU's and large numbers of worker threads then contention can be seen on the vx_sched_lk and vx_worklist_lk using lockstat as well as an increased %sys CPU RESOLUTION: Make the lock more scalable in large CPU configs * 3896250 (Tracking ID: 3870832) SYMPTOM: System panic due to a race between force umount and the nfs lock manager thread trying to get a vnode with the stack as below: vx_active_common_flush vx_do_vget vx_vget fsop_vget lm_nfs3_fhtovp lm_get_vnode lm_unlock lm_nlm4_dispatch svc_getreq svc_run svc_do_run nfssys DESCRIPTION: When the nfs mounted filesystem is unshared and force unmounted, if there is a file that was locked from the nfs client before that, there could be a panic. In nfs3 the unshare does not clear the existing locks or clear/kill the lock manager threads, so when the force umount wins the race, it would go and free the vx_fsext and vx_vfs structures. Later when the lockmanager threads try to get the vnode of this force unmounted filesystem it panics on the vx_fsext structure that is freed. RESOLUTION: The code is modified to mark the solaris vfs flag with VFS_UNMOUNTED flag during a force umount. This flag is later checked in the vx_vget function when the lock manager thread comes to get vnode, if the flag is set, then it returns an error. * 3896261 (Tracking ID: 3855726) SYMPTOM: Panic happens in vx_prot_unregister_all(). The stack looks like this: - vx_prot_unregister_all - vxportalclose - __fput - fput - filp_close - sys_close - system_call_fastpath DESCRIPTION: The panic is caused by a NULL fileset pointer, which is due to referencing the fileset before it's loaded, plus, there's a race on fileset identity array. RESOLUTION: Skip the fileset if it's not loaded yet. Add the identity array lock to prevent the possible race. * 3896267 (Tracking ID: 3861271) SYMPTOM: Due to the missing inode clear action, a page can also be in a strange state. Also, inode is not fully quiescent which leads to races in the inode code. Sometime this can cause panic from iput_final(). DESCRIPTION: We're missing an inode clear operation when a Linux inode is being de-initialized on SLES11. RESOLUTION: Add the inode clear operation on SLES11. * 3896269 (Tracking ID: 3879310) SYMPTOM: The file system may get corrupted after the file system freeze during vxupgrade. The full fsck gives following errors: UX:vxfs fsck: ERROR: V-3-20451: No valid device inodes found UX:vxfs fsck: ERROR: V-3-20694: cannot initialize aggregate DESCRIPTION: The vxupgrade requires file system to be frozen during it's functional operation. It may happen that the corruption can be detected while freeze is in progress and full fsck flag can be set on the file system. However, this doesn't stop vxupgrade to proceed. At later stage of vxupgrade, after structures related to new disk layout are updated on disk, vxfs frees up and zeroes out some of the old metadata inodes. If an error occurs after this point (because of full fsck being set), the file system completely needs to go back to previous version, at the tile of full fsck. Since the metadata corresponding to previous version is already cleared, the full fsck cannot proceed and gives error. RESOLUTION: Check for full fsck flag after freezing the file system during vxupgrade. Also, disable the file system if an error occurs after writing new metadata on disk. This will force the newly written metadata to be loaded in memory on next mount. * 3896270 (Tracking ID: 3707662) SYMPTOM: Race between reorg processing and fsadm timer thread (alarm expiry) leads to panic in vx_reorg_emap with the following stack:: vx_iunlock vx_reorg_iunlock_rct_reorg vx_reorg_emap vx_extmap_reorg vx_reorg vx_aioctl_full vx_aioctl_common vx_aioctl vx_ioctl fop_ioctl ioctl DESCRIPTION: When the timer expires (fsadm with -t option), vx_do_close() calls vx_reorg_clear() on local mount which performs cleanup on reorg rct inode. Another thread currently active in vx_reorg_emap() will panic due to null pointer dereference. RESOLUTION: When fop_close is called in alarm handler context, we defer the cleaning up untill the kernel thread performing reorg completes its operation. * 3896273 (Tracking ID: 3558087) SYMPTOM: When stat system call is executed on VxFS File System with delayed allocation feature enabled, it may take long time or it may cause high cpu consumption. DESCRIPTION: When delayed allocation (dalloc) feature is turned on, the flushing process takes much time. The process keeps the get page lock held, and needs writers to keep the inode reader writer lock held. Stat system call may keeps waiting for inode reader writer lock. RESOLUTION: Delayed allocation code is redesigned to keep the get page lock unlocked while flushing. * 3896277 (Tracking ID: 3691633) SYMPTOM: Remove RCQ Full messages DESCRIPTION: Too many unnecessary RCQ Full messages were logging in the system log. RESOLUTION: The RCQ Full messages removed from the code. * 3896281 (Tracking ID: 3830300) SYMPTOM: Heavy cpu usage while oracle archive process are running on a clustered fs. DESCRIPTION: The cause of the poor read performance in this case was due to fragmentation, fragmentation mainly happens when there are multiple archivers running on the same node. The allocation pattern of the oracle archiver processes is 1. write header with O_SYNC 2. ftruncate-up the file to its final size ( a few GBs typically) 3. do lio_listio with 1MB iocbs The problem occurs because all the allocations in this manner go through internal allocations i.e. allocations below file size instead of allocations past the file size. Internal allocations are done at max 8 Pages at once. So if there are multiple processes doing this, they all get these 8 Pages alternately and the fs becomes very fragmented. RESOLUTION: Added a tunable, which will allocate zfod extents when ftruncate tries to increase the size of the file, instead of creating a hole. This will eliminate the allocations internal to file size thus the fragmentation. Fixed the earlier implementation of the same fix, which ran into locking issues. Also fixed the performance issue while writing from secondary node. * 3896285 (Tracking ID: 3757609) SYMPTOM: High CPU usage because of contention over ODM_IO_LOCK DESCRIPTION: While performing ODM IO, to update some of the ODM counters we take ODM_IO_LOCK which leads to contention from multiple of iodones trying to update these counters at the same time. This is results in high CPU usage. RESOLUTION: Code modified to remove the lock contention. * 3896303 (Tracking ID: 3762125) SYMPTOM: Directory size sometimes keeps increasing even though the number of files inside it doesn't increase. DESCRIPTION: This only happens to CFS. A variable in the directory inode structure marks the start of directory free space. But when the directory ownership changes, the variable may become stale, which could cause this issue. RESOLUTION: The code is modified to reset this free space marking variable when there's ownershipchange. Now the space search goes from beginning of the directory inode. * 3896304 (Tracking ID: 3846521) SYMPTOM: cp -p is failing with EINVAL for files with 10 digit modification time. EINVAL error is returned if the value in tv_nsec field is greater than/outside the range of 0 to 999, 999, 999. VxFS supports the update in usec but when copying in the user space, we convert the usec to nsec. So here in this case, usec has crossed the upper boundary limit i.e 999, 999. DESCRIPTION: In a cluster, its possible that time across nodes might differ.so when updating mtime, vxfs check if it's cluster inode and if nodes mtime is newer time than current node time, then accordingly increment the tv_usec instead of changing mtime to older time value. There might be chance that it, tv_usec counter got overflowed here, which resulted in 10 digit mtime.tv_nsec. RESOLUTION: Code is modified to reset usec counter for mtime/atime/ctime when upper boundary limit i.e. 999999 is reached. * 3896306 (Tracking ID: 3790721) SYMPTOM: High CPU usage on the vxfs thread process. The backtrace of such kind of threads usually look like this: schedule schedule_timeout __down down vx_send_bcastgetemapmsg_remaus vx_send_bcastgetemapmsg vx_recv_getemapmsg vx_recvdele vx_msg_recvreq vx_msg_process_thread vx_kthread_init kernel_thread DESCRIPTION: The locking mechanism in vx_send_bcastgetemapmsg_process() is inefficient. So that every time vx_send_bcastgetemapmsg_process() is called, it will perform a series of down-up operation on a certain semaphore. This can result in a huge CPU cost when multiple threads have contention on this semaphore. RESOLUTION: Optimize the locking mechanism in vx_send_bcastgetemapmsg_process(), so that it only do down-up operation on the semaphore once. * 3896308 (Tracking ID: 3695367) SYMPTOM: Unable to remove volume from multi-volume VxFS using "fsvoladm" command. It fails with "Invalid argument" error. DESCRIPTION: Volumes are not being added in the in-core volume list structure correctly. Therefore while removing volume from multi-volume VxFS using "fsvoladm", command fails. RESOLUTION: The code is modified to add volumes in the in-core volume list structure correctly. * 3896310 (Tracking ID: 3859032) SYMPTOM: System panics in vx_tflush_map() due to NULL pointer dereference. DESCRIPTION: When converting VxFS using vxconvert, new blocks are allocated to the structural files like smap etc which can contain garbage. This is done with the expectation that fsck will rebuild the correct smap. but in fsck, we have missed to distinguish between EAU fully EXPANDED and ALLOCATED. because of which, if allocation to the file which has the last allocation from such affected EAU is done, it will create the sub transaction on EAU which are in allocated state. Map buffers of such EAUs are not initialized properly in VxFS private buffer cache, as a result, these buffers will be released back as stale during the transaction commit. Later, if any file-system wide sync tries to flush the metadata, it can refer to these buffer pointers and panic as these buffers are already released and reused. RESOLUTION: Code is modified in fsck to correctly set the state of EAU on disk. Also, modified the involved code paths as to avoid using doing transactions on unexpanded EAUs. * 3896311 (Tracking ID: 3779916) SYMPTOM: vxfsconvert fails to upgrade layout verison for a vxfs file system with large number of inodes. Error message will show some inode discrepancy. DESCRIPTION: vxfsconvert walks through the ilist and converts inode. It stores chunks of inodes in a buffer and process them as a batch. The inode number parameter for this inode buffer is of type unsigned integer. The offset of a particular inode in the ilist is calculated by multiplying the inode number with size of inode structure. For large inode numbers this product of inode_number * inode_size can overflow the unsigned integer limit, thus giving wrong offset within the ilist file. vxfsconvert therefore reads wrong inode and eventually fails. RESOLUTION: The inode number parameter is defined as unsigned long to avoid overflow. * 3896312 (Tracking ID: 3811849) SYMPTOM: On cluster file system (CFS), due to a size mismatch in the cluster-wide buffers containing hash bucket for large directory hashing (LDH), the system panics with the following stack trace: vx_populate_bpdata() vx_getblk_clust() vx_getblk() vx_exh_getblk() vx_exh_get_bucket() vx_exh_lookup() vx_dexh_lookup() vx_dirscan() vx_dirlook() vx_pd_lookup() vx_lookup_pd() vx_lookup() On some platforms, instead of panic, LDH corruption is reported. Full fsck reports some meta-data inconsistencies as displayed in the following sample messages: fileset 999 primary-ilist inode 263 has invalid alternate directory index (fileset 999 attribute-ilist inode 8193), clear index? (ynq)y DESCRIPTION: On a highly fragmented file system with a file system block size of 1K, 2K or 4K, the bucket(s) of an LDH inode, which has a fixed size of 8K, can spread across multiple small extents. Currently in-core allocation for bucket of LDH inode happens in parallel to on-disk allocation, which results in small in-core buffer allocations. Combination of these small in-core allocations will be merged for final in memory representation of LDH inodes bucket. On two Cluster File System (CFS) nodes, this may result in same LDH metadata/bucket represented as in-core buffers of different sizes. This may result in system panic as LDH inodes bucket are passed around the cluster, or this may result in on-disk corruption of LDH inode's buckets, if these buffers are flushed to disk. RESOLUTION: The code is modified to separate the on-disk allocation and in-core buffer initialization in LDH code paths, so that in-core LDH bucket will always be represented by a single 8K buffer. * 3896313 (Tracking ID: 3817734) SYMPTOM: If file system with full fsck flag set is mounted, direct command message is printed to the user to clean the file system with full fsck. DESCRIPTION: When mounting file system with full fsck flag set, mount will fail and a message will be printed to clean the file system with full fsck. This message contains direct command to run, which if run without collecting file system metasave will result in evidences being lost. Also since fsck will remove the file system inconsistencies it may lead to undesired data being lost. RESOLUTION: More generic message is given in error message instead of direct command. * 3896314 (Tracking ID: 3856363) SYMPTOM: vxfs reports mapbad errors in the syslog as below: vxfs: msgcnt 15 mesg 003: V-2-3: vx_mapbad - vx_extfind - /dev/vx/dsk/vgems01/lvems01 file system free extent bitmap in au 0 marked bad. And, full fsck reports following metadata inconsistencies: fileset 999 primary-ilist inode 6 has invalid number of blocks (18446744073709551583) fileset 999 primary-ilist inode 6 failed validation clear? (ynq)n pass2 - checking directory linkage fileset 999 directory 8192 block devid/blknum 0/393216 offset 68 references free inode ino 6 remove entry? (ynq)n fileset 999 primary-ilist inode 8192 contains invalid directory blocks clear? (ynq)n pass3 - checking reference counts fileset 999 primary-ilist inode 5 unreferenced file, reconnect? (ynq)n fileset 999 primary-ilist inode 5 clear? (ynq)n fileset 999 primary-ilist inode 8194 unreferenced file, reconnect? (ynq)n fileset 999 primary-ilist inode 8194 clear? (ynq)n fileset 999 primary-ilist inode 8195 unreferenced file, reconnect? (ynq)n fileset 999 primary-ilist inode 8195 clear? (ynq)n pass4 - checking resource maps DESCRIPTION: While processing the VX_IEZEROEXT extop, VxFS frees the extent without setting VX_TLOGDELFREE flag. Similarly, there are other cases where the flag VX_TLOGDELFREE is not set in the case of the delayed extent free, this could result in mapbad errors and invalid block counts. RESOLUTION: Since the flag VX_TLOGDELFREE need to be set on every extent free, modified to code to discard this flag and treat every extent free as delayed extent free implicitly. * 3901379 (Tracking ID: 3897793) SYMPTOM: Panic happens because of race where the mntlock ID is cleared while mntlock flag still set. DESCRIPTION: Panic happened because of race where mntlockid is null even after mntlock flag is set. Race is between fsadm thread and proc mount show_option thread. The fsadm thread deintialize mntlock id first and then removes mntlock flag. If other thread race with this fsadm thread, then it is possible to have mntlock flag set and mntlock id as a NULL. The fix is to remove flag first and deintialize mntlock id later. RESOLUTION: The code is modified to remove mntlock flag first. * 3903583 (Tracking ID: 3905607) SYMPTOM: Internal assert failed during migration. DESCRIPTION: In the latest release of Solaris, Solaris vnode structure is changed and a new field is added at the end of Solaris vnode structure. Migration vnode contains Solaris vnode and because of this newly added field in Solaris vnode structure, migration vnode's forw corrupted, resulting in assert failure. RESOLUTION: Code is modified and introduced a new padding field in migration vnode structure, so that it will not corrupt forw. * 3905055 (Tracking ID: 3880113) SYMPTOM: Mapbad scenario in case of deletion of cloned files having shared ZFOD extents DESCRIPTION: In a cloned filesystem , if two overlay inodes have shared ZFOD extents, then there are some issue around it. If write happens on both the inodes then same ZFOD extents may get allocated to both the files. This may further lead to mapbad scenarios while deleting both the files. RESOLUTION: Code has been modified such that shared zfod extents are not being pushed on cloned filesystem. * 3905056 (Tracking ID: 3879761) SYMPTOM: Performance issue observed due to contention on vxfs spin lock vx_worklist_lk. DESCRIPTION: ODM IOs are performed asynchronously, by queuing the ODM work items to the worker threads. It wakes up more number of worker threads than required after enqueuing the ODM work items which leads to contention of vx_worklist_lk spinlock. RESOLUTION: Modified the code such that, it will wake up one worker thread if only one workitem is enqueued. * 3906148 (Tracking ID: 3894712) SYMPTOM: ACL permissions are not inherited correctly on cluster file system. DESCRIPTION: The ACL counts stored on a directory inode gets reset every time directory inodes ownership is switched between the nodes. When ownership on directory inode comes back to the node, which previously abdicated it, ACL permissions were not getting inherited correctly for the newly created files. RESOLUTION: Modified the source such that the ACLs are inherited correctly. * 3907038 (Tracking ID: 3879799) SYMPTOM: Due to inconsistent LCT (Link Count Table), Veritas File System (VxFS) mount prompts for full fsck every time, and displays the following error message: UX:vxfs mount: ERROR: V-3-26881: Cannot be mounted until it has been cleaned by fsck. Please run "fsck -F vxfs -y " before mounting. DESCRIPTION: LCT corruption occurs due to events taking place outside of the VxFS code. The mount command fails. The VxFS fsck utility is unable to handle and correct these kinds of LCT inconsistencies. RESOLUTION: The fsck(1M) command is modified to handle LCT inconsistency and rectify the state of file system. * 3907350 (Tracking ID: 3817734) SYMPTOM: If file system with full fsck flag set is mounted, direct command message is printed to the user to clean the file system with full fsck. DESCRIPTION: When mounting file system with full fsck flag set, mount will fail and a message will be printed to clean the file system with full fsck. This message contains direct command to run, which if run without collecting file system metasave will result in evidences being lost. Also since fsck will remove the file system inconsistencies it may lead to undesired data being lost. RESOLUTION: More generic message is given in error message instead of direct command. Patch ID: VRTSvxfs-6.2.1.100 * 3754492 (Tracking ID: 3761603) SYMPTOM: Full fsck flag will be set incorrectly at the mount time. DESCRIPTION: There might be possibility that extop processing will be deferred during umount (i.e. in case of crash or disk failure) and will be kept on disk, so that mount can process them. During mount, inode can have multiple extop set. Previously if inode has trim and reorg extop set during mount, we were incorrectly setting fullfsck. This patch avoids this situation. RESOLUTION: Code is modified to avoid such unnecessary setting of fullfsck. * 3756002 (Tracking ID: 3764824) SYMPTOM: Internal cluster file system(CFS) testing hit debug assert DESCRIPTION: Internal debug assert is seen when there is a glm recovery while one of the secondary nodes is doing mount, specifically when glm recovery happens between attaching a file system and mounting file system. RESOLUTION: Code is modified to handle glm reconfiguration issue. * 3769992 (Tracking ID: 3729158) SYMPTOM: fuser and other commands hang on vxfs file systems. DESCRIPTION: The hang is seen while 2 threads contest for 2 locks -ILOCK and PLOCK. The writeadvise thread owns the ILOCK but is waiting for the PLOCK. The dalloc thread owns the PLOCK and is waiting for the ILOCK. RESOLUTION: Correct order of locking is PLOCK followed by the ILOCK. * 3817120 (Tracking ID: 3804400) SYMPTOM: VRTS/bin/cp does not return any error when quota hard limit is reached and partial write is encountered. DESCRIPTION: When quota hard limit is reached, VRTS/bin/cp may encounter a partial write, but it may not return any error to up layer application in such situation. RESOLUTION: Adjust VRTS/bin/cp to detect the partial write caused by quota limit, and return a proper error to up layer application. Patch ID: VRTSvxfs-6.2.1.000 * 3657150 (Tracking ID: 3604071) SYMPTOM: With the thin reclaim feature turned on, you can observe high CPU usage on the vxfs thread process. The backtrace of such kind of threads usually look like this: - vx_dalist_getau - vx_recv_bcastgetemapmsg - vx_recvdele - vx_msg_recvreq - vx_msg_process_thread - vx_kthread_init DESCRIPTION: In the routine to get the broadcast information of a node which contains maps of Allocation Units (AUs) for which node holds the delegations, the locking mechanism is inefficient. Thus every time when this routine is called, it will perform a series of down-up operation on a certain semaphore. This can result in a huge CPU cost when many threads calling the routine in parallel. RESOLUTION: The code is modified to optimize the locking mechanism in the routine to get the broadcast information of a node which contains maps of Allocation Units (AUs) for which node holds the delegations, so that it only does down-up operation on the semaphore once. * 3657152 (Tracking ID: 3602322) SYMPTOM: Panic while flushing the dirty pages of the inode with backtrace, do_page_fault error_exit vx_iflush vx_workitem_process vx_worklist_process vx_worklist_thread vx_kthread_init kernel_thread DESCRIPTION: The race between the vx_iflush and vx_ilist_chunkclean on the same inode. The vx_ilist_chunkclean takes the inode and clears the inode pointers while deiniting, which causes NULL pointer dereference in the flusher thread. RESOLUTION: Resolve the race by taking the ilock, along with icache lock, whenever we dereference a pointer in the inode. If the inode pointer is NULL already/deinitialized then goto the next inode and try to flush it. * 3657153 (Tracking ID: 3622323) SYMPTOM: Cluster Filesystem mounted as read-only panics when it gets sharing and/or compression statistics using the fsadm_vxfs(1M) command with the following stack: - vx_irwlock - vx_clust_fset_curused - vx_getcompstats - vx_aioctl_getcompstats - vx_aioctl_common - vx_aioctl - vx_unlocked_ioctl - vx_ioctl - vfs_ioctl - do_vfs_ioctl - sys_ioctl - system_call_fastpath DESCRIPTION: When file system is mounted as read-only, part of the initial setup is skipped, including loading of few internal structures. These structures are referenced while gathering statistics for sharing and/or compression. As a result, panic occurs. RESOLUTION: The code is modified to only allow "fsadm -HS all" to gather sharing and/or compression statistics on read-write file systems. On read-only file systems, this command fails. * 3657156 (Tracking ID: 3604750) SYMPTOM: The kernel loops during the extent re-org with the following stack trace: vx_bmap_enter() vx_reorg_enter_zfod() vx_reorg_emap() vx_extmap_reorg() vx_reorg() vx_aioctl_full() $cold_vx_aioctl_common() vx_aioctl() vx_ioctl() vno_ioctl() ioctl() syscall() DESCRIPTION: The extent re-org minimizes the file system fragmentation. When the re-org request is issued for an inode with a lot of ZFOD extents, it reallocates the extents of the original inode to the re-org inode. During this, the ZFOD extent are preserved and enter the re-org inode in a transaction. If the extent allocated is big, the transaction that enters the ZFOD extents becomes big and returns an error. Even when the transaction is retried the same issue occurs. As a result, the kernel loops during the extent re-org. RESOLUTION: The code is modified to enter the Bmap (block map) of the allocated extent and then perform the ZFOD processing. If you get a committable error during the ZFOD enter, then commit the transaction and continue with the ZFOD enter. * 3657157 (Tracking ID: 3617191) SYMPTOM: Checkpoint creation may take hours. DESCRIPTION: During checkpoint creation, with an inode marked for removal and being overlaid, there may be a downstream clone and VxFS starts pulling all the data. With Oracle it's evident because of temporary files deletion during checkpoint creation. RESOLUTION: The code is modified to selectively pull the data, only if a downstream push inode exists for file. * 3657158 (Tracking ID: 3601943) SYMPTOM: The block map tree of a file is corrupted across the levels, and during truncating, inode for the file may lead to an infinite loop. There are various DESCRIPTION: For files larger than 64G, truncation code first walks through the bmap tree to find the optimal offset from which to begin the truncation. If this truncation falls within corrupted range of the bmap, actual truncation code which relies on binary search to find this offset. As a result, the truncation cannot find the offset, thus it returns empty. The output makes the truncation code to submit dummy transaction, which updates the inode of file with latest ctime, without freeing the extents allocated. RESOLUTION: The truncation code is modified to detect the corruption, mark the inode bad and, mark the file system for full-fsck. The modification makes the truncation possible for full fsck. Next time when it runs, the truncation code is able to throw out the inode and free the extents. * 3657491 (Tracking ID: 3657482) SYMPTOM: Stress test on cluster file system fails due to data corruption DESCRIPTION: In direct I/O write code path, there is an optimization which avoids invalidation of any in-core pages in the range. Instead, in-core pages are updated with new data together with disk write. This optimization comes into picture when cached qio is enabled on the file. When we modify an in-core page, it is not getting marked dirty. If the page was already not dirty, there are chances that in-core changes might be lost if page was reused. This can cause a corruption if the page is read again before the disk update completes. RESOLUTION: In case of cached qio/ODM, disable the page overwrite optimization. * 3665980 (Tracking ID: 2059611) SYMPTOM: The system panics due to a NULL pointer dereference while flushing the bitmaps to the disk and the following stack trace is displayed: vx_unlockmap+0x10c vx_tflush_map+0x51c vx_fsq_flush+0x504 vx_fsflush_fsq+0x190 vx_workitem_process+0x1c vx_worklist_process+0x2b0 vx_worklist_thread+0x78 DESCRIPTION: The vx_unlockmap() function unlocks a map structure of the file system. If the map is being used, the hold count is incremented. The vx_unlockmap() function attempts to check whether this is an empty mlink doubly linked list. The asynchronous vx_mapiodone routine can change the link at random even though the hold count is zero. RESOLUTION: The code is modified to change the evaluation rule inside the vx_unlockmap() function, so that further evaluation can be skipped over when map hold count is zero. * 3665984 (Tracking ID: 2439261) SYMPTOM: When the vx_fiostats_tunable is changed from zero to non-zero, the system panics with the following stack trace: vx_fiostats_do_update vx_fiostats_update vx_read1 vx_rdwr vno_rw rwuio pread DESCRIPTION: When vx_fiostats_tunable is changed from zero to non-zero, all the incore-inode fiostats attributes are set to NULL. When these attributes are accessed, the system panics due to the NULL pointer dereference. RESOLUTION: The code has been modified to check the file I/O stat attributes are present before dereferencing the pointers. * 3665990 (Tracking ID: 3567027) SYMPTOM: During the File System resize operation, the "fullfsck flag is set with the following message:vxfs: msgcnt 183168 mesg 096: V-2-96: vx_setfsflags - /dev/vx/dsk/sfsdg/vol file system fullfsck flag set - vx_fs_upgrade_reorg DESCRIPTION: File system resize requires some temporary inodes to swap the old inode and the converted inode. However, before a structural inode ise processed, the "fullfsck flag is set when a failure occurs during the metadata change. The flag is cleared after the swap is successfully completed. If the temporary inode allocation fails, VxFS leaves the fullfsck flag on the disk. However, all temporary inodes can be cleaned up when not being in use, thus these temporary inodes do not result in corruption. RESOLUTION: The code is modified to clear the fullfsck flag if the structural inode conversion cannot create its temporary inode. * 3666009 (Tracking ID: 3647749) SYMPTOM: An obsolete v_path is created for the VxFS node when the following steps are performed: 1) Create a file(file1). 2) Delete the file (file2). 3) Create a new file(file2, has the same inode number as file1). 4) vnode of file2 has an obsolete v_path. However, it still shows file1. DESCRIPTION: When VxFS reuses an inode, it performs some clear or reset operations to clean the obsolete information. However, the corresponding Solaris vnode may not be improperly handled, which leads to the obsolete v_path. RESOLUTION: The code is modified to call the vn_recycle() function in the VxFS inode clear routine to reset the corresponding Solaris vnode. * 3666010 (Tracking ID: 3233276) SYMPTOM: On a 40 TB file system, the fsclustadm setprimary command consumes more than 2 minutes for execution. And, the unmount operation consumes more time causing a primary migration. DESCRIPTION: The old primary needs to process the delegated allocation units while migrating from primary to secondary. The inefficient implementation of the allocation unit list is consuming more time while removing the element from the list. As the file system size increases, the allocation unit list also increases, which results in additional migration time. RESOLUTION: The code is modified to process the allocation unit list efficiently. With this modification, the primary migration is completed in 1 second on the 40 TB file system. * 3677165 (Tracking ID: 2560032) SYMPTOM: System may panics while upgrading VRTSvxfs in the presence of a zone mounted on VxFS. DESCRIPTION: When the upgrade happens from base version to the target version, The post nstall script unloads the base level fdd module and loads the target level fdd modules when the VxFS module is still at the "base version" level. This leads to an inconsistency in the file device driver (fdd) and VxFS modules. RESOLUTION: The post install script is modified such as to avoid inconsistency. * 3688210 (Tracking ID: 3689104) SYMPTOM: The module version of the vxcafs module is not displayed when the "modinfo vxcafs" command is run. DESCRIPTION: When the "modinfo vxcafs" command is run, the output is not able to get the module version. However, the version is displayed for VxFS, fdd, vxportal, and other VxFS kernel modules. RESOLUTION: The code is modified such the module version for vxcafs is displayed similar to the other VxFS kernel modules. * 3697966 (Tracking ID: 3697964) SYMPTOM: When a file system is upgraded, the upgraded layout clears the superblock flags (fs_flags). DESCRIPTION: When a file system is upgraded, the new superblock structure gets populated with the field values. Most of these values are inherited from the old superblock. As a result, the fs_flags values are overwritten and the flags such as VX_SINGLEDEV are deleted from the superblock. RESOLUTION: The code is modified to restore the old superblock flags while upgrading the disk layout of a file system. * 3699953 (Tracking ID: 3702136) SYMPTOM: LCT corruption is observed while mounting the file system on a secondary node. DESCRIPTION: While mounting the file system on the secondary node, the primary node allocates the extent for PNOLT (per node OLT) entry. The primary node can allocate a new extent or extend the previously allocated extent. In case the primary node opts for the second option, then it allocates the whole extent that was previously allocated. As a result, the primary node erases the existing valid data, which results in a LCT corruption. RESOLUTION: The code is modified such that when an existing PNOLT extent is expanded to make space for PNOLT entry then only the newly allocated part of the extent is zeroed out. * 3715567 (Tracking ID: 3715566) SYMPTOM: VxFS fails to report an error when the maxlink and nomaxlink options are set for disk layout version (DLV) lower than 10. DESCRIPTION: The maxlink and nomaxlink options allow you to enable and disable the maxlink support feature respectively. The maxlink support feature operates only on DLV version 10. Due to an issue, the maxlink and nomaxlink options wrongly appear in DLV versions lower than 10. However, when selected the options do not take effect. RESOLUTION: The code is modified such that VxFS reports an error when you attempt to set the maxlink and nomaxlink options for DLV version lower than 10. * 3718542 (Tracking ID: 3269553) SYMPTOM: VxFS returns inappropriate message for read of hole via ODM. DESCRIPTION: Sometimes sparse files containing temp or backup/restore files are created outside the Oracle database. And, Oracle can read these files only using the ODM. As a result, ODM fails with an ENOTSUP error. RESOLUTION: The code is modified to return zeros instead of an error. * 3721458 (Tracking ID: 3721466) SYMPTOM: After a file system is upgraded from version 6 to 7, the vxupgrade(1M) command fails to set the VX_SINGLEDEV flag on a superblock. DESCRIPTION: The VX_SINGLEDEV flag was introduced in disk layout version 7. The purpose of the flag is to indicate whether a file system resides only on a single device or a volume. When the disk layout is upgraded from version 6 to 7, the flag is not inherited along with the other values since it was not supported in version 6. RESOLUTION: The code is modified to set the VX_SINGLEDEV flag when the disk layout is upgraded from version 6 to 7. * 3725347 (Tracking ID: 3725346) SYMPTOM: Trimming of underlying SSD volume was not supported for AIX and Solar using "fsadm -R -o ssd" command. DESCRIPTION: The fsadm command with the -o ssd option ("fsadm -R -o ssd") is used to initiate the TRIM command on an underlying SSD volume, which was not supported on AIX and Solaris. RESOLUTION: The code is modified on AIX and Solaris to support the TRIM command on an underlying SSD volume. * 3725569 (Tracking ID: 3731678) SYMPTOM: During an internal test, a debug assert was observed while handling the error scenario. DESCRIPTION: The issue occurred when the write stabilization (write data to both, fscache and HDD) occurs. The asserted function finds entry corresponding to the current IO request and sets error bits appropriately. As a result, a mismatch of address is observed between the buffer used for IO and the buffer used by the original user. The bp_baddr file stores the address of the buffer corresponding to the IO and the bp_origbaddr file stores the original buffer address corresponding to user request. RESOLUTION: The code is modified to handle the error scenario. * 3726403 (Tracking ID: 3739618) SYMPTOM: sfcache command with the "-i" option may not show filesystem cache statistic periodically. DESCRIPTION: sfcache command with the "-i" option may not show filesystem cache statistic periodically. RESOLUTION: The code is modified to add a loop to print sfcache statistics at the specified interval. * 3729111 (Tracking ID: 3729104) SYMPTOM: Man pages changes missing for smartiomode option of mount_vxfs (1M) DESCRIPTION: smartiomode option for mount_vxfs is missing in manpage. RESOLUTION: Modified the man page changes to reflect smartiomode option for mount_vxfs. * 3729704 (Tracking ID: 3719523) SYMPTOM: 'vxupgrade' does not clear the superblock replica of old layout versions. DESCRIPTION: While upgrading the file system to a new layout version, a new superblock inode is allocated and an extent is allocated for the replica superblock. After writing the new superblock (primary + replica), VxFS frees the extent of the old superblock replica. Now, if the primary superblock corrupts, the full fsck searches for replica to repair the file system. If it finds the replica of old superblock, it restores the file system to the old layout, instead of creating a new one. This behavior is wrong. In order to take the file system to a new version, we should clear the replica of old superblock as part of vxupgrade, so that full fsck won't detect it later. RESOLUTION: Clear the replica of old superblock as part of vxupgrade. * 3736133 (Tracking ID: 3736772) SYMPTOM: The sfcache(1M)command does not automatically enable write-back caching on file system once the cache size is increased to enable write-back caching. DESCRIPTION: When a file system, mounted with a write-back caching enabled, does not have sufficient caching space to enable write-back then read-cache gets enabled. This behavior is observed even when the size of cache area is grown. And, write-back fails to get automatically activated on the file system with the mount option set to smartiomode=writeback. RESOLUTION: The code is modified such that whenever cache area grows, all the file systems are scanned and write-back enable message is sent to file systems that are mounted with the write-back mode. * 3743913 (Tracking ID: 3743912) SYMPTOM: Users could create sub-directories more than 64K for disk layouts having versions lower than 10. DESCRIPTION: In this release, the maxlink feature enables users to create sub-directories larger than 64K.This feature is supported on disk layouts whose versions are higher than or equal to 10. The macro VX_TUNEMAXLINK denotes the maximum limitation on sub-directories. And, its value was changed from 64K to 4 billion. Due to this, users could create more than 64K sub-directories for layout versions < 10 as well, which is undesirable. This fix is applicable only on platforms other than AIX. RESOLUTION: The code is modified such that now you can set the value of sub-directory limitation to 64K for layouts whose versions are lower than 10. * 3755796 (Tracking ID: 3756750) SYMPTOM: VxFS may leak memory when File Design Driver (FDD) module is unloaded before the cache file system is taken offline. DESCRIPTION: When FDD module is unloaded before the cache file system is taken offline, few FDD related structures in the cache file system remains to be free. As a result, memory leak is observed. RESOLUTION: The code is modified such that FDD related structure is not initialized for cache file systems. Patch ID: VRTSvxfs-6.2.0.100 * 3703631 (Tracking ID: 3615043) SYMPTOM: At times, while writing to a file, data could be missed. DESCRIPTION: While writing to a file when delayed allocation is on, Solaris could dishonour the NON_CLUSTERING flag and cluster pages beyond the range for which we have issued the flushing, leading to data loss. RESOLUTION: Make sure we clear the flag and flush the exact range, in case of dalloc. Patch ID: VRTSamf-6.2.1.1100 * 3973131 (Tracking ID: 3970679) SYMPTOM: Veritas Infoscale Availability does not support Oracle Solaris 11.4. DESCRIPTION: Veritas Infoscale Availability does not support Oracle Solaris versions later than 11.3. RESOLUTION: Veritas Infoscale Availability now supports Oracle Solaris 11.4. Patch ID: VRTSamf-6.2.1.100 * 3864321 (Tracking ID: 3862933) SYMPTOM: On Solaris 11.2 SRU 8 or above with Asynchronous Monitoring framework (AMF) enabled, VCS agent processes may not respond or may encounter AMF errors during registration. DESCRIPTION: For Solaris 11.2 from SRU 8 onward, kernel header is modified. AMF functionality is affected because of the changes in the kernel headers which can cause agent processes like Mount, Oracle, Netlistener and Process enabled with AMF functionality to hang or report error during AMF registration. RESOLUTION: The AMF code is modified to handle the changes in the kernel headers for Solaris 11.2 SRU 8 to address the issue during reaper registration. Patch ID: VRTSllt-6.2.1.1100 * 3973130 (Tracking ID: 3970679) SYMPTOM: Veritas Infoscale Availability does not support Oracle Solaris 11.4. DESCRIPTION: Veritas Infoscale Availability does not support Oracle Solaris versions later than 11.3. RESOLUTION: Veritas Infoscale Availability now supports Oracle Solaris 11.4. INSTALLING THE PATCH -------------------- Run the Installer script to automatically install the patch: ----------------------------------------------------------- Please be noted that the installation of this P-Patch will cause downtime. To install the patch perform the following steps on at least one node in the cluster: 1. Copy the patch sfha-sol11_sparc-Patch-6.2.1.200.tar.gz to /tmp 2. Untar sfha-sol11_sparc-Patch-6.2.1.200.tar.gz to /tmp/hf # mkdir /tmp/hf # cd /tmp/hf # gunzip /tmp/sfha-sol11_sparc-Patch-6.2.1.200.tar.gz # tar xf /tmp/sfha-sol11_sparc-Patch-6.2.1.200.tar 3. Install the hotfix(Please be noted that the installation of this P-Patch will cause downtime.) # pwd /tmp/hf # ./installSFHA621P2 [ ...] You can also install this patch together with 6.2.1 maintenance release using Install Bundles 1. Download this patch and extract it to a directory 2. Change to the Veritas InfoScale 6.2.1 directory and invoke the installmr script with -patch_path option where -patch_path should point to the patch directory # ./installmr -patch_path [] [ ...] Install the patch manually: -------------------------- Manual installation is not recommended. REMOVING THE PATCH ------------------ Manual uninstallation is not recommended. SPECIAL INSTRUCTIONS -------------------- In case of stack + OS upgrade if ODM and vxfsckd is not online after reboot then follow below steps and start the services using CPI : # /lib/svc/method/odm restart # /usr/sbin/devfsadm -i vxportal #./installer -start OTHERS ------ NONE