* * * READ ME * * * * * * InfoScale 7.3.1 * * * * * * Patch 200 * * * Patch Date: 2018-12-18 This document provides the following information: * PATCH NAME * OPERATING SYSTEMS SUPPORTED BY THE PATCH * PACKAGES AFFECTED BY THE PATCH * BASE PRODUCT VERSIONS FOR THE PATCH * SUMMARY OF INCIDENTS FIXED BY THE PATCH * DETAILS OF INCIDENTS FIXED BY THE PATCH * INSTALLATION PRE-REQUISITES * INSTALLING THE PATCH * REMOVING THE PATCH PATCH NAME ---------- InfoScale 7.3.1 Patch 200 OPERATING SYSTEMS SUPPORTED BY THE PATCH ---------------------------------------- RHEL7 x86-64 PACKAGES AFFECTED BY THE PATCH ------------------------------ VRTSamf VRTSaslapm VRTSdbac VRTSgab VRTSglm VRTSgms VRTSllt VRTSodm VRTSveki VRTSvxfen VRTSvxfs VRTSvxvm BASE PRODUCT VERSIONS FOR THE PATCH ----------------------------------- * InfoScale Availability 7.3.1 * InfoScale Enterprise 7.3.1 * InfoScale Foundation 7.3.1 * InfoScale Storage 7.3.1 SUMMARY OF INCIDENTS FIXED BY THE PATCH --------------------------------------- Patch ID: VRTSvxvm-7.3.1.2100 * 3933888 (3868533) IO hang happens because of a deadlock situation. * 3937544 (3929246) When tried with yum, VxVM installation fails in the chroot environment. * 3953920 (3949954) Dumpstack messages are printed when vxio module is loaded for the first time when called blk_register_queue. * 3958860 (3953681) Data corruption issue is seen when more than one plex of volume is detached. * 3959451 (3913949) The DG import is failing with Split Brain after the system is rebooted or when a storage disturbance is seen. * 3959452 (3931678) Memory allocation and locking optimizations during the CVM (Cluster Volume Manager) IO shipping. * 3959453 (3932241) VxVM (Veritas Volume Manager) creates some required files under /tmp and /var/tmp directories. These directories could be modified by non-root users and will affect the Veritas Volume Manager Functioning. * 3959455 (3932496) In an FSS environment, volume creation might fail on the SSD devices if vxconfigd was earlier restarted. * 3959458 (3936535) Poor performance due to frequent cache drops. * 3959460 (3942890) IO hang as DRL flush gets into infinite loop. * 3959461 (3946350) kmalloc-1024 and kmalloc-2048 memory consuming keeps increasing when VVR IO size is more than 256K. * 3959462 (3947265) Delay added in vxvm-startup script to wait for infiniband devices to get discovered leads to various issues. * 3959463 (3954787) Data corruption may occur in GCO along with FSS environment on RHEL 7.5 Operating system. * 3959465 (3956732) systemd-udevd message can be seen in journalctl logs. * 3959469 (3922529) VxVM (Veritas Volume Manager) creates some required files under /tmp and /var/tmp directories. These directories could be modified by non-root users and will affect the Veritas Volume Manager Functioning. * 3959471 (3932356) vxconfigd dumping core while importing DG * 3959473 (3945115) VxVM (Veritas Volume Manager) vxassist relayout command fails for volumes with RAID layout. * 3959475 (3950384) In a scenario where volume encryption at rest is enabled, data corruption may occur if the file system size exceeds 1TB. * 3959476 (3950675) vxdg import appears to hang forever * 3959477 (3953845) IO hang can be experienced when there is memory pressure situation because of "vxencryptd". * 3959478 (3956027) System panicked while removing disks from disk group because of race condition between IO stats and disk removal code. * 3959479 (3956727) In SOLARIS DDL discovery when SCSI ioctl fails, direct disk IO on device can lead to high memory consumption and vxconfigd hangs. * 3959480 (3957227) Disk group import succeeded, but with error message. This may cause confusion. * 3960383 (3958062) When boot lun is migrated, enabling and disabling dmp_native_support fails. * 3961353 (3950199) System may panic while DMP(Dynamic Multipathing) path restoration. * 3961355 (3952529) vxdmpadm settune dmp_native_support command fails with "vg is in use" error if vg is in mounted state. * 3961356 (3953481) A stale entry of the old disk is left under /dev/[r]dsk even after replacing it. * 3961358 (3955101) Panic observed in GCO environment (cluster to cluster replication) during replication. * 3961359 (3955725) Utility to clear "failio" flag on disk after storage connectivity is back. * 3961468 (3926067) vxassist relayout /vxassist commands may fail in Campus Cluster environment. * 3961469 (3948140) System panic can occur if size of RTPG (Report Target Port Groups) data returned by underlying array is greater than 255. * 3961480 (3957549) Server panicked when tracing event because of NULL pointer check missing. * 3964315 (3952042) vxdmp iostat memory allocation might cause memory fragmentation and pagecache drop. * 3966132 (3960576) ODM cfgmgr rule for vxdisk scandisks is added twice in the database. * 3934330 (3941109) While VRTSaslapm package installation for SLES12SP3, APM (array policy modules) are not loading properly. * 3943620 (3938549) Volume creation fails with error: Unexpected kernel error in configuration update for Rhel 7.5. * 3932464 (3926976) Frequent loss of VxVM functionality due to vxconfigd unable to validate license. * 3933874 (3852146) Shared DiskGroup(DG) fails to import when "-c" and "-o noreonline" options are specified together * 3933875 (3872585) System panics with storage key exception. * 3933876 (3894657) VxVM commands may hang when using space optimized snapshot. * 3933877 (3914789) System may panic when reclaiming on secondary in VVR environment. * 3933878 (3918408) Data corruption when volume grow is attempted on thin reclaimable disks whose space is just freed. * 3933880 (3864063) Application I/O hangs because of a race between the Master Pause SIO (Staging I/O) and the Error Handler SIO. * 3933882 (3865721) Vxconfigd may hang while pausing the replication in CVR(cluster Veritas Volume Replicator) environment. * 3933883 (3867236) Application IO hang happens because of a race between Master Pause SIO(Staging IO) and RVWRITE1 SIO. * 3933884 (3868154) When DMP Native Support is set to ON, dmpnode with multiple VGs cannot be listed properly in the 'vxdmpadm native ls' command * 3933889 (3879234) dd read on the Veritas Volume Manager (VxVM) character device fails with Input/Output error while accessing end of device. * 3933890 (3879324) VxVM DR tool fails to handle busy device problem while LUNs are removed from OS * 3933897 (3907618) vxdisk resize leads to data corruption on filesystem * 3933898 (3908987) False vxrelocd messages being generated by joining CVM slave. * 3933900 (3915523) Local disk from other node belonging to private DG(diskgroup) is exported to the node when a private DG is imported on current node. * 3933904 (3921668) vxrecover command with -m option fails when executed on the slave nodes. * 3933907 (3873123) If the disk with CDS EFI label is used as remote disk on the cluster node, restarting the vxconfigd daemon on that particular node causes vxconfigd to go into disabled state * 3933910 (3910228) Registration of GAB(Global Atomic Broadcast) port u fails on slave nodes after multiple new devices are added to the system. * 3933911 (3925377) Not all disks could be discovered by DMP after first startup. * 3937540 (3906534) After enabling DMP (Dynamic Multipathing) Native support, enable /boot to be mounted on DMP device. * 3937541 (3911930) Provide a way to clear the PGR_FLAG_NOTSUPPORTED on the device instead of using exclude/include commands * 3937550 (3935232) Replication and IO hang during master takeover because of racing between log owner change and master switch. * 3937808 (3931936) VxVM(Veritas Volume Manager) command hang on master node after restarting slave node. * 3937811 (3935974) When client process shuts down abruptly or resets connection during communication with the vxrsyncd daemon, it may terminate vxrsyncd daemon. * 3940039 (3897047) Filesystems are not mounted automatically on boot through systemd on RHEL7 and SLES12. * 3940143 (3941037) VxVM (Veritas Volume Manager) creates some required files under /tmp and /var/tmp directories. These directories could be modified by non-root users and will affect the Veritas Volume Manager Functioning. * 3937549 (3934910) DRL map leaks during snapshot creation/removal cycle with dg reimport. * 3937542 (3917636) Filesystems from /etc/fstab file are not mounted automatically on boot through systemd on RHEL7 and SLES12. Patch ID: VRTSveki-7.3.1.100 * 3967357 (3967265) Support for RHEL 7.6 and RHEL 7.x RETPOLINE kernels. Patch ID: VRTSdbac-7.3.1.200 * 3955987 (3967265) Support for RHEL 7.6 and RHEL 7.x RETPOLINE kernels. * 3944197 (3944179) Veritas Infoscale Availability does not support Red Hat Enterprise Linux 7 Update 4(RHEL7.4). Patch ID: VRTSamf-7.3.1.300 * 3955986 (3967265) Support for RHEL 7.6 and RHEL 7.x RETPOLINE kernels. * 3947657 (3948217) AMF mnton/mntoff registration fails and causes cluster node to panic * 3944196 (3944179) Veritas Infoscale Availability does not support Red Hat Enterprise Linux 7 Update 4(RHEL7.4). Patch ID: VRTSvxfen-7.3.1.300 * 3955985 (3967265) Support for RHEL 7.6 and RHEL 7.x RETPOLINE kernels. * 3944195 (3944179) Veritas Infoscale Availability does not support Red Hat Enterprise Linux 7 Update 5(RHEL7.5). Patch ID: VRTSgab-7.3.1.200 * 3955984 (3967265) Support for RHEL 7.6 and RHEL 7.x RETPOLINE kernels. * 3944182 (3944179) Veritas Infoscale Availability does not support Red Hat Enterprise Linux 7 Update 5(RHEL7.5). Patch ID: VRTSllt-7.3.1.400 * 3955983 (3967265) Support for RHEL 7.6 and RHEL 7.x RETPOLINE kernels. * 3933242 (3948201) Kernel panics in case of FSS with LLT over RDMA during heavy data transfer. * 3944181 (3944179) Veritas Infoscale Availability does not support Red Hat Enterprise Linux 7 Update 4(RHEL7.4). Patch ID: VRTSgms-7.3.1.1100 * 3955058 (3951761) GMS support for RHEL6.10 and RHEL/SUSE retpoline kernels Patch ID: VRTSglm-7.3.1.1100 * 3944902 (3944901) File system unmount operation might get hang waiting for client locks when restart and scope leave api collides. * 3952305 (3939996) In CFS environment, GLM performance enhancement for gp_gen_lock bottleneck/contention. * 3952307 (3940838) Print the master nodeid information of the stuck lock in the output of glmdump. * 3955056 (3951759) GLM support for RHEL6.10 and RHEL/SUSE retpoline kernels * 3955901 (3955899) Node fails to join due to dependency of GLM services on services multi-user.target and graphical.target. * 3959327 (3959325) In case of cascade node reboot scenario NULL pointer dereference was hit in GLM code path. Patch ID: VRTSodm-7.3.1.2100 * 3936184 (3897161) Oracle Database on Veritas filesystem with Veritas ODM library has high log file sync wait time. * 3952306 (3940492) ODM is breaking ordering cycle in systemd. * 3955054 (3951754) ODM support for RHEL6.10 and RHEL/SUSE retpoline kernels * 3960064 (3922681) System panic with odm_tsk_exit. * 3966263 (3958865) ODM module failed to load on RHEL7.6. * 3943732 (3938546) ODM module failed to load on RHEL7.5. * 3939411 (3941018) VRTSodm driver will not load with 7.3.1.100 VRTSvxfs patch. Patch ID: VRTSvxfs-7.3.1.2100 * 3929952 (3929854) Enabling event notification support on CFS for Weblogic watchService on SOLARIS platform * 3933816 (3902600) Contention observed on vx_worklist_lk lock in cluster mounted file system with ODM * 3943715 (3944884) ZFOD extents shouldn't be pushed on clones in case of logged writes. * 3947560 (3947421) DLV upgrade operation fails while upgrading Filesystem from DLV 9 to DLV 10. * 3947561 (3947433) While adding a volume (part of vset) in already mounted filesystem, fsvoladm displays error. * 3947651 (3947648) Mistuning of vxfs_ninode and vx_bc_bufhwm to very small value. * 3952304 (3925281) Hexdump the incore inode data and piggyback data when inode revalidation fails. * 3952309 (3941942) Unable to handle kernel NULL pointer dereference while freeing fiostats. * 3952324 (3943232) System panic in vx_unmount_cleanup_notify when unmounting file system. * 3952325 (3943529) System panicked because watchdog timer detected hard lock up on CPU when trying to release dentry. * 3952840 (3952837) VXFS mount fails during startup as its dependency, autofs.service is not up. * 3955051 (3951752) VxFS support for RHEL6.10 and RHEL/SUSE retpoline kernels * 3955886 (3955766) CFS hung when doing extent allocating. * 3958475 (3958461) Man page changes to support updated DLVs * 3960380 (3934175) 4-node FSS CFS experienced IO hung on all nodes. * 3960468 (3957092) System panic with spin_lock_irqsave thru splunkd. * 3960470 (3958688) System panic when VxFS got force unmounted. * 3966262 (3958853) VxFS module failed to load on RHEL7.6. * 3933810 (3830300) Degraded CPU performance during backup of Oracle archive logs on CFS vs local filesystem * 3933819 (3879310) The file system may get corrupted after a failed vxupgrade. * 3933820 (3894712) ACL permissions are not inherited correctly on cluster file system. * 3933824 (3908785) System panic observed because of null page address in writeback structure in case of kswapd process. * 3933828 (3921152) Performance drop caused by vx_dalloc_flush(). * 3933834 (3931761) Cluster wide hang may be observed in case of high workload. * 3933843 (3926972) A recovery event can result in a cluster wide hang. * 3933844 (3922259) Force umount hang in vx_idrop * 3933912 (3922986) Dead lock issue with buffer cache iodone routine in CFS. * 3934841 (3930267) Deadlock between fsq flush threads and writer threads. * 3936286 (3936285) fscdsconv command may fsil the conversion for disk layout version(DLV) 12 and above. * 3937536 (3940516) File resize thread loops infinitely for file resize operation crossing 32 bit boundary. * 3938258 (3938256) When checking file size through seek_hole, it will return incorrect offset/size when delayed allocation is enabled on the file. * 3939406 (3941034) VxFS worker thread may continuously spin on a CPU * 3940266 (3940235) A hang might be observed in case filesystem gets disbaled while enospace handling is being taken care by inactive processing * 3940368 (3940268) File system might get disabled in case the size of the directory surpasses the vx_dexh_sz value. * 3940652 (3940651) The vxupgrade command might fail while upgrading Disk Layout Version (DLV) 10 to any upper DLV version. * 3940830 (3937042) Data corruption seen when issuing writev with mixture of named page and anonymous page buffers. DETAILS OF INCIDENTS FIXED BY THE PATCH --------------------------------------- This patch fixes the following incidents: Patch ID: VRTSvxvm-7.3.1.2100 * 3933888 (Tracking ID: 3868533) SYMPTOM: IO hang happens when starting replication. VXIO deamon hang with stack like following: vx_cfs_getemap at ffffffffa035e159 [vxfs] vx_get_freeexts_ioctl at ffffffffa0361972 [vxfs] vxportalunlockedkioctl at ffffffffa06ed5ab [vxportal] vxportalkioctl at ffffffffa06ed66d [vxportal] vol_ru_start at ffffffffa0b72366 [vxio] voliod_iohandle at ffffffffa09f0d8d [vxio] voliod_loop at ffffffffa09f0fe9 [vxio] DESCRIPTION: While performing DCM replay in case Smart Move feature is enabled, VxIO kernel needs to issue IOCTL to VxFS kernel to get file system free region. VxFS kernel needs to clone map by issuing IO to VxIO kernel to complete this IOCTL. Just at the time RLINK disconnection happened, so RV is serialized to complete the disconnection. As RV is serialized, all IOs including the clone map IO form VxFS is queued to rv_restartq, hence the deadlock. RESOLUTION: Code changes have been made to handle the dead lock situation. * 3937544 (Tracking ID: 3929246) SYMPTOM: While installing VxVM package in the chroot environment, the rpm installation fails with following error: WARNING: Veki driver is not loaded. error: %pre(VRTSvxvm-7.3.0.000-RHEL7.x86_64) scriptlet failed, exit status 1 error: VRTSvxvm-7.3.0.000-RHEL7.x86_64: install failed DESCRIPTION: When VxVM is installed in chroot environment, veki module can not be loaded. However, since other VxVM drivers are dependent on veki, VxVM installation gets aborted due to the failure in loading the veki module. RESOLUTION: The installation scripts are fixed to allow the installation of VxVM even if veki module can not be loaded, only if it's a chroot environment. * 3953920 (Tracking ID: 3949954) SYMPTOM: Dumpstack messages are printed when vxio module is loaded for the first time when called blk_register_queue. DESCRIPTION: In RHEL 7.5 a new check was added in kernel code in blk_register_queue where if QUEUE_FLAG_REGISTERED was already set on the queue a dumpstack warning message was printed. In vxvm the flag was already set as the flag got copied from the device queue which was earlier registered by the OS. RESOLUTION: Changes are done in VxVM code to avoid copying of QUEUE_FLAG_REGISTERED fix the dumpstack warnings. * 3958860 (Tracking ID: 3953681) SYMPTOM: Data corruption issue is seen when more than one plex of volume is detached. DESCRIPTION: When a plex of volume gets detached, DETACH map gets enabled in the DCO (Data Change Object). The incoming IO's are tracked in DRL (Dirty Region Log) and then asynchronously copied to DETACH map for tracking. If one more plex gets detached then it might happen that some of the new incoming regions are missed in the DETACH map of the previously detached plex. This leads to corruption when the disk comes back and plex resync happens using corrupted DETACH map. RESOLUTION: Code changes are done to correctly track the IO's in the DETACH map of previously detached plex and avoid corruption. * 3959451 (Tracking ID: 3913949) SYMPTOM: The DG import is failing with Split Brain after the system is rebooted or when a storage disturbance is seen. The DG import may fail due to split brain with following messages in syslog: V-5-1-9576 Split Brain. da id is 0.1, while dm id is 0.0 for dm B000F8BF40FF000043042DD4A5 V-5-1-9576 Split Brain. da id is 0.1, while dm id is 0.0 for dm B000F8BF40FF00004003FE9356 DESCRIPTION: When a disk is detached, the SSB ID of the remaining DA and DM records shall be incremented. Unfortunately for some reason, the SSB ID of DA record is only incremented, but the SSB ID of DM record is NOT updated. One probable reason may be because the disks get detached before updating the DM records. RESOLUTION: A work-around option is provided to bypass the SSB checks while importing the DG, the user can import the DG with 'vxdg -o overridessb import ' command if a false split brain happens. For using '-o overridessb', one should confirm that all DA records of the DG are available in ENABLED state and are differing with DM records against SSB by 1. * 3959452 (Tracking ID: 3931678) SYMPTOM: There was a performance issue while shipping IO to the remote disks due to non-cached memory allocation and the redundant locking. DESCRIPTION: There was redundant locking while checking if the flow control is enabled by GAB during IO shipping. The redundant locking is optimized. Additionally, the memory allocation during IO shipping is optimized. RESOLUTION: Changes are done in VxVM code to optimize the memory allocation and reduce the redundant locking to improve the performance. * 3959453 (Tracking ID: 3932241) SYMPTOM: VxVM (Veritas Volume Manager) creates some required files under /tmp and /var/tmp directories. DESCRIPTION: VxVM (Veritas Volume Manager) creates some required files under /tmp and /var/tmp directories. The non-root users have access to these folders, and they may accidently modify, move or delete those files. Such actions may interfere with the normal functioning of the Veritas Volume Manager. RESOLUTION: This Hot Fix address the issue by moving the required Veritas Volume Manager files to secure location. * 3959455 (Tracking ID: 3932496) SYMPTOM: In an FSS environment, volume creation might fail on SSD devices if vxconfigd was earlier restarted on the master node. DESCRIPTION: In an FSS environment, if a shared disk group is created using SSD devices and vxconfigd is restarted, then the volume creation might fail. The problem is because the mediatype attribute of the disk was NOT propagated from kernel to vxconfigd while restarting the vxconfigd daemon. RESOLUTION: Changes are done in VxVM code to correctly propagate mediatype for remote devices during vxconfigd startup. * 3959458 (Tracking ID: 3936535) SYMPTOM: The IO performance is poor due to frequent cache drops on snapshots configured system. DESCRIPTION: On VxVM snapshots configured system, along with the IO going, DCO map update will happen and it could allocate lots of chucks of pages memory, which triggered kswapd to swap the cache memory out, so cache drops were seen. RESOLUTION: Code changes are done to allocate big size memory for DCO map update without triggering memory swap out. * 3959460 (Tracking ID: 3942890) SYMPTOM: In case of Data Change Object (DCO) configured, IO hang may happen with heavy IO load plus slow Storage Replicator Log (SRL) flush. DESCRIPTION: Application IO needs to wait for DRL flush complete to proceed. Due to a defect in DRL code, DRL flush couldn't proceed in case there're large amount of IO which exceeded avaiable DRL chunks, hence IO hang. RESOLUTION: Code changes have been made to fix the issue. * 3959461 (Tracking ID: 3946350) SYMPTOM: kmalloc-1024 and kmalloc-2048 memory consuming keeps increasing when Veritas Volume Replicator (VVR) IO size is more than 256K. DESCRIPTION: In case of VVR , if I/O size is more than 256K, then the IO is broken into child IOs. Due to code defect, the allocated space doesn't got freed when splited IOs are completed. RESOLUTION: The code is modified to free VxVM allocated memory after split IOs competed. * 3959462 (Tracking ID: 3947265) SYMPTOM: vxfen tends to fail and creates split brain issues. DESCRIPTION: Currently to check whether the infiniband devices are present or not we check for some modules which on rhel 7.4 comes by default. RESOLUTION: TO check for infiniband devices we would be checking for /sys/class/infiniband directory in which the device information gets populated if infiniband devices are present. * 3959463 (Tracking ID: 3954787) SYMPTOM: On a RHEL 7.5 FSS environment with GCO configured having NVMe devices and Infiniband network, data corruption might occur when sending the IO from Master to slave node. DESCRIPTION: In the recent RHEL 7.5 release, linux stopped allowing IO on the underlying NVMe device which has gaps in between BIO vectors. In case of VVR, the SRL header of 3 blocks is added to the BIO . When the BIO is sent through LLT to the other node because of LLT limitation of 32 fragments can lead to unalignment of BIO vectors. When this unaligned BIO is sent to the underlying NVMe device, the last 3 blocks of the BIO are skipped and not written to the disk on the slave node. This leads to incomplete data written on the slave node which leads to data corruption. RESOLUTION: Code changes have been done to handle this case and send the BIO aligned to the underlying NVMe device. * 3959465 (Tracking ID: 3956732) SYMPTOM: systemd-udevd messages like below can be seen in journalctl logs: systemd-udevd[7506]: inotify_add_watch(7, /dev/VxDMP8, 10) failed: No such file or directory systemd-udevd[7511]: inotify_add_watch(7, /dev/VxDMP9, 10) failed: No such file or directory DESCRIPTION: When there are some changes done to the underlying VxDMP device the messages are getting displayed in journalctl logs. The reason for the message is because we have not handled the change event of the VxDMP device in our UDEV rule. RESOLUTION: Code changes have been done to handle change event of VxDMP device in our UDEV rule. * 3959469 (Tracking ID: 3922529) SYMPTOM: VxVM (Veritas Volume Manager) creates some required files under /tmp and /var/tmp directories. DESCRIPTION: During creation of VxVM (Veritas Volume Manager) rpm package, some files are created under /usr/lib/vxvm/voladm.d/lib/vxkmiplibs/ directory. The non-root users have access to these folders, and they may accidentally modify, move or delete those files. RESOLUTION: This Hot Fix address the issue by assigning proper permissions directory during creation of rpm. * 3959471 (Tracking ID: 3932356) SYMPTOM: In a two node cluster vxconfigd dumps core while importing the DG - dapriv_da_alloc () in setup_remote_disks () in volasym_remote_getrecs () req_dg_import () vold_process_request () start_thread () from /lib64/libpthread.so.0 from /lib64/libc.so.6 DESCRIPTION: The vxconfigd is dumping core due to address alignment issue. RESOLUTION: The alignment issue is fixed. * 3959473 (Tracking ID: 3945115) SYMPTOM: VxVM vxassist relayout command fails for volumes with RAID layout with the following message: VxVM vxassist ERROR V-5-1-2344 Cannot update volume VxVM vxassist ERROR V-5-1-4037 Relayout operation aborted. (7) DESCRIPTION: During relayout operation, the target volume inherits the attributes from original volume. One of those attributes is read policy. In case if the layout of original volume is RAID, it will set RAID read policy. RAID read policy expects the target volume to have appropriate log required for RAID policy. Since the target volume is of different layout, it does not have the log present and hence relayout operation fails. RESOLUTION: Code changes have been made to set the read policy to SELECT for target volumes rather than inheriting it from original volume in case original volume is of RAID layout. * 3959475 (Tracking ID: 3950384) SYMPTOM: In a scenario where volume data encryption at rest is enabled, data corruption may occur if the file system size exceeds 1TB and the data is located in a file extent which has an extent size bigger than 256KB. DESCRIPTION: In a scenario where data encryption at rest is enabled, data corruption may occur when both the following cases are satisfied: - File system size is over 1TB - The data is located in a file extent which has an extent size bigger than 256KB This issue occurs due to a bug which causes an integer overflow for the offset. RESOLUTION: As a part of this fix, appropriate code changes have been made to improve data encryption behavior such that the data corruption does not occur. * 3959476 (Tracking ID: 3950675) SYMPTOM: The following command is not progressing and appears stuck. # vxdg import DESCRIPTION: The DG import command is found to be non-progressing some times on backup system. The analysis of the situation has shown that devices belonging to the DG are found to be reporting "devid mismatch" more than once, due to not following graceful DR steps. An erroneous processing of such situation resulted in not allowing IOs on such devices leading to DG import hang. RESOLUTION: The code processing the "devid mismatch" is rectified. * 3959477 (Tracking ID: 3953845) SYMPTOM: IO hang can be experienced when there is memory pressure situation because of "vxencryptd". DESCRIPTION: For large size IO, VxVM(Veritas Volume Manager) tries to acquire contiguous pages in memory for some of its internal data structures. In heavy memory pressure scenarios, there can be a possibility that contiguous pages are not available. In such case it waits till the required pages are available for allocation and does not process the request further. This causes IO hang kind of situation where IO can't progress further or progresses very slowly. RESOLUTION: Code changes are done to avoid the IO hang situation. * 3959478 (Tracking ID: 3956027) SYMPTOM: System panicked while removing disks from disk group, stack likes the following: [0000F4C4]___memmove64+0000C4 () [077ED5FC]vol_get_one_io_stat+00029C () [077ED8FC]vol_get_io_stats+00009C () [077F1658]volinfo_ioctl+0002B8 () [07809954]volsioctl_real+0004B4 () [079014CC]volsioctl+00004C () [07900C40]vols_ioctl+000120 () [00605730]rdevioctl+0000B0 () [008012F4]spec_ioctl+000074 () [0068FE7C]vnop_ioctl+00005C () [0069A5EC]vno_ioctl+00016C () [006E2090]common_ioctl+0000F0 () [00003938]mfspurr_sc_flih01+000174 () DESCRIPTION: IO stats function was trying to access a freed disk that was being removed from disk group, result in panic for illegal memory access. RESOLUTION: Code changes have been made to resolve this race condition. * 3959479 (Tracking ID: 3956727) SYMPTOM: In SOLARIS DDL discovery when SCSI ioctl fails, direct disk IO on device can lead to high memory consumption and vxconfigd hangs. DESCRIPTION: In SOLARIS DDL discovery when SCSI ioctls on disk for private region IO fails we attempt a direct disk read/write to the disk. Due to compiler issue this direct read/write gets invalid arguments which leads to high memory consumption and vxconfigd hangs. RESOLUTION: Changes are done in VxVM code to ensure correct arguments are passed to disk read/write. * 3959480 (Tracking ID: 3957227) SYMPTOM: Disk group import succeeded, but with below error message: vxvm:vxconfigd: [ID ** daemon.error] V-5-1-0 dg_import_name_to_dgid: Found dgid = ** DESCRIPTION: When do disk group import, two configuration copies may be found. Volume Manager will use the latest configuration copy, then print a message to indicate this scenario. Due to wrong log level, this message got printed in error category. RESOLUTION: Code changes has been made to suppress this harmless message. * 3960383 (Tracking ID: 3958062) SYMPTOM: After migrating boot lun, disabling dmp_native_support fails with following error. VxVM vxdmpadm ERROR V-5-1-15883 check_bosboot open failed /dev/r errno 2 VxVM vxdmpadm ERROR V-5-1-15253 bosboot would not succeed, please run manually to find the cause of failure VxVM vxdmpadm ERROR V-5-1-15251 bosboot check failed VxVM vxdmpadm INFO V-5-1-18418 restoring protofile + final_ret=18 + f_exit 18 VxVM vxdmpadm ERROR V-5-1-15690 Operation failed for one or more volume groups VxVM vxdmpadm ERROR V-5-1-15686 The following VG(s) could not be migrated as could not disable DMP support for LVM bootability - rootvg DESCRIPTION: After performing the boot lun migration, while enabling/disabling the DMP native support, VxVM was performing 'bosboot' verification with the old boot disk name, instead of the migrated disk. The reason was one AIX OS command was returning the old boot disk name. RESOLUTION: The code is changed to use correct OS command to get the boot disk name after migration. * 3961353 (Tracking ID: 3950199) SYMPTOM: System may panic with following stack while DMP(Dynamic Mulitpathing) path restoration: #0 [ffff880c65ea73e0] machine_kexec at ffffffff8103fd6b #1 [ffff880c65ea7440] crash_kexec at ffffffff810d1f02 #2 [ffff880c65ea7510] oops_end at ffffffff8154f070 #3 [ffff880c65ea7540] no_context at ffffffff8105186b #4 [ffff880c65ea7590] __bad_area_nosemaphore at ffffffff81051af5 #5 [ffff880c65ea75e0] bad_area at ffffffff81051c1e #6 [ffff880c65ea7610] __do_page_fault at ffffffff81052443 #7 [ffff880c65ea7730] do_page_fault at ffffffff81550ffe #8 [ffff880c65ea7760] page_fault at ffffffff8154e2f5 [exception RIP: _spin_lock_irqsave+31] RIP: ffffffff8154dccf RSP: ffff880c65ea7818 RFLAGS: 00210046 RAX: 0000000000010000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000200246 RSI: 0000000000000040 RDI: 00000000000000e8 RBP: ffff880c65ea7818 R8: 0000000000000000 R9: ffff8824214ddd00 R10: 0000000000000002 R11: 0000000000000000 R12: ffff88302d2ce400 R13: 0000000000000000 R14: ffff880c65ea79b0 R15: ffff880c65ea79b7 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffff880c65ea7820] dmp_open_path at ffffffffa07be2c5 [vxdmp] #10 [ffff880c65ea7980] dmp_restore_node at ffffffffa07f315e [vxdmp] #11 [ffff880c65ea7b00] dmp_revive_paths at ffffffffa07ccee3 [vxdmp] #12 [ffff880c65ea7b40] gendmpopen at ffffffffa07cbc85 [vxdmp] #13 [ffff880c65ea7c10] dmpopen at ffffffffa07cc51d [vxdmp] #14 [ffff880c65ea7c20] dmp_open at ffffffffa07f057b [vxdmp] #15 [ffff880c65ea7c50] __blkdev_get at ffffffff811d7f7e #16 [ffff880c65ea7cb0] blkdev_get at ffffffff811d82a0 #17 [ffff880c65ea7cc0] blkdev_open at ffffffff811d8321 #18 [ffff880c65ea7cf0] __dentry_open at ffffffff81196f22 #19 [ffff880c65ea7d50] nameidata_to_filp at ffffffff81197294 #20 [ffff880c65ea7d70] do_filp_open at ffffffff811ad180 #21 [ffff880c65ea7ee0] do_sys_open at ffffffff81196cc7 #22 [ffff880c65ea7f30] compat_sys_open at ffffffff811eee9a #23 [ffff880c65ea7f40] symev_compat_open at ffffffffa0c9b08f DESCRIPTION: System panic can be encounter due to race condition. There is a possibility that a path picked by DMP restore daemon for processing may be deleted before the restoration process is complete. Hence when the restoration daemon tries to access the path properties it leads to system panic as the path properties are already freed. RESOLUTION: Code changes are done to handle the race condition. * 3961355 (Tracking ID: 3952529) SYMPTOM: Enabling and disabling DMP (Dynamic Multipathing) Native Support using command "vxdmpadm settune dmp_native_support" fails with below error: VxVM vxdmpadm ERROR V-5-1-15690 Operation failed for one or more volume groups VxVM vxdmpadm ERROR V-5-1-15686 The following vgs could not be migrated as they are in use - DESCRIPTION: During enabling/disabling DMP Native Support, the VG's need to migrate from OS devices to dmpnodes and vice-versa when the native support is enabled. To complete the migration, vgexport/vgimport commands are used. If the VG is in mounted state, the vgexport command fails indicating the VG is in use. Because of the failure the VG migration fails and the command "vxdmpadm settune dmp_native_support" fails with error "VG is in use". RESOLUTION: Code changes have been done to use vgchange instead of vgexport/vgimport to solve the problem. * 3961356 (Tracking ID: 3953481) SYMPTOM: A stale entry of a replaced disk was left behind under /dev/[r]dsk to represent the replaced disk. DESCRIPTION: Whenever a disk is removed from DMP view, the driver property information of the disk has to be removed from the kernel, if not a stale entry will be left out under the /dev/[r]dsk. Now when a new disk replaces with the same minor number, instead of refreshing the property, the stale information is left. RESOLUTION: Code is modified to remove the stale device property when a disk is removed. * 3961358 (Tracking ID: 3955101) SYMPTOM: Server might panic in a GCO environment with the following stack: nmcom_server_main_tcp() ttwu_do_wakeup() ttwu_do_activate.constprop.90() try_to_wake_up() update_curr() update_curr() account_entity_dequeue() __schedule() nmcom_server_proc_tcp() kthread() kthread_create_on_node() ret_from_fork() kthread_create_on_node() DESCRIPTION: There are recent changes done in the code to handle Dynamic port changes i.e deletion and addition of ports can now happen dynamically. It might happen that while accessing the port, it was deleted in the background by other thread. This would lead to a panic in the code since the port to be accessed has been already deleted. RESOLUTION: Code changes have been done to take care of this situation and check if the port is available before accessing it. * 3961359 (Tracking ID: 3955725) SYMPTOM: Utility to clear "failio" flag on disk after storage connectivity is back. DESCRIPTION: If I/Os to the disks timeout due to some hardware failures like weak Storage Area Network (SAN) cable link or Host Bus Adapter (HBA) failure, VxVM assumes that disk is bad or slow and it sets "failio" flag on the disk. Because of this flag, all the subsequent I/Os fail with the No such device error. After the connectivity is back, the "failio" needs to clear using "vxdisk failio=off". We have come up with a utility "vxcheckfailio" which will clear the "failio" flag for all the disks whose all paths are enabled. RESOLUTION: Code changes are done to add utility "vxcheckfailio" that will clear the "failio" flag on the disks. * 3961468 (Tracking ID: 3926067) SYMPTOM: In a Campus Cluster environment, vxassist relayout command may fail with following error: VxVM vxassist ERROR V-5-1-13124 Site offline or detached VxVM vxassist ERROR V-5-1-4037 Relayout operation aborted. (20) vxassist convert command also might fail with following error: VxVM vxassist ERROR V-5-1-10128 No complete plex on the site. DESCRIPTION: For vxassist "relayout" and "convert" operations in Campus Cluster environment, VxVM (Veritas Volume Manager) needs to sort the plexes of volume according to sites. When the number of plexes of volumes are greater than 100, the sorting of plexes fail due to a bug in the code. Because of this sorting failure, vxassist relayout/convert operations fail. RESOLUTION: Code changes are done to properly sort the plexes according to site. * 3961469 (Tracking ID: 3948140) SYMPTOM: System may panic if RTPG data returned by the array is greater than 255 with below stack: dmp_alua_get_owner_state() dmp_alua_get_path_state() dmp_get_path_state() dmp_check_path_state() dmp_restore_callback() dmp_process_scsireq() dmp_daemons_loop() DESCRIPTION: The size of the buffer given to RTPG SCSI command is currently 255 bytes. But the size of data returned by underlying array for RTPG can be greater than 255 bytes. As a result incomplete data is retrieved (only the first 255 bytes) and when trying to read the RTPG data, it causes invalid access of memory resulting in error while claiming the devices. This invalid access of memory may lead to system panic. RESOLUTION: The RTPG buffer size has been increased to 1024 bytes for handling this. * 3961480 (Tracking ID: 3957549) SYMPTOM: Server panicked when resyncing mirror volume with the following stack: voliot_object_event+0x2e0 vol_oes_sio_start+0x80 voliod_iohandle+0x30 voliod_loop+0x248 thread_start+4 DESCRIPTION: In case IO error happened during mirror resync, need to log trace event for the IO error. As the IO is from mirror resync, KIO should be NULL. But NULL pointer check for KIO is missed during logging trace event, hence the panic. RESOLUTION: Code changes have been made to fix the issue. * 3964315 (Tracking ID: 3952042) SYMPTOM: dmpevents.log is flooding with below messages: Tue Jul 11 09:28:36.620: Lost 12 DMP I/O statistics records Tue Jul 11 10:05:44.257: Lost 13 DMP I/O statistics records Tue Jul 11 10:10:05.088: Lost 6 DMP I/O statistics records Tue Jul 11 11:28:24.714: Lost 6 DMP I/O statistics records Tue Jul 11 11:46:35.568: Lost 1 DMP I/O statistics records Tue Jul 11 12:04:10.267: Lost 13 DMP I/O statistics records Tue Jul 11 12:04:16.298: Lost 5 DMP I/O statistics records Tue Jul 11 12:44:05.656: Lost 31 DMP I/O statistics records Tue Jul 11 12:44:38.855: Lost 2 DMP I/O statistics records DESCRIPTION: when DMP (Dynamic Multi-Pathing) expand iostat table, DMP allocates a new larger table, replaces the old table with the new one and frees the old one. This increases the possibility of memory fragmentation. RESOLUTION: The code is modified to increase the initial value for iostat table. * 3966132 (Tracking ID: 3960576) SYMPTOM: With installation of the said VxVM patch 7.3.1.100, one more rule got added incorrectly. DESCRIPTION: The rules are added directly through the file vxdmp.PdDv. The install/upgrade scripts take care of adding these rules using the odmadd command. So ideally if we see all the rules should be added twice like the current rule. But in the install/upgrade scripts other rules are removed before they are added using the odmadd command. RESOLUTION: We need to add an entry similar to these for the "vxdisk scandisks" rule as well in the install/upgrade script. Once that is done the error for the duplicate entry would go away. * 3934330 (Tracking ID: 3941109) SYMPTOM: lsmod is not showing the required APM modules loaded. DESCRIPTION: For supporting SLES12SP3 update, dmp module is recompiled with latest SLES12SP3 kernel version. During post install of the package the APM modules fails to load due to mismatch in dmp and additional APM module kernel version. RESOLUTION: ASLAPM package is recompiled with SLES12SP3 kernel. * 3943620 (Tracking ID: 3938549) SYMPTOM: Command vxassist -g make vol fails with error: Unexpected kernel error in configuration update for Rhel 7.5. DESCRIPTION: Due to changes in Rhel 7.5 source code the vxassist make volume command failed to create volume and returned with error "Unexpected kernel error in configuration update". RESOLUTION: Changes are done in VxVM code to solve the issue for volume creation. * 3932464 (Tracking ID: 3926976) SYMPTOM: Excessive number of connections are found in open state causing FD leak and eventually reporting license errors. DESCRIPTION: The vxconfigd reports license errors as it fails to open the license files. The failure to open is due to FD exhaustion, caused by excessive FIFO connections left in open state. The FIFO connections used to communicate with vxconfigd by clients (vx commands). Usually these should get closed once the client exits. One of such client "vxdclid" which is a daemon connecting frequently and leaving the connection is open state, causing FD leak. This issue is applicable to Solaris platform only. RESOLUTION: One of the API, a library call is leaving the connection in open state while leaving, which is fixed. * 3933874 (Tracking ID: 3852146) SYMPTOM: In a CVM cluster, when importing a shared diskgroup specifying both -c and -o noreonline options, the following error may be returned: VxVM vxdg ERROR V-5-1-10978 Disk group : import failed: Disk for disk group not found. DESCRIPTION: The -c option will update the disk ID and disk group ID on the private region of the disks in the disk group being imported. Such updated information is not yet seen by the slave because the disks have not been re-onlined (given that noreonline option is specified). As a result, the slave cannot identify the disk(s) based on the updated information sent from the master, causing the import to fail with the error Disk for disk group not found. RESOLUTION: The code is modified to handle the working of the "-c" and "-o noreonline" options together. * 3933875 (Tracking ID: 3872585) SYMPTOM: System running with VxFS and VxVM panics with storage key exception with the following stack: simple_lock dispatch flih_util touchrc pin_seg_range pin_com pinx_plock plock_pinvec plock mfspurr_sc_flih01 DESCRIPTION: The xntpd process mounted on a vxfs filesystem could panic with storage key exception. The xntpd binary page faulted and did an IO, after which the storage key exception was detected OS as it couldn't locate it's keyset. From the code review it was found that in a few error cases in the vxvm, the storage key may not be restored after they're replaced. RESOLUTION: Do storage key restore even when in the error cases in vxio and dmp layer. * 3933876 (Tracking ID: 3894657) SYMPTOM: VxVM commands may hang when using space optimized snapshot. DESCRIPTION: If there is a volume with DRL enabled having space optimized and mirrored cache object volume which DRL enabled, VxVM commands may hang. If the IO load on the volume is high it can lead to memory crunch as memory stabilization is done when DRL(Dirty Region Logging) is enabled. The IOs in the queue may wait for memory to become free. In the meantime, other VxVM commands which require changing the configuration of the volumes may hang because of IO not able to proceed. RESOLUTION: Memory stabilization is not required for VxVM generated internal IO's for cache object volume. Code changes have be done to eliminate memory stabilization for cache object IOs. * 3933877 (Tracking ID: 3914789) SYMPTOM: System may panic when reclaiming on secondary in VVR(Veritas Volume Replicator) environment. It's due to accessing invalid address, error message is similiar to "data access MMU miss". DESCRIPTION: VxVM maintains a linked list to keep memory segment information. When accessing its content with certain offset, linked list is traversed. Due to code defect when offset is equal to segment chunk size, end of such segement is returned instead of start of next segment. It can result silent memory corruption because it tries to access memory out of its boundary. System can panic when out of boundary address isn't allocated yet. RESOLUTION: Code changes have been made to fix the out-of-boundary access. * 3933878 (Tracking ID: 3918408) SYMPTOM: Data corruption when volume grow is attempted on thin reclaimable disks whose space is just freed. DESCRIPTION: When the space in the volume is freed by deleting some data or subdisks, the corresponding subdisks are marked for reclamation. It might take some time for the periodic reclaim task to start if not issued manually. In the meantime, if same disks are used for growing another volume, it can happen that reclaim task will go ahead and overwrite the data written on the new volume. Because of this race condition between reclaim and volume grow operation, data corruption occurs. RESOLUTION: Code changes are done to handle race condition between reclaim and volume grow operation. Also reclaim is skipped for those disks which have been already become part of new volume. * 3933880 (Tracking ID: 3864063) SYMPTOM: Application I/O hangs after the Master Pause command is issued. DESCRIPTION: Some flags (VOL_RIFLAG_DISCONNECTING or VOL_RIFLAG_REQUEST_PENDING) in VVR (Veritas Volume Replicator) kernel are not cleared because of a race between the Master Pause SIO and the Error Handler SIO. This causes the RU (Replication Update) SIO to fail to proceed, which leads to I/O hang. RESOLUTION: The code is modified to handle the race condition. * 3933882 (Tracking ID: 3865721) SYMPTOM: Vxconfigd hang in dealing transaction while pausing the replication in Clustered VVR environment. DESCRIPTION: In Clustered VVR (CVM VVR) environment, while pausing replication which is in DCM (Data Change Map) mode, the master pause SIO (staging IO) can not finish serialization since there are metadata shipping SIOs in the throttle queue with the activesio count added. Meanwhile, because master pause SIOs SERIALIZE flag is set, DCM flush SIO can not be started to flush the throttle queue. It leads to a dead loop hang state. Since the master pause routine needs to sync up with transaction routine, vxconfigd hangs in transaction. RESOLUTION: Code changes were made to flush the metadata shipping throttle queue if master pause SIO can not finish serialization. * 3933883 (Tracking ID: 3867236) SYMPTOM: Application IO hang happens after issuing Master Pause command. DESCRIPTION: The flag VOL_RIFLAG_REQUEST_PENDING in VVR(Veritas Volume Replicator) kernel is not cleared because of a race between Master Pause SIO and RVWRITE1 SIO resulting in RU (Replication Update) SIO to fail to proceed thereby causing IO hang. RESOLUTION: Code changes have been made to handle the race condition. * 3933884 (Tracking ID: 3868154) SYMPTOM: When DMP Native Support is set to ON, and if a dmpnode has multiple VGs, 'vxdmpadm native ls' shows incorrect VG entries for dmpnodes. DESCRIPTION: When DMP Native Support is set to ON, multiple VGs can be created on a disk as Linux supports creating VG on a whole disk as well as on a partition of a disk.This possibility was not handled in the code, hence the display of 'vxdmpadm native ls' was getting messed up. RESOLUTION: Code now handles the situation of multiple VGs of a single disk * 3933889 (Tracking ID: 3879234) SYMPTOM: dd read on the Veritas Volume Manager (VxVM) character device fails with Input/Output error while accessing end of device like below: [root@dn pmansukh_debug]# dd if=/dev/vx/rdsk/hfdg/vol1 of=/dev/null bs=65K dd: reading `/dev/vx/rdsk/hfdg/vol1': Input/output error 15801+0 records in 15801+0 records out 1051714560 bytes (1.1 GB) copied, 3.96065 s, 266 MB/s DESCRIPTION: The issue occurs because of the change in the Linux API generic_file_aio_read. Because of lot of changes in Linux API generic_file_aio_read, it does not properly handle end of device reads/writes. The Linux code has been changed to use blkdev_aio_read which is a GPL symbol and hence cannot be used. RESOLUTION: Made changes in the code to handle end of device reads/writes properly. * 3933890 (Tracking ID: 3879324) SYMPTOM: VxVM(Veritas Volume Manager) DR(Dynamic Reconfiguration) tool fails to handle busy device problem while LUNs are removed from OS DESCRIPTION: OS devices may still be busy after removing them from OS, it fails 'luxadm - e offline ' operation and leaves staled entries in 'vxdisk list' output like: emc0_65535 auto - - error emc0_65536 auto - - error RESOLUTION: Code changes have been done to address busy devices issue. * 3933897 (Tracking ID: 3907618) SYMPTOM: vxdisk resize leads to data corruption on filesystem with MSDOS labelled disk having VxVM sliced format. DESCRIPTION: vxdisk resize changes the geometry on the device if required. When vxdisk resize is in progress, absolute offsets i.e offsets starting from start of the device are used. For MSDOS labelled disk, the full disk is devoted on Slice 4 but not slice 0. Thus when IO is scheduled on the device an extra 32 sectors gets added to the IO which is not required since we are already starting the IO from start of the device. This leads to data corruption since the IO on the device shifted by 32 sectors. RESOLUTION: Code changes have been made to not add 32 sectors to the IO when vxdisk resize is in progress to avoid corruption. * 3933898 (Tracking ID: 3908987) SYMPTOM: The following unnecessary error message is printed to inform customer hot relocation will be performed on master mode. VxVM vxrelocd INFO V-5-2-6551 hot-relocation operation for shared disk group will be performed on master node. DESCRIPTION: In case there're failed disks the message will be printed. Because related code is not placed in right position, it's printed even if there's no failed disks. RESOLUTION: Code changes have been make to fix the issue. * 3933900 (Tracking ID: 3915523) SYMPTOM: Local disk from other node belonging to private DG is exported to the node when a private DG is imported on current node. DESCRIPTION: When we try to import a DG, all the disks belonging to the DG are automatically exported to the current node so as to make sure that the DG gets imported. This is done to have same behaviour as SAN with local disks as well. Since we are exporting all disks in the DG, then it happens that disks which belong to same DG name but different private DG on other node get exported to current node as well. This leads to wrong disk getting selected while DG gets imported. RESOLUTION: Instead of DG name, DGID (diskgroup ID) is used to decide whether disk needs to be exported or not. * 3933904 (Tracking ID: 3921668) SYMPTOM: Running the vxrecover command with -m option fails when run on the slave node with message "The command can be executed only on the master." DESCRIPTION: The issue occurs as currently vxrecover -g -m command on shared disk groups is not shipped using the command shipping framework from CVM (Cluster Volume Manager) slave node to the master node. RESOLUTION: Implemented code change to ship the vxrecover -m command to the master node, when its triggered from the slave node. * 3933907 (Tracking ID: 3873123) SYMPTOM: When remote disk on node is EFI disk, vold enable fails. And following message get logged, and eventually causing the vxconfigd to go into disabled state: Kernel and on-disk configurations don't match; transactions are disabled. DESCRIPTION: This is becasue one of the cases of EFI remote disk is not properly handled in disk recovery part when vxconfigd is enabled. RESOLUTION: Code changes have been done to set the EFI flag on darec in recovery code * 3933910 (Tracking ID: 3910228) SYMPTOM: Registration of GAB(Global Atomic Broadcast) port u fails on slave nodes after multiple new devices are added to the system.. DESCRIPTION: Vxconfigd sends command to GAB for port u registration and waits for a respnse from GAB. During this timeframe if the vxconfigd is interrupted by any other module apart from GAB then it will not be able to receive the signal from GAB of successful registration. Since the signal is not received, vxconfigd believes the registration did not succeed and treats it as a failure. RESOLUTION: Mask the signals which vxconfigd can receive before waiting for the signal from GAB for registration of gab u port. * 3933911 (Tracking ID: 3925377) SYMPTOM: Not all disks could be discovered by Dynamic Multi-Pathing(DMP) after first startup.. DESCRIPTION: DMP is started too earlier in the boot process if iSCSI and raw haven't been installed. Till that point the FC devices are not recognized by OS, hence DMP misses FC devices. RESOLUTION: The code is modified to make sure DMP get started after OS disk discovery. * 3937540 (Tracking ID: 3906534) SYMPTOM: After enabling DMP (Dynamic Multipathing) Native support, enable /boot to be mounted on DMP device. DESCRIPTION: Currently /boot is mounted on top of OS (Operating System) device. When DMP Native support is enabled, only VG's (Volume Groups) are migrated from OS device to DMP device.This is the reason /boot is not migrated to DMP device. With this if OS device path is not available then system becomes unbootable since /boot is not available. Thus it becomes necessary to mount /boot on DMP device to provide multipathing and resiliency. RESOLUTION: Code changes have been done to migrate /boot on top of DMP device when DMP Native support is enabled. Note - The code changes are currently implemented for RHEL-6 only. For other linux platforms, /boot will still not be mounted on the DMP device * 3937541 (Tracking ID: 3911930) SYMPTOM: Valid PGR operations sometimes fail on a dmpnode. DESCRIPTION: As part of the PGR operations, if the inquiry command finds that PGR is not supported on the dmpnode node, a flag PGR_FLAG_NOTSUPPORTED is set on the dmpnode. Further PGR operations check this flag and issue PGR commands only if this flag is NOT set. This flag remains set even if the hardware is changed so as to support PGR. RESOLUTION: A new command (namely enablepr) is provided in the vxdmppr utility to clear this flag on the specified dmpnode. * 3937550 (Tracking ID: 3935232) SYMPTOM: Replication and IO hang may happen on new master node during master takeover. DESCRIPTION: During master switch is in progress if log owner change kicks in, flag VOLSIO_FLAG_RVC_ACTIVE will be set by log owner change SIO. RVG(Replicated Volume Group) recovery initiated by master switch will clear flag VOLSIO_FLAG_RVC_ACTIVE after RVG recovery done. When log owner change done, as flag VOLSIO_FLAG_RVC_ACTIVE has been cleared, resetting flag VOLOBJ_TFLAG_VVR_QUIESCE is skipped. The present of flag VOLOBJ_TFLAG_VVR_QUIESCE will make replication and application IO on RVG always be in pending state. RESOLUTION: Code changes have been done to make log owner change wait until master switch completed. * 3937808 (Tracking ID: 3931936) SYMPTOM: In FSS(Flexible Storage Sharing) environment, after restarting slave node VxVM command on master node hang result in failed disks on slave node could not rejoin disk group. DESCRIPTION: While lost remote disks on slave node comes back, online these disk and add them to disk group operations are performed on master node. Disk online includes operations from both master and slave node. On slave node these disks should be offlined then reonlined, but due to code defect reonline disks are missed result in these disks are kept in reonlining state. The following add disk to disk group operation needs to issue private region IOs on the disk. These IOs are shipped to slave node to complete. As the disks are in reonline state, busy error gets returned and remote IOs keep retrying, hence VxVM command hang on master node. RESOLUTION: Code changes have been made to fix the issue. * 3937811 (Tracking ID: 3935974) SYMPTOM: While communicating with client process, vxrsyncd daemon terminates and after sometime it gets started or may require a reboot to start. DESCRIPTION: When the client process shuts down abruptly and vxrsyncd daemon attempt to write on the client socket, SIGPIPE signal is generated. The default action for this signal is to terminate the process. Hence vxrsyncd gets terminated. RESOLUTION: This SIGPIPE signal should be handled in order to prevent the termination of vxrsyncd. * 3940039 (Tracking ID: 3897047) SYMPTOM: Filesystems are not mounted automatically on boot through systemd on RHEL7 and SLES12. DESCRIPTION: When systemd service tries to start all the FS in /etc/fstab, the Veritas Volume Manager (VxVM) volumes are not started since vxconfigd is still not up. The VxVM volumes are started a little bit later in the boot process. Since the volumes are not available, the FS are not mounted automatically at boot. RESOLUTION: Registered the VxVM volumes with UDEV daemon of Linux so that the FS would be mounted when the VxVM volumes are started and discovered by udev. * 3940143 (Tracking ID: 3941037) SYMPTOM: VxVM (Veritas Volume Manager) creates some required files under /tmp and /var/tmp directories. DESCRIPTION: VxVM (Veritas Volume Manager) creates some .lock files under /etc/vx directory. The non-root users have access to these .lock files, and they may accidentally modify, move or delete those files. Such actions may interfere with the normal functioning of the Veritas Volume Manager. RESOLUTION: This Fix address the issue by masking the write permission for non-root users for these .lock files. * 3937549 (Tracking ID: 3934910) SYMPTOM: IO errors on data volume or file system happen after some cycles of snapshot creation/removal with dg reimport. DESCRIPTION: With the snapshot of the data volume removal and the dg reimport, the DRL map keep active rather than to be inactivated. With the new snapshot created, the DRL would be re-enabled and new DRL map allocated with the first write to the data volume. The original active DRL map would not be used and leaked. After some such cycles, the extent of the DCO volume would be exhausted due to the active but not be used DRL maps, then no more DRL map could be allocated and the IOs would be failed or unable to be issued on the data volume. RESOLUTION: Code changes are done to inactivate the DRL map if the DRL is disabled during the volume start, then it could be reused later safely. * 3937542 (Tracking ID: 3917636) SYMPTOM: Filesystems from /etc/fstab file are not mounted automatically on boot through systemd on RHEL7 and SLES12. DESCRIPTION: While bootup, when systemd tries to mount using the devices mentioned in /etc/fstab file on the device, the device is not accessible leading to the failure of the mount operation. As the device discovery happens through udev infrastructure, the udev-rules for those devices need to be run when volumes are created so that devices get registered with systemd. In the case udev rules are executed even before the devices in "/dev/vx/dsk" directory are created. Since the devices are not created, devices will not be registered with systemd leading to the failure of mount operation. RESOLUTION: Run "udevadm trigger" to execute all the udev rules once all volumes are created so that devices are registered. Patch ID: VRTSveki-7.3.1.100 * 3967357 (Tracking ID: 3967265) SYMPTOM: RHEL 7.x RETPOLINE kernels and RHEL 7.6 are not supported DESCRIPTION: Red Hat has released RHEL 7.6 which has RETPOLINE kernel, and also released RETPOLINE kernels for older RHEL 7.x Updates. Veritas Cluster Server kernel modules need to be recompiled with RETPOLINE aware GCC to support RETPOLINE kernel. RESOLUTION: Support for RHEL 7.6 and RETPOLINE kernels on RHEL 7.x kernels is now introduced. Patch ID: VRTSdbac-7.3.1.200 * 3955987 (Tracking ID: 3967265) SYMPTOM: RHEL 7.x RETPOLINE kernels and RHEL 7.6 are not supported DESCRIPTION: Red Hat has released RHEL 7.6 which has RETPOLINE kernel, and also released RETPOLINE kernels for older RHEL 7.x Updates. Veritas Cluster Server kernel modules need to be recompiled with RETPOLINE aware GCC to support RETPOLINE kernel. RESOLUTION: Support for RHEL 7.6 and RETPOLINE kernels on RHEL 7.x kernels is now introduced. * 3944197 (Tracking ID: 3944179) SYMPTOM: Veritas Infoscale Availability does not support Red Hat Enterprise Linux 7 Update 5(RHEL7.5). DESCRIPTION: Veritas Infoscale Availability does not support Red Hat Enterprise Linux versions later than RHEL7 Update 4. RESOLUTION: Veritas Infoscale Availability support for Red Hat Enterprise Linux 7 Update 5(RHEL7.5) is now introduced. Patch ID: VRTSamf-7.3.1.300 * 3955986 (Tracking ID: 3967265) SYMPTOM: RHEL 7.x RETPOLINE kernels and RHEL 7.6 are not supported DESCRIPTION: Red Hat has released RHEL 7.6 which has RETPOLINE kernel, and also released RETPOLINE kernels for older RHEL 7.x Updates. Veritas Cluster Server kernel modules need to be recompiled with RETPOLINE aware GCC to support RETPOLINE kernel. RESOLUTION: Support for RHEL 7.6 and RETPOLINE kernels on RHEL 7.x kernels is now introduced. * 3947657 (Tracking ID: 3948217) SYMPTOM: AMF mnton/mntoff registration fails and causes cluster node to panic DESCRIPTION: AMF node panics during the mnton/mntoff registration when incorrect module is loaded on minor kernel. RESOLUTION: The source code is modified to load the appropriate module version when exact module version for the kernel is not found. * 3944196 (Tracking ID: 3944179) SYMPTOM: Veritas Infoscale Availability does not support Red Hat Enterprise Linux 7 Update 5(RHEL7.5). DESCRIPTION: Veritas Infoscale Availability does not support Red Hat Enterprise Linux versions later than RHEL7 Update 4. RESOLUTION: Veritas Infoscale Availability support for Red Hat Enterprise Linux 7 Update 5(RHEL7.5) is now introduced. Patch ID: VRTSvxfen-7.3.1.300 * 3955985 (Tracking ID: 3967265) SYMPTOM: RHEL 7.x RETPOLINE kernels and RHEL 7.6 are not supported DESCRIPTION: Red Hat has released RHEL 7.6 which has RETPOLINE kernel, and also released RETPOLINE kernels for older RHEL 7.x Updates. Veritas Cluster Server kernel modules need to be recompiled with RETPOLINE aware GCC to support RETPOLINE kernel. RESOLUTION: Support for RHEL 7.6 and RETPOLINE kernels on RHEL 7.x kernels is now introduced. * 3944195 (Tracking ID: 3944179) SYMPTOM: Veritas Infoscale Availability does not support Red Hat Enterprise Linux 7 Update 5(RHEL7.5). DESCRIPTION: Veritas Infoscale Availability does not support Red Hat Enterprise Linux versions later than RHEL7 Update 4. RESOLUTION: Veritas Infoscale Availability support for Red Hat Enterprise Linux 7 Update 5(RHEL7.5) is now introduced. Patch ID: VRTSgab-7.3.1.200 * 3955984 (Tracking ID: 3967265) SYMPTOM: RHEL 7.x RETPOLINE kernels and RHEL 7.6 are not supported DESCRIPTION: Red Hat has released RHEL 7.6 which has RETPOLINE kernel, and also released RETPOLINE kernels for older RHEL 7.x Updates. Veritas Cluster Server kernel modules need to be recompiled with RETPOLINE aware GCC to support RETPOLINE kernel. RESOLUTION: Support for RHEL 7.6 and RETPOLINE kernels on RHEL 7.x kernels is now introduced. * 3944182 (Tracking ID: 3944179) SYMPTOM: Veritas Infoscale Availability does not support Red Hat Enterprise Linux 7 Update 5(RHEL7.5). DESCRIPTION: Veritas Infoscale Availability does not support Red Hat Enterprise Linux versions later than RHEL7 Update 4. RESOLUTION: Veritas Infoscale Availability support for Red Hat Enterprise Linux 7 Update 5(RHEL7.5) is now introduced. Patch ID: VRTSllt-7.3.1.400 * 3955983 (Tracking ID: 3967265) SYMPTOM: RHEL 7.x RETPOLINE kernels and RHEL 7.6 are not supported DESCRIPTION: Red Hat has released RHEL 7.6 which has RETPOLINE kernel, and also released RETPOLINE kernels for older RHEL 7.x Updates. Veritas Cluster Server kernel modules need to be recompiled with RETPOLINE aware GCC to support RETPOLINE kernel. RESOLUTION: Support for RHEL 7.6 and RETPOLINE kernels on RHEL 7.x kernels is now introduced. * 3933242 (Tracking ID: 3948201) SYMPTOM: Kernel panics in case of FSS with LLT over RDMA during heavy data transfer. DESCRIPTION: In case of FSS using LLT over RDMA, sometimes kernel may panic because of an issue in the buffer advertisement logic of RDMA buffers. The case arises when the buffer advertisement for a particular RDMA buffer reaches the sender LLT node earlier than the hardware ACK comes to LLT. RESOLUTION: LLT module is modified to fix the panic by using a different temporary queue for such buffers. * 3944181 (Tracking ID: 3944179) SYMPTOM: Veritas Infoscale Availability does not support Red Hat Enterprise Linux 7 Update 5(RHEL7.5). DESCRIPTION: Veritas Infoscale Availability does not support Red Hat Enterprise Linux versions later than RHEL7 Update 4. RESOLUTION: Veritas Infoscale Availability support for Red Hat Enterprise Linux 7 Update 5(RHEL7.5) is now introduced. Patch ID: VRTSgms-7.3.1.1100 * 3955058 (Tracking ID: 3951761) SYMPTOM: GMS support for RHEL6.10 and RHEL/SUSE retpoline kernels DESCRIPTION: The RHEL6.10 is new release and it has Retpoline kernel. Also redhat/suse released retpoline kernel for older RHEL/SUSE releases. The GMS module should recompile with retpoline aware GCC to support retpoline kernel. RESOLUTION: Compiled GMS with Retpoline GCC. Patch ID: VRTSglm-7.3.1.1100 * 3944902 (Tracking ID: 3944901) SYMPTOM: "hastop -all" initiate un-mounting operation on all the mount point resource. This operation might hang in infoscale version of 7.3.1 DESCRIPTION: Before starting the recovery, glm scope flag is cleared in order to force lock consumers to wait, since local recovery will soon follow. Recovery initiated by RESTART can leave in between if some other recovery is initiated. Due to collision of SCOPE_LEAVE and RESTART GLM API's, there can be a case where no one did the recovery and all other operations hung waiting for recovery to complete. RESOLUTION: Code changes have been done to ensure that during such races, anyone of them must complete the local recovery. * 3952305 (Tracking ID: 3939996) SYMPTOM: Multiple threads sharing a single file and at least one thread modifying file. Cluster must have a large no of CPU's calling CFS doing operations of files being shared by multiple threads. Most probable in AIX with large no of CPU's. DESCRIPTION: GLM has a global per port lock called as gp_gen_lock, which protects recovery related counters. Although gp_gen_lock is taken for a very short time but on LOCK request side is taken more frequent and multi threaded, where it's causing contention on high end servers. RESOLUTION: Code changes are being made to avoid bottleneck/contention. * 3952307 (Tracking ID: 3940838) SYMPTOM: Print the master nodeid information of the stuck lock in the output of glmdump. DESCRIPTION: In case of CFS, if any GLM lock gets stuck, we use glmdump utility to find out the possible stuck lock but the current glmdump utility does not print node id of the master lock. RESOLUTION: Code changes have been done to print the master lock node id. * 3955056 (Tracking ID: 3951759) SYMPTOM: GLM support for RHEL6.10 and RHEL/SUSE retpoline kernels DESCRIPTION: The RHEL6.10 is new release and it has Retpoline kernel. Also redhat/suse released retpoline kernel for older RHEL/SUSE releases. The GLM module should recompile with retpoline aware GCC to support retpoline kernel. RESOLUTION: Compiled GLM with Retpoline GCC. * 3955901 (Tracking ID: 3955899) SYMPTOM: Node fails to join due to dependency of GLM services on multi-user.target and graphical.target services. if this issue happens, CVM on the concerned node would go in faulted state and would not be able to join. DESCRIPTION: multi-user.target and graphical.target services were added in LLT service and GLM being dependent on LLT had to add these services. Not adding this led to some systemd errors on some servers. But the dependency of LLT on multi-user.target and graphical.target services were removed and GLM services were waiting for these services to come up. RESOLUTION: Code changes has been done to remove the dependency of GLM on multi-user.target and graphical.target services. * 3959327 (Tracking ID: 3959325) SYMPTOM: Machine panics due to NULL pointer dereference DESCRIPTION: In case of cascade node reboot NULL pointer dereference was seen in nmsg process code path. There was a race between recovery and message processing. In message processing scope is searched. After getting the scope no hold is taken on scope. In parallel recovery is in progress and it releases the scope. After the scope is been released any access to scope will cause memory corruption. In this case we are hitting null pointer dereference while accessing one field of scope. RESOLUTION: Done code changes to ensure that there is hold on scope so that scope free in recovery is delayed. Patch ID: VRTSodm-7.3.1.2100 * 3936184 (Tracking ID: 3897161) SYMPTOM: Oracle Database on Veritas filesystem with Veritas ODM library has high log file sync wait time. DESCRIPTION: The ODM_IOP lock would not be held for long, so instead of trying to take a trylock, and deferring the IO when we fail to get the trylock, it would be better to call the non-trylock lock and finish the IO in the interrupt context. It should be fine on solaris since this "sleep" lock is actually an adaptive mutex. RESOLUTION: Instead of ODM_IOP_TRYLOCK() call ODM_IOP_LOCK() in the odm_iodone and finish the IO. This fix will not defer any IO. * 3952306 (Tracking ID: 3940492) SYMPTOM: ODM breaks ordering cycle in systemd , due to which either vxodm.service does not get loaded OR system itself does not get booted. DESCRIPTION: If we have a systemd service which is ordered after vxodm.service and has 'WantedBy=multi-user.target' , due to this a cycle gets formed between vxodm, this service & multi-user.target.Due to which system will not boot OR vxodm will not get loaded.removing multi-user.target from vxodm.service breaks the cycle. RESOLUTION: Removing 'multi-user.target' & 'graphical.target' from 'After' section of vxodm.service file to avoid cycle formation at startup. * 3955054 (Tracking ID: 3951754) SYMPTOM: ODM support for RHEL6.10 and RHEL/SUSE retpoline kernels DESCRIPTION: The RHEL6.10 is new release and it has Retpoline kernel. Also redhat/suse released retpoline kernel for older RHEL/SUSE releases. The ODM module should recompile with retpoline aware GCC to support retpoline kernel. RESOLUTION: Compiled ODM with Retpoline GCC. * 3960064 (Tracking ID: 3922681) SYMPTOM: RHEL 7.3 system panic with RIP at odm_tsk_exit+0x20. DESCRIPTION: RHEL 7.3 has changed the definition of vm_operations_struct. The ODM module was built on earlier RHEL version. Since ODM also uses vm_operations_struct, this mismatch of structure size caused the painc. RESOLUTION: ODM code change has been made to make sure that this structure change doesn't introduce any issue. * 3966263 (Tracking ID: 3958865) SYMPTOM: ODM module failed to load on RHEL7.6. DESCRIPTION: Since RHEL7.6 is new release therefore ODM module failed to load on it. RESOLUTION: Added ODM support for RHEL7.6. * 3943732 (Tracking ID: 3938546) SYMPTOM: ODM module failed to load on RHEL7.5. DESCRIPTION: Since RHEL7.5 is new release therefore ODM module failed to load on it. RESOLUTION: Added ODM support for RHEL7.5. * 3939411 (Tracking ID: 3941018) SYMPTOM: VRTSodm driver will not load with 7.3.1.100 VRTSvxfs patch. DESCRIPTION: Need recompilation of VRTSodm due to recent changes in VRTSvxfs header files due to which some symbols are not being resolved. RESOLUTION: Recompiled the VRTSodm with new changes in VRTSvxfs header files. Patch ID: VRTSvxfs-7.3.1.2100 * 3929952 (Tracking ID: 3929854) SYMPTOM: Event notification was not supported on CFS mount point so getting following errors in log file. -bash-4.1# /usr/jdk/jdk1.8.0_121/bin/java test1 myWatcher: sun.nio.fs.SolarisWatchService@70dea4e filesystem provider is : sun.nio.fs.SolarisFileSystemProvider@5c647e05 java.nio.file.FileSystemException: /mnt1: Operation not supported at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91) at sun.nio.fs.UnixException.asIOException(UnixException.java:111) at sun.nio.fs.SolarisWatchService$Poller.implRegister(SolarisWatchService.java:311) at sun.nio.fs.AbstractPoller.processRequests(AbstractPoller.java:260) at sun.nio.fs.SolarisWatchService$Poller.processEvent(SolarisWatchService.java:425) at sun.nio.fs.SolarisWatchService$Poller.run(SolarisWatchService.java:397) at java.lang.Thread.run(Thread.java:745) DESCRIPTION: WebLogic watchservice was failing to register with CFS mount point directory. which is resulting into "/mnt1: Operation not supported" on cfs mount point. RESOLUTION: Added new module parameter "vx_cfsevent_notify" to enable event notification support on CFS. By default vx_cfsevent_notify is disable. This will work only in Active-Passive scenario: -The Primary node(Active) which has set this tunable will receive notifications for the respective events happened on cfs mount point directory. -Secondary node (Passive) will not receive any notifications. * 3933816 (Tracking ID: 3902600) SYMPTOM: Contention observed on vx_worklist_lk lock in cluster mounted file system with ODM DESCRIPTION: In CFS environment for ODM async i/o reads, iodones are done immediately, calling into ODM itself from the interrupt handler. But all CFS writes are currently processed in delayed fashion, where the requests are queued and processed later by the worker thread. This was adding delays in ODM writes. RESOLUTION: Optimized the IO processing of ODM work items on CFS so that those are processed in the same context if possible. * 3943715 (Tracking ID: 3944884) SYMPTOM: ZFOD extents are being pushed on the clones. DESCRIPTION: In case of logged writes on ZFOD extents on primary, ZFOD extents are are pushed on clones which is not expected which results into internal write test failures. RESOLUTION: Code has been modified not to push ZFOD extents on clones. * 3947560 (Tracking ID: 3947421) SYMPTOM: DLV upgrade operation fails while upgrading Filesystem from DLV 9 to DLV 10 with following error message: ERROR: V-3-22567: cannot upgrade /dev/vx/rdsk/metadg/metavol - Invalid argument DESCRIPTION: If the filesystem has been created with DLV 5 or lesser and later the successful upgrade opration has been done from 5 to 6, 6 to 7, 7 to 8, 8 to 9. The newly written code tries to find the "mkfs" logging in the history log. There was no concept of logging the mkfs operation in the history log for DLV 5 or lesser so Upgrade operation fails while upgrading from DLV 9 to 10 RESOLUTION: Code changes have been done to complete upgrade operation even in case mkfs logging is not found. * 3947561 (Tracking ID: 3947433) SYMPTOM: While adding a volume (part of vset) in already mounted filesystem, fsvoladm displays following error: UX:vxfs fsvoladm: ERROR: V-3-28487: Could not find the volume in vset DESCRIPTION: The code to find the volume in the vset requires the file descriptor of character special device but in the concerned code path, the file descriptor that is being passed is of block device. RESOLUTION: Code changes have been done to pass the file descriptor of character special device. * 3947651 (Tracking ID: 3947648) SYMPTOM: Due to the wrong auto tuning of vxfs_ninode/inode cache, there could be hang observed due to lot of memory pressure. DESCRIPTION: If kernel heap memory is very large(particularly observed from SOLARIS T7 servers), there can be overflow due to smaller size data type. RESOLUTION: Changed the code to handle overflow. * 3952304 (Tracking ID: 3925281) SYMPTOM: Hexdump the incore inode data and piggyback data when inode revalidation fails. DESCRIPTION: While assuming the inode ownership, if inode revalidation fails with piggyback data, then we doesn't hexdump the piggyback and incore inode data. This will loose the current state of inode. Added inode revalidation failure message and hexdump the incore inode data and piggyback data. RESOLUTION: Code is modified to print hexdump of incore inode and piggyback data when revalidation of inode fails. * 3952309 (Tracking ID: 3941942) SYMPTOM: If fiostats_enabled filesystem is created, and if odmwrites are in progress, forcefully unmounting the filesystem can panic the system. crash_kexec oops_end no_context __bad_area_nosemaphore bad_area_nosemaphore __do_page_fault do_page_fault vx_fiostats_free fdd_chain_inactive_common fdd_chain_inactive fdd_odm_close odm_vx_close odm_fcb_free odm_fcb_rel odm_ident_close odm_exit odm_tsk_daemon_deathlist odm_tsk_daemon odm_kthread_init kernel_thread DESCRIPTION: When we are freeing fiostats assigned to an inode, when we unmount the filesystem forcefully, we have to validate fs field. Otherwise we may end up in a situation where we dereference NULL pointer for checks in this codepath, which panics. RESOLUTION: Code is modified to add checks to validate fs in such scenarios of force unmount. * 3952324 (Tracking ID: 3943232) SYMPTOM: System panic in vx_unmount_cleanup_notify when unmounting file system. DESCRIPTION: Every vnode having watches on it gets attached to root vnode of file system via vnode hook v_inotify_list during dentry purge. When user removes all watches from vnode, vnode is destroyed and VxFS free their associated memory. But it's possible that this vnode is still attached to root vnode list. during unmount, if Vxfs pick this vnode from root vnode list, then this could lead to null pointer deference when trying to access freed memory. To fix this issue, VxFS will now remove such vnodes from root vnode list. RESOLUTION: Code is modified to remove Vnode from root vnode list. * 3952325 (Tracking ID: 3943529) SYMPTOM: System panicked because watchdog timer detected hard lock up on CPU when trying to release dentry. DESCRIPTION: When purging the dentries, there is possible race with iget thread which can lead to corrupted vnode flags. Because of these corrupted flags, vxfs tries to purge dentry again and it struck for vnode lock which was taken in the current thread context which leads to deadlock/softlockup. RESOLUTION: Code is modified to protect vnode flags with vnode lock. * 3952840 (Tracking ID: 3952837) SYMPTOM: If a VXFS is to be mounted during boot-up in systemd environment, if its dependency service autofs.service is not up, service trying to mount vxfs enters fail state. DESCRIPTION: vxfs.service has dependency on autofs.service, If autofs starts after vxfs.service & another service tries to mount a vxfs , mount fails and service enters fail state. RESOLUTION: Dependency of vxfs.service on autofs.service & systemd-remountfs.service has been removed to solve the issue. * 3955051 (Tracking ID: 3951752) SYMPTOM: VxFS support for RHEL6.10 and RHEL/SUSE retpoline kernels DESCRIPTION: The RHEL6.10 is new release and it has Retpoline kernel. Also redhat/suse released retpoline kernel for older RHEL/SUSE releases. The VxFS module should recompile with retpoline aware GCC to support retpoline kernel. RESOLUTION: Compiled VxFS with retpoline GCC. * 3955886 (Tracking ID: 3955766) SYMPTOM: CFS hung when doing extent allocating, there is a thread like following to loop forever doing extent allocation: #0 [ffff883fe490fb30] schedule at ffffffff81552d9a #1 [ffff883fe490fc18] schedule_timeout at ffffffff81553db2 #2 [ffff883fe490fcc8] vx_delay at ffffffffa054e4ee [vxfs] #3 [ffff883fe490fcd8] vx_searchau at ffffffffa036efc6 [vxfs] #4 [ffff883fe490fdf8] vx_extentalloc_device at ffffffffa036f945 [vxfs] #5 [ffff883fe490fea8] vx_extentalloc_device_proxy at ffffffffa054c68f [vxfs] #6 [ffff883fe490fec8] vx_worklist_process_high_pri_locked at ffffffffa054b0ef [vxfs] #7 [ffff883fe490fee8] vx_worklist_dedithread at ffffffffa0551b9e [vxfs] #8 [ffff883fe490ff28] vx_kthread_init at ffffffffa055105d [vxfs] #9 [ffff883fe490ff48] kernel_thread at ffffffff8155f7d0 DESCRIPTION: In the current code of emtran_process_commit(), it is possible that the EAU summary got updated without delegation of the corresponding EAU, because we clear the VX_AU_SMAPFREE flag before updating EAU summary, which could lead to possible hang. Also, some improper error handling in case of bad map can also cause some hang situations. RESOLUTION: To avoid potential hang, modify the code to clear the VX_AU_SMAPFREE flag after updating the EAU summary, and improve some error handling in emtran_commit/undo. * 3958475 (Tracking ID: 3958461) SYMPTOM: mkfs_vxfs(1m) and vxupgrade_vxfs(1m) man pages contain the outdated information of supported DLVs. DESCRIPTION: mkfs_vxfs(1m) and vxupgrade_vxfs(1m) man page contain the older DLVs information. RESOLUTION: Code changes have been done to reflect the updated DLV support in man pages of mkfs_vxfs(1m) and vxupgrade_vxfs(1m). * 3960380 (Tracking ID: 3934175) SYMPTOM: 4-node FSS CFS experienced IO hung on all nodes. DESCRIPTION: IO requests are processed in LIFO manner when IO requests are handed off to worker thread in case of low stack space, which is not expected, they should be process in FIFO. RESOLUTION: Modified the code to pick up the older work items from tail of the queue. * 3960468 (Tracking ID: 3957092) SYMPTOM: System panic with spin_lock_irqsave thru splunkd in rddirahead path. DESCRIPTION: As per current state of, it seems to be spinlock getting re-initialized somehow in rddirahead path which is causing this deadlock. RESOLUTION: Code changes done accordingly to avoid this situation. * 3960470 (Tracking ID: 3958688) SYMPTOM: System panic when VxFS got force unmounted, the panic stack trace could be like following: #8 [ffff88622a497c10] do_page_fault at ffffffff81691fc5 #9 [ffff88622a497c40] page_fault at ffffffff8168e288 [exception RIP: vx_nfsd_encode_fh_v2+89] RIP: ffffffffa0c505a9 RSP: ffff88622a497cf8 RFLAGS: 00010202 RAX: 0000000000000002 RBX: ffff883e5c731558 RCX: 0000000000000000 RDX: 0000000000000010 RSI: 0000000000000000 RDI: ffff883e5c731558 RBP: ffff88622a497d48 R8: 0000000000000010 R9: 000000000000fffe R10: 0000000000000000 R11: 000000000000000f R12: ffff88622a497d6c R13: 00000000000203d6 R14: ffff88622a497d78 R15: ffff885ffd60ec00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ffff88622a497d50] exportfs_encode_inode_fh at ffffffff81285cb0 #11 [ffff88622a497d60] show_mark_fhandle at ffffffff81243ed4 #12 [ffff88622a497de0] inotify_fdinfo at ffffffff8124411d #13 [ffff88622a497e18] inotify_show_fdinfo at ffffffff812441b0 #14 [ffff88622a497e50] seq_show at ffffffff81273ec7 #15 [ffff88622a497e90] seq_read at ffffffff8122253a #16 [ffff88622a497f00] vfs_read at ffffffff811fe0ee #17 [ffff88622a497f38] sys_read at ffffffff811fecbf #18 [ffff88622a497f80] system_call_fastpath at ffffffff816967c9 DESCRIPTION: There is no error handling for the situation that file system gets disabled/unmounted in nfsd_encode code path, which could lead to panic. RESOLUTION: Added error handling in vx_nfsd_encode_fh_v2()to avoid panic in case the file system get unmounted/dsiabled. * 3966262 (Tracking ID: 3958853) SYMPTOM: VxFS module failed to load on RHEL7.6. DESCRIPTION: Since RHEL7.6 is new release therefore VxFS module failed to load on it. RESOLUTION: Added VxFS support for RHEL7.6. * 3933810 (Tracking ID: 3830300) SYMPTOM: Heavy cpu usage while oracle archive process are running on a clustered fs. DESCRIPTION: The cause of the poor read performance in this case was due to fragmentation, fragmentation mainly happens when there are multiple archivers running on the same node. The allocation pattern of the oracle archiver processes is 1. write header with O_SYNC 2. ftruncate-up the file to its final size ( a few GBs typically) 3. do lio_listio with 1MB iocbs The problem occurs because all the allocations in this manner go through internal allocations i.e. allocations below file size instead of allocations past the file size. Internal allocations are done at max 8 Pages at once. So if there are multiple processes doing this, they all get these 8 Pages alternately and the fs becomes very fragmented. RESOLUTION: Added a tunable, which will allocate zfod extents when ftruncate tries to increase the size of the file, instead of creating a hole. This will eliminate the allocations internal to file size thus the fragmentation. Fixed the earlier implementation of the same fix, which ran into locking issues. Also fixed the performance issue while writing from secondary node. * 3933819 (Tracking ID: 3879310) SYMPTOM: The file system may get corrupted after the file system freeze during vxupgrade. The full fsck gives the following errors: UX:vxfs fsck: ERROR: V-3-20451: No valid device inodes found UX:vxfs fsck: ERROR: V-3-20694: cannot initialize aggregate DESCRIPTION: The vxupgrade requires the file system to be frozen during its functional operation. It may happen that the corruption can be detected while the freeze is in progress and the full fsck flag can be set on the file system. However, this doesn't stop the vxupgrade from proceeding. At later stage of vxupgrade, after structures related to the new disk layout are updated on the disk, vxfs frees up and zeroes out some of the old metadata inodes. If any error occurs after this point (because of full fsck being set), the file system needs to go back completely to the previous version at the tile of full fsck. Since the metadata corresponding to the previous version is already cleared, the full fsck cannot proceed and gives the error. RESOLUTION: The code is modified to check for the full fsck flag after freezing the file system during vxupgrade. Also, disable the file system if an error occurs after writing new metadata on the disk. This will force the newly written metadata to be loaded in memory on the next mount. * 3933820 (Tracking ID: 3894712) SYMPTOM: ACL permissions are not inherited correctly on cluster file system. DESCRIPTION: The ACL counts stored on a directory inode gets reset every time directory inodes ownership is switched between the nodes. When ownership on directory inode comes back to the node, which previously abdicated it, ACL permissions were not getting inherited correctly for the newly created files. RESOLUTION: Modified the source such that the ACLs are inherited correctly. * 3933824 (Tracking ID: 3908785) SYMPTOM: System panic observed because of null page address in writeback structure in case of kswapd process. DESCRIPTION: Secfs2/Encryptfs layers had used write VOP as a hook when Kswapd is triggered to free page. Ideally kswapd should call writepage() routine where writeback structure are correctly filled. When write VOP is called because of hook in secfs2/encrypts, writeback structures are cleared, resulting in null page address. RESOLUTION: Code changes has been done to call VxFS kswapd routine only if valid page address is present. * 3933828 (Tracking ID: 3921152) SYMPTOM: Performance drop. Core dump shows threads doing vx_dalloc_flush(). DESCRIPTION: An implicit typecast error in vx_dalloc_flush() can cause this performance issue. RESOLUTION: The code is modified to do an explicit typecast. * 3933834 (Tracking ID: 3931761) SYMPTOM: cluster wide hang may be observed in a race scenario in case freeze gets initiated and there are multiple pending workitems in the worklist related to lazy isize update workitems. DESCRIPTION: If lazy_isize_enable tunable is ON and "ls -l" is getting executed from the non-writing node of the cluster frequently, it accumulates a huge number of workitems to get processed by worker threads. In case there is any workitem with active level 1 held which is enqueued after these workitems and clusterwide freeze gets initiated, it leads to deadlock situation. The worker threads would get exhausted in processing the lazy isize update work items and the thread which is enqueued in the worklist would never get a chance to be processed. RESOLUTION: code changes have been done to handle this race condition. * 3933843 (Tracking ID: 3926972) SYMPTOM: Once a node reboots or goes out of the cluster, the whole cluster can hang. DESCRIPTION: This is a three way deadlock, in which a glock grant could block the recovery while trying to cache the grant against an inode. But when it tries for ilock, if the lock is held by hlock revoke and waiting to get a glm lock, in our case cbuf lock, then it won't be able to get that because a recovery is in progress. The recovery can't proceed because glock grant thread blocked it. Hence the whole cluster hangs. RESOLUTION: The fix is to avoid taking ilock in GLM context, if it's not available. * 3933844 (Tracking ID: 3922259) SYMPTOM: A force umount hang with stack like this: - vx_delay - vx_idrop - vx_quotaoff_umount2 - vx_detach_fset - vx_force_umount - vx_aioctl_common - vx_aioctl - vx_admin_ioctl - vxportalunlockedkioctl - vxportalunlockedioctl - do_vfs_ioctl - SyS_ioctl - system_call_fastpath DESCRIPTION: An opened external quota file was preventing the force umount from continuing. RESOLUTION: Code has been changed so that an opened external quota file will be processed properly during the force umount. * 3933912 (Tracking ID: 3922986) SYMPTOM: System panic since Linux NMI Watchdog detected LOCKUP in CFS. DESCRIPTION: The vxfs buffer cache iodone routine interrupted the inode flush thread which was trying to acquire the cfs buffer hash lock with releasing the cfs buffer. And the iodone routine was blocked by other threads on acquiring the free list lock. In the cycle, the other threads were contending the cfs buffer hash lock with the inode flush thread. On Linux, the spinlock is FIFO tickets lock, so if the inode flush thread set ticket on the spinlock earlier, other threads cant acquire the lock. This caused a dead lock issue. RESOLUTION: Code changes are made to ensure acquiring the cfs buffer hash lock with irq disabled. * 3934841 (Tracking ID: 3930267) SYMPTOM: Deadlock between fsq flush threads and writer threads. DESCRIPTION: In linux, under certain circumstances i.e. to account dirty pages, writer threads takes lock on inode and start flushing dirty pages which will need page lock. In this case, if fsq flush threads start flushing transaction on the same inode then it will need the inode lock which was held by writer thread. The page lock was taken by another writer thread which is waiting for transaction space which can be only freed by fsq flush thread. This leads to deadlock between these 3 threads. RESOLUTION: Code is modified to add a new flag which will skip dirty page accounting. * 3936286 (Tracking ID: 3936285) SYMPTOM: fscdsconv command may fail the conversion for disk layout version 12 and above.After exporting file system for use on the specified target, it fails to mount on that specified target with below error: # /opt/VRTS/bin/mount UX:vxfs mount: ERROR: V-3-20012: not a valid vxfs file system UX:vxfs mount: ERROR: V-3-24996: Unable to get disk layout version when importing file system on target for use on the same system, it asks for 'fullfsck' during mount. After 'fullfsck', file system mounted successfully. But fsck gives below meesages: # /opt/VRTS/bin/fsck -y -o full /dev/vx/rdsk/mydg/myvol log replay in progress intent log does not contain valid log entries pass0 - checking structural files fileset 1 primary-ilist inode 34 (SuperBlock) failed validation clear? (ynq)y pass1 - checking inode sanity and blocks rebuild structural files? (ynq)y pass0 - checking structural files pass1 - checking inode sanity and blocks pass2 - checking directory linkage pass3 - checking reference counts pass4 - checking resource maps corrupted CUT entries, clear? (ynq)y au 0 emap incorrect - fix? (ynq)y OK to clear log? (ynq)y flush fileset headers? (ynq)y set state to CLEAN? (ynq)y DESCRIPTION: While checking for filesystem version in fscdsconv, the check for DLV 12 and above was missing and that triggered this issue. RESOLUTION: Code changes have been done to handle filesystem version 12 and above for fscdsconv command. * 3937536 (Tracking ID: 3940516) SYMPTOM: The file resize thread loops infinitely if tried to resize file to a size greater than 4TB DESCRIPTION: Because of vx_u32_t typecast in vx_odm_resize function, resize threads gets stuck inside an infinite loop. RESOLUTION: Removed vx_u32_t typcast in vx_odm_resize() to handle such scenarios. * 3938258 (Tracking ID: 3938256) SYMPTOM: When checking file size through seek_hole, it will return incorrect offset/size when delayed allocation is enabled on the file. DESCRIPTION: In recent version of RHEL7 onwards, grep command uses seek_hole feature to check current file size and then it reads data depends on this file size. In VxFS, when dalloc is enabled, we allocate the extent to file later but we increment the file size as soon as write completes. When checking the file size in seek_hole, VxFS didn't completely consider case of dalloc and it was returning stale size, depending on the extent allocated to file, instead of actual file size which was resulting in reading less amount of data than expected. RESOLUTION: Code is modified in such way that VxFS will now return correct size in case dalloc is enabled on file and seek_hole is called on that file. * 3939406 (Tracking ID: 3941034) SYMPTOM: During forced umount the vxfs worker thread may continuously spin on a CPU DESCRIPTION: During forced unmount a vxfs worker thread need a semaphore to drop super block reference but that semaphore is held by vxumount thread and this vxumount thread waiting for a event to happened. This situation causing a softlockup panic on the system because vxfs worker thread continuously spinning on a CPU to grab semaphore. RESOLUTION: Code changes are done to fix this issue. * 3940266 (Tracking ID: 3940235) SYMPTOM: A hang might be observed in case filesystem gets disbaled while enospace handling is being taken care by inactive processing. The stacktrace might look like: cv_wait+0x3c() ] delay_common+0x70() vx_extfree1+0xc08() vx_extfree+0x228() vx_te_trunc_data+0x125c() vx_te_trunc+0x878() vx_trunc_typed+0x230() vx_trunc_tran2+0x104c() vx_trunc_tran+0x22c() vx_trunc+0xcf0() vx_inactive_remove+0x4ec() vx_inactive_tran+0x13a4() vx_local_inactive_list+0x14() vx_inactive_list+0x6e4() vx_workitem_process+0x24() vx_worklist_process+0x1ec() vx_worklist_thread+0x144() thread_start+4() DESCRIPTION: In function smapchange funtion, it is possible in case of races that SMAP can record the oldstate as VX_EAU_FREE or VX_EAU_ALLOCATED. But, the corresponding EMAP won't be updated. This will happen if the concerned flag can get reset to 0 by some other thread in between. This leads to fm_dirtycnt leak which causes a hang sometime afterwards. RESOLUTION: Code changes has been done to fix the issue by using the local variable instead of global dflag variable directly which can get reset to 0. * 3940368 (Tracking ID: 3940268) SYMPTOM: File system having disk layout version 13 might get disabled in case the size of the directory surpasses the vx_dexh_sz value. DESCRIPTION: When LDH (large directory Hash) hash directory is filled up and the buckets are filled up, we extend the size of the hash directory. For this we create a reorg inode and copy extent map of LDH attr inode into reorg inode. This is done using extent map reorg function. In that function, we check whether extent reorg structure was passed for the same inode or not. If its not, then we dont proceed with extent copying. we setup the extent reorg structure accordingly but while setting up the fileset index, we use inodes i_fsetindex. But in disk layout version 13 onwards, we have overlaid the attribute inode and because of these changes, we no longer sets i_fsetindex in attribute inode and it will remain 0. Hence the checks in extent map reorg function is failing and resulting in disabling FS. RESOLUTION: Code has been modified to pass correct fileset. * 3940652 (Tracking ID: 3940651) SYMPTOM: During Disk Layout Version (DLV) vxupgrade command might observe a hang. DESCRIPTION: vxupgrade does a lookup on histino to identify mkfs version. In case of CFS lookup requires RWLOCK or GLOCK on inode. RESOLUTION: Code changes have been done to take RWLOCK and GLOCK on inode. * 3940830 (Tracking ID: 3937042) SYMPTOM: Data corruption seen when issuing writev with mixture of named page and anonymous page buffers. DESCRIPTION: During writes, VxFS prefaults all of the user buffers into kernel and decides the write length depending on this prefault length. In case of mixed page buffers, VxFs issues prefault separately for each page i.e. for named page and anon page. This reduces length to be wrote and will cause page create optimization. Since VxFs erroneously enables page create optimization, data corruption was seen on disk. RESOLUTION: Code is modified such that VxFS will not enable page create optimization when short prefault is seen. INSTALLING THE PATCH -------------------- Run the Installer script to automatically install the patch: ----------------------------------------------------------- Please be noted that the installation of this P-Patch will cause downtime. To install the patch perform the following steps on at least one node in the cluster: 1. Copy the patch infoscale-rhel7_x86_64-Patch-7.3.1.200.tar.gz to /tmp 2. Untar infoscale-rhel7_x86_64-Patch-7.3.1.200.tar.gz to /tmp/hf # mkdir /tmp/hf # cd /tmp/hf # gunzip /tmp/infoscale-rhel7_x86_64-Patch-7.3.1.200.tar.gz # tar xf /tmp/infoscale-rhel7_x86_64-Patch-7.3.1.200.tar 3. Install the hotfix(Please be noted that the installation of this P-Patch will cause downtime.) # pwd /tmp/hf # ./installVRTSinfoscale731P200 [ ...] You can also install this patch together with 7.3.1 maintenance release using Install Bundles 1. Download this patch and extract it to a directory 2. Change to the Veritas InfoScale 7.3.1 directory and invoke the installmr script with -patch_path option where -patch_path should point to the patch directory # ./installmr -patch_path [] [ ...] Install the patch manually: -------------------------- #Manual installation is not supported REMOVING THE PATCH ------------------ #Manual uninstallation is not supported SPECIAL INSTRUCTIONS -------------------- NONE OTHERS ------ NONE