InfoScale on Linux, patch detail

To use SORT, JavaScript must be enabled. How to enable JavaScript.

infoscale-rhel7_x86_64-Patch-7.2.0.500 Go to Download Center to download.

Basic information

Release type:	Patch
Release date:	2019-05-31
OS update support:	RHEL7 x86-64 Update 6
Technote:	None
Documentation:	None
Popularity:	5298 viewed downloaded
Download size:	210.78 MB
Checksum:	687007944

Applies to one or more of the following products:

InfoScale Availability 7.2 On RHEL7 x86-64
InfoScale Enterprise 7.2 On RHEL7 x86-64
InfoScale Foundation 7.2 On RHEL7 x86-64
InfoScale Storage 7.2 On RHEL7 x86-64

Obsolete patches, incompatibilities, superseded patches, or other requirements:

This patch supersedes the following patches:	Release date
infoscale-rhel7.4_x86_64-Patch-7.2.0.200 (obsolete)	2017-09-17
odm-rhel7_x86_64-Patch-7.2.0.100 (obsolete)	2017-04-25
vm-rhel7_x86_64-Patch-7.2.0.100 (obsolete)	2017-04-24
fs-rhel7_x86_64-Patch-7.2.0.100 (obsolete)	2017-04-24
amf-rhel7_x86_64-Patch-7.2.0.100 (obsolete)	2017-04-17
gab-rhel7_x86_64-Patch-7.2.0.100 (obsolete)	2017-04-17
llt-rhel7_x86_64-Patch-7.2.0.100 (obsolete)	2017-03-08

Fixes the following incidents:

3898155, 3906300, 3908392, 3908988, 3909937, 3909938, 3909939, 3909940, 3909941, 3909943, 3909946, 3909992, 3910000, 3910083, 3910084, 3910085, 3910086, 3910088, 3910090, 3910093, 3910094, 3910095, 3910096, 3910097, 3910098, 3910101, 3910103, 3910105, 3910356, 3910426, 3910586, 3910588, 3910590, 3910591, 3910592, 3910593, 3911290, 3911407, 3911718, 3911719, 3911732, 3911926, 3911964, 3911968, 3912529, 3912532, 3912604, 3912988, 3912989, 3912990, 3913004, 3913119, 3913424, 3914384, 3914871, 3915003, 3915963, 3916912, 3917814, 3917961, 3920932, 3921996, 3922321, 3925379, 3926159, 3926163, 3926224, 3926301, 3926868, 3927030, 3927031, 3927032, 3928824, 3949175, 3949176, 3949177, 3949178, 3949189, 3950828, 3950829, 3950830, 3950831, 3950832, 3950989, 3964360, 3968917, 3968928, 3969182, 3969307, 3969467, 3969468, 3969821, 3969822, 3969833, 3969837, 3969839, 3969898, 3970414, 3971156, 3971157, 3971161, 3971162, 3971163, 3971587, 3971590, 3971592, 3971594, 3971596, 3971598, 3971600, 3971602, 3971604, 3971608, 3971873, 3971874, 3971877, 3971878, 3972087, 3972246, 3972347, 3972351, 3972360, 3972466, 3972563, 3972872, 3972911, 3972961, 3973009, 3973012, 3973020, 3973021, 3973022, 3973077, 3973079, 3973080, 3973403, 3973405, 3973415, 3973421, 3973424, 3973432, 3973434, 3973435, 3973441, 3973445, 3973526, 3973529, 3973530, 3973567, 3973574, 3973578, 3973661, 3973666, 3973669, 3973759, 3973769, 3973778, 3973878, 3974105, 3974120, 3974148, 3974326, 3974334, 3974569

Patch ID:

VRTSaslapm-7.2.0.5100-RHEL7
VRTSveki-7.2.0.200-RHEL7
VRTSvxvm-7.2.0.5200-RHEL7
VRTSllt-7.2.0.7100-RHEL7
VRTSgab-7.2.0.5100-RHEL7
VRTSvxfen-7.2.0.6100-RHEL7
VRTSamf-7.2.0.5100-RHEL7
VRTSdbac-7.2.0.5100-RHEL7
VRTSvxfs-7.2.0.4200-RHEL7
VRTSodm-7.2.0.4200-RHEL7
VRTSglm-7.2.0.3100-RHEL7
VRTSgms-7.2.0.3100-RHEL7

Readme file

* * * READ ME * * *
* * * InfoScale 7.2 * * *
* * * Patch 500 * * *
Patch Date: 2019-04-08

This document provides the following information:

* PATCH NAME
* OPERATING SYSTEMS SUPPORTED BY THE PATCH
* PACKAGES AFFECTED BY THE PATCH
* BASE PRODUCT VERSIONS FOR THE PATCH
* SUMMARY OF INCIDENTS FIXED BY THE PATCH
* DETAILS OF INCIDENTS FIXED BY THE PATCH
* INSTALLATION PRE-REQUISITES
* INSTALLING THE PATCH
* REMOVING THE PATCH

PATCH NAME
----------
InfoScale 7.2 Patch 500

OPERATING SYSTEMS SUPPORTED BY THE PATCH
----------------------------------------
RHEL7 x86-64

PACKAGES AFFECTED BY THE PATCH
------------------------------
VRTSamf
VRTSaslapm
VRTSdbac
VRTSgab
VRTSglm
VRTSgms
VRTSllt
VRTSodm
VRTSveki
VRTSvxfen
VRTSvxfs
VRTSvxvm

BASE PRODUCT VERSIONS FOR THE PATCH
-----------------------------------
* InfoScale Availability 7.2
* InfoScale Enterprise 7.2
* InfoScale Foundation 7.2
* InfoScale Storage 7.2

SUMMARY OF INCIDENTS FIXED BY THE PATCH
---------------------------------------
Patch ID: VRTSvxvm-7.2.0.5200
* 3908988 (3908987) False vxrelocd messages being generated by joining CVM slave.
* 3921996 (3921994) Failure in the backup for disk group, Temporary files such as
<DiskGroup>.bslist .cslist .perm are seen in the directory /var/temp.
* 3964360 (3964359) The DG import is failing with Split Brain after the system is rebooted or when a storage
disturbance is seen.
* 3968917 (3968915) VxVM support on RHEL 7.6
* 3969182 (3897047) Filesystems are not mounted automatically on boot through systemd on RHEL7 and
SLES12.
* 3969821 (3917636) Filesystems from /etc/fstab file are not mounted automatically on boot
through systemd on RHEL7 and SLES12.
* 3969822 (3956732) systemd-udevd message can be seen in journalctl logs.
* 3969833 (3947265) Delay added in vxvm-startup script to wait for infiniband devices to get
discovered leads to various issues.
* 3969839 (3925377) Not all disks could be discovered by DMP after first startup.
* 3969898 (3913949) The DG import is failing with Split Brain after the system is rebooted or when a storage
disturbance is seen.
* 3971587 (3953681) Data corruption issue is seen when more than one plex of volume is detached.
* 3971590 (3931936) VxVM(Veritas Volume Manager) command hang on master node after
restarting
slave node.
* 3971592 (3935232) Replication and IO hang during master takeover because of racing between log
owner change and master switch.
* 3971594 (3939891) In VVR (Veritas Volume Replicator), the replication might
fail with header checksum errors when using the UDP protocol.
* 3971596 (3935974) When client process shuts down abruptly or resets connection during
communication with the vxrsyncd daemon, it may terminate
vxrsyncd daemon.
* 3971598 (3906119) Failback didn't happen when the optimal path returned back in a cluster
environment.
* 3971600 (3919559) IO hangs after pulling out all cables, when VVR is reconfigured.
* 3971602 (3904538) IO hang happens during slave node leave or master node switch because of racing
between RV(Replicate Volume) recovery SIO(Staged IO) and new coming IOs.
* 3971604 (3907641) Panic in volsal_remove_saldb during reconfig in FSS configuration.
* 3971608 (3955725) Utility to clear "failio" flag on disk after storage connectivity is back.
* 3971873 (3932356) vxconfigd dumping core while importing DG
* 3971874 (3957227) Disk group import succeeded, but with error message. This may cause confusion.
* 3971877 (3922159) Thin reclaimation may fail on Xtremio SSD disks.
* 3971878 (3922529) VxVM (Veritas Volume Manager) creates some required files under /tmp and
/var/tmp directories. These directories could be modified by non-root users and
will affect the Veritas Volume Manager Functioning.
* 3972246 (3946350) kmalloc-1024 and kmalloc-2048 memory consuming keeps increasing when VVR IO
size is more than 256K.
* 3972911 (3918408) Data corruption when volume grow is attempted on thin reclaimable disks whose space is just freed.
* 3973009 (3965962) Auto recovery gets triggered at the time of slave join.
* 3973012 (3921572) vxconfigd dumps core during cable pull test.
* 3973020 (3754715) When dmp_native_support is enabled , kdump functionality does not work properly.
* 3973021 (3927439) VxVM vxassist relayout command doesn't honor read policy.
* 3973022 (3873123) If the disk with CDS EFI label is used as remote
disk on the cluster node, restarting the vxconfigd
daemon on that particular node causes vxconfigd
to go into disabled state
* 3973077 (3945115) VxVM (Veritas Volume Manager) vxassist relayout command fails for volumes with
RAID layout.
* 3973079 (3950199) System may panic while DMP(Dynamic Multipathing) path restoration.
* 3973080 (3936535) Poor performance due to frequent cache drops.
* 3973405 (3931048) VxVM (Veritas Volume Manager) creates particular log files with write permission
to all users.
* 3974326 (3948140) System panic can occur if size of RTPG (Report Target Port Groups) data returned
by underlying array is greater than 255.
* 3974334 (3899568) Adding tunable dmp_compute_iostats to start/stop the iostat gathering
persistently.
Patch ID: VRTSvxvm-7.2.0.400
* 3949189 (3938549) Volume creation fails with error: Unexpected kernel error
in configuration update for Rhel 7.5.
Patch ID: VRTSvxvm-7.2.0.300
* 3913119 (3902769) System may panic while running vxstat in CVM (Clustered
Volume Manager) environment.
* 3920932 (3915523) Local disk from other node belonging to private DG(diskgroup) is exported to the
node when a private DG is imported on current
node.
* 3925379 (3919902) vxdmpadm iopolicy switch command can fail and standby paths are not honored
by some iopolicy
* 3926224 (3925400) While VRTSaslapm package installation for RHEL7.4, APM (array policy modules) are
not loading properly.
* 3926301 (3925398) VxVM modules failed to load on RHEL7.4.
Patch ID: VRTSvxvm-7.2.0.100
* 3909992 (3898069) System panic may happen in dmp_process_stats routine.
* 3910000 (3893756) 'vxconfigd' is holding a task device for long time, after the kernel counter rewinds, it may create a boundary issue.
* 3910426 (3868533) IO hang happens because of a deadlock situation.
* 3910586 (3852146) Shared DiskGroup(DG) fails to import when "-c" and "-o noreonline" options
are
specified together
* 3910588 (3868154) When DMP Native Support is set to ON, dmpnode with multiple VGs cannot be listed
properly in the 'vxdmpadm native ls' command
* 3910590 (3878030) Enhance VxVM DR tool to clean up OS and VxDMP device trees without user
interaction.
* 3910591 (3867236) Application IO hang happens because of a race between Master Pause SIO(Staging IO)
and RVWRITE1 SIO.
* 3910592 (3864063) Application IO hang happens because of a race between Master Pause SIO(Staging IO)
and Error Handler SIO.
* 3910593 (3879324) VxVM DR tool fails to handle busy device problem while LUNs are removed from OS
* 3912529 (3878153) VVR 'vradmind' deamon core dump.
* 3912532 (3853144) VxVM mirror volume's stale plex is incorrectly marked as "Enable Active" after
it comes back.
* 3915963 (3907034) The mediatype is not shown as ssd in vxdisk -e list command for SSD (solid
state devices) devices.
Patch ID: VRTSaslapm-7.2.0.5100
* 3968928 (3967122) Retpoline support for ASLAPM rpm on RHEL 7.6
retpoline kernel
Patch ID: VRTSgms-7.2.0.3100
* 3974569 (3932849) Temporary files are being created in /tmp.
Patch ID: VRTSgms-7.2.0.200
* 3949177 (3947812) GMS support for RHEL7.5
Patch ID: VRTSveki-7.2.0.200
* 3969307 (3955519) VRTSvxvm upgrade fails since VRTSveki upgrade fails while using yum upgrade command.
* 3969837 (3923934) Modules loading in incorrect order on SLES12SP2.
* 3970414 (3967265) Support for RHEL 7.6 and RHEL 7.x RETPOLINE kernels.
Patch ID: VRTSveki-7.2.0.100
* 3950989 (3944179) Veritas Infoscale Availability does not support Red Hat Enterprise Linux 7
Update 5(RHEL7.5).
Patch ID: VRTSdbac-7.2.0.5100
* 3971163 (3967265) Support for RHEL 7.6 and RHEL 7.x RETPOLINE kernels.
Patch ID: VRTSdbac-7.2.0.300
* 3950832 (3944179) Veritas Infoscale Availability does not support Red Hat Enterprise Linux 7
Update 5(RHEL7.5).
Patch ID: VRTSdbac-7.2.0.200
* 3928824 (3925832) vcsmm module does not load with RHEL7.4
Patch ID: VRTSamf-7.2.0.5100
* 3971162 (3967265) Support for RHEL 7.6 and RHEL 7.x RETPOLINE kernels.
Patch ID: VRTSamf-7.2.0.300
* 3950831 (3944179) Veritas Infoscale Availability does not support Red Hat Enterprise Linux 7
Update 5(RHEL7.5).
Patch ID: VRTSamf-7.2.0.200
* 3927030 (3923100) Veritas Infoscale Availability does not support Red Hat Enterprise Linux 7
Update 4(RHEL7.4).
Patch ID: VRTSamf-7.2.0.100
* 3908392 (3896877) Veritas Infoscale Availability does not support Red Hat Enterprise Linux 7
Update 3(RHEL7.3).
Patch ID: VRTSvxfen-7.2.0.6100
* 3971161 (3967265) Support for RHEL 7.6 and RHEL 7.x RETPOLINE kernels.
Patch ID: VRTSvxfen-7.2.0.400
* 3950830 (3944179) Veritas Infoscale Availability does not support Red Hat Enterprise Linux 7
Update 5(RHEL7.5).
Patch ID: VRTSvxfen-7.2.0.300
* 3927032 (3923100) Veritas Infoscale Availability does not support Red Hat Enterprise Linux 7
Update 4(RHEL7.4).
Patch ID: VRTSvxfen-7.2.0.200
* 3915003 (2852872) Fencing sometimes shows "replaying" RFSM state for some nodes
in the cluster.
Patch ID: VRTSgab-7.2.0.5100
* 3971157 (3967265) Support for RHEL 7.6 and RHEL 7.x RETPOLINE kernels.
Patch ID: VRTSgab-7.2.0.300
* 3950829 (3944179) Veritas Infoscale Availability does not support Red Hat Enterprise Linux 7
Update 5(RHEL7.5).
Patch ID: VRTSgab-7.2.0.200
* 3927031 (3923100) Veritas Infoscale Availability does not support Red Hat Enterprise Linux 7
Update 4(RHEL7.4).
Patch ID: VRTSgab-7.2.0.100
* 3913424 (3896877) Veritas Infoscale Availability does not support Red Hat Enterprise Linux 7
Update 3(RHEL7.3).
Patch ID: VRTSllt-7.2.0.7100
* 3971156 (3967265) Support for RHEL 7.6 and RHEL 7.x RETPOLINE kernels.
Patch ID: VRTSllt-7.2.0.500
* 3950828 (3944179) Veritas Infoscale Availability does not support Red Hat Enterprise Linux 7
Update 5(RHEL7.5).
Patch ID: VRTSllt-7.2.0.400
* 3922321 (3922320) Kernel panics in case of FSS with LLT over RDMA during heavy data transfer.
* 3926868 (3923100) Veritas Infoscale Availability does not support Red Hat Enterprise Linux 7
Update 4(RHEL7.4).
Patch ID: VRTSllt-7.2.0.200
* 3898155 (3896875) Veritas Infoscale Availability (VCS) does not support SUSE Linux Enterprise Server
12 Service Pack 2 (SLES 12 SP2).
Patch ID: VRTSllt-7.2.0.100
* 3906300 (3905430) Application IO hangs in case of FSS with LLT over RDMA during heavy data transfer.
Patch ID: VRTSglm-7.2.0.3100
* 3974120 (3932845) Temporary files are being created in /tmp
Patch ID: VRTSglm-7.2.0.200
* 3949178 (3947815) GLM support for RHEL7.5
Patch ID: VRTSodm-7.2.0.4200
* 3969468 (3958865) ODM module failed to load on RHEL7.6.
* 3973530 (3932804) Temporary files are being created in /tmp
* 3973769 (3897161) Oracle Database on Veritas filesystem with Veritas ODM
library has high log file sync wait time.
Patch ID: VRTSodm-7.2.0.300
* 3949176 (3938546) ODM module failed to load on RHEL7.5.
Patch ID: VRTSodm-7.2.0.200
* 3926163 (3923310) ODM module failed to load on RHEL7.4.
Patch ID: VRTSodm-7.2.0.100
* 3910095 (3757609) CPU usage going high because of contention over ODM_IO_LOCK
* 3911964 (3907933) ODM module failed to load on SLES12 SP2.
Patch ID: VRTSvxfs-7.2.0.4200
* 3969467 (3958853) VxFS module failed to load on RHEL7.6.
* 3972087 (3914782) Performance drop caused by too many VxFS worker threads
* 3972347 (3938256) When checking file size through seek_hole, it will return incorrect offset/size
when delayed allocation is enabled on the file.
* 3972351 (3928046) VxFS kernel panic BAD TRAP: type=34 in vx_assemble_hdoffset().
* 3972360 (3943529) System panicked because watchdog timer detected hard lock up on CPU when trying to
release dentry.
* 3972466 (3926972) A recovery event can result in a cluster wide hang.
* 3972563 (3922986) Dead lock issue with buffer cache iodone routine in CFS.
* 3972872 (3929854) Enabling event notification support on CFS for Weblogic watchService on
SOLARIS platform
* 3972961 (3921152) Performance drop caused by vx_dalloc_flush().
* 3973403 (3947433) While adding a volume (part of vset) in already mounted filesystem, fsvoladm
displays error.
* 3973415 (3958688) System panic when VxFS got force unmounted.
* 3973421 (3955766) CFS hung when doing extent allocating.
* 3973424 (3959305) Fix a bug in security attribute initialisation of files with named attributes.
* 3973432 (3940268) File system might get disabled in case the size of the directory surpasses the
vx_dexh_sz value.
* 3973434 (3940235) A hang might be observed in case filesystem gets disbaled while enospace
handling is being taken care by inactive processing
* 3973435 (3922259) Force umount hang in vx_idrop
* 3973441 (3973440) VxFS mount failed with error "no security xattr handler" on RHEL 7.6 when SELinux enabled (both permissive and enforcing)
* 3973445 (3959299) Improve file creation time on systems with Selinux enabled.
* 3973526 (3932163) Temporary files are being created in /tmp
* 3973529 (3943232) System panic in vx_unmount_cleanup_notify when unmounting file system.
* 3973567 (3947648) Mistuning of vxfs_ninode and vx_bc_bufhwm to very small
value.
* 3973574 (3925281) Hexdump the incore inode data and piggyback data when inode
revalidation fails.
* 3973578 (3941942) Unable to handle kernel NULL pointer dereference while freeing fiostats.
* 3973661 (3931761) Cluster wide hang may be observed in case of high workload.
* 3973666 (3973668) On linux, during system startup, boot.vxfs fails to load vxfs modules & throws following error:
* 3973669 (3944884) ZFOD extents shouldn't be pushed on clones in case of logged
writes.
* 3973759 (3902600) Contention observed on vx_worklist_lk lock in cluster
mounted file system with ODM
* 3973778 (3926061) System panic in vx_idalloc_off() due to a NULL pointer dereference
* 3973878 (3917319) Random panics in VxFS stack
* 3974105 (3941620) Memory starvation during heavy write activity.
* 3974148 (3957285) job promote operation from replication target node fails.
Patch ID: VRTSvxfs-7.2.0.300
* 3917814 (3918285) During extent allocation for write operation in local mount
file system, if error is occurred during commit of State Map transaction, then
count of pending delayed allocation may remain inaccurate.
* 3949175 (3938544) VxFS module failed to load on RHEL7.5.
Patch ID: VRTsvxfs-7.2.0.200
* 3911407 (3917013) fsck command throws error message.
* 3912604 (3914488) On a Local Mount Filesystem, while mounting the filesystem, it can be marked for FULLFSCK.
* 3914871 (3915125) File system kernel threads deadlock while allocating/freeing blocks.
* 3916912 (3916914) On Disk Layout Version 11, FileSystem may run into ENOSPC
condition.
* 3917961 (3919130) Failures observed while setting named attribute using
nxattrset command.
* 3926159 (3923307) VxFS module failed to load on RHEL7.4.
Patch ID: VRTSvxfs-7.2.0.100
* 3909937 (3908954) Some writes could be missed causing data loss.
* 3909938 (3905099) VxFS unmount panicked in deactive_super().
* 3909939 (3906548) Start up vxfs after local file system mounted with rw through systemd.
* 3909940 (3894712) ACL permissions are not inherited correctly on cluster
file system.
* 3909941 (3868609) High CPU usage seen because of vxfs thread
while applying Oracle redo logs
* 3909943 (3729030) The fsdedupschd daemon failed to start on RHEL7.
* 3909946 (3685391) Execute permissions for a file not honored correctly.
* 3910083 (3707662) Race between reorg processing and fsadm timer thread (alarm expiry) leads to panic in vx_reorg_emap.
* 3910084 (3812330) slow ls -l across cluster nodes
* 3910085 (3779916) vxfsconvert fails to upgrade layout verison for a vxfs file
system with large number of inodes.
* 3910086 (3830300) Degraded CPU performance during backup of Oracle archive logs
on CFS vs local filesystem
* 3910088 (3855726) Panic in vx_prot_unregister_all().
* 3910090 (3790721) High cpu usage caused by vx_send_bcastgetemapmsg_remaus
* 3910093 (1428611) 'vxcompress' can spew many GLM block lock messages over the
LLT network.
* 3910094 (3879310) The file system may get corrupted after a failed vxupgrade.
* 3910096 (3757609) CPU usage going high because of contention over ODM_IO_LOCK
* 3910097 (3817734) Direct command to run fsck with -y|Y option was mentioned in
the message displayed to user when file system mount fails.
* 3910098 (3861271) Missing an inode clear operation when a Linux inode is being de-initialized on
SLES11.
* 3910101 (3846521) "cp -p" fails if modification time in nano seconds have 10
digits.
* 3910103 (3817734) Direct command to run fsck with -y|Y option was mentioned in
the message displayed to user when file system mount fails.
* 3910105 (3907902) System panic observed due to race between dalloc off thread
and getattr thread.
* 3910356 (3908785) System panic observed because of null page address in writeback
structure in case of
kswapd process.
* 3911290 (3910526) fsadm fails with error number 28 during resize operation
* 3911718 (3905576) CFS hang during a cluster wide freeze
* 3911719 (3909583) Disable partitioning of directory if directory size is greater than upper
threshold
value.
* 3911732 (3896670) Intermittent CFS hang like situation with many CFS pglock
grant messages pending on LLT layer
* 3911926 (3901318) VxFS module failed to load on RHEL7.3.
* 3911968 (3911966) VxFS module failed to load on SLES12 SP2.
* 3912988 (3912315) EMAP corruption while freeing up the extent
* 3912989 (3912322) vxfs tunable max_seqio_extent_size cannot be tuned to any
value less than 32768.
* 3912990 (3912407) CFS hang on VxFS 7.2 while thread on a CFS node waits for EAU
delegation.
* 3913004 (3911048) LDH corrupt and filesystem hang.
* 3914384 (3915578) vxfs module is failed to load after reboot.

DETAILS OF INCIDENTS FIXED BY THE PATCH
---------------------------------------
This patch fixes the following incidents:

Patch ID: VRTSvxvm-7.2.0.5200

* 3908988 (Tracking ID: 3908987)

SYMPTOM:
The following unnecessary error message is printed to inform customer hot
relocation will be performed on master mode.

VxVM vxrelocd INFO V-5-2-6551
hot-relocation operation for shared disk group will be performed on master
node.

DESCRIPTION:
In case there're failed disks the message will be printed. Because related
code is not placed in right position, it's printed even if there's no failed
disks.

RESOLUTION:
Code changes have been make to fix the issue.

* 3921996 (Tracking ID: 3921994)

SYMPTOM:
Temporary files such as <DiskGroup>.bslist .cslist .perm are seen in the
directory /var/temp.

DESCRIPTION:
When ADD and REMOVE operations of disks of a disk group are done between the
interval of two backups, a failure in the next backup of the same disk group is
observed, which is why the files are left behind in the directory as specified
in

RESOLUTION:
Corrected the syntax errors in the code, to handle the vxconfigbackup issue.

* 3964360 (Tracking ID: 3964359)

SYMPTOM:
The DG import is failing with Split Brain after the system is rebooted or when a storage
disturbance is seen.

The DG import may fail due to split brain with following messages in syslog:
V-5-1-9576 Split Brain. da id is 0.1, while dm id is 0.0 for dm
B000F8BF40FF000043042DD4A5
V-5-1-9576 Split Brain. da id is 0.1, while dm id is 0.0 for dm
B000F8BF40FF00004003FE9356

DESCRIPTION:
When a disk is detached, the SSB ID of the remaining DA and DM records
shall be incremented. Unfortunately for some reason, the SSB ID of DA
record is only incremented, but the SSB ID of DM record is NOT updated.
One probable reason may be because the disks get detached before updating
the DM records.

RESOLUTION:
The code changes are done in DG import process to identify the false split brain condition and correct the
disk SSB IDs during the import. With this fix, the import shall NOT be fail due to a false split brain condition.

Additionally one more improvement is done in -o overridessb option to correct the disk
SSB IDs during import.

Ideally with this fix the disk group import shall ideally NOT fail due to false split brain conditions.
But if the disk group import still fails with a false split brain condition, then user can try -o overridessb option.
For using '-o overridessb', one should confirm that all the DA records of the DG are available in ENABLED state
and are differing with DM records against SSB by 1.

* 3968917 (Tracking ID: 3968915)

SYMPTOM:
VxVM support on RHEL 7.6

DESCRIPTION:
The RHEL 7.6 is new release and hence VxVM module is compiled with the RHEl 7.6 kernel .

RESOLUTION:
Compiled VxVM with RHEl 7.6 kernel bits .

* 3969182 (Tracking ID: 3897047)

SYMPTOM:
Filesystems are not mounted automatically on boot through systemd on RHEL7 and
SLES12.

DESCRIPTION:
When systemd service tries to start all the FS in /etc/fstab, the Veritas Volume

Manager (VxVM) volumes are not started since vxconfigd is still not up. The VxVM

volumes are started a little bit later in the boot process. Since the volumes
are
not available, the FS are not mounted automatically at boot.

RESOLUTION:
Registered the VxVM volumes with UDEV daemon of Linux so that the FS would be
mounted when the VxVM volumes are started and discovered by udev.

* 3969821 (Tracking ID: 3917636)

SYMPTOM:
Filesystems from /etc/fstab file are not mounted automatically on boot
through systemd on RHEL7 and SLES12.

DESCRIPTION:
While bootup, when systemd tries to mount using the devices mentioned in
/etc/fstab file on the device, the device is not accessible leading to the
failure of the mount operation. As the device discovery happens through udev
infrastructure, the udev-rules for those
devices need to be run when volumes are created so that devices get
registered with systemd. In the case udev rules are executed even before the
devices in "/dev/vx/dsk" directory are created.
Since the devices are not created, devices will not be registered with
systemd leading to the failure of mount operation.

RESOLUTION:
Run "udevadm trigger" to execute all the udev rules once all volumes are
created so that devices are registered.

* 3969822 (Tracking ID: 3956732)

SYMPTOM:
systemd-udevd messages like below can be seen in journalctl logs:

systemd-udevd[7506]: inotify_add_watch(7, /dev/VxDMP8, 10) failed: No such file or directory
systemd-udevd[7511]: inotify_add_watch(7, /dev/VxDMP9, 10) failed: No such file or directory

DESCRIPTION:
When there are some changes done to the underlying VxDMP device the messages are getting displayed in journalctl logs. The reason for the message is because we have not handled the change event of the VxDMP device in our UDEV rule.

RESOLUTION:
Code changes have been done to handle change event of VxDMP device in our UDEV rule.

* 3969833 (Tracking ID: 3947265)

SYMPTOM:
vxfen tends to fail and creates split brain issues.

DESCRIPTION:
Currently to check whether the infiniband devices are present or not
we check for some modules which on rhel 7.4 comes by default.

RESOLUTION:
TO check for infiniband devices we would be checking for /sys/class/infiniband
directory in which the device information gets populated if infiniband
devices are present.

* 3969839 (Tracking ID: 3925377)

SYMPTOM:
Not all disks could be discovered by Dynamic Multi-Pathing(DMP) after first
startup..

DESCRIPTION:
DMP is started too earlier in the boot process if iSCSI and raw haven't been
installed. Till that point the FC devices are not recognized by OS, hence DMP
misses FC devices.

RESOLUTION:
The code is modified to make sure DMP get started after OS disk discovery.

* 3969898 (Tracking ID: 3913949)

SYMPTOM:
The DG import is failing with Split Brain after the system is rebooted or when a storage
disturbance is seen.

RESOLUTION:
A work-around option is provided to bypass the SSB checks while importing the DG, the user
can import the DG with 'vxdg -o overridessb import <dgname>' command if a false split brain
happens.
For using '-o overridessb', one should confirm that all DA records
of the DG are available in ENABLED state and are differing with DM records against
SSB by 1.

* 3971587 (Tracking ID: 3953681)

SYMPTOM:
Data corruption issue is seen when more than one plex of volume is detached.

DESCRIPTION:
When a plex of volume gets detached, DETACH map gets enabled in the DCO (Data Change Object). The incoming IO's are tracked in DRL (Dirty Region Log) and then asynchronously copied to DETACH map for tracking.
If one more plex gets detached then it might happen that some of the new incoming regions are missed in the DETACH map of the previously detached plex.
This leads to corruption when the disk comes back and plex resync happens using corrupted DETACH map.

RESOLUTION:
Code changes are done to correctly track the IO's in the DETACH map of previously detached plex and avoid corruption.

* 3971590 (Tracking ID: 3931936)

SYMPTOM:
In FSS(Flexible Storage Sharing) environment, after restarting slave node VxVM
command on master node hang result in failed disks on slave node could not
rejoin disk group.

DESCRIPTION:
While lost remote disks on slave node comes back, online these disk and add
them to disk group operations are performed on master node. Disk online
includes operations from both master and slave node. On slave node these
disks
should be offlined then reonlined, but due to code defect reonline disks are
missed result in these disks are kept in reonlining state. The following add disk
to
disk group operation needs to issue private region IOs on the disk. These IOs
are
shipped to slave node to complete. As the disks are in reonline state, busy error
gets returned and remote IOs keep retrying, hence VxVM command hang on
master node.

RESOLUTION:
Code changes have been made to fix the issue.

* 3971592 (Tracking ID: 3935232)

SYMPTOM:
Replication and IO hang may happen on new master node during master
takeover.

DESCRIPTION:
During master switch is in progress if log owner change kicks in, flag
VOLSIO_FLAG_RVC_ACTIVE will be set by log owner change SIO.
RVG(Replicated Volume Group) recovery initiated by master switch will clear
flag VOLSIO_FLAG_RVC_ACTIVE after RVG recovery done. When log owner
change done, as flag VOLSIO_FLAG_RVC_ACTIVE has been cleared, resetting
flag VOLOBJ_TFLAG_VVR_QUIESCE is skipped. The present of flag
VOLOBJ_TFLAG_VVR_QUIESCE will make replication and application IO on RVG
always be in pending state.

RESOLUTION:
Code changes have been done to make log owner change wait until master
switch completed.

* 3971594 (Tracking ID: 3939891)

SYMPTOM:
In VVR, the replication might fail with header checksum errors
when using UDP protocol for replication.
The rlink might be disconnected with below errors:
Jan 5 16:24:54 <hostname> kernel: VxVM VVR vxio V-5-0-830 Header checksum
error

DESCRIPTION:
An incorrect address was used while sending some data to
the secondary node while using the UDP protocol for replication, causing the
header checksum errors on the VVR secondary site. This resulted in constant
reconnect/disconnect from primary to secondary and stopped the replication.

RESOLUTION:
Changes are done in VxVM code to fix the bug which resulted
in the header checksum errors.

* 3971596 (Tracking ID: 3935974)

SYMPTOM:
While communicating with client process, vxrsyncd daemon terminates and after
sometime it gets started or may require a reboot to start.

DESCRIPTION:
When the client process shuts down abruptly and vxrsyncd daemon attempt to write
on the client socket, SIGPIPE signal is generated. The default action for this
signal is to terminate the process. Hence vxrsyncd gets terminated.

RESOLUTION:
This SIGPIPE signal should be handled in order to prevent the termination of
vxrsyncd.

* 3971598 (Tracking ID: 3906119)

SYMPTOM:
In CVM (Cluster Volume Manager) environment, failback didn't happen when the
optimal path returned back.

DESCRIPTION:
For ALUA(Asymmetric Logical Unit Access) array, it supports implicit and
explicit asymmetric logical unit access management methods. In a CVM
environment, DMP(Dynamic Multi-pathing) failed to start failback for
implicit
ALUA only mode array, hence the issue.

RESOLUTION:
Code changes are added to handle this case for implicit ALUA only mode
array.

* 3971600 (Tracking ID: 3919559)

SYMPTOM:
IO hangs after pulling out all cables, when VVR(Veritas Volume Replicator)
is
reconfigured.

DESCRIPTION:
When VVR is configured and SRL(Storage Replicator Log) batch feature is
enabled, after pulling out all cable, if more than one IO get queued in VVR
before a header error, due to a bug in VVR, at least one IO won't be
handled,
hence the issue.

RESOLUTION:
Code has been modified to get every queued IO in VVR handled properly.

* 3971602 (Tracking ID: 3904538)

SYMPTOM:
RV(Replicate Volume) IO hang happens during slave node leave or master node
switch.

DESCRIPTION:
RV IO hang happens because of SRL(Serial Replicate Log) header is updated by RV
recovery SIO. After slave node leave or master node switch, RV recovery could
be
initiated. During RV recovery, all new coming IOs should be quiesced by setting
NEED
RECOVERY flag on RV to avoid racing. Due to a code defect, this flag is removed
by
transaction commit, result in conflicting between new IOs and RV recovery SIO.

RESOLUTION:
Code changes have been made to fix this issue.

* 3971604 (Tracking ID: 3907641)

SYMPTOM:
Panic in volsal_remove_saldb during reconfig in FSS configuration, the stack
trace is like following:

machine_kexec
crash_kexec
oops_end
die
do_trap
do_invalid_op
invalid_op
kfree
vol_free
volsal_remove_saldb
volcvm_vxreconfd_thread
kthread
ret_from_fork

DESCRIPTION:
While deleting the cluster-wide SAL device database, there is a bug to delete
an item that is already deleted.

RESOLUTION:
Modified the code to prevent deleting an item that is already deleted.

* 3971608 (Tracking ID: 3955725)

SYMPTOM:
Utility to clear "failio" flag on disk after storage connectivity is back.

DESCRIPTION:
If I/Os to the disks timeout due to some hardware failures like weak Storage Area Network (SAN) cable link or Host Bus Adapter (HBA) failure, VxVM assumes
that disk is bad or slow and it sets "failio" flag on the disk. Because of this flag, all the subsequent I/Os fail with the No such device error. After the connectivity is back, the "failio" needs to clear using "vxdisk <disk_name> failio=off". We have come up with a utility "vxcheckfailio" which will
clear the "failio" flag for all the disks whose all paths are enabled.

RESOLUTION:
Code changes are done to add utility "vxcheckfailio" that will clear the "failio" flag on the disks.

* 3971873 (Tracking ID: 3932356)

SYMPTOM:
In a two node cluster vxconfigd dumps core while importing the DG -

dapriv_da_alloc ()
in setup_remote_disks ()
in volasym_remote_getrecs ()
req_dg_import ()
vold_process_request ()
start_thread () from /lib64/libpthread.so.0
from /lib64/libc.so.6

DESCRIPTION:
The vxconfigd is dumping core due to address alignment issue.

RESOLUTION:
The alignment issue is fixed.

* 3971874 (Tracking ID: 3957227)

SYMPTOM:
Disk group import succeeded, but with below error message:

vxvm:vxconfigd: [ID ** daemon.error] V-5-1-0 dg_import_name_to_dgid: Found dgid = **

DESCRIPTION:
When do disk group import, two configuration copies may be found. Volume Manager will use the latest configuration copy, then print a message to indicate this scenario. Due to wrong log level, this message got printed in error category.

RESOLUTION:
Code changes has been made to suppress this harmless message.

* 3971877 (Tracking ID: 3922159)

SYMPTOM:
Thin reclaimation may fail on Xtremio SSD disks with following error:
Reclaiming storage on:
Disk <disk_name> : Failed. Failed to reclaim <directory_name>.

DESCRIPTION:
VxVM (Veritas Volume Manager) uses thin-reclaimation method in order to reclaim
the space on Xtremio SSD disks. Few SSD arrays use TRIM method for reclaimation,
instead of thin-reclaimation.
A condition in code which checks whether TRIM is supported or not was incorrect
and it was leading to reclaim failure on Xtremio disks.

RESOLUTION:
Corrected the condition in code which checks whether TRIM method is supported or
not for reclaimation.

* 3971878 (Tracking ID: 3922529)

SYMPTOM:
VxVM (Veritas Volume Manager) creates some required files under /tmp
and /var/tmp directories.

DESCRIPTION:
During creation of VxVM (Veritas Volume Manager) rpm package, some files are
created under /usr/lib/vxvm/voladm.d/lib/vxkmiplibs/ directory.

The non-root users have access to these folders, and they may accidentally modify,
move or delete those files.

RESOLUTION:
This Hot Fix address the issue by assigning proper permissions directory during
creation of rpm.

* 3972246 (Tracking ID: 3946350)

SYMPTOM:
kmalloc-1024 and kmalloc-2048 memory consuming keeps increasing when Veritas
Volume Replicator (VVR) IO size is more than 256K.

DESCRIPTION:
In case of VVR , if I/O size is more than 256K, then the IO is broken into
child
IOs. Due to code defect, the allocated space doesn't got freed when splited
IOs are completed.

RESOLUTION:
The code is modified to free VxVM allocated memory after split IOs competed.

* 3972911 (Tracking ID: 3918408)

SYMPTOM:
Data corruption when volume grow is attempted on thin reclaimable disks whose space is just freed.

DESCRIPTION:
When the space in the volume is freed by deleting some data or subdisks, the corresponding subdisks are marked for
reclamation. It might take some time for the periodic reclaim task to start if not issued manually. In the meantime, if
same disks are used for growing another volume, it can happen that reclaim task will go ahead and overwrite the data
written on the new volume. Because of this race condition between reclaim and volume grow operation, data corruption
occurs.

RESOLUTION:
Code changes are done to handle race condition between reclaim and volume grow operation. Also reclaim is skipped for
those disks which have been already become part of new volume.

* 3973009 (Tracking ID: 3965962)

SYMPTOM:
Following process is seen in the "ps -ef" output
/bin/sh - /usr/sbin/auto_recover

DESCRIPTION:
When there are plexes in the configuration which needs recovery, then if there are events such
as master initialization , master takeover , auto recovery gets triggered at the time of slave join.

RESOLUTION:
Code changes are done, to allow admins to prevent auto recovery at the time of slave join.

* 3973012 (Tracking ID: 3921572)

SYMPTOM:
The vxconfigd dumps core when an array is disconnected.

DESCRIPTION:
In a configuration where a Disk Group having disks from more than one array,
the
vxconfigd is dumping core when an array is disconnected followed by a
command
which attempts to get the details of all the disks of the Disk Group.

Once the array is disconnected, the vxconfigd removes all the Disk Access
(DA)
records. While servicing the command which needs the details of the disks in
the
DG, the vxconfigd goes through the DA list. The code which services the
command
has a defect causing the core.

RESOLUTION:
The code is rectified to ignore the NULL records to avoid the core.

* 3973020 (Tracking ID: 3754715)

SYMPTOM:
When dmp_native_support is enabled and kdump is triggered , system gets
Hang while collecting crash.

DESCRIPTION:
When kdump is triggered with native support enabled , issue occurs when booting
into the kdump kernel using kdump initrd. Since kdump kernel has limited memory,
loading vxvm
modules into the kdump kernel causes system to hang because of memory allocation
failure.

RESOLUTION:
VxVM modules are added to blacklist in kdump.conf file which prevents them from
loading when the kdump is triggered.

* 3973021 (Tracking ID: 3927439)

SYMPTOM:
VxVM(Veritas Volume Manager) vxassist relayout command doesn't honor read
policy. Read policy will be forcibly reset to SELECT when relayout finishes,
regardless of volume's read policy before relayout operation.

DESCRIPTION:
There're some places in vxassist relayout code that SELECT policy is hard coded.

RESOLUTION:
Code changes have been made to inherit volume's read policy before relayout
operation.

* 3973022 (Tracking ID: 3873123)

SYMPTOM:
When remote disk on node is EFI disk, vold enable fails.
And following message get logged, and eventually causing the vxconfigd to go
into disabled state:
Kernel and on-disk configurations don't match; transactions are disabled.

DESCRIPTION:
This is becasue one of the cases of EFI remote disk is not properly handled
in disk recovery part when vxconfigd is enabled.

RESOLUTION:
Code changes have been done to set the EFI flag on darec in recovery code

* 3973077 (Tracking ID: 3945115)

SYMPTOM:
VxVM vxassist relayout command fails for volumes with RAID layout with the
following message:
VxVM vxassist ERROR V-5-1-2344 Cannot update volume <vol-name>
VxVM vxassist ERROR V-5-1-4037 Relayout operation aborted. (7)

DESCRIPTION:
During relayout operation, the target volume inherits the attributes from
original volume. One of those attributes is read policy. In case if the layout
of original volume is RAID, it will set RAID read policy. RAID read policy
expects the target volume to have appropriate log required for RAID policy.
Since the target volume is of different layout, it does not have the log
present and hence relayout operation fails.

RESOLUTION:
Code changes have been made to set the read policy to SELECT for target
volumes rather than inheriting it from original volume in case original volume
is of RAID layout.

* 3973079 (Tracking ID: 3950199)

SYMPTOM:
System may panic with following stack while DMP(Dynamic Mulitpathing) path
restoration:

#0 [ffff880c65ea73e0] machine_kexec at ffffffff8103fd6b
#1 [ffff880c65ea7440] crash_kexec at ffffffff810d1f02
#2 [ffff880c65ea7510] oops_end at ffffffff8154f070
#3 [ffff880c65ea7540] no_context at ffffffff8105186b
#4 [ffff880c65ea7590] __bad_area_nosemaphore at ffffffff81051af5
#5 [ffff880c65ea75e0] bad_area at ffffffff81051c1e
#6 [ffff880c65ea7610] __do_page_fault at ffffffff81052443
#7 [ffff880c65ea7730] do_page_fault at ffffffff81550ffe
#8 [ffff880c65ea7760] page_fault at ffffffff8154e2f5
[exception RIP: _spin_lock_irqsave+31]
RIP: ffffffff8154dccf RSP: ffff880c65ea7818 RFLAGS: 00210046
RAX: 0000000000010000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000200246 RSI: 0000000000000040 RDI: 00000000000000e8
RBP: ffff880c65ea7818 R8: 0000000000000000 R9: ffff8824214ddd00
R10: 0000000000000002 R11: 0000000000000000 R12: ffff88302d2ce400
R13: 0000000000000000 R14: ffff880c65ea79b0 R15: ffff880c65ea79b7
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#9 [ffff880c65ea7820] dmp_open_path at ffffffffa07be2c5 [vxdmp]
#10 [ffff880c65ea7980] dmp_restore_node at ffffffffa07f315e [vxdmp]
#11 [ffff880c65ea7b00] dmp_revive_paths at ffffffffa07ccee3 [vxdmp]
#12 [ffff880c65ea7b40] gendmpopen at ffffffffa07cbc85 [vxdmp]
#13 [ffff880c65ea7c10] dmpopen at ffffffffa07cc51d [vxdmp]
#14 [ffff880c65ea7c20] dmp_open at ffffffffa07f057b [vxdmp]
#15 [ffff880c65ea7c50] __blkdev_get at ffffffff811d7f7e
#16 [ffff880c65ea7cb0] blkdev_get at ffffffff811d82a0
#17 [ffff880c65ea7cc0] blkdev_open at ffffffff811d8321
#18 [ffff880c65ea7cf0] __dentry_open at ffffffff81196f22
#19 [ffff880c65ea7d50] nameidata_to_filp at ffffffff81197294
#20 [ffff880c65ea7d70] do_filp_open at ffffffff811ad180
#21 [ffff880c65ea7ee0] do_sys_open at ffffffff81196cc7
#22 [ffff880c65ea7f30] compat_sys_open at ffffffff811eee9a
#23 [ffff880c65ea7f40] symev_compat_open at ffffffffa0c9b08f

DESCRIPTION:
System panic can be encounter due to race condition. There is a possibility
that a path picked by DMP restore daemon for processing
may be deleted before the restoration process is complete. Hence when the
restoration daemon tries to access the path properties it leads to system panic
as the path properties are already freed.

RESOLUTION:
Code changes are done to handle the race condition.

* 3973080 (Tracking ID: 3936535)

SYMPTOM:
The IO performance is poor due to frequent cache drops on snapshots
configured system.

DESCRIPTION:
On VxVM snapshots configured system, along with the IO going, DCO map update
will happen and it could allocate lots of chucks of pages memory, which
triggered kswapd to swap the cache memory out, so cache drops were seen.

RESOLUTION:
Code changes are done to allocate big size memory for DCO map update without
triggering memory swap out.

* 3973405 (Tracking ID: 3931048)

SYMPTOM:
Few VxVM log files listed below are created with write permission to all users
which might lead to security issues.

/etc/vx/log/vxloggerd.log
/var/adm/vx/logger.txt
/var/adm/vx/kmsg.log

DESCRIPTION:
The log files are created with write permissions to all users, which is a
security hole.
The files are created with default rw-rw-rw- (666) permission because the umask
is set to 0 while creating these files.

RESOLUTION:
Changed umask to 022 while creating these files and fixed an incorrect open
system call. Log files will now have rw-r--r--(644) permissions.

* 3974326 (Tracking ID: 3948140)

SYMPTOM:
System may panic if RTPG data returned by the array is greater than 255 with
below stack:

dmp_alua_get_owner_state()
dmp_alua_get_path_state()
dmp_get_path_state()
dmp_check_path_state()
dmp_restore_callback()
dmp_process_scsireq()
dmp_daemons_loop()

DESCRIPTION:
The size of the buffer given to RTPG SCSI command is currently 255 bytes. But the
size of data returned by underlying array for RTPG can be greater than 255
bytes. As a result
incomplete data is retrieved (only the first 255 bytes) and when trying to read
the RTPG data, it causes invalid access of memory resulting in error while
claiming the devices. This invalid access of memory may lead to system panic.

RESOLUTION:
The RTPG buffer size has been increased to 1024 bytes for handling this.

* 3974334 (Tracking ID: 3899568)

SYMPTOM:
"vxdmpadm iostat stop" as per design cannot stop the iostat gathering
persistently. To avoid Performance & Memory crunch related issues, it is
generally recommended to stop the iostat gathering.There is a requirement
to provide such ability to stop/start the iostat gathering persistently
in those cases.

DESCRIPTION:
Today DMP iostat daemon is stopped using - "vxdmpadm iostat stop". but this
is not persistent setting. After reboot this would be lost and hence
customer
needs to also have to put this in init scripts at appropriate place for
persistent effect.

RESOLUTION:
Code is modified to provide a tunable "dmp_compute_iostats" which can
start/stop the iostat gathering persistently.

Notes:
Use following command to start/stop the iostat gathering persistently.
# vxdmpadm settune dmp_compute_iostats=on/off.

Patch ID: VRTSvxvm-7.2.0.400

* 3949189 (Tracking ID: 3938549)

SYMPTOM:
Command vxassist -g <dg> make vol <size> fails with error:
Unexpected kernel error in configuration update for Rhel 7.5.

DESCRIPTION:
Due to changes in Rhel 7.5 source code the vxassist make
volume command failed to create volume and returned with error "Unexpected
kernel error in configuration update".

RESOLUTION:
Changes are done in VxVM code to solve the issue for
volume creation.

Patch ID: VRTSvxvm-7.2.0.300

* 3913119 (Tracking ID: 3902769)

SYMPTOM:
While running vxstat in CVM (Clustered Volume Manager) environment, system may
panic with following stack:

machine_kexec
__crash_kexec
crash_kexec
oops_end
no_context
do_page_fault
page_fault
[exception RIP: vol_cvm_io_stats_common+143]
__wake_up_common
__wake_up_sync_key
unix_destruct_scm
__alloc_pages_nodemask
volinfo_ioctl
volsioctl_real
vols_ioctl at
vols_unlocked_ioctl
do_vfs_ioctl
sys_ioctl
entry_SYSCALL_64_fastpath

DESCRIPTION:
The system panic occurs because while fetching the IO statistics from the VxVM
(Veritas Volume Manager) kernel, an illegal address in the IO stats data is
accessed which is not yet populated.

RESOLUTION:
The code is fixed to correctly populate the address in the IO stats data before
fetching the IO stats.

* 3920932 (Tracking ID: 3915523)

SYMPTOM:
Local disk from other node belonging to private DG is exported to the node when
a private DG is imported on current node.

DESCRIPTION:
When we try to import a DG, all the disks belonging to the DG are automatically
exported to the current node so as to make sure
that the DG gets imported. This is done to have same behaviour as SAN with local
disks as well. Since we are exporting all disks in
the DG, then it happens that disks which belong to same DG name but different
private DG on other node get exported to current node
as well. This leads to wrong disk getting selected while DG gets imported.

RESOLUTION:
Instead of DG name, DGID (diskgroup ID) is used to decide whether disk needs to
be exported or not.

* 3925379 (Tracking ID: 3919902)

SYMPTOM:
VxVM(Veritas Volume Manager) vxdmpadm iopolicy switch command may not work.
When issue happens, vxdmpadm setattr iopolicy finishes without any error,
but subsequent vxdmpadm getattr command shows iopolicy is not correctly
updated:
# vxdmpadm getattr enclosure emc-vplex0 iopolicy
ENCLR_NAME DEFAULT CURRENT
============================================
emc-vplex0 Balanced Balanced
# vxdmpadm setattr arraytype VPLEX-A/A iopolicy=minimumq
# vxdmpadm getattr enclosure emc-vplex0 iopolicy
ENCLR_NAME DEFAULT CURRENT
============================================
emc-vplex0 Balanced Balanced

Also standby paths are not honored by some iopolicy(for example balanced
iopolicy). Read/Write IOs are seen against standby paths by vxdmpadm iostat
command.

DESCRIPTION:
array's iopolicy field becomes stale when vxdmpadm setattr arraytype
iopolicy command is used, hence when iopolicy was set back to the staled
one, it will not work actually. Also, when paths are evaluated for issusing
IOs, standby flag isn't taken into consideration hence standby paths are
used for r/w IOs.

RESOLUTION:
Code changes have been done to address these issues.

* 3926224 (Tracking ID: 3925400)

SYMPTOM:
lsmod is not showing the required APM modules loaded.

DESCRIPTION:
For supporting RHEL7.4 update, dmp module is recompiled with latest RHEL7.4 kernel
version. During post install of the package the APM modules fails to load due to
mismatch in dmp and additional APM module kernel version.

RESOLUTION:
ASLAPM package is recompiled with RHEL7.4 kernel.

* 3926301 (Tracking ID: 3925398)

SYMPTOM:
VxVM modules failed to load on RHEL7.4.

DESCRIPTION:
Since RHEL7.4 is new release therefore VxVM module failed to load
on it.

RESOLUTION:
The VxVM has been re-compiled with RHEL 7.4 build environment.

Patch ID: VRTSvxvm-7.2.0.100

* 3909992 (Tracking ID: 3898069)

SYMPTOM:
System panic may happen in dmp_process_stats routine with the following stack:

dmp_process_stats+0x471/0x7b0
dmp_daemons_loop+0x247/0x550
kthread+0xb4/0xc0
ret_from_fork+0x58/0x90

DESCRIPTION:
When aggregate the pending IOs per DMP path over all CPUs, out of bound
access issue happened due to the wrong index of statistic table, which could
cause a system panic.

RESOLUTION:
Code changes have been done to correct the wrong index.

* 3910000 (Tracking ID: 3893756)

SYMPTOM:
Under certain circumstances, after vxconfigd running for a long time, a task might be dangling in system. Which may be seen by issuing 'vxtask -l list'.

DESCRIPTION:
- voltask_dump() gets a task id by calling ' vol_task_dump' in kernel (ioctl) as the minor number of the taskdev.
- the task id (or minor number) increases by 1 when a new task is registered.
- task id starts from 160 and rewinds when it meets 65536. there is a global counter 'vxtask_next_minor' indicating next task id.
- at the time vxconfigd opens a taskdev by calling voltask_dump() and holding it, it gets a task id too (let's say 165). from then on, there's a
vnode with this minor number (major=273, minor=165) exists in kernel.
- as time goes by, the task id increases and meets 65536, it then rewinds and starts from 160 again.
- when taskid goes by 165 again with a cli command (say 'vxdisk -othin, fssize list'), then it's taskdev gets the same major and minor number
(165) as vxconfigd's.
- at the same time, vxconfigd is still holding this vnode too. vxdisk doesn't know this and opens the taskdev, and registers a task structure in
kernel hash table, this adds a reference to the same vnode which vxconfigd is holding, now the reference count of the common snode is 2.
- when vxdisk (fsusage_collect_stats_task) has done it's job, it calls voltask_complete->close()->spec_close(), trying to remove this task
(165). but the os function spec_close() ( from specfs ) gets in the way, it detects reference count of the common snode (vnode->v_data-
>snode->s_commonvp->v_data->common snode). spec_close() finds out the value of s_count is 2, then it only drops the reference by one
and returns success to caller, without calling the actual closing function 'volsclose()'.
- volsclose() is not called by spec_close(), then it's subsequent functions are not called too: volsclose_real()->voltask_close()
->vxtask_rm_task(), among those, vxtask_rm_task() does the actual job removing a task from the kernel hashtable.
- after calling close(), fsusage_collect_stats_task returns, and vxdisk command exits. from this point on, the task is dangling in kernel hash
table, until vxconfigd exits.

RESOLUTION:
Source change to avoid vxconfigd holding task device.

* 3910426 (Tracking ID: 3868533)

SYMPTOM:
IO hang happens when starting replication. VXIO deamon hang with stack like
following:

vx_cfs_getemap at ffffffffa035e159 [vxfs]
vx_get_freeexts_ioctl at ffffffffa0361972 [vxfs]
vxportalunlockedkioctl at ffffffffa06ed5ab [vxportal]
vxportalkioctl at ffffffffa06ed66d [vxportal]
vol_ru_start at ffffffffa0b72366 [vxio]
voliod_iohandle at ffffffffa09f0d8d [vxio]
voliod_loop at ffffffffa09f0fe9 [vxio]

DESCRIPTION:
While performing DCM replay in case Smart Move feature is enabled, VxIO
kernel needs to issue IOCTL to VxFS kernel to get file system free region.
VxFS kernel needs to clone map by issuing IO to VxIO kernel to complete this
IOCTL. Just at the time RLINK disconnection happened, so RV is serialized to
complete the disconnection. As RV is serialized, all IOs including the
clone map IO form VxFS is queued to rv_restartq, hence the deadlock.

RESOLUTION:
Code changes have been made to handle the dead lock situation.

* 3910586 (Tracking ID: 3852146)

SYMPTOM:
In a CVM cluster, when importing a shared diskgroup specifying both -c and -o
noreonline options, the following error may be returned:
VxVM vxdg ERROR V-5-1-10978 Disk group <dgname>: import failed: Disk for disk
group not found.

DESCRIPTION:
The -c option will update the disk ID and disk group ID on the private region
of the disks in the disk group being imported. Such updated information is not
yet seen by the slave because the disks have not been re-onlined (given that
noreonline option is specified). As a result, the slave cannot identify the
disk(s) based on the updated information sent from the master, causing the
import to fail with the error Disk for disk group not found.

RESOLUTION:
The code is modified to handle the working of the "-c" and "-o noreonline"
options together.

* 3910588 (Tracking ID: 3868154)

SYMPTOM:
When DMP Native Support is set to ON, and if a dmpnode has multiple VGs,
'vxdmpadm native ls' shows incorrect VG entries for dmpnodes.

DESCRIPTION:
When DMP Native Support is set to ON, multiple VGs can be created on a disk as
Linux supports creating VG on a whole disk as well as on a partition of
a disk.This possibility was not handled in the code, hence the display of
'vxdmpadm native ls' was getting messed up.

RESOLUTION:
Code now handles the situation of multiple VGs of a single disk

* 3910590 (Tracking ID: 3878030)

SYMPTOM:
Enhance VxVM(Veritas Volume Manager) DR(Dynamic Reconfiguration) tool to
clean up OS and VxDMP(Veritas Dynamic Multi-Pathing) device trees without
user interaction.

DESCRIPTION:
When users add or remove LUNs, stale entries in OS or VxDMP device trees can
prevent VxVM from discovering changed LUNs correctly. It even causes VxVM
vxconfigd process core dump under certain conditions, users have to reboot
system to let vxconfigd restart again.
VxVM has DR tool to help users adding or removing LUNs properly but it
requires user inputs during operations.

RESOLUTION:
Enhancement has been done to VxVM DR tool. It accepts '-o refresh' option to
clean up OS and VxDMP device trees without user interaction.

* 3910591 (Tracking ID: 3867236)

SYMPTOM:
Application IO hang happens after issuing Master Pause command.

DESCRIPTION:
The flag VOL_RIFLAG_REQUEST_PENDING in VVR(Veritas Volume Replicator) kernel is
not cleared because of a race between Master Pause SIO and RVWRITE1 SIO resulting
in RU (Replication Update) SIO to fail to proceed thereby causing IO hang.

RESOLUTION:
Code changes have been made to handle the race condition.

* 3910592 (Tracking ID: 3864063)

SYMPTOM:
Application IO hang happens after issuing Master Pause command.

DESCRIPTION:
Some flags(VOL_RIFLAG_DISCONNECTING or VOL_RIFLAG_REQUEST_PENDING) in VVR(Veritas
Volume Replicator) kernel are not cleared because of a race between Master Pause SIO
and Error Handler SIO resulting in RU (Replication Update) SIO to fail to proceed
thereby causing IO hang.

RESOLUTION:
Code changes have been made to handle the race condition.

* 3910593 (Tracking ID: 3879324)

SYMPTOM:
VxVM(Veritas Volume Manager) DR(Dynamic Reconfiguration) tool fails to
handle busy device problem while LUNs are removed from OS

DESCRIPTION:
OS devices may still be busy after removing them from OS, it fails 'luxadm -
e offline <disk>' operation and leaves staled entries in 'vxdisk list'
output
like:
emc0_65535 auto - - error
emc0_65536 auto - - error

RESOLUTION:
Code changes have been done to address busy devices issue.

* 3912529 (Tracking ID: 3878153)

SYMPTOM:
VVR (Veritas Volume Replicator) 'vradmind' deamon core dump.

DESCRIPTION:
Under certain circumstances 'vradmind' daemon may core dump freeing a variable allocated in
stack.

RESOLUTION:
Code change has been done to address the issue.

* 3912532 (Tracking ID: 3853144)

SYMPTOM:
VxVM(Veritas Volume Manager) mirror volume's stale plex is incorrectly marked as
"Enable Active" after it comes back, which prevents resync of such stale plex
from up-to-date ones. It can cause data corruption if the stale plex happens to
be the preferred or slected plex, or read policy "round" is set for the volume.

DESCRIPTION:
When volume plex is detached abruptly while vxconfigd is unavailable, VxVM
kernel logging records the detach activity along with its detach transaction id
for future resync or recover. Because of code defect, such detach transaction id
can be wrongly selected under certain situation.

RESOLUTION:
Code changes have been done to correctly select the detach transaction id.

* 3915963 (Tracking ID: 3907034)

SYMPTOM:
The mediatype is not shown as ssd in vxdisk -e list command for SSD (solid
state devices) devices.

DESCRIPTION:
Some of the SSD devices does not have a ASL (Array Support Library) to claim
them and are claimed as JBOD (Just a bunch of disks). In this case since
there is no ASL, the
attributes of the device like mediatype are not known. This is the reason
mediatype is not shown in vxdisk -e list output.

RESOLUTION:
Code now checks the value stored in the file
/sys/block/<device>/queue/rotational which signifies whether the device is
SSD or not to detect mediatype.

Patch ID: VRTSaslapm-7.2.0.5100

* 3968928 (Tracking ID: 3967122)

SYMPTOM:
Retpoline support for ASLAPM on RHEL 7.6
kernels

DESCRIPTION:
The RHEL7.6 is new release and it has Retpoline kernel. The APM module
should be recompiled with retpoline aware GCC to support retpoline
kernel.

RESOLUTION:
Compiled APM with retpoline GCC.

Patch ID: VRTSgms-7.2.0.3100

* 3974569 (Tracking ID: 3932849)

SYMPTOM:
Temporary files are being created in /tmp.

DESCRIPTION:
The gms is creating files in /tmp.

RESOLUTION:
Added code to redirect temporary files in /etc/vx/tmp/.

Patch ID: VRTSgms-7.2.0.200

* 3949177 (Tracking ID: 3947812)

SYMPTOM:
GMS support for RHEL7.5

DESCRIPTION:
The RHEL7.5 is new release and it has Retpoline kernel. So the GMS
module should recompile with retpoline aware GCC.

RESOLUTION:
Compiled GMS with Retpoline GCC for RHEL7.5 support

Patch ID: VRTSveki-7.2.0.200

* 3969307 (Tracking ID: 3955519)

SYMPTOM:
VRTSvxvm upgrade fails since VRTSveki upgrade fails while using yum upgrade command.

DESCRIPTION:
While using yum for upgrading the packages, dependent packages are updated first and hence VRTSveki is upgraded before VRTSvxvm. The upgrade of VRTSveki fails since the unload of veki module fails because the other vxvm/vxfs modules are still loaded. Since the upgrade fails, the link under the directory /lib/modules/*/vertias/veki is not created and hence the upgrade of the VRTSvxvm fails since the vxdmp module cannot be unloaded.

RESOLUTION:
Code changes have been done to create the link in the directory even though the successful unload has not happened.

* 3969837 (Tracking ID: 3923934)

SYMPTOM:
VxFS automounts fail after reboot.

DESCRIPTION:
The LSB header is missing from the veki init.d scripts that is
causing the systemd services generator to create incorrect dependencies. Due to
this Veki was coming up later than VxVM after which VxFS was laoding. Hence
automounts after reboot were failing on SLES12SP2.

RESOLUTION:
We have clearly defined the dependencies of Veki and see that no
problem is occurring with automounts.

* 3970414 (Tracking ID: 3967265)

SYMPTOM:
RHEL 7.x RETPOLINE kernels and RHEL 7.6 are not supported

DESCRIPTION:
Red Hat has released RHEL 7.6 which has RETPOLINE kernel, and also released RETPOLINE kernels for older RHEL 7.x Updates. Veritas Cluster Server
kernel modules need to be recompiled with RETPOLINE aware GCC to support RETPOLINE kernel.

RESOLUTION:
Support for RHEL 7.6 and RETPOLINE kernels on RHEL 7.x kernels is now introduced.

Patch ID: VRTSveki-7.2.0.100

* 3950989 (Tracking ID: 3944179)

SYMPTOM:
Veritas Infoscale Availability does not support Red Hat Enterprise Linux 7
Update 5(RHEL7.5).

DESCRIPTION:
Veritas Infoscale Availability does not support Red Hat Enterprise Linux
versions later than RHEL7 Update 4.

RESOLUTION:
Veritas Infoscale Availability support for Red Hat Enterprise Linux 7 Update
5(RHEL7.5) is now introduced.

Patch ID: VRTSdbac-7.2.0.5100

* 3971163 (Tracking ID: 3967265)

SYMPTOM:
RHEL 7.x RETPOLINE kernels and RHEL 7.6 are not supported

RESOLUTION:
Support for RHEL 7.6 and RETPOLINE kernels on RHEL 7.x kernels is now introduced.

Patch ID: VRTSdbac-7.2.0.300

* 3950832 (Tracking ID: 3944179)

SYMPTOM:
Veritas Infoscale Availability does not support Red Hat Enterprise Linux 7
Update 5(RHEL7.5).

DESCRIPTION:
Veritas Infoscale Availability does not support Red Hat Enterprise Linux
versions later than RHEL7 Update 4.

RESOLUTION:
Veritas Infoscale Availability support for Red Hat Enterprise Linux 7 Update
5(RHEL7.5) is now introduced.

Patch ID: VRTSdbac-7.2.0.200

* 3928824 (Tracking ID: 3925832)

SYMPTOM:
vcsmm module does not load with RHEL7.4

DESCRIPTION:
Since RHEL7.4 is new release therefore vcsmm module failed to load
on it.

RESOLUTION:
The VRTSdbac package is re-compiled with RHEL7.4 kernel (3.10.0-
693.el7.x86_64)
in the build environment to mitigate the failure.

Patch ID: VRTSamf-7.2.0.5100

* 3971162 (Tracking ID: 3967265)

SYMPTOM:
RHEL 7.x RETPOLINE kernels and RHEL 7.6 are not supported

RESOLUTION:
Support for RHEL 7.6 and RETPOLINE kernels on RHEL 7.x kernels is now introduced.

Patch ID: VRTSamf-7.2.0.300

* 3950831 (Tracking ID: 3944179)

SYMPTOM:
Veritas Infoscale Availability does not support Red Hat Enterprise Linux 7
Update 5(RHEL7.5).

DESCRIPTION:
Veritas Infoscale Availability does not support Red Hat Enterprise Linux
versions later than RHEL7 Update 4.

RESOLUTION:
Veritas Infoscale Availability support for Red Hat Enterprise Linux 7 Update
5(RHEL7.5) is now introduced.

Patch ID: VRTSamf-7.2.0.200

* 3927030 (Tracking ID: 3923100)

SYMPTOM:
Veritas Infoscale Availability does not support Red Hat Enterprise Linux 7
Update 4(RHEL7.4).

DESCRIPTION:
Veritas Infoscale Availability does not support Red Hat Enterprise Linux
versions later than RHEL7 Update 3.

RESOLUTION:
Veritas Infoscale Availability support for Red Hat Enterprise Linux 7 Update
4(RHEL7.4) is now introduced.

Patch ID: VRTSamf-7.2.0.100

* 3908392 (Tracking ID: 3896877)

SYMPTOM:
Veritas Infoscale Availability does not support Red Hat Enterprise Linux 7
Update 3(RHEL7.3).

DESCRIPTION:
Veritas Infoscale Availability does not support Red Hat Enterprise Linux
versions later than RHEL7 Update 2.

RESOLUTION:
Veritas Infoscale Availability support for Red Hat Enterprise Linux 7 Update
3(RHEL7.3) is now introduced.

Patch ID: VRTSvxfen-7.2.0.6100

* 3971161 (Tracking ID: 3967265)

SYMPTOM:
RHEL 7.x RETPOLINE kernels and RHEL 7.6 are not supported

RESOLUTION:
Support for RHEL 7.6 and RETPOLINE kernels on RHEL 7.x kernels is now introduced.

Patch ID: VRTSvxfen-7.2.0.400

* 3950830 (Tracking ID: 3944179)

SYMPTOM:
Veritas Infoscale Availability does not support Red Hat Enterprise Linux 7
Update 5(RHEL7.5).

DESCRIPTION:
Veritas Infoscale Availability does not support Red Hat Enterprise Linux
versions later than RHEL7 Update 4.

RESOLUTION:
Veritas Infoscale Availability support for Red Hat Enterprise Linux 7 Update
5(RHEL7.5) is now introduced.

Patch ID: VRTSvxfen-7.2.0.300

* 3927032 (Tracking ID: 3923100)

SYMPTOM:
Veritas Infoscale Availability does not support Red Hat Enterprise Linux 7
Update 4(RHEL7.4).

DESCRIPTION:
Veritas Infoscale Availability does not support Red Hat Enterprise Linux
versions later than RHEL7 Update 3.

RESOLUTION:
Veritas Infoscale Availability support for Red Hat Enterprise Linux 7 Update
4(RHEL7.4) is now introduced.

Patch ID: VRTSvxfen-7.2.0.200

* 3915003 (Tracking ID: 2852872)

SYMPTOM:
Veritas Fencing command "vxfenadm -d" sometimes shows "replaying" RFSM
state for some nodes in the cluster.

DESCRIPTION:
During cluster startup, sometimes fencing RFSM keeps showing
"replaying" state for a node, but in fact the node has entered "running" state.

RESOLUTION:
The code is modified so that now fencing does not show incorrect RFSM
state for a node.

Patch ID: VRTSgab-7.2.0.5100

* 3971157 (Tracking ID: 3967265)

SYMPTOM:
RHEL 7.x RETPOLINE kernels and RHEL 7.6 are not supported

RESOLUTION:
Support for RHEL 7.6 and RETPOLINE kernels on RHEL 7.x kernels is now introduced.

Patch ID: VRTSgab-7.2.0.300

* 3950829 (Tracking ID: 3944179)

SYMPTOM:
Veritas Infoscale Availability does not support Red Hat Enterprise Linux 7
Update 5(RHEL7.5).

DESCRIPTION:
Veritas Infoscale Availability does not support Red Hat Enterprise Linux
versions later than RHEL7 Update 4.

RESOLUTION:
Veritas Infoscale Availability support for Red Hat Enterprise Linux 7 Update
5(RHEL7.5) is now introduced.

Patch ID: VRTSgab-7.2.0.200

* 3927031 (Tracking ID: 3923100)

SYMPTOM:
Veritas Infoscale Availability does not support Red Hat Enterprise Linux 7
Update 4(RHEL7.4).

DESCRIPTION:
Veritas Infoscale Availability does not support Red Hat Enterprise Linux
versions later than RHEL7 Update 3.

RESOLUTION:
Veritas Infoscale Availability support for Red Hat Enterprise Linux 7 Update
4(RHEL7.4) is now introduced.

Patch ID: VRTSgab-7.2.0.100

* 3913424 (Tracking ID: 3896877)

SYMPTOM:
Veritas Infoscale Availability does not support Red Hat Enterprise Linux 7
Update 3(RHEL7.3).

DESCRIPTION:
Veritas Infoscale Availability does not support Red Hat Enterprise Linux
versions later than RHEL7 Update 2.

RESOLUTION:
Veritas Infoscale Availability support for Red Hat Enterprise Linux 7 Update
3(RHEL7.3) is now introduced.

Patch ID: VRTSllt-7.2.0.7100

* 3971156 (Tracking ID: 3967265)

SYMPTOM:
RHEL 7.x RETPOLINE kernels and RHEL 7.6 are not supported

RESOLUTION:
Support for RHEL 7.6 and RETPOLINE kernels on RHEL 7.x kernels is now introduced.

Patch ID: VRTSllt-7.2.0.500

* 3950828 (Tracking ID: 3944179)

SYMPTOM:
Veritas Infoscale Availability does not support Red Hat Enterprise Linux 7
Update 5(RHEL7.5).

DESCRIPTION:
Veritas Infoscale Availability does not support Red Hat Enterprise Linux
versions later than RHEL7 Update 4.

RESOLUTION:
Veritas Infoscale Availability support for Red Hat Enterprise Linux 7 Update
5(RHEL7.5) is now introduced.

Patch ID: VRTSllt-7.2.0.400

* 3922321 (Tracking ID: 3922320)

SYMPTOM:
Kernel panics in case of FSS with LLT over RDMA during heavy data transfer.

DESCRIPTION:
In case of FSS using LLT over RDMA, sometimes kernel may panic because of an issue in the
buffer advertisement logic of RDMA buffers. The case arises when the buffer advertisement for
a particular RDMA buffer reaches the sender LLT node earlier than the hardware ACK comes to
LLT.

RESOLUTION:
LLT module is modified to fix the panic by using a different temporary queue for such buffers.

* 3926868 (Tracking ID: 3923100)

SYMPTOM:
Veritas Infoscale Availability does not support Red Hat Enterprise Linux 7
Update 4(RHEL7.4).

DESCRIPTION:
Veritas Infoscale Availability does not support Red Hat Enterprise Linux
versions later than RHEL7 Update 3.

RESOLUTION:
Veritas Infoscale Availability support for Red Hat Enterprise Linux 7 Update
4(RHEL7.4) is now introduced.

Patch ID: VRTSllt-7.2.0.200

* 3898155 (Tracking ID: 3896875)

SYMPTOM:
Veritas Infoscale Availability (VCS) does not support SUSE Linux Enterprise Server
12 Service Pack 2 (SLES 12 SP2).

DESCRIPTION:
Veritas Infoscale Availability did not support SUSE Linux Enterprise Server
versions released after SLES 12 SP1.

RESOLUTION:
Veritas Infoscale Availability support for SUSE Linux Enterprise Server 12 SP2 is
now introduced.

Patch ID: VRTSllt-7.2.0.100

* 3906300 (Tracking ID: 3905430)

SYMPTOM:
Application IO hangs in case of FSS with LLT over RDMA during heavy data transfer.

DESCRIPTION:
In case of FSS using LLT over RDMA, sometimes IO may hang because of race conditions in LLT
code.

RESOLUTION:
LLT module is modified to fix the race conditions arising due to heavy load with multiple
application threads.

Patch ID: VRTSglm-7.2.0.3100

* 3974120 (Tracking ID: 3932845)

SYMPTOM:
Temporary files are being created in /tmp

DESCRIPTION:
The GLM component is creating files in /tmp

RESOLUTION:
Added code to redirect temporary files in common location.

Patch ID: VRTSglm-7.2.0.200

* 3949178 (Tracking ID: 3947815)

SYMPTOM:
GLM support for RHEL7.5

DESCRIPTION:
The RHEL7.5 is new release and it has Retpoline kernel. So the GLM
module should recompile with retpoline aware GCC.

RESOLUTION:
Compiled GLM with Retpoline GCC for RHEL7.5 support

Patch ID: VRTSodm-7.2.0.4200

* 3969468 (Tracking ID: 3958865)

SYMPTOM:
ODM module failed to load on RHEL7.6.

DESCRIPTION:
Since RHEL7.6 is new release therefore ODM module failed to load
on it.

RESOLUTION:
Added ODM support for RHEL7.6.

* 3973530 (Tracking ID: 3932804)

SYMPTOM:
Temporary files are being created in /tmp

DESCRIPTION:
The odm component is creating files in /tmp

RESOLUTION:
Added code to redirect temporary files in common location.

* 3973769 (Tracking ID: 3897161)

SYMPTOM:
Oracle Database on Veritas filesystem with Veritas ODM library has high
log file sync wait time.

DESCRIPTION:
The ODM_IOP lock would not be held for long, so instead of trying
to take a trylock, and deferring the IO when we fail to get the trylock, it
would be better to call the non-trylock lock and finish the IO in the interrupt
context. It should be fine on solaris since this "sleep" lock is actually an
adaptive mutex.

RESOLUTION:
Instead of ODM_IOP_TRYLOCK() call ODM_IOP_LOCK() in the odm_iodone
and finish the IO. This fix will not defer any IO.

Patch ID: VRTSodm-7.2.0.300

* 3949176 (Tracking ID: 3938546)

SYMPTOM:
ODM module failed to load on RHEL7.5.

DESCRIPTION:
Since RHEL7.5 is new release therefore ODM module failed to load
on it.

RESOLUTION:
Added ODM support for RHEL7.5.

Patch ID: VRTSodm-7.2.0.200

* 3926163 (Tracking ID: 3923310)

SYMPTOM:
ODM module failed to load on RHEL7.4.

DESCRIPTION:
Since RHEL7.4 is new release therefore ODM module failed to load
on it.

RESOLUTION:
Added ODM support for RHEL7.4.

Patch ID: VRTSodm-7.2.0.100

* 3910095 (Tracking ID: 3757609)

SYMPTOM:
High CPU usage because of contention over ODM_IO_LOCK

DESCRIPTION:
While performing ODM IO, to update some of the ODM counters we take
ODM_IO_LOCK which leads to contention from multiple of iodones trying to update
these counters at the same time. This is results in high CPU usage.

RESOLUTION:
Code modified to remove the lock contention.

* 3911964 (Tracking ID: 3907933)

SYMPTOM:
ODM module failed to load on SLES12 SP2.

DESCRIPTION:
Since SLES12 SP2 is new release therefore ODM module failed to load
on it.

RESOLUTION:
Added ODM support for SLES12 SP2.

Patch ID: VRTSvxfs-7.2.0.4200

* 3969467 (Tracking ID: 3958853)

SYMPTOM:
VxFS module failed to load on RHEL7.6.

DESCRIPTION:
Since RHEL7.6 is new release therefore VxFS module failed to load
on it.

RESOLUTION:
Added VxFS support for RHEL7.6.

* 3972087 (Tracking ID: 3914782)

SYMPTOM:
VxFS 7.1 performance drops. Many VxFS threads occupy much CPU. There're lots of
vx_worklist_* threads even when there's no activity on the file system at all.

DESCRIPTION:
In VxFS 7.1, when NAIO load is high, vx_naio_load_check work items will be added
to pending work list. But these work items will put themselves back on the list
when processed. Thus the total count of these items keeps increasing which
eventually results in high count of worker threads.

RESOLUTION:
Modifications are made to ensure the added items will not be put back on the
work list.

* 3972347 (Tracking ID: 3938256)

SYMPTOM:
When checking file size through seek_hole, it will return incorrect offset/size when
delayed allocation is enabled on the file.

DESCRIPTION:
In recent version of RHEL7 onwards, grep command uses seek_hole feature to check
current file size and then it reads data depends on this file size. In VxFS, when dalloc is enabled, we
allocate the extent to file later but we increment the file size as soon as write completes. When
checking the file size in seek_hole, VxFS didn't completely consider case of dalloc and it was
returning stale size, depending on the extent allocated to file, instead of actual file size which was
resulting in reading less amount of data than expected.

RESOLUTION:
Code is modified in such way that VxFS will now return correct size in case dalloc is
enabled on file and seek_hole is called on that file.

* 3972351 (Tracking ID: 3928046)

SYMPTOM:
VxFS panic in the stack like below due to memory address not aligned:
void vxfs:vx_assemble_hdoffset+0x18
void vxfs:vx_assemble_opts+0x8c
void vxfs:vx_assemble_rwdata+0xf4
void vxfs:vx_gather_rwdata+0x58
void vxfs:vx_rwlock_putdata+0x2f8
void vxfs:vx_glm_cbfunc+0xe4
void vxfs:vx_glmlist_thread+0x164
unix:thread_start+4

DESCRIPTION:
The panic issue happened on copying piggyback data from inode to data buffer
for the rwlock under revoke processing. After some data has been copied to
the
data buffer, it reached to a 32-bits aligned address, but the value (large
dir
freespace offset) which is defined as 64-bits data type was being accessed
at
the address. Then it causes system panic due to memory address not aligned.

RESOLUTION:
The code changed by copy data to the 32-bits aligned address through bcopy()
rather than access directly.

* 3972360 (Tracking ID: 3943529)

SYMPTOM:
System panicked because watchdog timer detected hard lock up on CPU when trying to
release dentry.

DESCRIPTION:
When purging the dentries, there is possible race with iget thread which can lead to corrupted
vnode flags. Because of these corrupted flags, vxfs tries to purge dentry again and it struck
for vnode lock which was taken in the current thread context which leads to
deadlock/softlockup.

RESOLUTION:
Code is modified to protect vnode flags with vnode lock.

* 3972466 (Tracking ID: 3926972)

SYMPTOM:
Once a node reboots or goes out of the cluster, the whole cluster can hang.

DESCRIPTION:
This is a three way deadlock, in which a glock grant could block the recovery while trying to
cache the grant against an inode. But when it tries for ilock, if the lock is held by hlock revoke
and
waiting to get a glm lock, in our case cbuf lock, then it won't be able to get that because a
recovery is in progress. The recovery can't proceed because glock grant thread blocked it.

Hence the whole cluster hangs.

RESOLUTION:
The fix is to avoid taking ilock in GLM context, if it's not available.

* 3972563 (Tracking ID: 3922986)

SYMPTOM:
System panic since Linux NMI Watchdog detected LOCKUP in CFS.

DESCRIPTION:
The vxfs buffer cache iodone routine interrupted the inode flush thread
which
was trying to acquire the cfs buffer hash lock with releasing the cfs
buffer.
And the iodone routine was blocked by other threads on acquiring the free
list
lock. In the cycle, the other threads were contending the cfs buffer hash
lock
with the inode flush thread. On Linux, the spinlock is FIFO tickets lock, so
if the inode flush thread set ticket on the spinlock earlier, other threads
cant acquire the lock. This caused a dead lock issue.

RESOLUTION:
Code changes are made to ensure acquiring the cfs buffer hash lock with irq
disabled.

* 3972872 (Tracking ID: 3929854)

SYMPTOM:
Event notification was not supported on CFS mount point so getting following
errors in log file.
-bash-4.1# /usr/jdk/jdk1.8.0_121/bin/java test1
myWatcher: sun.nio.fs.SolarisWatchService@70dea4e filesystem provider is : sun.nio.fs.SolarisFileSystemProvider@5c647e05
java.nio.file.FileSystemException: /mnt1: Operation not supported
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
at sun.nio.fs.UnixException.asIOException(UnixException.java:111)
at sun.nio.fs.SolarisWatchService$Poller.implRegister(SolarisWatchService.java:311)
at sun.nio.fs.AbstractPoller.processRequests(AbstractPoller.java:260)
at sun.nio.fs.SolarisWatchService$Poller.processEvent(SolarisWatchService.java:425)
at sun.nio.fs.SolarisWatchService$Poller.run(SolarisWatchService.java:397)
at java.lang.Thread.run(Thread.java:745)

DESCRIPTION:
WebLogic watchservice was failing to register with CFS mount point directory.
which is resulting into "/mnt1: Operation not supported" on cfs mount point.

RESOLUTION:
Added new module parameter "vx_cfsevent_notify" to enable event notification
support on CFS.
By default vx_cfsevent_notify is disable.

This will work only in Active-Passive scenario:

-The Primary node(Active) which has set this tunable will receive notifications
for the respective events happened on cfs mount point directory.

-Secondary node (Passive) will not receive any notifications.

* 3972961 (Tracking ID: 3921152)

SYMPTOM:
Performance drop. Core dump shows threads doing vx_dalloc_flush().

DESCRIPTION:
An implicit typecast error in vx_dalloc_flush() can cause this performance issue.

RESOLUTION:
The code is modified to do an explicit typecast.

* 3973403 (Tracking ID: 3947433)

SYMPTOM:
While adding a volume (part of vset) in already mounted filesystem, fsvoladm
displays following error:
UX:vxfs fsvoladm: ERROR: V-3-28487: Could not find the volume <volume name> in vset

DESCRIPTION:
The code to find the volume in the vset requires the file descriptor of character
special device but in the concerned code path, the file descriptor that is being
passed is of block device.

RESOLUTION:
Code changes have been done to pass the file descriptor of character special device.

* 3973415 (Tracking ID: 3958688)

SYMPTOM:
System panic when VxFS got force unmounted, the panic stack trace could be like following:

#8 [ffff88622a497c10] do_page_fault at ffffffff81691fc5
#9 [ffff88622a497c40] page_fault at ffffffff8168e288
[exception RIP: vx_nfsd_encode_fh_v2+89]
RIP: ffffffffa0c505a9 RSP: ffff88622a497cf8 RFLAGS: 00010202
RAX: 0000000000000002 RBX: ffff883e5c731558 RCX: 0000000000000000
RDX: 0000000000000010 RSI: 0000000000000000 RDI: ffff883e5c731558
RBP: ffff88622a497d48 R8: 0000000000000010 R9: 000000000000fffe
R10: 0000000000000000 R11: 000000000000000f R12: ffff88622a497d6c
R13: 00000000000203d6 R14: ffff88622a497d78 R15: ffff885ffd60ec00
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#10 [ffff88622a497d50] exportfs_encode_inode_fh at ffffffff81285cb0
#11 [ffff88622a497d60] show_mark_fhandle at ffffffff81243ed4
#12 [ffff88622a497de0] inotify_fdinfo at ffffffff8124411d
#13 [ffff88622a497e18] inotify_show_fdinfo at ffffffff812441b0
#14 [ffff88622a497e50] seq_show at ffffffff81273ec7
#15 [ffff88622a497e90] seq_read at ffffffff8122253a
#16 [ffff88622a497f00] vfs_read at ffffffff811fe0ee
#17 [ffff88622a497f38] sys_read at ffffffff811fecbf
#18 [ffff88622a497f80] system_call_fastpath at ffffffff816967c9

DESCRIPTION:
There is no error handling for the situation that file system gets disabled/unmounted in nfsd_encode code path, which could lead to panic.

RESOLUTION:
Added error handling in vx_nfsd_encode_fh_v2()to avoid panic in case the file system get unmounted/dsiabled.

* 3973421 (Tracking ID: 3955766)

SYMPTOM:
CFS hung when doing extent allocating, there is a thread like following to loop forever doing extent allocation:

#0 [ffff883fe490fb30] schedule at ffffffff81552d9a
#1 [ffff883fe490fc18] schedule_timeout at ffffffff81553db2
#2 [ffff883fe490fcc8] vx_delay at ffffffffa054e4ee [vxfs]
#3 [ffff883fe490fcd8] vx_searchau at ffffffffa036efc6 [vxfs]
#4 [ffff883fe490fdf8] vx_extentalloc_device at ffffffffa036f945 [vxfs]
#5 [ffff883fe490fea8] vx_extentalloc_device_proxy at ffffffffa054c68f [vxfs]
#6 [ffff883fe490fec8] vx_worklist_process_high_pri_locked at ffffffffa054b0ef [vxfs]
#7 [ffff883fe490fee8] vx_worklist_dedithread at ffffffffa0551b9e [vxfs]
#8 [ffff883fe490ff28] vx_kthread_init at ffffffffa055105d [vxfs]
#9 [ffff883fe490ff48] kernel_thread at ffffffff8155f7d0

DESCRIPTION:
In the current code of emtran_process_commit(), it is possible that the EAU summary got updated without delegation of the corresponding EAU, because we clear the VX_AU_SMAPFREE flag before updating EAU summary, which could lead to possible hang. Also, some improper error handling in case of bad map can also cause some hang situations.

RESOLUTION:
To avoid potential hang, modify the code to clear the VX_AU_SMAPFREE flag after updating the EAU summary, and improve some error handling in emtran_commit/undo.

* 3973424 (Tracking ID: 3959305)

SYMPTOM:
When large number of files with named attributes are being created/written to/deleted in a loop,
along with other operations on an Selinux enabled system, some files may end up without
security attributes. This may lead to access being denied to such files later.

DESCRIPTION:
On an Selinux enabled system, during file creation, security initialisation happens and
security attributes are stored. However when there are parallel create/write/delete operations
on multiple files or on same files multiple times which have named attributes, due to a race condition,
it is possible that a security attribute initialization may get skipped for some files. Since these
file dont have security attributes set, at later time Selinux security module will prevent access
to such files for other operations. These operations will fail with access denied error.

RESOLUTION:
If this is a file creation context, then while writing named attributes also attempt to do security
initialisation of file by explicitly calling security initiailzation routine. This is an additional provision
(in addition to security initialisation during default file create code) to ensure that security-initialization
always happens (notwithstanding race conditions) in named attribute write codepath.

* 3973432 (Tracking ID: 3940268)

SYMPTOM:
File system having disk layout version 13 might get disabled in case the size of
the directory surpasses the vx_dexh_sz value.

DESCRIPTION:
When LDH (large directory Hash) hash directory is filled up and the buckets are
filled up, we extend the size of the hash directory. For this we create a reorg
inode and copy extent map of LDH attr inode into reorg inode. This is done using
extent map reorg function. In that function, we check whether extent reorg
structure was passed for the same inode or not. If its not, then we dont
proceed with extent copying. we setup the extent reorg structure accordingly but
while setting up the fileset index, we use inodes i_fsetindex. But in disk
layout version 13 onwards, we have overlaid the attribute inode and because of
these changes, we no longer sets i_fsetindex in attribute inode and it will
remain 0. Hence the checks in extent map reorg function is failing and resulting in
disabling FS.

RESOLUTION:
Code has been modified to pass correct fileset.

* 3973434 (Tracking ID: 3940235)

SYMPTOM:
A hang might be observed in case filesystem gets disbaled while enospace
handling is being taken care by inactive processing.
The stacktrace might look like:

cv_wait+0x3c() ]
delay_common+0x70()
vx_extfree1+0xc08()
vx_extfree+0x228()
vx_te_trunc_data+0x125c()
vx_te_trunc+0x878()
vx_trunc_typed+0x230()
vx_trunc_tran2+0x104c()
vx_trunc_tran+0x22c()
vx_trunc+0xcf0()
vx_inactive_remove+0x4ec()
vx_inactive_tran+0x13a4()
vx_local_inactive_list+0x14()
vx_inactive_list+0x6e4()
vx_workitem_process+0x24()
vx_worklist_process+0x1ec()
vx_worklist_thread+0x144()
thread_start+4()

DESCRIPTION:
In function smapchange funtion, it is possible in case of races that SMAP can
record the oldstate as VX_EAU_FREE or VX_EAU_ALLOCATED. But, the corresponding
EMAP won't be updated. This will happen if the concerned flag can get reset to 0
by some other thread in between. This leads to fm_dirtycnt leak which causes a
hang sometime afterwards.

RESOLUTION:
Code changes has been done to fix the issue by using the local variable instead of
global dflag variable directly which can get reset to 0.

* 3973435 (Tracking ID: 3922259)

SYMPTOM:
A force umount hang with stack like this:
- vx_delay
- vx_idrop
- vx_quotaoff_umount2
- vx_detach_fset
- vx_force_umount
- vx_aioctl_common
- vx_aioctl
- vx_admin_ioctl
- vxportalunlockedkioctl
- vxportalunlockedioctl
- do_vfs_ioctl
- SyS_ioctl
- system_call_fastpath

DESCRIPTION:
An opened external quota file was preventing the force umount from continuing.

RESOLUTION:
Code has been changed so that an opened external quota file will be processed
properly during the force umount.

* 3973441 (Tracking ID: 3973440)

SYMPTOM:
VxFS mount failed with error "no security xattr handler" on RHEL 7.6 when SELinux enabled (both permissive and enforcing)

# mount -t vxfs /dev/vx/dsk/mydg/myvol /my
UX:vxfs mount.vxfs: ERROR: V-3-23731: mount failed.

/var/log/messages:
Jan 7 12:18:57 server102 kernel: SELinux: (dev VxVM10000, type vxfs) has no security xattr handler

DESCRIPTION:
On RHEL7.6, VxFS mount failed if SElinux is enabled (both permissive and enforcing). During mount below error can be observed.

# mount -t vxfs /dev/vx/dsk/mydg/myvol /my
UX:vxfs mount.vxfs: ERROR: V-3-23731: mount failed.

/var/log/messages:
Jan 7 12:18:57 server102 kernel: SELinux: (dev VxVM10000, type vxfs) has no security xattr handler

RESOLUTION:
Added code to allow VxFS mount when SElinux is enabled.

* 3973445 (Tracking ID: 3959299)

SYMPTOM:
When large number of files are created at once, on a system with Selinux enabled, the file creation
may take longer time as compared to on a system with Selinux disabled.

DESCRIPTION:
On an Selinux enabled system, during file creation Selinux security labels are needed to be stored as
extended attributes. This requires allocation of attribute inode and it's data extent. The content of
the extent are read synchronously into the buffer. If this is a newly allocated extent, it's content
are anyway garbage. And it will get overwritten with the attribute data containing Selinux security
labels. Thus it was found that, for newly allocated attribute extents, the read operation is redundant.

RESOLUTION:
As a fix, for newly allocated attribute extent the reading of the data from that extent is skipped.
However, If the allocated extent gets merged with previously allocated extent, then extent returned
by allocator could be a combined extent. In such cases, read of entire extent is allowed to ensure
that previously written data is correctly loaded in-core.

* 3973526 (Tracking ID: 3932163)

SYMPTOM:
Temporary files are being created in /tmp

DESCRIPTION:
The VxFS component is creating files in /tmp

RESOLUTION:
Added code to redirect temporary files in common location.

* 3973529 (Tracking ID: 3943232)

SYMPTOM:
System panic in vx_unmount_cleanup_notify when unmounting file system.

DESCRIPTION:
Every vnode having watches on it gets attached to root vnode of file system via vnode hook
v_inotify_list during dentry purge. When user removes all watches from vnode, vnode is destroyed and
VxFS free their associated memory. But it's possible that this vnode is still attached to root vnode list.
during unmount, if Vxfs pick this vnode from root vnode list, then this could lead to null pointer deference
when trying to access freed memory. To fix this issue, VxFS will now remove such vnodes from root vnode
list.

RESOLUTION:
Code is modified to remove Vnode from root vnode list.

* 3973567 (Tracking ID: 3947648)

SYMPTOM:
Due to the wrong auto tuning of vxfs_ninode/inode cache, there could be hang
observed due to lot of memory pressure.

DESCRIPTION:
If kernel heap memory is very large(particularly observed from SOLARIS T7
servers), there can be overflow due to smaller size data type.

RESOLUTION:
Changed the code to handle overflow.

* 3973574 (Tracking ID: 3925281)

SYMPTOM:
Hexdump the incore inode data and piggyback data when inode revalidation fails.

DESCRIPTION:
While assuming the inode ownership, if inode revalidation fails with piggyback
data, then we doesn't hexdump the piggyback and incore inode data. This will loose the
current
state of inode. Added inode revalidation failure message and hexdump the incore inode data
and
piggyback data.

RESOLUTION:
Code is modified to print hexdump of incore inode and piggyback data when
revalidation of inode fails.

* 3973578 (Tracking ID: 3941942)

SYMPTOM:
If fiostats_enabled filesystem is created, and if odmwrites are in progress,
forcefully unmounting the filesystem can panic the system.

crash_kexec
oops_end
no_context
__bad_area_nosemaphore
bad_area_nosemaphore
__do_page_fault
do_page_fault
vx_fiostats_free
fdd_chain_inactive_common
fdd_chain_inactive
fdd_odm_close
odm_vx_close
odm_fcb_free
odm_fcb_rel
odm_ident_close
odm_exit
odm_tsk_daemon_deathlist
odm_tsk_daemon
odm_kthread_init
kernel_thread

DESCRIPTION:
When we are freeing fiostats assigned to an inode, when we unmount the
filesystem forcefully, we have to validate fs field. Otherwise we may end up
in a situation where we dereference NULL pointer for checks in this codepath,
which panics.

RESOLUTION:
Code is modified to add checks to validate fs in such scenarios of force
unmount.

* 3973661 (Tracking ID: 3931761)

SYMPTOM:
cluster wide hang may be observed in a race scenario in case freeze gets
initiated and there are multiple pending workitems in the worklist related to
lazy isize update workitems.

DESCRIPTION:
If lazy_isize_enable tunable is ON and "ls -l" is getting executed from the
non-writing node of the cluster frequently, it accumulates a huge number of
workitems to get processed by worker threads. In case there is any workitem with
active level 1 held which is enqueued after these workitems and clusterwide
freeze
gets initiated, it leads to deadlock situation. The worker threads
would get exhausted in processing the lazy isize update work items and the
thread
which is enqueued in the worklist would never get a chance to be processed.

RESOLUTION:
code changes have been done to handle this race condition.

* 3973666 (Tracking ID: 3973668)

SYMPTOM:
Following error is thrown by modinst script:
/etc/vx/modinst-vxfs: line 251: /var/VRTS/vxfs/sort.xxx: No such file or directory

DESCRIPTION:
After the changes done through e3935401, the files created by modinst-vxfs.sh script are dumped in
/var/VRTS/vxfs. If '/var' happens to be a separate filesystem, it is mounted by boot.localfs script.
boot.localfs starts after boot.vxfs(evident from boot logs).
Hence the file creation fails & boot.vxfs doesn't load modules.

RESOLUTION:
Adding the dependency of boot.localfs in LSB of boot.vxfs will
cause localfs to run before boot.vxfs thereby fixing the issue.

* 3973669 (Tracking ID: 3944884)

SYMPTOM:
ZFOD extents are being pushed on the clones.

DESCRIPTION:
In case of logged writes on ZFOD extents on primary, ZFOD extents are
are pushed on clones which is not expected which results into internal write test
failures.

RESOLUTION:
Code has been modified not to push ZFOD extents on clones.

* 3973759 (Tracking ID: 3902600)

SYMPTOM:
Contention observed on vx_worklist_lk lock in cluster mounted file
system with ODM

DESCRIPTION:
In CFS environment for ODM async i/o reads, iodones are done
immediately, calling into ODM itself from the interrupt handler. But all
CFS writes are currently processed in delayed fashion, where the requests
are queued and processed later by the worker thread. This was adding delays
in ODM writes.

RESOLUTION:
Optimized the IO processing of ODM work items on CFS so that those
are processed in the same context if possible.

* 3973778 (Tracking ID: 3926061)

SYMPTOM:
File system got disabled, following by a system panic due to NULL pointer
dereference with stack like this:
vx_idalloc_off
vx_dalloc_do_flush_file
vx_dalloc_flush_file
vx_workitem_process
vx_worklist_process
vx_worklist_thread

DESCRIPTION:
There's a possible race condition between two threads doing vx_idalloc_off()
when file system is getting disabled.

RESOLUTION:
Code is modified to eliminate the race possibility.

* 3973878 (Tracking ID: 3917319)

SYMPTOM:
Server panics randomly with VxFS stack. stack may look like below :

vx_putpage_dirty+0xfc()
vx_do_putpage+0xbc()
vx_putpage1+0x2e8()
vxfs:vx_putpage()
fop_putpage+0x4c()
fsflush_do_pages+0x344()
fsflush+0x368()
thread_start+4()

DESCRIPTION:
Because of race between thread which turns delayed allocation off and file
system flusher thread, delayed allocation related structure is accessed when it
was already turned off by the other thread. This leads to NULL pointer de-
reference resulting in panic

RESOLUTION:
Fixed the race between the threads to avoid panics.

* 3974105 (Tracking ID: 3941620)

SYMPTOM:
High memory usage is seen on Solaris system with delayed allocation enabled.

DESCRIPTION:
With VxFS delayed allocation enabled, in some unnecessary condition, it
could
make some put page operations return early before pages flushed and without
retry later. Then only one vxfs working thread works on page flushing
through
delayed allocation flush path. This will result in the dirty page flushing
much slow, and cause the memory starvation of the system.

RESOLUTION:
Code change has been done to make the page flushing work through multiple
threads as normal.

* 3974148 (Tracking ID: 3957285)

SYMPTOM:
job promote operation executed on replication target node fails with error message like:

# /opt/VRTS/bin/vfradmin job promote myjob1 /mnt2
UX:vxfs vfradmin: INFO: V-3-28111: Current replication direction:
<machine1>:/mnt1 -> <machine2>:/mnt2
UX:vxfs vfradmin: INFO: V-3-28112: If you continue this command, replication direction will change to:
<machine2>:/mnt2 -> <machine1>:/mnt1
UX:vxfs vfradmin: QUESTION: V-3-28101: Do you want to continue? [ynq]y
UX:vxfs vfradmin: INFO: V-3-28090: Performing final sync for job myjob1 before promoting...
UX:vxfs vfradmin: INFO: V-3-28099: Job promotion failed. If you continue, replication will be stopped and the filesystem will be made available on this host for
use. To resume replication when <machine1> returns, use the vfradmin job recover command.
UX:vxfs vfradmin: INFO: V-3-28100: Continuing may result in data loss.
UX:vxfs vfradmin: QUESTION: V-3-28101: Do you want to continue? [ynq]y
UX:vxfs vfradmin: INFO: V-3-28227: Unable to unprotect filesystem.

DESCRIPTION:
job promote from target node sends promote operation related message to source node. After this message is processed on source side, 'seqno' file updating/write
is done. 'seqno' file is created on target side and not present on source side, hence 'seqno' file update returns error and promote fails.

RESOLUTION:
'seqno' file write is not required as part of promote message. Passing SKIP_SEQNO_UPDATE flag in promote message so that seqno file write is skipped on
source side during promote processing.
Note: job should be stopped on source node before doing promote from target node.

Patch ID: VRTSvxfs-7.2.0.300

* 3917814 (Tracking ID: 3918285)

SYMPTOM:
During extent allocation for write operation in local mount file
system, if error is occurred during commit of State Map transaction, then count
of pending delayed allocation may remain inaccurate.

DESCRIPTION:
During extent allocation for write operation in local mount file
system, if error is occurred during commit of State Map transaction, then count
of pending delayed allocation may remain inaccurate due to wrong handling in
error code path.

RESOLUTION:
Code is modified to correct count for pending delayed allocation, if
error is occurred during commit of State Map transaction.

* 3949175 (Tracking ID: 3938544)

SYMPTOM:
VxFS module failed to load on RHEL7.5.

DESCRIPTION:
Since RHEL7.5 is new release therefore VxFS module failed to load
on it.

RESOLUTION:
Added VxFS support for RHEL7.5.

Patch ID: VRTsvxfs-7.2.0.200

* 3911407 (Tracking ID: 3917013)

SYMPTOM:
fsck may fail with following error message.

# fsck -t vxfs -o full -n /dev/vx/dsk/testdg/vol1

UX:vxfs fsck.vxfs: ERROR: V-3-28484: File system block size is not aligned
with supported sector size.

DESCRIPTION:
If superblock of filesystem is corrupted along with the block size
field and now the block size is not aligned with supported sector size then
while recovering the filesystem, the fsck may fail.

RESOLUTION:
Code changes has been done to add new flag to skip the sector and
block size alignment check.

* 3912604 (Tracking ID: 3914488)

SYMPTOM:
Local mount may fail with the following error message and as a result FULLFSCK get set on the filesystem:
vx_lctbad - <filesystemname> file system link count table bad

DESCRIPTION:
During Extop processing while local mounting a filesystem (primary fileset), LCT may get
marked bad and FULLFSCK may get set on the Filesystem. This is a corner case where earlier unmount opertion on the
filesystem is not clean.

RESOLUTION:
Code Changes have been done to merge the LCT while mounting the primary fileset.

* 3914871 (Tracking ID: 3915125)

SYMPTOM:
File system freeze was stuck because of the deadlock among kernel threads
while writing to a file.

DESCRIPTION:
While writing to any file, file system allocates space on disk. It needs
to search for free blocks and hence needs to scan through various metadata
information, like state map (SMAP), extent bitmap (EMAP), etc which are per-
allocation unit (AU) information.
Now, the problem was: we were not holding the global lock (GLM) while making
changes to the particular allocation unit (AU).
Because of this, the allocator thread was continuously looping.

RESOLUTION:
Return error from the function which does the allocation in case the we
modified the state map metadata and couldn't get appropriate GLM lock.

* 3916912 (Tracking ID: 3916914)

SYMPTOM:
On Disk Layout Version 11, FileSystem may run into ENOSPC condition even if
Filesystem
still has space available.

DESCRIPTION:
Filesystem can run into enospc condition with DLV 11 and logversion 12, although
it has space but bitmaps are marked allocated instead of being marked free.

RESOLUTION:
Code changes have been done to reflect the correct log record conversion.

* 3917961 (Tracking ID: 3919130)

SYMPTOM:
The following failure may be observed while setting named attribute
using nxattrset command:
INFO: V-3-28410: nxattrset failed for <filename> with error 48

DESCRIPTION:
In Solaris platform, the user land structures are of 32 bits and
kernel structures are of 64 bits. The transition from user space to kernel space
reads information regarding attributes from 32 bit structures. There is a flag
field in 64 bit structure that is used for further processing, but due to the
missing flag field in 32 bit structure, the corresponding flag field of 64 bits
structure does not get initialized and hence contains the garbage value. It
leads to the failure of nxattrset command.

RESOLUTION:
The required flag field has been introduced in 32 bit structure.

* 3926159 (Tracking ID: 3923307)

SYMPTOM:
VxFS module failed to load on RHEL7.4.

DESCRIPTION:
Since RHEL7.4 is new release therefore VxFS module failed to load
on it.

RESOLUTION:
Added VxFS support for RHEL7.4.

Patch ID: VRTSvxfs-7.2.0.100

* 3909937 (Tracking ID: 3908954)

SYMPTOM:
Whilst performing vectored writes, using writev(), where two iovec-writes write
to different offsets within the same 4K page-aligned range of a file, it is
possible to find null data at the beginning of the 4Kb range when reading
the
data back.

DESCRIPTION:
Whilst multiple processes are performing vectored writes to a file, using
writev(),

The following situation can occur

We have 2 iovecs, the first is 448 bytes and the second is 30000 bytes. The
first iovec of 448 bytes completes, however the second iovec finds that the
source page is no longer in memory. As it cannot fault-in during uiomove, it has
to undo both iovecs. It then faults the page back in and retries the second iovec
only. However, as the undo operation also undid the first iovec, the first 448
bytes of the page are populated with nulls. When reading the file back, it seems
that no data was written for the first iovec. Hence, we find nulls in the file.

RESOLUTION:
Code has been changed to handle the unwind of multiple iovecs accordingly in the
scenarios where certain amount of data is written out of a particular iovec and
some from other.

* 3909938 (Tracking ID: 3905099)

SYMPTOM:
VxFS unmount panicked in deactive_super(), the panic stack looks like
following:

#9 vx_fsnotify_flush [vxfs]
#10 vx_softcnt_flush [vxfs]
#11 vx_idrop [vxfs]
#12 vx_detach_fset [vxfs]
#13 vx_unmount [vxfs]
#14 generic_shutdown_super
#15 kill_block_super
#16 vx_kill_sb
#17 amf_kill_sb
#18 deactivate_super
#19 mntput_no_expire
#20 sys_umount
#21 system_call_fastpath

DESCRIPTION:
Suspected there is a a race between unmount, and a user-space notifier install
for root inode.

RESOLUTION:
Added diaged code and defensive check for fsnotify_flush in vx_softcnt_flush.

* 3909939 (Tracking ID: 3906548)

SYMPTOM:
Read-only file system errors reported while loading drivers to start up vxfs
through systemd.

DESCRIPTION:
On SLES system, vxfs starting up could be invoked by other systemd unit while
root
and local file system still not be mounted as read/write, in such case, Read-
only
file system errors could be reported during system start up.

RESOLUTION:
Start up vxfs after local file system mounted with rw through systemd, also
delay vxfs file system mounting by add "_netdev" mount option in /etc/fstab.

* 3909940 (Tracking ID: 3894712)

SYMPTOM:
ACL permissions are not inherited correctly on cluster file system.

DESCRIPTION:
The ACL counts stored on a directory inode gets reset every
time directory inodes
ownership is switched between the nodes. When ownership on directory inode
comes back to the node,
which previously abdicated it, ACL permissions were not getting inherited
correctly for the newly
created files.

RESOLUTION:
Modified the source such that the ACLs are inherited correctly.

* 3909941 (Tracking ID: 3868609)

SYMPTOM:
High CPU usage seen because of vxfs thread.

DESCRIPTION:
To avoid memory deadlocks, and to track exiting
threads with outstanding ODM requests, we need to hook into
he kernels memory management. While rescheduling happens for
Oracle threads, they hold the mmap_sem on which FDD threads
keep on waiting causing contention and high CPU usage.

RESOLUTION:
Remove the bouncing of the spinlock between the
CPUs, and so reduce the CPU spike.

* 3909943 (Tracking ID: 3729030)

SYMPTOM:
The fsdedupschd daemon failed to start on RHEL7.

DESCRIPTION:
The dedup service daemon failed to start because RHEL 7 changed the service
management mechanism. The daemon uses the new systemctl to start and stop
the service. For the systemctl to properly start, stop, or query the
service, it needs a service definition file under the
/usr/lib/systemd/system.

RESOLUTION:
The code is modified to create the fsdedupschd.service file while installing
the VRTSfsadv package.

* 3909946 (Tracking ID: 3685391)

SYMPTOM:
Execute permissions for a file not honored correctly.

DESCRIPTION:
The user was able to execute the file regardless of not having the execute permissions.

RESOLUTION:
The code is modified such that an error is reported when the execute permissions are not applied.

* 3910083 (Tracking ID: 3707662)

SYMPTOM:
Race between reorg processing and fsadm timer thread (alarm expiry) leads to panic in vx_reorg_emap with the following stack::

vx_iunlock
vx_reorg_iunlock_rct_reorg
vx_reorg_emap
vx_extmap_reorg
vx_reorg
vx_aioctl_full
vx_aioctl_common
vx_aioctl
vx_ioctl
fop_ioctl
ioctl

DESCRIPTION:
When the timer expires (fsadm with -t option), vx_do_close() calls vx_reorg_clear() on local mount which performs cleanup on reorg rct inode. Another thread currently active in vx_reorg_emap() will panic due to null pointer dereference.

RESOLUTION:
When fop_close is called in alarm handler context, we defer the cleaning up untill the kernel thread performing reorg completes its operation.

* 3910084 (Tracking ID: 3812330)

SYMPTOM:
slow ls -l across cluster nodes.

DESCRIPTION:
When we issue "ls -l" , VOP getattr is issued by which in VxFS, we
update the necessary stats of an inode whose owner is some other node in the
cluster. Ideally this update process should be done through asynchronous message
passing mechanism which is not happening in this case.Instead the non-owner
node, where we are issuing "ls -l", tries to pull strong ownership towards itself
to update the inode stats.Hence a lot of time is consumed in this ping-pong of
ownership.

RESOLUTION:
Avoiding pulling strong ownership for the inode when doing ls from
some other node which is not current owner and doing inode stats update through
asynchronous message passing mechanism using a module parameter "vx_lazyisiz".

* 3910085 (Tracking ID: 3779916)

SYMPTOM:
vxfsconvert fails to upgrade layout verison for a vxfs file system with
large number of inodes. Error message will show some inode discrepancy.

DESCRIPTION:
vxfsconvert walks through the ilist and converts inode. It stores
chunks of inodes in a buffer and process them as a batch. The inode number
parameter for this inode buffer is of type unsigned integer. The offset of a
particular inode in the ilist is calculated by multiplying the inode number with
size of inode structure. For large inode numbers this product of inode_number *
inode_size can overflow the unsigned integer limit, thus giving wrong offset
within the ilist file. vxfsconvert therefore reads wrong inode and eventually
fails.

RESOLUTION:
The inode number parameter is defined as unsigned long to avoid
overflow.

* 3910086 (Tracking ID: 3830300)

SYMPTOM:
Heavy cpu usage while oracle archive process are running on a clustered
fs.

DESCRIPTION:
The cause of the poor read performance in this case was due to fragmentation,
fragmentation mainly happens when there are multiple archivers running on the
same node. The allocation pattern of the oracle archiver processes is

1. write header with O_SYNC
2. ftruncate-up the file to its final size ( a few GBs typically)
3. do lio_listio with 1MB iocbs

The problem occurs because all the allocations in this manner go through
internal allocations i.e. allocations below file size instead of allocations
past the file size. Internal allocations are done at max 8 Pages at once. So if
there are multiple processes doing this, they all get these 8 Pages alternately
and the fs becomes very fragmented.

RESOLUTION:
Added a tunable, which will allocate zfod extents when ftruncate
tries to increase the size of the file, instead of creating a hole. This will
eliminate the allocations internal to file size thus the fragmentation. Fixed
the earlier implementation of the same fix, which ran into
locking issues. Also fixed the performance issue while writing from secondary node.

* 3910088 (Tracking ID: 3855726)

SYMPTOM:
Panic happens in vx_prot_unregister_all(). The stack looks like this:

- vx_prot_unregister_all
- vxportalclose
- __fput
- fput
- filp_close
- sys_close
- system_call_fastpath

DESCRIPTION:
The panic is caused by a NULL fileset pointer, which is due to referencing the
fileset before it's loaded, plus, there's a race on fileset identity array.

RESOLUTION:
Skip the fileset if it's not loaded yet. Add the identity array lock to prevent
the possible race.

* 3910090 (Tracking ID: 3790721)

SYMPTOM:
High CPU usage on the vxfs thread process. The backtrace of such kind of threads
usually look like this:

schedule
schedule_timeout
__down
down
vx_send_bcastgetemapmsg_remaus
vx_send_bcastgetemapmsg
vx_recv_getemapmsg
vx_recvdele
vx_msg_recvreq
vx_msg_process_thread
vx_kthread_init
kernel_thread

DESCRIPTION:
The locking mechanism in vx_send_bcastgetemapmsg_process() is inefficient. So that
every
time vx_send_bcastgetemapmsg_process() is called, it will perform a series of
down-up
operation on a certain semaphore. This can result in a huge CPU cost when multiple
threads have contention on this semaphore.

RESOLUTION:
Optimize the locking mechanism in vx_send_bcastgetemapmsg_process(),
so that it only do down-up operation on the semaphore once.

* 3910093 (Tracking ID: 1428611)

SYMPTOM:
'vxcompress' command can cause many GLM block lock messages to be
sent over the network. This can be observed with 'glmstat -m' output under the
section "proxy recv", as shown in the example below -

bash-3.2# glmstat -m
message all rw g pg h buf oth
loop
master send:
GRANT 194 0 0 0 2 0 192
98
REVOKE 192 0 0 0 0 0 192
96
subtotal 386 0 0 0 2 0 384
194

master recv:
LOCK 193 0 0 0 2 0 191
98
RELEASE 192 0 0 0 0 0 192
96
subtotal 385 0 0 0 2 0 383
194

master total 771 0 0 0 4 0 767
388

proxy send:
LOCK 98 0 0 0 2 0 96
98
RELEASE 96 0 0 0 0 0 96
96
BLOCK_LOCK 2560 0 0 0 0 2560 0
0
BLOCK_RELEASE 2560 0 0 0 0 2560 0
0
subtotal 5314 0 0 0 2 5120 192
194

DESCRIPTION:
'vxcompress' creates placeholder inodes (called IFEMR inodes) to
hold the compressed data of files. After the compression is finished, IFEMR
inode exchange their bmap with the original file and later given to inactive
processing. Inactive processing truncates the IFEMR extents (original extents
of the regular file, which is now compressed) by sending cluster-wide buffer
invalidation requests. These invalidations need GLM block lock. Regular file
data need not be invalidated across the cluster, thus making these GLM block
lock requests unnecessary.

RESOLUTION:
Pertinent code has been modified to skip the invalidation for the
IFEMR inodes created during compression.

* 3910094 (Tracking ID: 3879310)

SYMPTOM:
The file system may get corrupted after the file system freeze during
vxupgrade. The full fsck gives the following errors:

UX:vxfs fsck: ERROR: V-3-20451: No valid device inodes found
UX:vxfs fsck: ERROR: V-3-20694: cannot initialize aggregate

DESCRIPTION:
The vxupgrade requires the file system to be frozen during its functional
operation. It may happen that the corruption can be detected while the freeze
is in progress and the full fsck flag can be set on the file system. However,
this doesn't stop the vxupgrade from proceeding.
At later stage of vxupgrade, after structures related to the new disk layout
are updated on the disk, vxfs frees up and zeroes out some of the old metadata
inodes. If any error occurs after this point (because of full fsck being set),
the file system needs to go back completely to the previous version at the tile
of full fsck. Since the metadata corresponding to the previous version is
already cleared, the full fsck cannot proceed and gives the error.

RESOLUTION:
The code is modified to check for the full fsck flag after freezing the file
system during vxupgrade. Also, disable the file system if an error occurs after
writing new metadata on the disk. This will force the newly written metadata to
be loaded in memory on the next mount.

* 3910096 (Tracking ID: 3757609)

SYMPTOM:
High CPU usage because of contention over ODM_IO_LOCK

RESOLUTION:
Code modified to remove the lock contention.

* 3910097 (Tracking ID: 3817734)

SYMPTOM:
If file system with full fsck flag set is mounted, direct command message
is printed to the user to clean the file system with full fsck.

DESCRIPTION:
When mounting file system with full fsck flag set, mount will fail
and a message will be printed to clean the file system with full fsck. This
message contains direct command to run, which if run without collecting file
system metasave will result in evidences being lost. Also since fsck will remove
the file system inconsistencies it may lead to undesired data being lost.

RESOLUTION:
More generic message is given in error message instead of direct
command.

* 3910098 (Tracking ID: 3861271)

SYMPTOM:
Due to the missing inode clear action, a page can also be in a strange state.
Also, inode is not fully quiescent which leads to races in the inode code.
Sometime this can cause panic from iput_final().

DESCRIPTION:
We're missing an inode clear operation when a Linux inode is being
de-initialized on SLES11.

RESOLUTION:
Add the inode clear operation on SLES11.

* 3910101 (Tracking ID: 3846521)

SYMPTOM:
cp -p is failing with EINVAL for files with 10 digit
modification time. EINVAL error is returned if the value in tv_nsec field is
greater than/outside the range of 0 to 999, 999, 999. VxFS supports the
update in usec but when copying in the user space, we convert the usec to
nsec. So here in this case, usec has crossed the upper boundary limit i.e
999, 999.

DESCRIPTION:
In a cluster, its possible that time across nodes might
differ.so
when updating mtime, vxfs check if it's cluster inode and if nodes mtime is
newer
time than current node time, then accordingly increment the tv_usec instead of
changing mtime to older time value. There might be chance that it, tv_usec
counter got overflowed here, which resulted in 10 digit mtime.tv_nsec.

RESOLUTION:
Code is modified to reset usec counter for mtime/atime/ctime when
upper boundary limit i.e. 999999 is reached.

* 3910103 (Tracking ID: 3817734)

SYMPTOM:
If file system with full fsck flag set is mounted, direct command message
is printed to the user to clean the file system with full fsck.

RESOLUTION:
More generic message is given in error message instead of direct
command.

* 3910105 (Tracking ID: 3907902)

SYMPTOM:
System panic observed due to race between dalloc off thread and
getattr
thread.

DESCRIPTION:
With 7.2 release of VxFS, dalloc states are now stored in a new
structure. when getting attributes of file, dalloc blocks are calculated and
stored into this new structure. If a dalloc off thread races with getattr
thread, there is a possibility of dereferencing of NULL dalloc structure by
getattr thread.

RESOLUTION:
Code changes has been done to take appropriate dalloc lock while
calculating dalloc blocks in getattr function to avoid the race.

* 3910356 (Tracking ID: 3908785)

SYMPTOM:
System panic observed because of null page address in writeback structure in case of
kswapd
process.

DESCRIPTION:
Secfs2/Encryptfs layers had used write VOP as a hook when Kswapd is triggered to
free page.
Ideally kswapd should call writepage() routine where writeback structure are correctly filled. When
write VOP is
called because of hook in secfs2/encrypts, writeback structures are cleared, resulting in null page
address.

RESOLUTION:
Code changes has been done to call VxFS kswapd routine only if valid page address is
present.

* 3911290 (Tracking ID: 3910526)

SYMPTOM:
In case of a full filesystem, fsadm resize operation to increase the filesystem
size may fail with following error :
Attempt to resize <volume-name> failed with errno 28

DESCRIPTION:
If there is no space available in the filesystem and resize operation gets
initiated, intent log extents are being used for metadata setup to continue the
resize operation. If resize is successful, we update superblock with the new
size and try to reallocate new intent log inode of same size. if resize size is
smaller than log inode size, we hit the ENOSPC when reallocating intent log
inode extents. Due to this failure, resize command fails and shrinks back the
volume to its original size but the superblock continue to have the new
filesystem size.

RESOLUTION:
The code has been modified to fail the resize operation if resize
size is less than intent log inode size.

* 3911718 (Tracking ID: 3905576)

SYMPTOM:
Cluster file system hangs. On one node, all worker threads are blocked due to file
system freeze. And there's another thread blocked with stack like this:

- __schedule
- schedule
- vx_svar_sleep_unlock
- vx_event_wait
- vx_olt_iauinit
- vx_olt_iasinit
- vx_loadv23_fsetinit
- vx_loadv23_fset
- vx_fset_reget
- vx_fs_reinit
- vx_recv_cwfa
- vx_msg_process_thread
- vx_kthread_init

DESCRIPTION:
The frozen CFS won't thaw because the mentioned thread is waiting for a work item
to be processed in vx_olt_iauinit(). Since all the worker threads are blocked,
there is no free thread to process this work item.

RESOLUTION:
Change the code in vx_olt_iauinit(), so that the work item will be processed even
with all worker threads blocked.

* 3911719 (Tracking ID: 3909583)

SYMPTOM:
Disable partitioning of directory if directory size is greater than upper threshold value.

DESCRIPTION:
If PD is enabled during mount, mount may take long time to complete. Because mount
tries to partition all the directories hence looks like hung. To avoid such hangs, a new upper
threshold
value for PD is added which will disable partitioning of directory if directory size is above that value.

RESOLUTION:
Code is modified to disable partitioning of directory if directory size is greater than
upper
threshold value.

* 3911732 (Tracking ID: 3896670)

SYMPTOM:
Intermittent CFS hang like situation with many CFS pglock grant
messages pending on LLT layer.

DESCRIPTION:
To optimize the CFS locking, VxFS may send greedy pglock grant
messages to speed up the upcoming write operations. In certain scenarios
created due to particular read and write pattern across nodes, one node can
send these greedy msgs far more faster than the speed of response. This may
cause build up lot of msgs on the CFS layer and may delay the response of
other msgs and cause slowness in CFS operations.

RESOLUTION:
Fix is to send the next greedy msg only after receiving the
response of the previous one. This way at a given time there will be only
one
pglock greedy msg will be in flight.

* 3911926 (Tracking ID: 3901318)

SYMPTOM:
VxFS module failed to load on RHEL7.3.

DESCRIPTION:
Since RHEL7.3 is new release therefore VxFS module failed to load
on it.

RESOLUTION:
Added VxFS support for RHEL7.3.

* 3911968 (Tracking ID: 3911966)

SYMPTOM:
VxFS module failed to load on SLES12 SP2.

DESCRIPTION:
Since SLES12 SP2 is new release therefore VxFS module failed to load
on it.

RESOLUTION:
Added VxFS support for SLES12 SP2.

* 3912988 (Tracking ID: 3912315)

SYMPTOM:
EMAP corruption while freeing up the extent.
Feb 4 15:10:45 localhost kernel: vxfs: msgcnt 2 mesg 056: V-2-56: vx_mapbad -
vx_smap_stateupd - file system extent allocation unit state bitmap number 0
marked bad

Feb 4 15:10:45 localhost kernel: Call Trace:
vx_setfsflags+0x103/0x140 [vxfs]
vx_mapbad+0x74/0x2d0 [vxfs]
vx_smap_stateupd+0x113/0x130 [vxfs]
vx_extmapupd+0x552/0x580 [vxfs]
vx_alloc+0x3d6/0xd10 [vxfs]
vx_extsub+0x0/0x5f0 [vxfs]
vx_semapclone+0xe1/0x190 [vxfs]
vx_clonemap+0x14d/0x230 [vxfs]
vx_unlockmap+0x299/0x330 [vxfs]
vx_smap_dirtymap+0xea/0x120 [vxfs]
vx_do_extfree+0x2b8/0x2e0 [vxfs]
vx_extfree1+0x22e/0x7c0 [vxfs]
vx_extfree+0x9f/0xd0 [vxfs]
vx_exttrunc+0x10d/0x2a0 [vxfs]
vx_trunc_ext4+0x65f/0x7a0 [vxfs]
vx_validate_ext4+0xcc/0x1a0 [vxfs]
vx_trunc_tran2+0xb7f/0x1450 [vxfs]
vx_trunc_tran+0x18f/0x1e0 [vxfs]
vx_trunc+0x66a/0x890 [vxfs]
vx_iflush_list+0xaee/0xba0 [vxfs]
vx_iflush+0x67/0x80 [vxfs]
vx_workitem_process+0x24/0x50 [vxfs]

DESCRIPTION:
The issue oocurs due to wrongly validating the smap and changing
the AU state from free to allocated.

RESOLUTION:
skip validation and change the EAU state only if it is a SMAP
update.

* 3912989 (Tracking ID: 3912322)

SYMPTOM:
The vxfs tunable max_seqio_extent_size cannot be tuned to any
value less than 32768.

DESCRIPTION:
In vxfs 7.1 the default for max_seqio_extent_size was changed from
2048 to 32768. Due to a bug it doesn't allow setting any value less than 32768
for this tunable.

RESOLUTION:
Fix is to allow the tunable to be set to any value >=2048.

* 3912990 (Tracking ID: 3912407)

SYMPTOM:
CFS hang on VxFS 7.2 while thread on a CFS node waits for EAU
delegation:

__schedule at ffffffff8163b46d
schedule at ffffffff8163bb09
vx_svar_sleep_unlock at ffffffffa0a99eb5 [vxfs]
vx_event_wait at ffffffffa0a9a68f [vxfs]
vx_async_waitmsg at ffffffffa09aa680 [vxfs]
vx_msg_send at ffffffffa09aa829 [vxfs]
vx_get_dele at ffffffffa09d5be1 [vxfs]
vx_extentalloc_device at ffffffffa0923bb0 [vxfs]
vx_extentalloc at ffffffffa0925272 [vxfs]
vx_bmap_ext4 at ffffffffa0953755 [vxfs]
vx_bmap_alloc_ext4 at ffffffffa0953e14 [vxfs]
vx_bmap_alloc at ffffffffa0950a2a [vxfs]
vx_write_alloc3 at ffffffffa09c281e [vxfs]
vx_tran_write_alloc at ffffffffa09c3321 [vxfs]
vx_cfs_prealloc at ffffffffa09b4220 [vxfs]
vx_write_alloc2 at ffffffffa09c25ad [vxfs]
vx_write_alloc at ffffffffa0b708fb [vxfs]
vx_write1 at ffffffffa0b712ff [vxfs]
vx_write_common_slow at ffffffffa0b7268c [vxfs]
vx_write_common at ffffffffa0b73857 [vxfs]
vx_write at ffffffffa0af04a6 [vxfs]
vfs_write at ffffffff811ded3d
sys_write at ffffffff811df7df
system_call_fastpath at ffffffff81646b09

And a corresponding delegation receiver thread should be seen looping on CFS
primary:

PID: 18958 TASK: ffff88006c776780 CPU: 0 COMMAND: "vx_msg_thread"
__schedule at ffffffff8163a26d
mutex_lock at ffffffff81638b42
vx_emap_lookup at ffffffffa0ecf0eb [vxfs]
vx_extchkmaps at ffffffffa0e9c7b4 [vxfs]
vx_searchau_downlevel at ffffffffa0ea0923 [vxfs]
vx_searchau at ffffffffa0ea0e22 [vxfs]
vx_dele_get_freespace at ffffffffa0f53b6d [vxfs]
vx_getedele_size at ffffffffa0f54c4b [vxfs]
vx_pri_getdele at ffffffffa0f54edc [vxfs]
vx_recv_getdele at ffffffffa0f5690d [vxfs]
vx_recvdele at ffffffffa0f59800 [vxfs]
vx_msg_process_thread at ffffffffa0f2aead [vxfs]
vx_kthread_init at ffffffffa105cba4 [vxfs]
kthread at ffffffff810a5aef
ret_from_fork at ffffffff81645858

DESCRIPTION:
In VxFS 7.2 release a performance optimization was done in the way we
update the allocation state map. Prior to 7.2 the updates were done synchronously
and
in 7.2 the updates were made transactional. The hang happens when an EAU needs to be
converted back to FREE state after all the allocation from it is freed. In such case
if
the corresponding EAU delegation may time out before we can complete the process, it
can result in inconsistent state. Because of this inconsistency, in future when a
node
will try to get delegation of this EAU, the primary may loop forever resulting the
secondary to wait infinitely for the AU delegation.

RESOLUTION:
Code fix is done to not allow the delegation to timeout till the free
processing is complete.

* 3913004 (Tracking ID: 3911048)

SYMPTOM:
The LDH bucket validation failure message is logged and system hang.

DESCRIPTION:
When modifying a large directory, vxfs needs to find a new bucket in the LDH
for this
directory, and once the bucket is full, it will be be split to get more
bucket to use.
When the bucket is split to maximum amount, overflow bucket will be
allocated. Under
some condition, the available bucket lookup on overflow bucket will may got
incorrect
result and overwrite the existing bucket entry thus corrupt the LDH file.
Another
problem is that when the bucket invalidation failed, the bucket buffer is
released
without checking whether the buffer is already in a previous transaction,
this may
cause the transaction flush thread to hang and finally stuck the whole
filesystem.

RESOLUTION:
Correct the LDH bucket entry change code to avoid the corrupt. And release
the bucket
buffer without throw it out of memory to avoid blocking the transaction
flush.

* 3914384 (Tracking ID: 3915578)

SYMPTOM:
vxfs module failing to load after reaboot with dmesg log:
[ 3.689385] systemd-sysv-generator[518]: Overwriting existing symlink
/run/systemd/generator.late/vxfs.service with real service.
[ 4.738143] vxfs: Unknown symbol ki_get_boot (err 0)
[ 4.793301] vxfs: Unknown symbol ki_get_boot (err 0)

DESCRIPTION:
The vxfs module is dependent on the veki module. During boot time,
the veki is not loaded before vxfs. So the vxfs module fails to load.

RESOLUTION:
During boot the veki should start before loading vxfs module.

INSTALLING THE PATCH
--------------------
Manual installation is not recommended.

REMOVING THE PATCH
------------------
Manual uninstallation is not recommended.

SPECIAL INSTRUCTIONS
--------------------
NONE

OTHERS
------
NONE