infoscale-sles12_x86_64-Patch-7.4.2.3400

 Basic information
Release type: Patch
Release date: 2023-07-12
OS update support: None
Technote: None
Documentation: None
Popularity: 121 viewed    downloaded
Download size: 201.65 MB
Checksum: 941452599

 Applies to one or more of the following products:
InfoScale Availability 7.4.2 On SLES12 x86-64
InfoScale Enterprise 7.4.2 On SLES12 x86-64
InfoScale Foundation 7.4.2 On SLES12 x86-64
InfoScale Storage 7.4.2 On SLES12 x86-64

 Obsolete patches, incompatibilities, superseded patches, or other requirements:
None.

 Fixes the following incidents:
4011971, 4012765, 4013169, 4013420, 4014720, 4015287, 4015834, 4015835, 4016721, 4017282, 4017818, 4017820, 4018173, 4018178, 4018182, 4019877, 4020055, 4020056, 4020207, 4020438, 4020912, 4021238, 4021240, 4021346, 4021366, 4023095, 4031342, 4037283, 4037288, 4040238, 4040608, 4040612, 4040618, 4042038, 4042686, 4044184, 4046265, 4046266, 4046267, 4046271, 4046272, 4046829, 4046906, 4046907, 4046908, 4047568, 4047592, 4047695, 4047722, 4048120, 4049091, 4049097, 4049268, 4050870, 4051703, 4052119, 4054311, 4056329, 4056919, 4058873, 4060549, 4060566, 4060585, 4060805, 4060839, 4060962, 4060966, 4061004, 4061036, 4061055, 4061057, 4061203, 4061298, 4061317, 4061509, 4061527, 4062461, 4062577, 4062746, 4062747, 4062751, 4062755, 4063374, 4064523, 4066930, 4067706, 4067710, 4067712, 4067713, 4067715, 4067717, 4067914, 4067915, 4069522, 4069523, 4069524, 4070099, 4070186, 4070253, 4071105, 4071131, 4072874, 4074298, 4075873, 4075875, 4079532, 4083792, 4083948, 4084881, 4086043, 4088078, 4089394, 4090311, 4090411, 4090415, 4090442, 4090541, 4090573, 4090599, 4090600, 4090601, 4090604, 4090617, 4090639, 4090932, 4090946, 4090960, 4090970, 4091248, 4091580, 4091588, 4091910, 4091911, 4091912, 4091963, 4091989, 4092002, 4093306, 4099550, 4102424, 4106001, 4106702, 4108085

 Patch ID:
VRTSaslapm-7.4.2.3700-SLES12
VRTSvxvm-7.4.2.3700-SLES12
VRTSvxfs-7.4.2.4100-SLES12
VRTSpython-3.7.4.38-SLES12

Readme file
                          * * * READ ME * * *
                      * * * InfoScale 7.4.2 * * *
                         * * * Patch 3400 * * *
                         Patch Date: 2023-02-22


This document provides the following information:

   * PATCH NAME
   * OPERATING SYSTEMS SUPPORTED BY THE PATCH
   * PACKAGES AFFECTED BY THE PATCH
   * BASE PRODUCT VERSIONS FOR THE PATCH
   * SUMMARY OF INCIDENTS FIXED BY THE PATCH
   * DETAILS OF INCIDENTS FIXED BY THE PATCH
   * INSTALLATION PRE-REQUISITES
   * INSTALLING THE PATCH
   * REMOVING THE PATCH


PATCH NAME
----------
InfoScale 7.4.2 Patch 3400


OPERATING SYSTEMS SUPPORTED BY THE PATCH
----------------------------------------
SLES12 x86-64


PACKAGES AFFECTED BY THE PATCH
------------------------------
VRTSaslapm
VRTSpython
VRTSvxfs
VRTSvxvm


BASE PRODUCT VERSIONS FOR THE PATCH
-----------------------------------
   * InfoScale Availability 7.4.2
   * InfoScale Enterprise 7.4.2
   * InfoScale Foundation 7.4.2
   * InfoScale Storage 7.4.2


SUMMARY OF INCIDENTS FIXED BY THE PATCH
---------------------------------------
Patch ID: VRTSvxvm-7.4.2.3700
* 4106001 (4102501) A security vulnerability exists in the third-party component libcurl.
Patch ID: VRTSvxvm-7.4.2.3600
* 4052119 (4045871) vxconfigd crashed at ddl_get_disk_given_path.
* 4086043 (4072241) vxdiskadm functionality is failing due to changes in dmpdr script
* 4090311 (4039690) Change the logger files size and do the gzip on logger files.
* 4090411 (4054685) In case of CVR environment, RVG recovery gets hung in linux platforms.
* 4090415 (4071345) Unplanned fallback synchronisation is unresponsive
* 4090442 (4078537) Connection to s3-fips bucket is failing
* 4090541 (4058166) Increase DCM log size based on volume size without exceeding region size limit of 4mb.
* 4090599 (4080897) Performance drop on raw VxVM volume in RHEL 8.x compared to RHEL7.X
* 4090604 (4044529) DMP is unable to display PWWN details for some LUNs by "vxdmpadm getportids".
* 4090932 (3996634) System boots slow since Linux lsblk command return within long time.
* 4090946 (4023297) Smartmove functionality was not being used after VVR Rlink was paused and resumed during VVR initial sync or DCM resync operation.
* 4090960 (4087770) NBFS: Data corruption due to skipped full-resync of detached mirrors of volume after DCO repair operation
* 4090970 (4017036) After enabling DMP (Dynamic Multipathing) Native support, enable /boot to be
mounted on DMP device when Linux is booting with systemd.
* 4091248 (4040808) df command hung in clustered environment
* 4091588 (3966157) SRL batching feature is broken
* 4091910 (4090321) Increase timeout for vxvm-boot systemd service
* 4091911 (4090192) Increase number of DDL threads for faster discovery
* 4091912 (4090234) Volume Manager Boot service is failing after reboot the system.
* 4091963 (4067191) In CVR environment after rebooting Slave node, Master node may panic
* 4091989 (4090930) [NBFS-3.1]: MASTER FS corruption is seen in loop reboot (-f) test
* 4092002 (4081740) vxdg flush command slow due to too many luns needlessly access /proc/partitions.
* 4099550 (4065145) multivolume and vset not able to overwrite encryption tags on secondary.
* 4102424 (4103350) vxvm-encrypted.service going into failed state on secondary site on performing "vradmind -g <dg> -encrypted addsec <rvg> <prim_ip> <sec_ip>" command.
Patch ID: VRTSvxvm-7.4.2.3300
* 4083792 (4082799) A security vulnerability exists in the third-party component libcurl.
Patch ID: VRTSvxvm-7.4.2.3200
* 4011971 (3991668) In a Veritas Volume Replicator (VVR) configuration where secondary logging is enabled, data inconsistency is reported after the "No IBC message arrived" error is encountered.
* 4013169 (4011691) High CPU consumption on the VVR secondary nodes because of high pending IO load.
* 4037288 (4034857) VxVM support on SLES 15 SP2
* 4048120 (4031452) vxesd core dump in esd_write_fc()
* 4051703 (4010794) When storage activity was going on, Veritas Dynamic Multi-Pathing (DMP) caused system panic in a cluster.
* 4052119 (4045871) vxconfigd crashed at ddl_get_disk_given_path.
* 4054311 (4040701) Some warnings are observed while installing vxvm package.
* 4056329 (4056156) VxVM Support for SLES15 Sp3
* 4056919 (4056917) Import of disk group in Flexible Storage Sharing (FSS) with missing disks can lead to data corruption.
* 4058873 (4057526) Adding check for init while accessing /var/lock/subsys/ path in vxnm-vxnetd.sh script.
* 4060839 (3975667) Softlock in vol_ioship_sender kernel thread
* 4060962 (3915202) Reporting repeated disk failures & DCPA events for other internal disks
* 4060966 (3959716) System may panic with sync replication with VVR configuration, when the RVG is in DCM mode.
* 4061004 (3993242) vxsnap prepare command when run on vset sometimes fails.
* 4061036 (4031064) Master switch operation is hung in VVR secondary environment.
* 4061055 (3999073) The file system corrupts when the cfsmount group goes into offline state.
* 4061057 (3931583) Node may panic while unloading the vxio module due to race condition.
* 4061298 (3982103) I/O hang is observed in VVR.
* 4061317 (3925277) DLE (Dynamic Lun Expansion) of single path GPT disk may corrupt disk public region.
* 4061509 (4043337) logging fixes for VVR
* 4062461 (4066785) create new option usereplicatedev=only to import the replicated LUN only.
* 4062577 (4062576) hastop -local never finishes on Rhel8.4 and RHEL8.5 servers with latest minor kernels due to hang in vxdg deport command.
* 4062746 (3992053) Data corruption may happen with layered volumes due to some data not re-synced while attaching a plex.
* 4062747 (3943707) vxconfigd reconfig hang when joing a cluster
* 4062751 (3989185) In a Veritas Volume Manager(VVR) environment vxrecover command can hang.
* 4062755 (3978453) Reconfig hang during master takeover
* 4063374 (4005121) Application IOPS drop in DCM mode with DCO-integrated DCM
* 4064523 (4049082) I/O read error is displayed when remote FSS node rebooting.
* 4066930 (3951527) Data loss on DR site seen while upgrading from Infoscale 7.3.1 or before to 7.4.x or later versions.
* 4067706 (4060462) Nidmap information is not cleared after a node leaves, resulting in add node failure subsequently.
* 4067710 (4064208) Node failed to join the existing cluster after bits are upgraded to a newer version.
* 4067712 (3868140) VVR primary site node might panic if the rlink disconnects while some data is getting replicated to secondary.
* 4067713 (3997531) Fail to start the VVR replication as vxnetd threads are not running
* 4067715 (4008740) Access to freed memory
* 4067717 (4009151) Auto-import of diskgroup on system reboot fails with error 'Disk for diskgroup not found'.
* 4067914 (4037757) Add a tunable to control auto start VVR services on boot up.
* 4067915 (4059134) Resync takes too long on raid-5 volume
* 4069522 (4043276) vxattachd is onlining previously offlined disks.
* 4069523 (4056751) Import read only cloned disk corrupts private region
* 4069524 (4056954) Vradmin addsec failures when encryption is enabled over wire
* 4070099 (3159650) Implemented vol_vvr_use_nat tunable support for vxtune.
* 4070186 (4041822) In an SRDF/Metro array setup, the last path is in the enabled state even after all the host and the array-side switch ports are disabled.
* 4070253 (3911930) Provide a way to clear the PGR_FLAG_NOTSUPPORTED flag on the device instead of using exclude and include commands.
* 4071131 (4071605) A security vulnerability exists in the third-party component libxml2.
* 4072874 (4046786) FS becomes NOT MOUNTED after powerloss/poweron on all nodes.
Patch ID: VRTSvxvm-7.4.2.2200
* 4018173 (3852146) A shared disk group (DG) fails to be imported when "-c" and "-o noreonline" are specified together.
* 4018178 (3906534) After Dynamic Multi-Pathing (DMP) Native support is enabled, /boot should to be mounted on the DMP device.
* 4031342 (4031452) vxesd core dump in esd_write_fc()
* 4037283 (4021301) Data corruption issue observed in VxVM on RHEL8.
* 4042038 (4040897) Add support for HPE MSA 2060 arrays in the current ASL.
* 4046906 (3956607) A core dump occurs when you run the vxdisk reclaim command.
* 4046907 (4041001) In a VxVM environment, a system hangs when some nodes are rebooted.
* 4046908 (4038865) System panick at vxdmp module in IRQ stack.
* 4047592 (3992040) bi_error - bi_status conversion map added for proper interpretation of errors at FS side.
* 4047695 (3911930) Provide a way to clear the PGR_FLAG_NOTSUPPORTED flag on the device instead of using exclude and include commands.
* 4047722 (4023390) Vxconfigd keeps dump core as invalid private region offset on a disk.
* 4049268 (4044583) A system goes into the maintenance mode when DMP is enabled to manage native devices.
Patch ID: VRTSvxvm-7.4.2.1500
* 4018182 (4008664) System panic when signal vxlogger daemon that has ended.
* 4020207 (4018086) system hang was observed when RVG was in DCM resync with SmartMove as ON.
* 4020438 (4020046) DRL log plex gets detached unexpectedly.
* 4021238 (4008075) Observed with ASL changes for NVMe, This issue observed in reboot scenario. For every reboot machine was hitting panic And this was happening in loop.
* 4021240 (4010612) This issue observed for NVMe and ssd. where every disk has separate enclosure like nvme0, nvme1... so on. means every nvme/ssd disks names would be 
hostprefix_enclosurname0_disk0, hostprefix_enclosurname1_disk0....
* 4021346 (4010207) System panicked due to hard-lockup due to a spinlock not released properly during the vxstat collection.
* 4021366 (4008741) VxVM device files are not correctly labeled to prevent unauthorized modification - device_t
* 4023095 (4007920) Control auto snapshot deletion when cache obj is full.
Patch ID: VRTSpython-3.7.4.38
* 4108085 (4108043) Need to remove multiple rest modules from VRTSpython for 7.4.2
Patch ID: VRTSvxfs-7.4.2.4100
* 4106702 (4106701) A security vulnerability exists in the third-party component sqlite.
Patch ID: VRTSvxfs-7.4.2.3900
* 4050870 (3987720) vxms test is having failures.
* 4071105 (4067393) Panic "UG: unable to handle kernel NULL pointer dereference at 00000000000009e0."
* 4074298 (4069116) fsck got stuck in pass1 inode validation.
* 4075873 (4075871) Utility to find possible pending stuck messages.
* 4075875 (4018783) Metasave collection and restore takes significant amount of time.
* 4084881 (4084542) Enhance fsadm defrag report to display if FS is badly fragmented.
* 4088078 (4087036) The fsck binary has been updated to fix a failure while running with the "-o metasave" option on a shared volume.
* 4090573 (4056648) Metasave collection can be executed on a mounted filesystem.
* 4090600 (4090598) Utility to detect culprit nodes while cfs hang is observed.
* 4090601 (4068143) fsck->misc is having failures.
* 4090617 (4070217) Command fsck might fail with 'cluster reservation failed for volume' message for a disabled cluster-mounted filesystem.
* 4090639 (4086084) VxFS mount operation causes system panic.
* 4091580 (4056420) VFR  Hardlink file is not getting replicated after modification in incremental sync.
* 4093306 (4090127) CFS hang in vx_searchau().
Patch ID: VRTSvxfs-7.4.2.3600
* 4089394 (4089392) Security vulnerabilities exist in the OpenSSL third-party components used by VxFS.
Patch ID: VRTSvxfs-7.4.2.3500
* 4083948 (4070814) Security Vulnerability in VxFS third party component Zlib
Patch ID: VRTSvxfs-7.4.2.3400
* 4079532 (4079869) Security Vulnerability in VxFS third party components
Patch ID: VRTSvxfs-7.4.2.2600
* 4015834 (3988752) Use ldi_strategy() routine instead of bdev_strategy() for IO's in solaris.
* 4040612 (4033664) Multiple different issues occur with hardlink replication using VFR.
* 4040618 (4040617) Veritas file replicator is not performing as per the expectation.
* 4060549 (4047921) Replication job getting into hung state when pause/resume operations performed repeatedly.
* 4060566 (4052449) Cluster goes in an 'unresponsive' mode while invalidating pages due to duplicate page entries in iowr structure.
* 4060585 (4042925) Intermittent Performance issue on commands like df and ls.
* 4060805 (4042254) A new feature has been added in vxupgrade which fails disk-layout upgrade if sufficient space is not available in the filesystem.
* 4061203 (4005620) Internal counter of inodes from Inode Allocation Unit (IAU) can be negative if IAU is marked bad.
* 4061527 (4054386) If systemd service fails to load vxfs module, the service still shows status as active instead of failed.
Patch ID: VRTSvxfs-7.4.2.2200
* 4013420 (4013139) The abort operation on an ongoing online migration from the native file system to VxFS on RHEL 8.x systems.
* 4040238 (4035040) vfradmin stats command failed to show all the fields in the command output in-case job paused and resume.
* 4040608 (4008616) fsck command got hung.
* 4042686 (4042684) ODM resize fails for size 8192.
* 4044184 (3993140) Compclock was not giving accurate results.
* 4046265 (4037035) Added new tunable "vx_ninact_proc_threads" to control the number of inactive processing threads.
* 4046266 (4043084) panic in vx_cbdnlc_lookup
* 4046267 (4034910) Asynchronous access/updatation of global list large_dirinfo  can corrupt its values in multi-threaded execution.
* 4046271 (3993822) fsck stops running on a file system
* 4046272 (4017104) Deleting a lot of files can cause resource starvation, causing panic or momentary hangs.
* 4046829 (3993943) The fsck utility hit the coredump due to segmentation fault in get_dotdotlst()
* 4047568 (4046169) On RHEL8, while doing a directory move from one FS (ext4 or vxfs) to migration VxFS, the migration can fail and FS will be disable.
* 4049091 (4035057) On RHEL8, IOs done on FS, while other FS to VxFS migration is in progress can cause panic.
* 4049097 (4049096) Dalloc change ctime in background while extent allocation
Patch ID: VRTSvxfs-7.4.2.1600
* 4012765 (4011570) WORM attribute replication support in VxFS.
* 4014720 (4011596) Multiple issues were observed during glmdump using hacli for communication
* 4015287 (4010255) "vfradmin promote" fails to promote target FS with selinux enabled.
* 4015835 (4015278) System panics during vx_uiomove_by _hand.
* 4016721 (4016927) For multi cloud tier scenario, system panic with NULL pointer dereference when we try to remove second cloud tier
* 4017282 (4016801) filesystem mark for fullfsck
* 4017818 (4017817) VFR performance enhancement changes.
* 4017820 (4017819) Adding cloud tier operation fails while trying to add AWS GovCloud.
* 4019877 (4019876) Remove license library dependency from vxfsmisc.so library
* 4020055 (4012049) Documented "metasave" option and added one new option in fsck binary.
* 4020056 (4012049) Documented "metasave" option and added one new option in fsck binary.
* 4020912 (4020758) Filesystem mount or fsck with -y may see hang during log replay


DETAILS OF INCIDENTS FIXED BY THE PATCH
---------------------------------------
This patch fixes the following incidents:

Patch ID: VRTSvxvm-7.4.2.3700

* 4106001 (Tracking ID: 4102501)

SYMPTOM:
A security vulnerability exists in the third-party component libcurl.

DESCRIPTION:
VxVM uses a third-party component named libcurl in which a security vulnerability exists.

RESOLUTION:
VxVM is updated to use a newer version of libcurl in which the security vulnerability has been addressed.

Patch ID: VRTSvxvm-7.4.2.3600

* 4052119 (Tracking ID: 4045871)

SYMPTOM:
vxconfigd crashed at ddl_get_disk_given_path with following stacks:
ddl_get_disk_given_path
ddl_reconfigure_all
ddl_find_devices_in_system
find_devices_in_system
mode_set
setup_mode
startup
main
_start

DESCRIPTION:
Under some situations, duplicate paths can be added in one dmpnode in vxconfigd. If the duplicate paths are removed then the empty path entry can be generated for that dmpnode. Thus, later when vxconfigd accesses the empty path entry, it crashes due to NULL pointer reference.

RESOLUTION:
Code changes have been done to avoid the duplicate paths that are to be added.

* 4086043 (Tracking ID: 4072241)

SYMPTOM:
-bash-5.1# /usr/lib/vxvm/voladm.d/bin/dmpdr
Dynamic Reconfiguration Operations

WARN: Please Do not Run any Device Discovery Operations outside the Tool during Reconfiguration operations
INFO: The logs of current operation can be found at location /var/log/vx/dmpdr_20220420_1042.log
ERROR: Failed to open lock file for /usr/lib/vxvm/voladm.d/bin/dmpdr, No such file or directory. Exit.

Exiting the Current DMP-DR Run of the Tool

DESCRIPTION:
VxVM Log location for linux changed which impacted vxdiskadm functionality on solaris .

RESOLUTION:
Required changed have been done to make code changes work across platforms.

* 4090311 (Tracking ID: 4039690)

SYMPTOM:
Change the logger files size to collect the large amount of logs in the system.

DESCRIPTION:
Double the logger files size limit and improve on logger size footprint by using gzip on logger files.

RESOLUTION:
Completed required code changes to do this enhancement.

* 4090411 (Tracking ID: 4054685)

SYMPTOM:
RVG recovery gets hung in case of reconfiguration scenarios in CVR environments leading to vx commands hung on master node.

DESCRIPTION:
As a part of rvg recovery we perform DCM, datavolume recovery. But datavolume recovery takes long time due to wrong IOD handling done in linux platforms.

RESOLUTION:
Fix the IOD handling mechanism to resolve the rvg recovery handling.

* 4090415 (Tracking ID: 4071345)

SYMPTOM:
Replication is unresponsive after failed site is up.

DESCRIPTION:
Autosync and unplanned fallback synchronisation had issues in a mix of cloud and non-cloud Volumes in RVG.
After a cloud volume is found rest of the volumes were getting ignored for synchronisation

RESOLUTION:
Fixed condition to make it iterate over all Volumes.

* 4090442 (Tracking ID: 4078537)

SYMPTOM:
When connection to s3-fips bucket is made below error messages are observed :
2022-05-31 03:53:26 VxVM ERROR V-5-1-19512 amz_request_perform: PUT request failed, url: https://s3-fips.us-east-2.amazonaws.com/fipstier334f3956297c8040078280000d91ab70a/2.txt_27ffff625eff0600442d000013ffff5b_999_7_1077212296_0_1024_38, errno 11
2022-05-31 03:53:26 VxVM ERROR V-5-1-19333 amz_upload_object: amz_request_perform failed for obj:2.txt_27ffff625eff0600442d000013ffff5b_999_7_1077212296_0_1024_38
2022-05-31 03:53:26 VxVM WARNING V-5-1-19752 Try upload_object(fipstier334f3956297c8040078280000d91ab70a/2.txt_27ffff625eff0600442d000013ffff5b_999_7_1077212296_0_1024_38) again, number of requests attempted: 3.
2022-05-31 03:53:26 VxVM ERROR V-5-1-19358 curl_send_request: curl_easy_perform() failed: Couldn't resolve host name
2022-05-31 03:53:26 VxVM ERROR V-5-1-0 curl_request_perform: Error in curl_request_perform 6
2022-05-31 03:53:26 VxVM ERROR V-5-1-19357 curl_request_perform: curl_send_request failed with error: 6

DESCRIPTION:
For s3-fips bucket endpoints, AWS has made it mandatory to use the virtual-hosted style methods to connect to the s3-fips bucket instead of path-hosted style method which is currently used by Infoscale.

RESOLUTION:
Code changes are done to send cloud requestss3-fips bucket successfully.

* 4090541 (Tracking ID: 4058166)

SYMPTOM:
While setting up VVR/CVR on large size data volumes (size > 3TB) with filesystems mounted on them, initial autosync operation takes a lot of time to complete.

DESCRIPTION:
While performing autosync on VVR/CVR setup for a volume with filesystem mounted, if smartmove feature is enabled, the operation does smartsync by syncing only the regions dirtied by filesystem, instead of syncing entire volume, which completes faster than normal case. However, for large size volumes (size > 3TB),

smartmove feature does not get enabled, even with filesystem mounted on them and hence autosync operation syncs entire volume.

This behaviour is due to smaller size DCM plexes allocated for such large size volumes, autosync ends up performing complete volume sync, taking lot more time to complete.

RESOLUTION:
Increase the limit of DCM plex size (loglen) beyond 2MB so that smart move feature can be utilised properly.

* 4090599 (Tracking ID: 4080897)

SYMPTOM:
Observed Performance drop on raw VxVM volume in RHEL 8.x compared to RHEL7.X

DESCRIPTION:
There has been change in file_operations used for character devices from RHEL 7.X and RHEL8.X releases. In RHEL 7.X aio_read and aio_write function pointers are implemented whereas this has changed to read_iter and write_iter respectively in the latest release. In RHEL 8.X changes, VxVM code called generic_file_write_iter(). The problem here is that this function takes an inode-lock. And in the multi-thread write operation, this semaphore basically causes serial processing of IO submission leading to dropped performance.

RESOLUTION:
Use of __generic_file_write_iter() function helps to resolve the issue and vxvm_generic_write_sync() function is implemented which handles the SYNCing part of the write similar to functions like blkdev_write_iter() and generic_file_write_iter().

* 4090604 (Tracking ID: 4044529)

SYMPTOM:
DMP is unable to display PWWN details for some LUNs by "vxdmpadm getportids".

DESCRIPTION:
Udev rules file(/usr/lib/udev/rules.d/63-fc-wwpn-id.rules) from newer RHEL OS will generate an addtional hardware path for a FC device, hence there will be 2 hardware paths for the same device. However vxpath_links script only consider single hardware path for a FC device. In the case of 2 hardware paths, vxpath_links may not treat it as a FC device, thus fail to populate PWWN related information.

RESOLUTION:
Code changes have been done to make vxpath_links correctly detect FC device even there are multiple hardware paths.

* 4090932 (Tracking ID: 3996634)

SYMPTOM:
A system boot with large number of luns managed by VxDMP take a long time.

DESCRIPTION:
When a system has large number of luns managed by VxDMP which mounted on a primary partition or has formatted with some type of File system, during boot, the DMP device would be removed and UDEV trigger event against the OS device, the OS device would be read from lsblk command. The lsblk command is slow and if the lsblk commands issued against multiple devices in parallel, it may be stuck, then the system boot take a long time.

RESOLUTION:
Code has been changed to read the OS device from blkid command rather than lsblk command.

* 4090946 (Tracking ID: 4023297)

SYMPTOM:
Smartmove functionality was not being used after VVR Rlink was paused and resumed during VVR initial sync or DCM resync operation. This was resulting in more data transfer to VVR secondary site than needed.

DESCRIPTION:
The transactions for VVR pause and resume operations were being considered as phases after which smartmove is not necessary to be used. This was resulting in smartmove not being used after the resume operation.

RESOLUTION:
Fixed the condition so that smartmove continues to work beyond pause/resume operations.

* 4090960 (Tracking ID: 4087770)

SYMPTOM:
Data corruption post mirror attach operation seen after complete storage fault for DCO volumes.

DESCRIPTION:
DCO (data change object) tracks delta changes for faulted mirrors. During complete storage loss of DCO volume mirrors in, DCO object will be marked as BADLOG and becomes unusable for bitmap tracking.
Post storage reconnect (such as node rejoin in FSS environments) DCO will be re-paired for subsequent tracking. During this if VxVM finds any of the mirrors detached for data volumes, those are expected to be marked for full-resync as bitmap in DCO has no valid information. Bug in repair DCO operation logic prevented marking mirror for full-resync in cases where repair DCO operation is triggered before data volume is started. This resulted into mirror getting attached without any data being copied from good mirrors and hence reads serviced from such mirrors have stale data, resulting into file-system corruption and data loss.

RESOLUTION:
Code has been added to ensure repair DCO operation is performed only if volume object is enabled so as to ensure detached mirrors are marked for full-resync appropriately.

* 4090970 (Tracking ID: 4017036)

SYMPTOM:
After enabling DMP (Dynamic Multipathing) Native support, enable /boot to be
mounted on DMP device when Linux is booting with systemd.

DESCRIPTION:
Currently /boot is mounted on top of OS (Operating System) device. When DMP
Native support is enabled, only VG's (Volume Groups) are migrated from OS 
device to DMP device.This is the reason /boot is not migrated to DMP device.
With this if OS device path is not available then system becomes unbootable 
since /boot is not available. Thus it becomes necessary to mount /boot on DMP
device to provide multipathing and resiliency. 
The current fix can only work on configurations with single boot partition.

RESOLUTION:
Code changes have been done to migrate /boot on top of DMP device when DMP
Native support is enabled and when Linux is booting with systemd.

* 4091248 (Tracking ID: 4040808)

SYMPTOM:
df command hung in clustered environment

DESCRIPTION:
df command hung in clustered environment due to drl updates are not getting complete causing application IOs to hang.

RESOLUTION:
Fis is added to complete incore DRL updates and drive corresponding application IOs

* 4091588 (Tracking ID: 3966157)

SYMPTOM:
the feature of SRL batching was broken and we were not able to enable it as it might caused problems.

DESCRIPTION:
Batching of updates needs to be done as to get benefit of batching multiple updates and getting performance increased

RESOLUTION:
we have decided to simplify the working as we are now aligning each of the small update within a total batch to 4K size so that,

by default we will get the whole batch aligned one, and then there is no need of book keeping for last update and hence reducing the overhead of

different calculations.

we are padding individual updates to reduce overhead of book keeping things around last update in a batch,
by padding each updates to 4k, we will be having a batch of updates which is 4k aligned itself.

* 4091910 (Tracking ID: 4090321)

SYMPTOM:
vxvm-boot service startup failure

DESCRIPTION:
vxvm-boot service is taking long time to start and getting timed out. With more number of devices device discovery is taking more time to finish.

RESOLUTION:
Increase timeout for service so that discovery gets more time to finish

* 4091911 (Tracking ID: 4090192)

SYMPTOM:
vxvm-boot service startup failure

DESCRIPTION:
vxvm-boot service is taking long time to start and getting timed out. With more number of devices device discovery is taking more time to finish.

RESOLUTION:
Increase device discovery threads to range of 128 to 256 depending on CPUs available on system

* 4091912 (Tracking ID: 4090234)

SYMPTOM:
vxvm-boot service is taking long time to start and getting timed out in large LUNs setups.

DESCRIPTION:
Device discovery layer and infiniband devices(default 120s) are taking long time to discover
the devices which is cause for Volume Manager service timeout. 
Messages logged:
Jul 28 19:52:52 nb-appliance vxvm-boot[17711]: VxVM general startup...
Jul 28 19:57:51 nb-appliance systemd[1]: vxvm-boot.service: start operation timed out. Terminating.
Jul 28 19:57:51 nb-appliance vxvm-boot[17711]: Terminated
Jul 28 19:57:51 nb-appliance systemd[1]: vxvm-boot.service: Control process exited, code=exited status=100
Jul 28 19:59:22 nb-appliance systemd[1]: vxvm-boot.service: State 'stop-final-sigterm' timed out. Killing.
Jul 28 19:59:23 nb-appliance systemd[1]: vxvm-boot.service: Killing process 209714 (vxconfigd) with signal SIGKILL.
Jul 28 20:00:30 nb-appliance systemd[1]: vxvm-boot.service: Failed with result 'timeout'.
Jul 28 20:00:30 nb-appliance systemd[1]: Failed to start VERITAS Volume Manager Boot service.

RESOLUTION:
Completed required changes to fix this issue.

NBA:
https://jira.community.veritas.com/browse/STESC-7281

Flex:
https://jira.community.veritas.com/browse/FLEX-7003

We are suspecting vxvm-boot service timeout due to multiple issues
1. OS is taking long time to discover devices.
2. We have 120 seconds sleep in vxvm-startup when infiniband devices or controllers are present in setup.

Issue1: 
Issue here is vxvm-boot service is taking long time to start and getting timed out. Main issue lies in the device discovery layer which is taking more time. 
There are suspected issues from OS side as well where we have seen OS is also taking long time to discover devices

http://codereview.engba.veritas.com/r/42003/

Issue2:
Earlier in vxvm-startup, we are sleeping 120 seconds for infiniband devices or controller but now we are sleeping 120 seconds only if infinband devices claimed by ASL.

* 4091963 (Tracking ID: 4067191)

SYMPTOM:
In CVR environment after rebooting Slave node, Master node may panic with below stack:

Call Trace:
dump_stack+0x66/0x8b
panic+0xfe/0x2d7
volrv_free_mu+0xcf/0xd0 [vxio]
vol_ru_free_update+0x81/0x1c0 [vxio]
volilock_release_internal+0x86/0x440 [vxio]
vol_ru_free_updateq+0x35/0x70 [vxio]
vol_rv_write2_done+0x191/0x510 [vxio]
voliod_iohandle+0xca/0x3d0 [vxio]
wake_up_q+0xa0/0xa0
voliod_iohandle+0x3d0/0x3d0 [vxio]
voliod_loop+0xc3/0x330 [vxio]
kthread+0x10d/0x130
kthread_park+0xa0/0xa0
ret_from_fork+0x22/0x40

DESCRIPTION:
As part of CVM Master switch a rvg_recovery is triggered. In this step race
condition can occured between the VVR objects due to which the object value
is not updated properly and can cause panic.

RESOLUTION:
Code changes are done to handle the race condition between VVR objects.

* 4091989 (Tracking ID: 4090930)

SYMPTOM:
Relocation of failed data disk of mirror volume leads to data corruption.

DESCRIPTION:
However with existing volume having another faulted mirror and detached mirror being tracked in data change object (DCO) in detach map. At the same time VxVM relocation daemon when decides to relocate another failed disk of volume. This was expected to be full copy of data. Due to bug in relocation code the relocation operation was allowed even when volume is in DISABLED state. When volume became ENABLED the task to copy the data of new mirror incorrectly used detach map instead of full-sync and thus resulting into data loss for the new mirror.

RESOLUTION:
Code has been changed to block triggering relocation of disks when top-level volume is not in ENABLED state.



Mandatory details/instructions while reporting issues
 
1)	Problem

* 4092002 (Tracking ID: 4081740)

SYMPTOM:
vxdg flush command slow due to too many luns needlessly access /proc/partitions.

DESCRIPTION:
Linux BLOCK_EXT_MAJOR(block major 259) is used as extended devt for block devices. When partition number of one device is more than 15, the partition device gets assigned under major 259 to solve the sd limitations (16 minors per device), by which more partitions are allowed for one sd device. During "vxdg flush", for each lun in the disk group, vxconfigd reads file /proc/partitions line by line through fgets() to find all the partition devices with major number 259, which would cause vxconfigd to respond sluggishly if there are large amount of luns in the disk group.

RESOLUTION:
Code has been changed to remove the needless access on /proc/partitions for the luns without using extended devt.

* 4099550 (Tracking ID: 4065145)

SYMPTOM:
During addsec we were unable to processencrypted volume tags for multiple volumes and vsets.
Error we saw:

$ vradmin -g dg2 -encrypted addsec dg2_rvg1 10.210.182.74 10.210.182.75

Error: Duplicate tag name vxvm.attr.enckeytype provided in input.

DESCRIPTION:
The number of tags was not defined and we were processing all the tags at a time instead of processing max number of tags for a volume.

RESOLUTION:
Introduced a number of tags variable depend on the cipher method (CBC/GCM), as well fixed minor code issues.

* 4102424 (Tracking ID: 4103350)

SYMPTOM:
Following error message is seen on running vradmin -encrypted addsec command.

# vradmin -g enc_dg2 -encrypted addsec enc_dg2_rvg1 123.123.123.123 234.234.234.234
Message from Host 234.234.234.234:
Job for vxvm-encrypt.service failed.
See "systemctl status vxvm-encrypt.service" and "journalctl -xe" for details.
VxVM vxvol ERROR V-5-1-18863 Failed to start vxvm-encrypt service. Error:1.

DESCRIPTION:
"vradmin -encrypted addsec" command fails on primary because vxvm-encrypt.service goes into failed state on secondary site. On secondary master, vxvm-encrypt.service tries to restart 5 times and goes into failed state.

RESOLUTION:
Code changes have been done to prevent vxvm-encrypt.service from going into failed state.

Patch ID: VRTSvxvm-7.4.2.3300

* 4083792 (Tracking ID: 4082799)

SYMPTOM:
A security vulnerability exists in the third-party component libcurl.

DESCRIPTION:
VxVM uses a third-party component named libcurl in which a security vulnerability exists.

RESOLUTION:
VxVM is updated to use a newer version of libcurl in which the security vulnerability has been addressed.

Patch ID: VRTSvxvm-7.4.2.3200

* 4011971 (Tracking ID: 3991668)

SYMPTOM:
In a VVR configuration with secondary logging enabled, data inconsistency is reported after the "No IBC message arrived" error is encountered.

DESCRIPTION:
It might happen that the VVR secondary node handles updates with larger sequence IDs before the In-Band Control (IBC) update arrives. In this case, VVR drops the IBC update. Due to the updates with the larger sequence IDs than the one for the IBC update, data writes cannot be started, and they get queued. Data loss may occur after the VVR secondary receives an atomic commit and frees the queue. If this situation occurs, the "vradmin verifydata" command reports data inconsistency.

RESOLUTION:
VVR is modified to trigger updates as they are received in order to start data volume writes.

* 4013169 (Tracking ID: 4011691)

SYMPTOM:
Observed high CPU consumption on the VVR secondary nodes because of high pending IO load.

DESCRIPTION:
High replication related IO load on the VVR secondary and the requirement of maintaining write order fidelity with limited memory pools created  contention. This resulted in multiple VxVM kernel threads contending for shared resources and there by increasing the CPU consumption.

RESOLUTION:
Limited the way in which VVR consumes its resources so that a high pending IO load would not result into high CPU consumption.

* 4037288 (Tracking ID: 4034857)

SYMPTOM:
Current load of Vxvm modules were failing on SLES15 SP2(Kernel - 5.3.18-22.2-default).

DESCRIPTION:
With new kernel (5.3.18-22.2-default) below mentioned functions were depricated -
1. gettimeofday() 
2.struct timeval
3. bio_segments()
4. iov_for_each()
5.req filed in struct next_rq
Also, there was susceptible Data corruption with big size IO(>1M) processed by Linux kernel IO splitting.

RESOLUTION:
Code changes are mainly to support kernel 5.3.18 and to provide support for deprecated functions. 
Remove dependency on req->next_rq field in blk-mq code
And, changes related to bypassing the Linux kernel IO split functions, which seems redundant for VxVM IO processing.

* 4048120 (Tracking ID: 4031452)

SYMPTOM:
Add node operation is failing with error "Error found while invoking '' in the new node, and rollback done in both nodes"

DESCRIPTION:
Stack showed a valid address for pointer ptmap2, but still it generated core.
It suggested that it might be a double-free case. Issue lies in freeing a pointer

RESOLUTION:
Added handling for such case by doing NULL assignment to pointers wherever they are freed

* 4051703 (Tracking ID: 4010794)

SYMPTOM:
Veritas Dynamic Multi-Pathing (DMP) caused system panic in a cluster with below stack when storage activities were going on:
dmp_start_cvm_local_failover+0x118()
dmp_start_failback+0x398()
dmp_restore_node+0x2e4()
dmp_revive_paths+0x74()
gen_update_status+0x55c()
dmp_update_status+0x14()
gendmpopen+0x4a0()

DESCRIPTION:
The system panic occurred due to invalid dmpnode's current primary path when disks were attached/detached in a cluster. When DMP accessed the current primary path without doing sanity check, the system panics due to an invalid pointer.

RESOLUTION:
Code changes have been made to avoid accessing any invalid pointer.

* 4052119 (Tracking ID: 4045871)

SYMPTOM:
vxconfigd crashed at ddl_get_disk_given_path with following stacks:
ddl_get_disk_given_path
ddl_reconfigure_all
ddl_find_devices_in_system
find_devices_in_system
mode_set
setup_mode
startup
main
_start

DESCRIPTION:
Under some situations, duplicate paths can be added in one dmpnode in vxconfigd. If the duplicate paths are removed then the empty path entry can be generated for that dmpnode. Thus, later when vxconfigd accesses the empty path entry, it crashes due to NULL pointer reference.

RESOLUTION:
Code changes have been done to avoid the duplicate paths that are to be added.

* 4054311 (Tracking ID: 4040701)

SYMPTOM:
Below warnings are observed while installing the VXVM package.
WARNING: libhbalinux/libhbaapi is not installed. vxesd will not capture SNIA HBA API library events.
mv: '/var/adm/vx/cmdlog' and '/var/log/vx/cmdlog' are the same file
mv: '/var/adm/vx/cmdlog.1' and '/var/log/vx/cmdlog.1' are the same file
mv: '/var/adm/vx/cmdlog.2' and '/var/log/vx/cmdlog.2' are the same file
mv: '/var/adm/vx/cmdlog.3' and '/var/log/vx/cmdlog.3' are the same file
mv: '/var/adm/vx/cmdlog.4' and '/var/log/vx/cmdlog.4' are the same file
mv: '/var/adm/vx/ddl.log' and '/var/log/vx/ddl.log' are the same file
mv: '/var/adm/vx/ddl.log.0' and '/var/log/vx/ddl.log.0' are the same file
mv: '/var/adm/vx/ddl.log.1' and '/var/log/vx/ddl.log.1' are the same file
mv: '/var/adm/vx/ddl.log.10' and '/var/log/vx/ddl.log.10' are the same file
mv: '/var/adm/vx/ddl.log.11' and '/var/log/vx/ddl.log.11' are the same file

DESCRIPTION:
Some warnings are observed while installing vxvm package.

RESOLUTION:
Appropriate code changes are done to avoid the warnings.

* 4056329 (Tracking ID: 4056156)

SYMPTOM:
VxVM package fails to load on SLES15 SP3

DESCRIPTION:
Changes introduced in SLES15 SP3 impacted VxVM block IO functionality. This included changes in block layer structures in kernel.

RESOLUTION:
Changes have been done to handle the impacted functionalities.

* 4056919 (Tracking ID: 4056917)

SYMPTOM:
In Flexible Storage Sharing (FSS) environments, disk group import operation with few disks missing leads to data corruption.

DESCRIPTION:
In FSS environments, import of disk group with missing disks is not allowed. If disk with highest updated configuration information is not present during import, the import operation fired was leading incorrectly incrementing the config TID on remaining disks before failing the operation. When missing disk(s) with latest configuration came back, import was successful. But because of earlier failed transaction, import operation incorrectly choose wrong configuration to import the diskgroup leading to data corruption.

RESOLUTION:
Code logic in disk group import operation is modified to ensure failed/missing disks check happens early before attempting perform any on-disk update as part of import.

* 4058873 (Tracking ID: 4057526)

SYMPTOM:
Whenever vxnm-vxnetd is loaded, it reports "Cannot touch '/var/lock/subsys/vxnm-vxnetd': No such file or directory" in /var/log/messages.

DESCRIPTION:
New systemd update removed the support for "/var/lock/subsys/" directory. Thus, whenever vxnm-vxnetd is loaded on the systems supporting systemd, it 
reports "cannot touch '/var/lock/subsys/vxnm-vxnetd': No such file or directory"

RESOLUTION:
Added a check to validate if the /var/lock/subsys/ directory is supported in vxnm-vxnetd.sh

* 4060839 (Tracking ID: 3975667)

SYMPTOM:
NMI watchdog: BUG: soft lockup

DESCRIPTION:
When flow control on ioshipping channel is set there is window in code where vol_ioship_sender thread can go in tight loop.
This causes softlockup

RESOLUTION:
Relinquish CPU to schedule other process. vol_ioship_sender() thread will restart after some delay.

* 4060962 (Tracking ID: 3915202)

SYMPTOM:
vxconfigd hang in vxconfigd -k -r reset

DESCRIPTION:
vxconfigd hang is observed since all the file descriptors to the process have been utilized because of fd leak.
This issue was not integrated hence facing the issue.

RESOLUTION:
Appropriate code changes are done to handle scenario of the fd leak.

* 4060966 (Tracking ID: 3959716)

SYMPTOM:
System may panic with sync replication with VVR configuration, when VVR RVG is in DCM mode, with following panic stack:
volsync_wait [vxio]
voliod_iohandle [vxio]
volted_getpinfo [vxio]
voliod_loop [vxio]
voliod_kiohandle [vxio]
kthread

DESCRIPTION:
With sync replication, if ACK for data message is delayed from the secondary site, the 
primary site might incorrectly free the message from the waiting queue at primary site.
Due to incorrect handling of the message, a system panic may happen.

RESOLUTION:
Required code changes are done to resolve the panic issue.

* 4061004 (Tracking ID: 3993242)

SYMPTOM:
vxsnap prepare on vset might throw error : "VxVM vxsnap ERROR V-5-1-19171 Cannot perform prepare operation on cloud 
volume"

DESCRIPTION:
There were  some wrong volume-records entries being fetched for VSET and due to which required validations were failing and triggering the issue .

RESOLUTION:
Code changes have been done to resolve the issue .

* 4061036 (Tracking ID: 4031064)

SYMPTOM:
During master switch with replication in progress, cluster wide hang is seen on VVR secondary.

DESCRIPTION:
With application running on primary, and replication setup between VVR primary & secondary, when master switch operation is attempted on secondary, it gets hung permanently.

RESOLUTION:
Appropriate code changes are done to handle scenario of master switch operation and replication data on secondary.

* 4061055 (Tracking ID: 3999073)

SYMPTOM:
Data corruption occurred when the fast mirror resync (FMR) was enabled and the failed plex of striped-mirror layout was attached.

DESCRIPTION:
To determine and recover the regions of volumes using contents of detach, a plex attach operation with FMR tracking has been enabled.

For the given volume region, the DCO region size being higher than the stripe-unit of volume, the code logic in plex attached code path was incorrectly skipping the bits in detach maps. Thus, some of the regions (offset-len) of volume did not sync with the attached plex leading to inconsistent mirror contents.

RESOLUTION:
To resolve the data corruption issue, the code has been modified to consider all the bits for given region (offset-len) in plex attached code.

* 4061057 (Tracking ID: 3931583)

SYMPTOM:
Node may panic while uninstalling or upgrading the VxVM package or during reboot.

DESCRIPTION:
Due to a race condition in Volume Manager (VxVM), IO may be queued for processing while the vxio module is being unloaded. This results in VxVM acquiring and accessing a lock which is currently being freed and it may panic the system with the following backtrace:

 #0 [ffff88203da089f0] machine_kexec at ffffffff8105d87b
 #1 [ffff88203da08a50] __crash_kexec at ffffffff811086b2
 #2 [ffff88203da08b20] panic at ffffffff816a8665
 #3 [ffff88203da08ba0] nmi_panic at ffffffff8108ab2f
 #4 [ffff88203da08bb0] watchdog_overflow_callback at ffffffff81133885
 #5 [ffff88203da08bc8] __perf_event_overflow at ffffffff811727d7
 #6 [ffff88203da08c00] perf_event_overflow at ffffffff8117b424
 #7 [ffff88203da08c10] intel_pmu_handle_irq at ffffffff8100a078
 #8 [ffff88203da08e38] perf_event_nmi_handler at ffffffff816b7031
 #9 [ffff88203da08e58] nmi_handle at ffffffff816b88ec
#10 [ffff88203da08eb0] do_nmi at ffffffff816b8b1d
#11 [ffff88203da08ef0] end_repeat_nmi at ffffffff816b7d79
 [exception RIP: _raw_spin_unlock_irqrestore+21]
 RIP: ffffffff816b6575 RSP: ffff88203da03d98 RFLAGS: 00000283
 RAX: 0000000000000283 RBX: ffff882013f63000 RCX: 0000000000000080
 RDX: 0000000000000001 RSI: 0000000000000283 RDI: 0000000000000283
 RBP: ffff88203da03d98 R8: 00000000005d1cec R9: ffff8810e8ec0000
 R10: 0000000000000002 R11: ffff88203da03da8 R12: ffff88103af95560
 R13: ffff882013f630c8 R14: 0000000000000001 R15: 0000000000000ca5
 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#12 [ffff88203da03d98] _raw_spin_unlock_irqrestore at ffffffff816b6575
#13 [ffff88203da03da0] voliod_qsio at ffffffffc0fd14c3 [vxio]
#14 [ffff88203da03dd0] vol_sample_timeout at ffffffffc101d8df [vxio]
#15 [ffff88203da03df0] __voluntimeout at ffffffffc0fd34be [vxio]
#16 [ffff88203da03e18] voltimercallback at ffffffffc0fd3568 [vxio]
...
...

RESOLUTION:
Code changes made to handle the race condition and prevent the access of resources that are being freed.

* 4061298 (Tracking ID: 3982103)

SYMPTOM:
When the memory available is low in the system , I/O hang is seen.

DESCRIPTION:
In low memory situation, the memory allocated to some VVR IOs was NOT getting released properly due to which the new application IO
could NOT be served as the VVR memory pool gets fully utilized. This was resulting in IO hang type of situation.

RESOLUTION:
Code changes are done to properly release the memory in the low memory situation.

* 4061317 (Tracking ID: 3925277)

SYMPTOM:
vxdisk resize corrupts disk public region and causes file system mount fail.

DESCRIPTION:
While resizing single path disk with GPT label, update policy data according to the changes made to da/dmrec during two transactions of resize is missed, hence the private region IOs are sent to the old private region device which is on partition 3. This may make the private region data written to public region and cause corruption.

RESOLUTION:
Code changes have been made to fix  the problem.

* 4061509 (Tracking ID: 4043337)

SYMPTOM:
rp_rv.log file uses space for logging.

DESCRIPTION:
rp_rv log files needs to be removed and logger file should have 16 mb rotational log files.

RESOLUTION:
The code changes are implemented to disabel logging for rp_rv.log files

* 4062461 (Tracking ID: 4066785)

SYMPTOM:
When the replicated disks are in SPLIT mode, importing its disk group failed with "Device is a hardware mirror".

DESCRIPTION:
When the replicated disks are in SPLIT mode, which are readable and writable, importing its disk group failed with "Device is a hardware mirror". Third party doesn't expose disk attribute to show when it's in SPLIT mode. With this new enhancement, the replicated disk group can be imported with option `-o usereplicatedev=only`.

RESOLUTION:
The code is enhanced to import the replicated disk group with option `-o usereplicatedev=only`.

* 4062577 (Tracking ID: 4062576)

SYMPTOM:
When hastop -local is used to stop the cluster, dg deport command hangs. Below stack trace is observed in system logs :

#0 [ffffa53683bf7b30] __schedule at ffffffffa834a38d
 #1 [ffffa53683bf7bc0] schedule at ffffffffa834a868
 #2 [ffffa53683bf7bd0] blk_mq_freeze_queue_wait at ffffffffa7e4d4e6
 #3 [ffffa53683bf7c18] blk_cleanup_queue at ffffffffa7e433b8
 #4 [ffffa53683bf7c30] vxvm_put_gendisk at ffffffffc3450c6b [vxio]   
 #5 [ffffa53683bf7c50] volsys_unset_device at ffffffffc3450e9d [vxio]
 #6 [ffffa53683bf7c60] vol_rmgroup_devices at ffffffffc3491a6b [vxio]
 #7 [ffffa53683bf7c98] voldg_delete at ffffffffc34932fc [vxio]
 #8 [ffffa53683bf7cd8] vol_delete_group at ffffffffc3494d0d [vxio]
 #9 [ffffa53683bf7d18] volconfig_ioctl at ffffffffc3555b8e [vxio]
#10 [ffffa53683bf7d90] volsioctl_real at ffffffffc355fc8a [vxio]
#11 [ffffa53683bf7e60] vols_ioctl at ffffffffc124542d [vxspec]
#12 [ffffa53683bf7e78] vols_unlocked_ioctl at ffffffffc124547d [vxspec]
#13 [ffffa53683bf7e80] do_vfs_ioctl at ffffffffa7d2deb4
#14 [ffffa53683bf7ef8] ksys_ioctl at ffffffffa7d2e4f0
#15 [ffffa53683bf7f30] __x64_sys_ioctl at ffffffffa7d2e536

DESCRIPTION:
This issue is seen due to some updation from kernel side w.r.t to handling request queue.Existing VxVM code set the request handling area (make_request_fn) as vxvm_gen_strategy, this functionality is getting impacted.

RESOLUTION:
Code changes are added to handle the request queues using blk_mq_init_allocated_queue.

* 4062746 (Tracking ID: 3992053)

SYMPTOM:
Data corruption may happen with layered volumes due to some data not re-synced while attaching a plex. This is due to 
inconsistent data across the plexes after attaching a plex in layered volumes.

DESCRIPTION:
When a plex is detached in a layered volume, the regions which are dirty/modified are tracked in DCO (Data change object) map.
When the plex is attached back, the data corresponding to these dirty regions is re-synced to the plex being attached.
There was a defect in the code due to which the some particular regions were NOT re-synced when a plex is attached.
This issue only happens only when the offset of the sub-volume is NOT aligned with the region size of DCO (Data change object) volume.

RESOLUTION:
The code defect is fixed to correctly copy the data for dirty regions when the sub-volume offset is NOT aligned with the DCO region size.

* 4062747 (Tracking ID: 3943707)

SYMPTOM:
vxconfigd reconfig hang when joing a cluster with below stack:
volsync_wait [vxio]
_vol_syncwait [vxio]
voldco_await_shared_tocflush [vxio]
volcvm_ktrans_fmr_cleanup [vxio]
vol_ktrans_commit [vxio]
volconfig_ioctl [vxio]
volsioctl_real [vxio]
vols_ioctl [vxspec]
vols_unlocked_ioctl [vxspec]
vfs_ioctl  
do_vfs_ioctl

DESCRIPTION:
There is a race condition that caused the current seqno on tocsio does not get incremented on one of the nodes. While master and other slaves move to next stage with higher seqno, this slave drops the DISTRIBUTE message. The messages is retried from master and slave keeps on dropping, leading to hang.

RESOLUTION:
Code changes have been made to avoid the race condition.

* 4062751 (Tracking ID: 3989185)

SYMPTOM:
In a Veritas Volume Manager(VVR) environment vxrecover command can hang.

DESCRIPTION:
When vxrecover is triggered after storage failure it is possible that the vxrecover operation may hang.
This is because vxrecover does the RVG recovery. As part of this recovery dummy updates are written on SRL.
Due to a bug in the code these updated were written incorrectly on the SRL which led the flush operation from SRL to data volume hang.

RESOLUTION:
Code changes are done appropriately so that the dummy updates are written  correctly to the SRL.

* 4062755 (Tracking ID: 3978453)

SYMPTOM:
Reconfig hang during master takeover with below stack:
volsync_wait+0xa7/0xf0 [vxio]
volsiowait+0xcb/0x110 [vxio]
vol_commit_iolock_objects+0xd3/0x270 [vxio]
vol_ktrans_commit+0x5d3/0x8f0 [vxio]
volconfig_ioctl+0x6ba/0x970 [vxio]
volsioctl_real+0x436/0x510 [vxio]
vols_ioctl+0x62/0xb0 [vxspec]
vols_unlocked_ioctl+0x21/0x30 [vxspec]
do_vfs_ioctl+0x3a0/0x5a0

DESCRIPTION:
There is a hang in dcotoc protocol on slave and that is causing couple of slave nodes not respond with LEAVE_DONE to master, hence the issue.

RESOLUTION:
Code changes have been made to add handling of transaction overlapping with a shared toc update. Passing toc update sio flag from old to new object during transaction, to resume recovery if required.

* 4063374 (Tracking ID: 4005121)

SYMPTOM:
Application IOs appear hung or progress slowly until SRL to DCM flush finished.

DESCRIPTION:
When VVR SRL gets full DCM protection was triggered and application IO appear hung until SRL to DCM flush finished.

RESOLUTION:
Added fix that avoided duplicate DCM tracking through vol_rvdcm_log_update(), which reduced the IOPS drop comparatively.

* 4064523 (Tracking ID: 4049082)

SYMPTOM:
I/O read error is displayed when remote FSS node rebooting.

DESCRIPTION:
When rebooting remote FSS node, I/O read requests to a mirror volume that is scheduled on the remote disk from the FSS node should be redirected to the remaining plex. However, current vxvm does not handle this correctly. The retrying I/O requests could still be sent to the offline remote disk, which cause to final I/O read failure.

RESOLUTION:
Code changes have been done to schedule the retrying read request on the remaining plex.

* 4066930 (Tracking ID: 3951527)

SYMPTOM:
Data loss issue is seen because of incorrect version check handling done as a part of SRL 4k update alignment changes in 7.4 release.

DESCRIPTION:
On primary, rv_target_rlink field always is set to NULL which internally skips checking the 4k version  in VOL_RU_INIT_UPDATE macro. It causes SRL writes to be written in a 4k aligned manner even though remote rvg version is <= 7.3.1. This resulted in data loss.

RESOLUTION:
Changes are done to use rv_replicas rather than rv_target_rlink to check the version appropriately for all sites and not write SRL IO's in 4k aligned manner. 
Also, RVG version is not upgraded as part of diskgroup upgrades if rlinks are in attached state. RVG version can be upgraded using vxrvg upgrade command after detaching the rlinks and also when all sites are upgraded.

* 4067706 (Tracking ID: 4060462)

SYMPTOM:
System is unresponsive while adding new nodes.

DESCRIPTION:
After a node is removed, and adding node with  different node name is attempted; system turns
unresponsive. When a node leaves a cluster, in-memory information related to the node is not cleared due to the race condition.

RESOLUTION:
Fixed race condition to clear in-memory information of the node that leaves the cluster.

* 4067710 (Tracking ID: 4064208)

SYMPTOM:
Node is unresponsive while it gets added to the cluster.

DESCRIPTION:
While a node joins the cluster, if bits on the node are upgraded; size
of the object is interpreted incorrectly. Issue is observed when number of objects is higher and on
InfoScale 7.3.1 and above.

RESOLUTION:
Correct sizes are calculated for the data received from the master node.

* 4067712 (Tracking ID: 3868140)

SYMPTOM:
VVR primary site node might panic if the rlink disconnects while some data is getting replicated to secondary with below stack: 

dump_stack()
panic()
vol_rv_service_message_start()
update_curr()
put_prev_entity()
voliod_iohandle()
voliod_loop()
voliod_iohandle()

DESCRIPTION:
If rlink disconnects, VVR will clear some handles to the in-progress updates in memory, but if some IOs are still getting acknowledged from secondary to primary, then accessing updates for these IOs might result in panic at primary node.

RESOLUTION:
Code fix is implemented to correctly access the primary node updates in order to avoid the panic.

* 4067713 (Tracking ID: 3997531)

SYMPTOM:
VVR replication is not working, as vxnetd does not start properly.

DESCRIPTION:
If vxnetd restarts, a race condition blocks the completion of vxnetd start function after the shutdown process is completed.

RESOLUTION:
To avoid the race condition, the vxnetd start and stop functions are Synchronized.

* 4067715 (Tracking ID: 4008740)

SYMPTOM:
System panic

DESCRIPTION:
Due to a race condition there was code accessing freed VVR update which resulted in system panic

RESOLUTION:
Fixed race condition to avoid incorrect memory access

* 4067717 (Tracking ID: 4009151)

SYMPTOM:
Auto-import of diskgroup on system reboot fails with error:
"Disk for diskgroup not found"

DESCRIPTION:
When diskgroup is auto-imported, VxVM (Veritas Volume Manager) tries to find the disk with latest configuration copy. During this the DG import process searches through all the disks. The procedure also tries to find out if the DG contains clone disks or standard disks. While doing this calculation the DG import process incorrectly determines that current DG contains cloned disks instead of standard disks because of the stale value being there for the previous DG selected. Since VxVM incorrectly decides to import cloned disks instead of standard disks the import fails with "Disk for diskgroup not found" error.

RESOLUTION:
Code has been modified to accurately determine whether the DG contains standard or cloned disks and accordingly use those disks for DG import.

* 4067914 (Tracking ID: 4037757)

SYMPTOM:
VVR services always get started on boot up even if VVR is not being used.

DESCRIPTION:
VVR services get auto start as they are integrated in system init.d or similar framework.

RESOLUTION:
Added a tunable to not start VVR services on boot up

* 4067915 (Tracking ID: 4059134)

SYMPTOM:
Resync task takes too long on large size raid-5 volume

DESCRIPTION:
The resync of raid-5 volume will be done by small regions, and a check point will be setup for each region. If the size of raid-5 volume is large, it will be divided to large number of regions for resync, and check point setup will be issued against each region in loop. In each cycle, resync utility will open and connect vxconfigd daemon to do that,  each client would be created in vxconfigds context along with each region. As the number of created clients is large, it will take long time for vxconfigd which need to traverse the client list, thus, introduce the performance issue for resync.

RESOLUTION:
Code changes are made so that only one client created during the whole resync process, few time spent in client list traversing.

* 4069522 (Tracking ID: 4043276)

SYMPTOM:
If admin has offlined disks with "vxdisk offline <disk_name>" then vxattachd may brings the disk back to online state.

DESCRIPTION:
The "offline" state of VxVM disks is not stored persistently, it is recommended to use "vxdisk define" to persistently offline a disk.
For Veritas Netbackup Appliance there is a requirement that vxattachd shouldn't online the disks offlined with "vxdisk offline" operation.
To cater this request we have added tunable based enhancement to vxattachd for Netbackup Appliance use case.
The enhancement are done specifically so that Netback Appliance script can use it.
Following are the tunable details.
If skip_offline tunable is set then it will avoid offlined disk into online state. 
If handle_invalid_disk is set then it will offlined the "online invalid" SAN disks.
If remove_disable_dmpnode is set then it will cleanup stale entries from disk.info file and VxVM layer.
By default these tunables are off, we DONOT recommend InfoScale users to enable these vxattachd tunables.

RESOLUTION:
Code changes are done in vxattachd to cater Netbackup Appliance usecases.

* 4069523 (Tracking ID: 4056751)

SYMPTOM:
When importing a disk group containing all cloned disks with cloneoff option(-c) and some of disks are in read only status, the import fails and some of writable disks are removed from the disk group unexpectedly.

DESCRIPTION:
During disk group import with cloneoff option, a flag DA_VFLAG_ASSOC_DG gets set as update dgid is necessary. When associating da record of the read only disks, update private region TOC failed because of write failure, so the pending associations get aborted for all disks. During the aborting, for those of disks containing flag DA_VFLAG_ASSOC_DG would be removed from the dg and offline their config copy. Hence we can see a kind of private region corruption on writeable disks, actually they were removed from the dg.

RESOLUTION:
The issue has been fixed by failing the import at early stage if some of disks are read-only.

* 4069524 (Tracking ID: 4056954)

SYMPTOM:
When performing addsec using the VIPs with SSL enable the hang is observed

DESCRIPTION:
the issues comes when on primary side,  vradmin tries to create a local socket with endpoints as Local VIP & Interface IP and ends up calling SSL_accept and gets stuck infinitely.

RESOLUTION:
Appropriate code changes are done to handle scenario of vvr_islocalip() function to identify if the ip is local to the node.
So now by using the vvr_islocalip() the SSL_accept() func get called only if the ip is remote ip

* 4070099 (Tracking ID: 3159650)

SYMPTOM:
vxtune did not support vol_vvr_use_nat.

DESCRIPTION:
Platform specific methods were required to set vol_vvr_use_nat tunable, as its support for vxtune command was not present.

RESOLUTION:
Added vol_vvr_use_nat support for vxtune command.

* 4070186 (Tracking ID: 4041822)

SYMPTOM:
In an SRDF/Metro array setup, the last path is in the enabled state even after all the host and the array-side switch ports are disabled.

DESCRIPTION:
In case of an SRDF/Metro array setup, if the path connectivity is disrupted, one path may still appear to be in the enabled state until the connectivity is restored. For example: 
# vxdmpadm getsubpaths dmpnodename=emc0_02e0
NAME                      STATE[A]    PATH-TYPE[M]  CTLR-NAME  ENCLR-TYPE  ENCLR-NAME  ATTRS  PRIORITY
=======================================================================================================
c7t50000975B00AD80Ad9s2   DISABLED       -           c7        EMC         emc0        -      -
c7t50000975B00AD84Ad9s2   DISABLED       -           c7        EMC         emc0        -      -
c7t50000975B00AF40Ad70s2  DISABLED       -           c7        EMC         emc0        -      -
c7t50000975B00AF44Ad70s2  DISABLED       -           c7        EMC         emc0        -      -
c8t50000975B00AD80Ad9s2   DISABLED       -           c8        EMC         emc0        -      -
c8t50000975B00AD84Ad9s2   DISABLED       -           c8        EMC         emc0        -      -
c8t50000975B00AF40Ad70s2  ENABLED(A)     -           c8        EMC         emc0        -      -
c8t50000975B00AF44Ad70s2  DISABLED       -           c8        EMC         emc0        -      -
Note: This situation does not occur if I/Os are in progress through this DMP node. DMP identifies the disruption in the connectivity and correctly updates the state of the path.

RESOLUTION:
Made code changes to fixed this issue, To refresh the state of the paths sooner, run the 'vxdisk scandisks' command.

* 4070253 (Tracking ID: 3911930)

SYMPTOM:
Valid PGR operations sometimes fail on a DMP node.

DESCRIPTION:
As part of the PGR operations, if the inquiry command finds that PGR is not
supported on the DMP node, the PGR_FLAG_NOTSUPPORTED flag is set on the node. Further PGR operations check this value and issue PGR commands only if the flag is not set. PGR_FLAG_NOTSUPPORTED remains set even if the hardware is changed so as to support PGR.

RESOLUTION:
A new command, enablepr, is provided in the vxdmppr utility to clear this flag on the specified DMP node.

* 4071131 (Tracking ID: 4071605)

SYMPTOM:
A security vulnerability exists in the third-party component libxml2.

DESCRIPTION:
VxVM uses a third-party component named libxml2 in which a security vulnerability exists.

RESOLUTION:
VxVM is updated to use a newer version of libxml2 in which the security vulnerability has been addressed.

* 4072874 (Tracking ID: 4046786)

SYMPTOM:
During reboot , nodes go out of cluster and FS is not mounted .

DESCRIPTION:
NVMe asl can some time give different UDID (difference with actual UDID would be absence of space characters in UDID) during discovery.

RESOLUTION:
usage of nvme ioctl to fetch data has been removed and sysfs will be used instead

Patch ID: VRTSvxvm-7.4.2.2200

* 4018173 (Tracking ID: 3852146)

SYMPTOM:
In a CVM cluster, when a shared DG is imported by specifying both, the "-c" and the "-o noreonline" options, you may encounter the following error: 
VxVM vxdg ERROR V-5-1-10978 Disk group <disk_group_name>: import failed: Disk for disk group not found.

DESCRIPTION:
The "-c" option updates the disk ID and the DG ID on the private region of the disks in the DG that is being imported. Such updated information is not yet seen by the slave because the disks have not been brought online again because the "noreonline" option was specified. As a result, the slave cannot identify the disk(s) based on the updated information sent from the master, which caused the import to fail with the error: Disk for disk group not found.

RESOLUTION:
VxVM is updated so that a shared DG import completes successfully even when the "-c" and the "-o noreonline" options are specified together.

* 4018178 (Tracking ID: 3906534)

SYMPTOM:
After Dynamic Multi-Pathing (DMP) Native support is enabled, /boot should to be mounted on the DMP device.

DESCRIPTION:
Typically, /boot is mounted on top of an Operating System (OS) device. When DMP Native support is enabled, only the volume groups (VGs) are migrated from the OS device to the DMP device, but /boot is not migrated. Parallely, if the OS device path is not available, the system becomes unbootable, because /boot is not available. Thus, it is necessary to mount /boot on the DMP device to provide multipathing and resiliency.

RESOLUTION:
The module is updated to migrate /boot on top of a DMP device when DMP Native support is enabled. Note: This fix is available for RHEL 6 only. For other Linux platforms, /boot will still not be mounted on the DMP device.

* 4031342 (Tracking ID: 4031452)

SYMPTOM:
Add node operation is failing with error "Error found while invoking '' in the new node, and rollback done in both nodes"

DESCRIPTION:
Stack showed a valid address for pointer ptmap2, but still it generated core.
It suggested that it might be a double-free case. Issue lies in freeing a pointer

RESOLUTION:
Added handling for such case by doing NULL assignment to pointers wherever they are freed

* 4037283 (Tracking ID: 4021301)

SYMPTOM:
Data corruption issue happened with the big size IO processed by Linux kernel IO split on RHEL8.

DESCRIPTION:
On RHEL8 or as of Linux kernel 3.13, it introduces some changes in Linux kernel block layer, new item of the bio iterator structure is used to represent the start offset of bio or bio vectors after the IO processed by Linux kernel IO split functions. Also, in recent version of vxfs, it can generate bio with larger size than the size limitation defined within Linux kernel block layer and VxVM, which lead the IO from vxfs could be split by Linux kernel. For such split IOs, VxVM does not take the new item of the bio iterator into account while process them, which caused the data is written to wrong position of volume/disk. Hence, data corruption.

RESOLUTION:
Code changes have been made to bypass the Linux kernel IO split functions, which seems redundant for VxVM IO processing.

* 4042038 (Tracking ID: 4040897)

SYMPTOM:
This is new array and we need to add support for claiming HPE MSA 2060 arrays.

DESCRIPTION:
HPE MSA 2060 is new array and current ASL doesn't support it. So it will not be claimed with current ASL. This array support has been now added in the current ASL.

RESOLUTION:
Code changes to support HPE MSA 2060 array have been done.

* 4046906 (Tracking ID: 3956607)

SYMPTOM:
When removing a VxVM disk using the vxdg-rmdisk operation, the following error occurs while requesting a disk reclaim:
VxVM vxdg ERROR V-5-1-0 Disk <device_name> is used by one or more subdisks which are pending to be reclaimed.
Use "vxdisk reclaim <device_name>" to reclaim space used by these subdisks, and retry "vxdg rmdisk" command.
Note: The reclamation operation is irreversible. However, a core dump occurs when vxdisk-reclaim is executed.

DESCRIPTION:
This issue occurs due to a memory allocation failure in the disk-reclaim code, which fails to be detected and causes an invalid address to be referenced. Consequently, a core dump occurs.

RESOLUTION:
The disk-reclaim code is updated to handle memory allocation failures properly.

* 4046907 (Tracking ID: 4041001)

SYMPTOM:
When some nodes of a system are rebooted, they cannot join back because the required disk attach transactions fail.

DESCRIPTION:
In a VxVM environment, when some nodes are rebooted, some plexes of the volume are detached. It may happen that all the plexes of a volume are disabled. In this case, if all the plexes of some DCO volume become inaccessible, that DCO volume state does not get marked as BADLOG. Consequently, transactions fail with the following error:
VxVM ERROR V-5-1-10128  DCO experienced IO errors during the operation. Re-run the operation after ensuring that DCO is accessible.
The system hangs and the nodes cannot join, because the transactions fail.

RESOLUTION:
VxVM is updated to assress this issue. When all the plexes of a DCO become inaccessible during I/O load, the DCO state is marked as BADLOG.

* 4046908 (Tracking ID: 4038865)

SYMPTOM:
System panick at vxdmp module with following calltrace in IRQ stack.
native_queued_spin_lock_slowpath
queued_spin_lock_slowpath
_raw_spin_lock_irqsave7
dmp_get_shared_lock
gendmpiodone
dmpiodone
bio_endio
blk_update_request
scsi_end_request
scsi_io_completion
scsi_finish_command
scsi_softirq_done
blk_done_softirq
__do_softirq
call_softirq
do_softirq
irq_exit
do_IRQ
 <IRQ stack>

DESCRIPTION:
A deadlock issue can happen between inode_hash_lock and DMP shared lock, when one process holding inode_hash_lock but acquires the DMP shared lock in IRQ context, in the mean time other process holding the DMP shared lock may acquire inode_hash_lock.

RESOLUTION:
Code changes have been done to avoid the deadlock issue.

* 4047592 (Tracking ID: 3992040)

SYMPTOM:
VxFS Testing CFS Stress hits a kernel panic, f:vx_dio_bio_done:2

DESCRIPTION:
In RHEL8.0/SLES15 kernel code, The value in bi_status isn't a standard error code at and there are completely separate set of values that are all small positive integers (for example, BLK_STS_OK and BLK_STS_IOERROR) while actual errors sent by VM are different hence VM should send proper bi_status to FS with newer kernel. This fix avoids further kernel crashes.

RESOLUTION:
Code changes are done to have a map for bi_status and bi_error conversion( as it's been there in Linux Kernel code - blk-core.c)

* 4047695 (Tracking ID: 3911930)

SYMPTOM:
Valid PGR operations sometimes fail on a DMP node.

DESCRIPTION:
As part of the PGR operations, if the inquiry command finds that PGR is not
supported on the DMP node, the PGR_FLAG_NOTSUPPORTED flag is set on the node. Further PGR operations check this value and issue PGR commands only if the flag is not set. PGR_FLAG_NOTSUPPORTED remains set even if the hardware is changed so as to support PGR.

RESOLUTION:
A new command, enablepr, is provided in the vxdmppr utility to clear this flag on the specified DMP node.

* 4047722 (Tracking ID: 4023390)

SYMPTOM:
Vxconfigd crashes as a disk contains invalid privoffset(160), which is smaller than minimum required offset(VTOC 265, GPT 208).

DESCRIPTION:
There may have disk label corruption or stale information residents on the disk header, which caused unexpected label written.

RESOLUTION:
Add a assert when updating CDS label to ensure the valid privoffset written to disk header.

* 4049268 (Tracking ID: 4044583)

SYMPTOM:
A system goes into the maintenance mode when DMP is enabled to manage native devices.

DESCRIPTION:
The "vxdmpadm gettune dmp_native_support=on" command is used to enable DMP to manage native devices. After you change the value of the dmp_native_support tunable, you need to reboot the system needs for the changes to take effect. However, the system goes into the maintenance mode after it reboots. The issue occurs due to the copying of the local liblicmgr72.so file instead of the original one while creating the vx_initrd image.

RESOLUTION:
Code changes have been made to copy the correct liblicmgr72.so file. The system successfully reboots without going into maintenance mode.

Patch ID: VRTSvxvm-7.4.2.1500

* 4018182 (Tracking ID: 4008664)

SYMPTOM:
System panic occurs with the following stack:

void genunix:psignal+4()
void vxio:vol_logger_signal_gen+0x40()
int vxio:vollog_logentry+0x84()
void vxio:vollog_logger+0xcc()
int vxio:voldco_update_rbufq_chunk+0x200()
int vxio:voldco_chunk_updatesio_start+0x364()
void vxio:voliod_iohandle+0x30()
void vxio:voliod_loop+0x26c((void *)0)
unix:thread_start+4()

DESCRIPTION:
Vxio keeps vxloggerd proc_t that is used to send a signal to vxloggerd. In case vxloggerd has been ended for some reason, the signal may be sent to an unexpected process, which may cause panic.

RESOLUTION:
Code changes have been made to correct the problem.

* 4020207 (Tracking ID: 4018086)

SYMPTOM:
vxiod with ID as 128 was stuck with below stack:

 #2 [] vx_svar_sleep_unlock at [vxfs]
 #3 [] vx_event_wait at [vxfs]
 #4 [] vx_async_waitmsg at [vxfs]
 #5 [] vx_msg_send at [vxfs]
 #6 [] vx_send_getemapmsg at [vxfs]
 #7 [] vx_cfs_getemap at [vxfs]
 #8 [] vx_get_freeexts_ioctl at [vxfs]
 #9 [] vxportalunlockedkioctl at [vxportal]
 #10 [] vxportalkioctl at [vxportal]
 #11 [] vxfs_free_region at [vxio]
 #12 [] vol_ru_start_replica at [vxio]
 #13 [] vol_ru_start at [vxio]
 #14 [] voliod_iohandle at [vxio]
 #15 [] voliod_loop at [vxio]

DESCRIPTION:
With SmartMove feature as ON, it can happen vxiod with ID as 128 starts replication where RVG was in DCM mode, this vxiod is waiting for filesystem's response if a given region is used by filesystem or not. Filesystem will trigger MDSHIP IO on logowner. Due to a bug in code, MDSHIP IO always gets queued in vxiod with ID as 128. Hence a dead lock situation.

RESOLUTION:
Code changes have been made to avoid handling MDSHIP IO in vxiod whose ID is bigger than 127.

* 4020438 (Tracking ID: 4020046)

SYMPTOM:
The following IO errors are reported on VxVM sub-disks result in DRL log detached without any SCSI errors detected.

VxVM vxio V-5-0-1276 error on Subdisk [xxxx] while writing volume [yyyy][log] offset 0 length [zzzz]
VxVM vxio V-5-0-145 DRL volume yyyy[log] is detached

DESCRIPTION:
DRL plexes detached as an atomic write flag (BIT_ATOMIC) was set on BIO unexpectedly. The BIT_ATOMIC flag gets set on bio only if VOLSIO_BASEFLAG_ATOMIC_WRITE flag is set on SUBDISK SIO and its parent MVWRITE SIO's sio_base_flags. When generating MVWRITE SIO,  it's sio_base_flags was copied from a gio structure, because the gio structure memory isn't initialized it may contain gabarge values, hence the issue.

RESOLUTION:
Code changes have been made to fix the issue.

* 4021238 (Tracking ID: 4008075)

SYMPTOM:
Observed with ASL changes for NVMe, This issue observed in reboot scenario. For every reboot machine was hitting panic And this was happening in loop.

DESCRIPTION:
panic was hitting for such splitted bios, root cause for this is RHEL8 introduced a new field named as __bi_remaining.
where __bi_remaining is maintanins the count of chained bios, And for every endio that __bi_remaining gets atomically decreased in bio_endio() function.
While decreasing __bi_remaining OS checks that the __bi_remaining 'should not <= 0' and in our case __bi_remaining was always 0 and we were hitting OS
BUG_ON.

RESOLUTION:
>>> For scsi devices maxsize is 4194304,
[   26.919333] DMP_BIO_SIZE(orig_bio) : 16384, maxsize: 4194304
[   26.920063] DMP_BIO_SIZE(orig_bio) : 262144, maxsize: 4194304

>>>and for NVMe devices maxsize is 131072
[  153.297387] DMP_BIO_SIZE(orig_bio) : 262144, maxsize: 131072
[  153.298057] DMP_BIO_SIZE(orig_bio) : 262144, maxsize: 131072

* 4021240 (Tracking ID: 4010612)

SYMPTOM:
$ vxddladm set namingscheme=ebn lowercase=no
This issue observed for NVMe and ssd. where every disk has separate enclosure like nvme0, nvme1... so on. means every nvme/ssd disks names would be 
hostprefix_enclosurname0_disk0, hostprefix_enclosurname1_disk0....

DESCRIPTION:
$ vxddladm set namingscheme=ebn lowercase=no
This issue observed for NVMe and ssd. where every disk has separate enclosure like nvme0, nvme1... so on.
means every nvme/ssd disks names would be hostprefix_enclosurname0_disk0, hostprefix_enclosurname1_disk0....
eg.
smicro125_nvme0_0 <--- disk1
smicro125_nvme1_0 <--- disk2

for lowercase=no our current code is suppressing the suffix digit of enclosurname and hence multiple disks gets same name and it is showing udid_mismatch 
because whatever udid of private region is not matching with ddl. ddl database showing wrong info because of multiple disks gets same name.

smicro125_nvme_0 <--- disk1   <<<<<<<-----here suffix digit of nvme enclosure suppressed
smicro125_nvme_0 <--- disk2

RESOLUTION:
Append the suffix integer while making da_name

* 4021346 (Tracking ID: 4010207)

SYMPTOM:
System panic occurred with the below stack:

native_queued_spin_lock_slowpath()
queued_spin_lock_slowpath()
_raw_spin_lock_irqsave()
volget_rwspinlock()
volkiodone()
volfpdiskiodone()
voldiskiodone_intr()
voldmp_iodone()
bio_endio()
gendmpiodone()
dmpiodone()
bio_endio()
blk_update_request()
scsi_end_request()
scsi_io_completion()
scsi_finish_command()
scsi_softirq_done()
blk_done_softirq()
__do_softirq()
call_softirq()

DESCRIPTION:
As part of collecting the IO statistics collection, the vxstat thread acquires a spinlock and tries to copy data to the user space. During the data copy, if some page fault happens, then the thread would relinquish the CPU and provide the same to some other thread. If the thread which gets scheduled on the CPU requests the same spinlock which vxstat thread had acquired, then this results in a hard lockup situation.

RESOLUTION:
Code has been changed to properly release the spinlock before copying out the data to the user space during vxstat collection.

* 4021366 (Tracking ID: 4008741)

SYMPTOM:
VxVM device files appears to have device_t SELinux label.

DESCRIPTION:
If an unauthorized or modified device is allowed to exist on the system, there is the possibility the system may perform unintended or unauthorized operations.
eg: ls -LZ
...
...
/dev/vx/dsk/testdg/vol1   system_u:object_r:device_t:s0
/dev/vx/dmpconfig         system_u:object_r:device_t:s0
/dev/vx/vxcloud           system_u:object_r:device_t:s0

RESOLUTION:
Code changes made to change the device labels to misc_device_t, fixed_disk_device_t.

* 4023095 (Tracking ID: 4007920)

SYMPTOM:
vol_snap_fail_source tunable is set still largest and oldest snapshot automatically deleted when cache object becomes full

DESCRIPTION:
If vol_snap_fail_source tunable is set then oldest snapshot should not be deleted in case of cache object full. Flex requires these snapshots for rollback.

RESOLUTION:
Added fix to stop auto snapshot deletion in vxcached

Patch ID: VRTSpython-3.7.4.38

* 4108085 (Tracking ID: 4108043)

SYMPTOM:
VRTSpython having multiple security issues for unused python modules.

DESCRIPTION:
VRTSrest is not supported for IS 7.4.2 and python modules required for REST having multiple security issues.

RESOLUTION:
Removing all python modules required for rest from VRTSpython 3.7.4.38

Patch ID: VRTSvxfs-7.4.2.4100

* 4106702 (Tracking ID: 4106701)

SYMPTOM:
A security vulnerability exists in the third-party component sqlite.

DESCRIPTION:
VXFS uses a third-party component named sqlitein which a security vulnerability exists.

RESOLUTION:
VxFS is updated to use a newer version of sqlitein which the security vulnerability has been addressed.

Patch ID: VRTSvxfs-7.4.2.3900

* 4050870 (Tracking ID: 3987720)

SYMPTOM:
vxms test is having failures.

DESCRIPTION:
vxms test is having failures.

RESOLUTION:
updated vxms.

* 4071105 (Tracking ID: 4067393)

SYMPTOM:
System panicked with the following stack trace:

page_fault 
[exception RIP: vx_ckptdir_nmspc_match+29]
vx_nmspc_resolve
vx_drevalidate 
lookup_dcache 
do_last 
path_openat 
do_filp_open 
do_sys_open 
sys_open

DESCRIPTION:
Negative path lookup on force unmounted file system was not handled, hence NULL pointer
de-reference due to accessing already freed fs struc of force unmounted fs.

RESOLUTION:
Handled cases for force umounted before vx_nmspc_resolve() call, so it can NULL pointer
de-reference.

* 4074298 (Tracking ID: 4069116)

SYMPTOM:
fsck got stuck in pass1 inode validation.

DESCRIPTION:
fsck could land into a infinite retry loop during inode validation with the following stack trace:

pthread_mutex_unlock()
bc_getfreebuf()
sl_getblk()
bc_rgetblk()
fs_getblk()
bmap_bread()
fs_bmap_typ()
fs_callback_bmap()
fsck_callback_bmap()
bmap_check_overlay()
ivalidate()
pass1()
iproc_do_work()
start_thread()

This is because the inode is completely corrupted in such a way that it matches a known inode type in ivalidate() and goes ahead to verify the inode bmap. While trying to do so it requests for a buffer size larger than maximum fsck buffer cache memory and hence gets stuck in a loop.

RESOLUTION:
Added code changes to skip bmap validation if the inode mode bits are corrupted

* 4075873 (Tracking ID: 4075871)

SYMPTOM:
Utility to find possible pending stuck messages.

DESCRIPTION:
Utility to find possible pending stuck messages.

RESOLUTION:
Added utility to find possible pending stuck messages.

* 4075875 (Tracking ID: 4018783)

SYMPTOM:
Metasave collection and restore takes significant amount of time.

DESCRIPTION:
Metasave collection and restore takes significant amount of time.

RESOLUTION:
Code changes have been done in metasave code base to improve metasave collection and metasave restore in the range of 30-40%.

* 4084881 (Tracking ID: 4084542)

SYMPTOM:
Enhance fsadm defrag report to display if FS is badly fragmented.

DESCRIPTION:
Enhance fsadm defrag report to display if FS is badly fragmented.

RESOLUTION:
Added method to identify if FS needs defragmentation.

* 4088078 (Tracking ID: 4087036)

SYMPTOM:
FSCK utility exits with an error while running it with the "-o metasave" option on a shared volume.

DESCRIPTION:
FSCK utility exits with an error while running it with the "-o metasave" option on a shared volume. Besides this, while running this utility with "-n" and either "-o metasave" or "-o dumplog", it silently ignores the latter option(s).

RESOLUTION:
Code changes have been done to resolve the above-mentioned failure and also warning messages have been added to inform users regarding mutually exclusive behavior of "-n" and either of "metasave" and "dumplog" options instead of silently ignoring them.

* 4090573 (Tracking ID: 4056648)

SYMPTOM:
Metasave collection can be executed on a mounted filesystem.

DESCRIPTION:
If metasave image is collected from a mounted filesystem then it might be an inconsistent state of the filesystem as there could be ongoing changes happening on the filesystem.

RESOLUTION:
Code changes have been done to fail default metasave collection for a mounted filesystem. If metasave needs to be collected from mounted filesystem then this can still be achieved with option "-o inconsistent".

* 4090600 (Tracking ID: 4090598)

SYMPTOM:
Utility to detect culprit nodes while cfs hang is observed.

DESCRIPTION:
Utility to detect culprit nodes while cfs hang is observed.Customer can reboot and collect crash from those nodes to get the application up and running. Integrated msgdump and glmdump utiltiy with cfshang_check.

RESOLUTION:
Integrated msgdump and glmdump utiltiy with cfshang_check.

* 4090601 (Tracking ID: 4068143)

SYMPTOM:
fsck->misc is having failures.

DESCRIPTION:
fsck->misc is having failures.

RESOLUTION:
Updated fsck->misc.

* 4090617 (Tracking ID: 4070217)

SYMPTOM:
Command fsck might fail with 'cluster reservation failed for volume' message for a disabled cluster-mounted filesystem.

DESCRIPTION:
On a disabled cluster-mounted filesystem, release of cluster reservation might fail during unmount operation resulting in a  failure of command fsck with 'cluster reservation failed for volume' message.

RESOLUTION:
Code is modified to release cluster reservation in unmount operation properly even for cluster-mounted filesystem.

* 4090639 (Tracking ID: 4086084)

SYMPTOM:
VxFS mount operation causes system panic when -o context is used.

DESCRIPTION:
VxFS mount operation supports context option to override existing extended attributes, or to specify a different, default context for file systems that do not support extended attributes. System panic observed when -o context is used.

RESOLUTION:
Required code changes are added to avoid panic.

* 4091580 (Tracking ID: 4056420)

SYMPTOM:
VFR  Hardlink file is not getting replicated after modification in incremental sync.

DESCRIPTION:
VFR  Hardlink file is not getting replicated after modification in incremental sync.

RESOLUTION:
Updated code to address: VFR Hardlink file is not getting replicated after modification in incremental sync.

* 4093306 (Tracking ID: 4090127)

SYMPTOM:
CFS hang in vx_searchau().

DESCRIPTION:
As part of SMAP transaction changes, allocator changed its logic to call mdele tryhold always when getting the emap for a particular EAU, and it passes 
nogetdele as 1 to mdele_tryhold, which suggests that mdele_tryhold should not ask for delegation when detecting a free EAU without delegation, so in our case, 
allocator finds such an EAU in device summary tree but without delegation,  and it keeps retrying but without asking for delegation, hence the forever.

RESOLUTION:
In case a FREE EAU is found without delegation, delegate it back to Primary.

Patch ID: VRTSvxfs-7.4.2.3600

* 4089394 (Tracking ID: 4089392)

SYMPTOM:
Security vulnerabilities exist in the OpenSSL third-party components used by VxFS.

DESCRIPTION:
VxFS uses the OpenSSL third-party components in which some security vulnerability exist.

RESOLUTION:
VxFS is updated to use newer version (1.1.1q) of this third-party components in which the security vulnerabilities have been addressed.

Patch ID: VRTSvxfs-7.4.2.3500

* 4083948 (Tracking ID: 4070814)

SYMPTOM:
Security Vulnerability found in VxFS while running security scans.

DESCRIPTION:
In our internal security scans we found some Vulnerabilities in VxFS third party component Zlib.

RESOLUTION:
Upgrading the third party component Zlib to resolve these vulnerabilities.

Patch ID: VRTSvxfs-7.4.2.3400

* 4079532 (Tracking ID: 4079869)

SYMPTOM:
Security Vulnerability found in VxFS while running security scans.

DESCRIPTION:
In our internal security scans we found some Vulnerabilities in VxFS third party components. The Attackers can exploit these security vulnerability 
to attack on system.

RESOLUTION:
Upgrading the third party components to resolve these vulnerabilities.

Patch ID: VRTSvxfs-7.4.2.2600

* 4015834 (Tracking ID: 3988752)

SYMPTOM:
Use ldi_strategy() routine instead of bdev_strategy() for IO's in solaris.

DESCRIPTION:
bdev_strategy() is deprecated from solaris code and was causing performance issues when used for IO's. Solaris has recommended to use LDI framework for all IO's.

RESOLUTION:
Code is modified to use ldi framework for all IO's in solaris.

* 4040612 (Tracking ID: 4033664)

SYMPTOM:
Multiple issues occur with hardlink replication using VFR.

DESCRIPTION:
Multiple different issues occur with hardlink replication using Veritas File Replicator (VFR).

RESOLUTION:
VFR is updated to fix issues with hardlink replication in the following cases:
1. Files with multiple links
2. Data inconsistency after hardlink file replication
3. Rename and move operations dumping core in multiple different scenarios
4. WORM feature support

* 4040618 (Tracking ID: 4040617)

SYMPTOM:
Veritas file replicator is not performing as per the expectation.

DESCRIPTION:
Veritas FIle replicator was having some bottlenecks at networking layer as well as data transfer level. This was causing additional throttling in the Replication.

RESOLUTION:
Performance optimisations done at multiple places to make use of available resources properly so that Veritas File replicator

* 4060549 (Tracking ID: 4047921)

SYMPTOM:
Replication job was getting into hung state because of the deadlock involving below threads :

Thread : 1  

#0  0x00007f160581854d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f1605813e9b in _L_lock_883 () from /lib64/libpthread.so.0
#2  0x00007f1605813d68 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x000000000043be1f in replnet_sess_bulk_free ()
#4  0x000000000043b1e3 in replnet_server_dropchan ()
#5  0x000000000043ca07 in replnet_client_connstate ()
#6  0x00000000004374e3 in replnet_conn_changestate ()
#7  0x0000000000437c18 in replnet_conn_evalpoll ()
#8  0x000000000044ac39 in vxev_loop ()
#9  0x0000000000405ab2 in main ()

Thread 2 :

#0  0x00007f1605815a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000000000043902b in replnet_msgq_waitempty ()
#2  0x0000000000439082 in replnet_bulk_recv_func ()
#3  0x00007f1605811ea5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f1603ef29fd in clone () from /lib64/libc.so.6

DESCRIPTION:
When replication job is paused/resumed in a succession multiple times because of the race condition it may lead to a deadlock situation involving two threads.

RESOLUTION:
Fix the locking sequence and add additional holds on resources to avoid race leading to deadlock situation.

* 4060566 (Tracking ID: 4052449)

SYMPTOM:
Cluster goes in an 'unresponsive' mode while invalidating pages due to duplicate page entries in iowr structure.

DESCRIPTION:
While finding pages for invalidation of inodes, VxFS traverses radix tree by taking RCU lock and fills the IO structure with dirty/writeback pages that need to be invalidated in an array. This lock is efficient for read but does not protect the parallel creation/deletion of node. Hence, when VxFS finds page, consistency for the page in checked through radix_tree_exception()/radix_tree_deref_retry(). And if it fails, VxFS restarts the page finding from start offset. But VxFs does not reset the array index, leading to incorrect filling of IO structure's array which was causing  duplicate entries of pages. While trying to destroy these pages, VxFS takes page lock on each page. Because of duplicate entries, VxFS tries to take page lock couple of times on same page, leading to self-deadlock.

RESOLUTION:
Code is modified to reset the array index correctly in case of failure to find pages.

* 4060585 (Tracking ID: 4042925)

SYMPTOM:
Intermittent Performance issue on commands like df and ls.

DESCRIPTION:
Commands like "df" "ls" issue stat system call on node to calculate the statistics of the file system. In a CFS, when stat system call is issued, it compiles statistics from all nodes. When multiple df or ls are fired within specified time limit, vxfs is optimized. vxfs returns the cached statistics, instead of recalculating statistics from all nodes. If multiple such commands are fired in succession and one of the old caller of stat system call takes time, this optimization fails and VxFS recompiles statistics from all nodes. This can lead to bad performance of stat system call, leading to unresponsive situations for df, ls commands.

RESOLUTION:
Code is modified to protect last modified time of stat system call with a sleep lock.

* 4060805 (Tracking ID: 4042254)

SYMPTOM:
vxupgrade sets fullfsck flag in the filesystem if it is unable to upgrade the disk layout version because of ENOSPC.

DESCRIPTION:
If the filesystem is 100 % full and  its disk layout version is upgraded by using vxupgrade, then this utility starts the upgrade and later it fails with ENOSPC and ends up setting fullfsck flag in the filesystem.

RESOLUTION:
Code changes introduced which first calculate the required space to perform the disk layout upgrade. If the required space is not available, it fails the upgrade gracefully without setting fullfsck flag.

* 4061203 (Tracking ID: 4005620)

SYMPTOM:
Inode count maintained in the inode allocation unit (IAU) can be negative when an IAU is marked bad. An error such as the following is logged.

V-2-4: vx_mapbad - vx_inoauchk - /fs1 file system free inode bitmap in au 264 marked bad

Due to the negative inode count, errors like the following might be observed and processes might be stuck at inode allocation with a stack trace as shown.

V-2-14: vx_iget - inode table overflow

	vx_inoauchk 
	vx_inofindau 
	vx_findino 
	vx_ialloc 
	vx_dirmakeinode 
	vx_dircreate 
	vx_dircreate_tran 
	vx_pd_create 
	vx_create1_pd 
	vx_do_create 
	vx_create1 
	vx_create0 
	vx_create 
	vn_open 
	open

DESCRIPTION:
The inode count can be negative if somehow VxFS tries to allocate an inode from an IAU where the counter for regular file and directory inodes is zero. In such a situation, the inode allocation fails and the IAU map is marked bad. But the code tries to further reduce the already-zero counters, resulting in negative counts that can cause subsequent unresponsive situation.

RESOLUTION:
Code is modified to not reduce inode counters in vx_mapbad code path if the result is negative. A diagnostic message like the following flashes.
"vxfs: Error: Incorrect values of ias->ifree and Aus rifree detected."

* 4061527 (Tracking ID: 4054386)

SYMPTOM:
VxFS systemd service may show active status despite the module not being loaded.

DESCRIPTION:
If systemd service fails to load vxfs module, the service still shows status as active instead of failed.

RESOLUTION:
The script is modified to show the correct status in case of such failures.

Patch ID: VRTSvxfs-7.4.2.2200

* 4013420 (Tracking ID: 4013139)

SYMPTOM:
The abort operation on an ongoing online migration from the native file system to VxFS on RHEL 8.x systems.

DESCRIPTION:
The following error messages are logged when the abort operation fails:
umount: /mnt1/lost+found/srcfs: not mounted
UX:vxfs fsmigadm: ERROR: V-3-26835:  umount of source device: /dev/vx/dsk/testdg/vol1 failed, with error: 32

RESOLUTION:
The fsmigadm utility is updated to address the issue with the abort operation on an ongoing online migration.

* 4040238 (Tracking ID: 4035040)

SYMPTOM:
After replication job paused and resumed some of the fields got missed in stats command output and never shows missing fields on onward runs.

DESCRIPTION:
rs_start for the current stat initialized to the start time of the replication and default value of rs_start is zero.
Stat don't show some fields in-case rc_start is zero.

        if (rs->rs_start && dis_type == VX_DIS_CURRENT) {
                if (!rs->rs_done) {
                        diff = rs->rs_update - rs->rs_start;
                }
                else {
                        diff = rs->rs_done - rs->rs_start;
                }

                /*
                 * The unit of time is in seconds, hence
                 * assigning 1 if the amount of data
                 * was too small
                 */

                diff = diff ? diff : 1;
                rate = rs->rs_file_bytes_synced /
                        (diff - rs->rs_paused_duration);
                printf("\t\tTransfer Rate: %s/sec\n", fmt_bytes(h, rate));
        }

In replication we initialize the rs_start to zero and update with the start time but we don't save the stats to disk. That small window leave a case where
in-case, we pause the replication and start again we always see the rs_start to zero.

Now after initializing the rs_start we write to disk in the same function. In-case in resume case we found rs_start to zero, we again initialize the rs_start 
field to current replication start time.

RESOLUTION:
Write rs_start to disk and added a check in resume case to initialize rs_start value in-case found 0.

* 4040608 (Tracking ID: 4008616)

SYMPTOM:
fsck command got hung.

DESCRIPTION:
fsck got stuck due to deadlock when a thread which marked buffer aliased is waiting for itself for the reference drain, while
getting block code was called with NOBLOCK flag.

RESOLUTION:
honour NOBLOCK flag

* 4042686 (Tracking ID: 4042684)

SYMPTOM:
Command fails to resize the file.

DESCRIPTION:
There is a window where a parallel thread can clear IDELXWRI flag which it should not.

RESOLUTION:
setting the delayed extending write flag incase any parallel thread has cleared it.

* 4044184 (Tracking ID: 3993140)

SYMPTOM:
In every 60 seconds, compclock was lagging behind approximate 1.44 seconds from actual time elapsed.

DESCRIPTION:
In every 60 seconds, compclock was lagging behind approximate 1.44 seconds from actual time elapsed.

RESOLUTION:
Made adjustment to logic responsible for calculating and updating compclock timer.

* 4046265 (Tracking ID: 4037035)

SYMPTOM:
Added new tunable "vx_ninact_proc_threads" to control the number of inactive processing threads.

DESCRIPTION:
On high end servers, heavy lock contention was seen during inactive removal processing, which was caused by the large number of inactive worker threads spawned by VxFS. To avoid the contention, new tunable "vx_ninact_proc_threads" was added so that customer can adjust the number of inactive processing threads based on their server config and workload.

RESOLUTION:
Added new tunable "vx_ninact_proc_threads" to control the number of inactive processing threads.

* 4046266 (Tracking ID: 4043084)

SYMPTOM:
panic in vx_cbdnlc_lookup

DESCRIPTION:
Panic observed in the following stack trace:
vx_cbdnlc_lookup+000140 ()
vx_int_lookup+0002C0 ()
vx_do_lookup2+000328 ()
vx_do_lookup+0000E0 ()
vx_lookup+0000A0 ()
vnop_lookup+0001D4 (??, ??, ??, ??, ??, ??)
getFullPath+00022C (??, ??, ??, ??)
getPathComponents+0003E8 (??, ??, ??, ??, ??, ??, ??)
svcNameCheck+0002EC (??, ??, ??, ??, ??, ??, ??)
kopen+000180 (??, ??, ??)
syscall+00024C ()

RESOLUTION:
Code changes to handle memory pressure while changing FC connectivity

* 4046267 (Tracking ID: 4034910)

SYMPTOM:
Garbage values inside global list large_dirinfo.

DESCRIPTION:
Garbage values inside global list large_dirinfo, which will lead to fsck failure.

RESOLUTION:
Make access/updataion to global list large_dirinfo synchronous throughout the fsck binary, so that garbage values due to race condition can be avoided.

* 4046271 (Tracking ID: 3993822)

SYMPTOM:
running fsck on a file system core dumps

DESCRIPTION:
buffer was marked as busy without taking buffer lock while getting buffer from freelist in 1 thread and there was another thread 
that was accessing this buffer through its local variable

RESOLUTION:
marking buffer busy within the buffer lock while getting free buffer.

* 4046272 (Tracking ID: 4017104)

SYMPTOM:
Deleting a huge number of inodes can consume a lot of system resources during inactivations which cause hangs or even panic.

DESCRIPTION:
Delicache inactivations dumps all the inodes in its inventory, all at once for inactivation. This causes a surge in the resource consumptions due to which other processes can starve.

RESOLUTION:
Gradually process the inode inactivation.

* 4046829 (Tracking ID: 3993943)

SYMPTOM:
The fsck utility hit the coredump due to segmentation fault in get_dotdotlst().

Below is stack trace of the issue.

get_dotdotlst 
check_dotdot_tbl 
iproc_do_work
start_thread 
clone ()

DESCRIPTION:
Due to a bug in fsck utility the coredump was generated while running the fsck on the filesystem. The fsck operation aborted in between due to the coredump.

RESOLUTION:
Code changes are done to fix this issue

* 4047568 (Tracking ID: 4046169)

SYMPTOM:
On RHEL8, while doing a directory move from one FS (ext4 or vxfs) to migration VxFS, the migration can fail and FS will be disable. In debug testing, the issue was caught by internal assert, with following stack trace.

panic
ted_call_demon
ted_assert
vx_msgprint
vx_mig_badfile
vx_mig_linux_removexattr_int
__vfs_removexattr
__vfs_removexattr_locked
vfs_removexattr
removexattr
path_removexattr
__x64_sys_removexattr
do_syscall_64

DESCRIPTION:
Due to different implementation of "mv" operation in RHEL8 (as compared to RHEL7), there is a removexattr call on the target FS - which in migration case will be migration VxFS. In this removexattr call, kernel asks "system.posix_acl_default" attribute to be removed from the directory to be moved. But since the directory is not present on the target side yet (and hence no extended attributes for the directory), the code returns ENODATA. When code in vx_mig_linux_removexattr_int() encounter this error, it disables the FS and in debug pkg calls assert.

RESOLUTION:
The fix is to ignore ENODATA error and not assert or disable the FS.

* 4049091 (Tracking ID: 4035057)

SYMPTOM:
On RHEL8, IOs done on FS, while other FS to VxFS migration is in progress can cause panic, with following stack trace.
 machine_kexec
 __crash_kexec
 crash_kexec
 oops_end
 no_context
 do_page_fault
 page_fault
 [exception RIP: memcpy+18]
 _copy_to_iter
 copy_page_to_iter
 generic_file_buffered_read
 new_sync_read
 vfs_read
 kernel_read
 vx_mig_read
 vfs_read
 ksys_read
 do_syscall_64

DESCRIPTION:
- As part of RHEL8 support changes, vfs_read, vfs_write calls were replaced with kernel_read, kernel_write as the vfs_ calls are no longer exported. The kernel_read, kernel_write calls internally set the memory segment of the thread to KERNEL_DS and expects the buffer passed to have been allocated in kernel space.
- In migration code, if the read/write operation cannot be completed using target FS (VxFS), then the IO is redirected to source FS. And in doing so, the code passes the same buffer - which is a user buffer to kernel call. This worked well with vfs_read, vfs_write calls. But is does not work with kernel_read, kernel_write calls, causing a panic.

RESOLUTION:
- Fix is to use vfs_iter_read, vfs_iter_write calls, which work with user buffer. To use these methods the user buffer needs to passed as part of struct iovec.iov_base

* 4049097 (Tracking ID: 4049096)

SYMPTOM:
Tar command errors out with 1 throwing warnings.

DESCRIPTION:
This is happening due to dalloc which is changing the ctime of the file after allocating the extents `(worklist thread)->vx_dalloc_flush -> vx_dalloc_off` in between the 2 fsstat calls in tar.

RESOLUTION:
Avoiding changing ctime while allocating delayed extents in background.

Patch ID: VRTSvxfs-7.4.2.1600

* 4012765 (Tracking ID: 4011570)

SYMPTOM:
WORM attribute replication support in VxFS.

DESCRIPTION:
WORM attribute replication is not supported in VFR. Modified code to replicate WORM attribute during attribute processing in VFR.

RESOLUTION:
Code is modified to replicate WORM attributes in VFR.

* 4014720 (Tracking ID: 4011596)

SYMPTOM:
It throws error saying "No such file or directory present"

DESCRIPTION:
Bug observed during parallel communication between all the nodes. Some required temp files were not present on other nodes.

RESOLUTION:
Fixed to have consistency maintained while parallel node communication. Using hacp for transferring temp files.

* 4015287 (Tracking ID: 4010255)

SYMPTOM:
"vfradmin promote" fails to promote target FS with selinux enabled.

DESCRIPTION:
During promote operation, VxFS remounts FS at target. When remounting FS to remove "protected on" flag from target, VxFS first fetch current mount options. With Selinux enabled (either in permissive mode/enabled), OS adds default "seclable" option to mount. When VxFS fetch current mount options, "seclabel" was not recognized by VxFS. Hence it fails to mount FS.

RESOLUTION:
Code is modified to remove "seclabel" mount option during mount processing on target.

* 4015835 (Tracking ID: 4015278)

SYMPTOM:
System panics during vx_uiomove_by _hand

DESCRIPTION:
During uiomove, VxFS get the pages from OS through get_user_pages() to copy user data. Oracle use hugetablfs internally for performance reason. This can allocate hugepages. Under low memory condition, it is possible that get_user_pages() might return VxFS compound pages. In case of compound pages, only head page has valid mapping set and all other pages are mapped as TAIL_MAPPING. In case of uiomove, if VxFS gets compound page, then it try to check writable mapping for all pages from this compound page. This can result into dereferencing illegal address (TAIL_MAPPING) which was causing panic in  stack. VxFS doesn't support huge pages but it is possible that compound page is present on the system and VxFS might get one through get_user_pages.

RESOLUTION:
Code is modified to get head page in case of tail pages from compound page when VxFS checks writeable mapping.

* 4016721 (Tracking ID: 4016927)

SYMPTOM:
Remove tier command panics the system, crash has panic reason "BUG: unable to handle kernel NULL pointer dereference at 0000000000000150"

DESCRIPTION:
When fsvoladm removes device all devices are not moved. Number of device count also remains same unless it is the last device in the array. So check for free slot before trying to access device.

RESOLUTION:
In the device list check for free slot before accessing the device in that slot.

* 4017282 (Tracking ID: 4016801)

SYMPTOM:
filesystem mark for fullfsck

DESCRIPTION:
In cluster environment, some operation can be perform on primary node only. When such operations are executed from secondary node, message is 
passed to primary node. During this, it may possible sender node has some transaction and not yet reached to disk. In such scenario, if sender node rebooted 
then primary node can see stale data.

RESOLUTION:
Code is modified to make sure transactions are flush to log disk before sending message to primary.

* 4017818 (Tracking ID: 4017817)

SYMPTOM:
NA

DESCRIPTION:
In order to increase the overall throughput of VFR, code changes have been done
to replicate files parallelly.

RESOLUTION:
Code changes have been done to replicate file's data & metadata parallely over
multiple socket connections.

* 4017820 (Tracking ID: 4017819)

SYMPTOM:
Cloud tier add operation fails when user is trying to add the AWS GovCloud.

DESCRIPTION:
Adding AWS GovCloud as a cloud tier was not supported in InfoScale. With these changes, user will be able to add AWS GovCloud type of cloud.

RESOLUTION:
Added support for AWS GovCloud

* 4019877 (Tracking ID: 4019876)

SYMPTOM:
vxfsmisc.so is publicly shared library for samba and doesn't require infoscale license for its usage

DESCRIPTION:
vxfsmisc.so is publicly shared library for samba and doesn't require infoscale license for its usage

RESOLUTION:
Removed license dependency in vxfsmisc library

* 4020055 (Tracking ID: 4012049)

SYMPTOM:
"fsck" supports the "metasave" option but it was not documented anywhere.

DESCRIPTION:
"fsck" supports the "metasave" option while executing with the "-y" option. but it is not documented anywhere. Also, it tries to store metasave in a particular location. The user doesn't have the option to specify the location. If that location doesn't have enough space, "fsck" fails to take the metasave and it continues to change filesystem state.

RESOLUTION:
Code changes have been done to add one new option with which the user can specify the location to store metasave. "metasave" and "target", these two options have been added in the "usage" message of "fsck" binary.

* 4020056 (Tracking ID: 4012049)

SYMPTOM:
"fsck" supports the "metasave" option but it was not documented anywhere.

DESCRIPTION:
"fsck" supports the "metasave" option while executing with the "-y" option. but it is not documented anywhere. Also, it tries to store metasave in a particular location. The user doesn't have the option to specify the location. If that location doesn't have enough space, "fsck" fails to take the metasave and it continues to change filesystem state.

RESOLUTION:
Code changes have been done to add one new option with which the user can specify the location to store metasave. "metasave" and "target", these two options have been added in the "usage" message of "fsck" binary.

* 4020912 (Tracking ID: 4020758)

SYMPTOM:
Filesystem mount or fsck with -y may see hang during log replay

DESCRIPTION:
fsck utility is used to perform the log replay. This log replay is performed during mount operation or during filesystem check with -y option, if needed. In certain cases if there are lot of logs that needs to be replayed then it end up into consuming entire buffer cache. This results into out of buffer scenario and results into hang.

RESOLUTION:
Code is modified to make sure enough buffers are always available.



INSTALLING THE PATCH
--------------------
Run the Installer script to automatically install the patch:
-----------------------------------------------------------
Please be noted that the installation of this P-Patch will cause downtime.

To install the patch perform the following steps on at least one node in the cluster:
1. Copy the patch infoscale-sles12_x86_64-Patch-7.4.2.3400.tar.gz to /tmp
2. Untar infoscale-sles12_x86_64-Patch-7.4.2.3400.tar.gz to /tmp/hf
    # mkdir /tmp/hf
    # cd /tmp/hf
    # gunzip /tmp/infoscale-sles12_x86_64-Patch-7.4.2.3400.tar.gz
    # tar xf /tmp/infoscale-sles12_x86_64-Patch-7.4.2.3400.tar
3. Install the hotfix(Please be noted that the installation of this P-Patch will cause downtime.)
    # pwd /tmp/hf
    # ./installVRTSinfoscale742P3400 [<host1> <host2>...]

You can also install this patch together with 7.4.2 base release using Install Bundles
1. Download this patch and extract it to a directory
2. Change to the Veritas InfoScale 7.4.2 directory and invoke the installer script
   with -patch_path option where -patch_path should point to the patch directory
    # ./installer -patch_path [<path to this patch>] [<host1> <host2>...]

Install the patch manually:
--------------------------
Manual installation is not recommended.


REMOVING THE PATCH
------------------
Manual uninstallation is not recommended.


SPECIAL INSTRUCTIONS
--------------------
Vulnerabilities Fixed :

Following vulnerabilities are fixed in this security SP –
BDSA-2022-3544, BDSA-2021-2585, BDSA-2021-4125, CVE-2022-35737 (BDSA-2022-2151), CVE-2022-42916 (BDSA-2022-3047), BDSA-2022-3660, CVE-2022-35260 (BDSA-2022-3051), CVE-2022-43551 (BDSA-2022-3659), CVE-2022-32221 (BDSA-2022-3049), CVE-2022-42915 (BDSA-2022-3050), CVE-2022-35252 (BDSA-2022-2385), CVE-2022-0391 (BDSA-2021-4119), CVE-2022-45061 (BDSA-2022-3175), CVE-2021-3737 (BDSA-2021-3183), CVE-2021-3733 (BDSA-2021-2824), CVE-2021-23336 (BDSA-2021-0380), CVE-2020-28493 (BDSA-2020-4134), BDSA-2019-0431, CVE-2021-3572 (BDSA-2021-1527), CVE-2022-29217 (BDSA-2022-1465), CVE-2022-40897 (BDSA-2022-3675), BDSA-2023-0324, CVE-2022-29361 (BDSA-2022-1461), BDSA-2023-0323.


OTHERS
------
NONE