* * * READ ME * * *
                 * * * Veritas File System 7.3.1 * * *
                         * * * Patch 100 * * *
                         Patch Date: 2018-03-01


This document provides the following information:

   * PATCH NAME
   * OPERATING SYSTEMS SUPPORTED BY THE PATCH
   * PACKAGES AFFECTED BY THE PATCH
   * BASE PRODUCT VERSIONS FOR THE PATCH
   * SUMMARY OF INCIDENTS FIXED BY THE PATCH
   * DETAILS OF INCIDENTS FIXED BY THE PATCH
   * INSTALLATION PRE-REQUISITES
   * INSTALLING THE PATCH
   * REMOVING THE PATCH
   * KNOWN ISSUES


PATCH NAME
----------
Veritas File System 7.3.1 Patch 100


OPERATING SYSTEMS SUPPORTED BY THE PATCH
----------------------------------------
RHEL6 x86-64


PACKAGES AFFECTED BY THE PATCH
------------------------------
VRTSvxfs


BASE PRODUCT VERSIONS FOR THE PATCH
-----------------------------------
   * Veritas InfoScale Foundation 7.3.1
   * Veritas InfoScale Storage 7.3.1
   * Veritas InfoScale Enterprise 7.3.1


SUMMARY OF INCIDENTS FIXED BY THE PATCH
---------------------------------------
Patch ID: 7.3.1.100
* 3933810 (3830300) Degraded CPU performance during backup of Oracle archive logs
on CFS vs local filesystem
* 3933819 (3879310) The file system may get corrupted after a failed vxupgrade.
* 3933820 (3894712) ACL permissions are not inherited correctly on cluster 
file system.
* 3933824 (3908785) System panic observed because of null page address in writeback 
structure in case of 
kswapd process.
* 3933828 (3921152) Performance drop caused by vx_dalloc_flush().
* 3933834 (3931761) Cluster wide hang may be observed in case of high workload.
* 3933843 (3926972) A recovery event can result in a cluster wide hang.
* 3933844 (3922259) Force umount hang in vx_idrop
* 3933912 (3922986) Dead lock issue with buffer cache iodone routine in CFS.
* 3934841 (3930267) Deadlock between fsq flush threads and writer threads.
* 3936286 (3936285) fscdsconv command may fsil the conversion for disk layout version(DLV) 12 and above.
* 3937536 (3940516) File resize thread loops infinitely for file resize operation crossing 32 bit
boundary.
* 3938258 (3938256) When checking file size through seek_hole, it will return incorrect offset/size 
when delayed allocation is enabled on the file.
* 3939406 (3941034) VxFS worker thread may continuously spin on a CPU
* 3940266 (3940235) A hang might be observed in case filesystem gets disbaled while enospace
handling is being taken care by inactive processing
* 3940368 (3940268) File system might get disabled in case the size of the directory surpasses the
vx_dexh_sz value.
* 3940652 (3940651) The vxupgrade command might fail while upgrading Disk Layout Version (DLV) 10 to
any upper DLV version.
* 3940830 (3937042) Data corruption seen when issuing writev with mixture of named page and 
anonymous page buffers.


DETAILS OF INCIDENTS FIXED BY THE PATCH
---------------------------------------
This patch fixes the following incidents:

Patch ID: 7.3.1.100

* 3933810 (Tracking ID: 3830300)

SYMPTOM:
Heavy cpu usage while oracle archive process are running on a clustered
fs.

DESCRIPTION:
The cause of the poor read performance in this case was due to fragmentation,
fragmentation mainly happens when there are multiple archivers running on the
same node. The allocation pattern of the oracle archiver processes is 

1. write header with O_SYNC
2. ftruncate-up the file to its final size ( a few GBs typically)
3. do lio_listio with 1MB iocbs

The problem occurs because all the allocations in this manner go through
internal allocations i.e. allocations below file size instead of allocations
past the file size. Internal allocations are done at max 8 Pages at once. So if
there are multiple processes doing this, they all get these 8 Pages alternately
and the fs becomes very fragmented.

RESOLUTION:
Added a tunable, which will allocate zfod extents when ftruncate
tries to increase the size of the file, instead of creating a hole. This will
eliminate the allocations internal to file size thus the fragmentation. Fixed
the earlier implementation of the same fix, which ran into
locking issues. Also fixed the performance issue while writing from secondary node.

* 3933819 (Tracking ID: 3879310)

SYMPTOM:
The file system may get corrupted after the file system freeze during 
vxupgrade. The full fsck gives the following errors:

UX:vxfs fsck: ERROR: V-3-20451: No valid device inodes found
UX:vxfs fsck: ERROR: V-3-20694: cannot initialize aggregate

DESCRIPTION:
The vxupgrade requires the file system to be frozen during its functional 
operation. It may happen that the corruption can be detected while the freeze 
is in progress and the full fsck flag can be set on the file system. However, 
this doesn't stop the vxupgrade from proceeding.
At later stage of vxupgrade, after structures related to the new disk layout 
are updated on the disk, vxfs frees up and zeroes out some of the old metadata 
inodes. If any error occurs after this point (because of full fsck being set), 
the file system needs to go back completely to the previous version at the tile 
of full fsck. Since the metadata corresponding to the previous version is 
already cleared, the full fsck cannot proceed and gives the error.

RESOLUTION:
The code is modified to check for the full fsck flag after freezing the file 
system during vxupgrade. Also, disable the file system if an error occurs after 
writing new metadata on the disk. This will force the newly written metadata to 
be loaded in memory on the next mount.

* 3933820 (Tracking ID: 3894712)

SYMPTOM:
ACL permissions are not inherited correctly on cluster file system.

DESCRIPTION:
The ACL counts stored on a directory inode gets reset every 
time directory inodes 
ownership is switched between the nodes. When ownership on directory inode 
comes back to the node, 
which  previously abdicated it, ACL permissions were not getting inherited 
correctly for the newly 
created files.

RESOLUTION:
Modified the source such that the ACLs are inherited correctly.

* 3933824 (Tracking ID: 3908785)

SYMPTOM:
System panic observed because of null page address in writeback structure in case of 
kswapd 
process.

DESCRIPTION:
Secfs2/Encryptfs layers had used write VOP as a hook when Kswapd is triggered to 
free page. 
Ideally kswapd should call writepage() routine where writeback structure are correctly filled.  When 
write VOP is 
called because of hook in secfs2/encrypts, writeback structures are cleared, resulting in null page 
address.

RESOLUTION:
Code changes has been done to call VxFS kswapd routine only if valid page address is 
present.

* 3933828 (Tracking ID: 3921152)

SYMPTOM:
Performance drop. Core dump shows threads doing vx_dalloc_flush().

DESCRIPTION:
An implicit typecast error in vx_dalloc_flush() can cause this performance issue.

RESOLUTION:
The code is modified to do an explicit typecast.

* 3933834 (Tracking ID: 3931761)

SYMPTOM:
cluster wide hang may be observed in a race scenario in case freeze gets
initiated and there are multiple pending workitems in the worklist related to
lazy isize update workitems.

DESCRIPTION:
If lazy_isize_enable tunable is ON and "ls -l" is getting executed from the
non-writing node of the cluster frequently, it accumulates a huge number of
workitems to get processed by worker threads. In case there is any workitem with
active level 1 held which is enqueued after these workitems and clusterwide
freeze
gets initiated, it leads to deadlock situation. The worker threads
would get exhausted in processing the lazy isize update work items and the
thread
which is enqueued in the worklist would never get a chance to be processed.

RESOLUTION:
code changes have been done to handle this race condition.

* 3933843 (Tracking ID: 3926972)

SYMPTOM:
Once a node reboots or goes out of the cluster, the whole cluster can hang.

DESCRIPTION:
This is a three way deadlock, in which a glock grant could block the recovery while trying to 
cache the grant against an inode. But when it tries for ilock, if the lock is held by hlock revoke and 
waiting to get a glm lock, in our case cbuf lock, then it won't be able to get that because a 
recovery is in progress. The recovery can't proceed because glock grant thread blocked it.

Hence the whole cluster hangs.

RESOLUTION:
The fix is to avoid taking ilock in GLM context, if it's not available.

* 3933844 (Tracking ID: 3922259)

SYMPTOM:
A force umount hang with stack like this:
- vx_delay
- vx_idrop
- vx_quotaoff_umount2
- vx_detach_fset
- vx_force_umount
- vx_aioctl_common
- vx_aioctl
- vx_admin_ioctl
- vxportalunlockedkioctl
- vxportalunlockedioctl
- do_vfs_ioctl
- SyS_ioctl
- system_call_fastpath

DESCRIPTION:
An opened external quota file was preventing the force umount from continuing.

RESOLUTION:
Code has been changed so that an opened external quota file will be processed
properly during the force umount.

* 3933912 (Tracking ID: 3922986)

SYMPTOM:
System panic since Linux NMI Watchdog detected LOCKUP in CFS.

DESCRIPTION:
The vxfs buffer cache iodone routine interrupted the inode flush thread 
which 
was trying to acquire the cfs buffer hash lock with releasing the cfs 
buffer. 
And the iodone routine was blocked by other threads on acquiring the free 
list 
lock. In the cycle, the other threads were contending the cfs buffer hash 
lock 
with the inode flush thread. On Linux, the spinlock is FIFO tickets lock, so 
if the inode flush thread set ticket on the spinlock earlier, other threads 
cant acquire the lock. This caused a dead lock issue.

RESOLUTION:
Code changes are made to ensure acquiring the cfs buffer hash lock with irq 
disabled.

* 3934841 (Tracking ID: 3930267)

SYMPTOM:
Deadlock between fsq flush threads and writer threads.

DESCRIPTION:
In linux, under certain circumstances i.e. to account dirty pages, writer threads takes lock on inode 
and start flushing dirty pages which will need page lock. In this case, if fsq flush threads start flushing transaction 
on the same inode then it will need the inode lock which was held by writer thread. The page lock was taken by 
another writer thread which is waiting for transaction space which can be only freed by fsq flush thread. This 
leads to deadlock between these 3 threads.

RESOLUTION:
Code is modified to add a new flag which will skip dirty page accounting.

* 3936286 (Tracking ID: 3936285)

SYMPTOM:
fscdsconv command may fail the conversion for disk layout version 12 and
above.After exporting file system for use on the specified target, it fails to
mount on that specified target with below error:

# /opt/VRTS/bin/mount <vol> <mount-point>
UX:vxfs mount: ERROR: V-3-20012: not a valid vxfs file system
UX:vxfs mount: ERROR: V-3-24996: Unable to get disk layout version

when importing file system on target for use on the same system, it asks for
'fullfsck' during mount. After 'fullfsck', file system mounted successfully. But
fsck gives below
meesages:

# /opt/VRTS/bin/fsck -y -o full /dev/vx/rdsk/mydg/myvol
log replay in progress
intent log does not contain valid log entries
pass0 - checking structural files
fileset 1 primary-ilist inode 34 (SuperBlock)
                failed validation clear? (ynq)y
pass1 - checking inode sanity and blocks
rebuild structural files? (ynq)y
pass0 - checking structural files
pass1 - checking inode sanity and blocks
pass2 - checking directory linkage
pass3 - checking reference counts
pass4 - checking resource maps
corrupted CUT entries, clear? (ynq)y
au 0 emap incorrect - fix? (ynq)y
OK to clear log? (ynq)y
flush fileset headers? (ynq)y
set state to CLEAN? (ynq)y

DESCRIPTION:
While checking for filesystem version in fscdsconv, the check for DLV 12 and above
was missing and that triggered this issue.

RESOLUTION:
Code changes have been done to handle filesystem version 12 and above for
fscdsconv command.

* 3937536 (Tracking ID: 3940516)

SYMPTOM:
The file resize thread loops infinitely if tried to resize file to a size 
greater than 4TB

DESCRIPTION:
Because of vx_u32_t typecast in vx_odm_resize function, resize threads gets stuck
inside an infinite loop.

RESOLUTION:
Removed vx_u32_t typcast in vx_odm_resize() to handle such scenarios.

* 3938258 (Tracking ID: 3938256)

SYMPTOM:
When checking file size through seek_hole, it will return incorrect offset/size when 
delayed allocation is enabled on the file.

DESCRIPTION:
In recent version of RHEL7 onwards, grep command uses seek_hole feature to check 
current file size and then it reads data depends on this file size. In VxFS, when dalloc is enabled, we 
allocate the extent to file later but we increment the file size as soon as write completes. When 
checking the file size in seek_hole, VxFS didn't completely consider case of dalloc and it was 
returning stale size, depending on the extent allocated to file, instead of actual file size which was 
resulting in reading less amount of data than expected.

RESOLUTION:
Code is modified in such way that VxFS will now return correct size in case dalloc is 
enabled on file and seek_hole is called on that file.

* 3939406 (Tracking ID: 3941034)

SYMPTOM:
During forced umount the vxfs worker thread may continuously spin on a CPU

DESCRIPTION:
During forced unmount a vxfs worker thread need a semaphore to drop super block
reference but that semaphore is held by vxumount thread and this vxumount thread
waiting for a event to happened. This situation causing a softlockup panic on the
system because vxfs worker thread continuously spinning on a CPU to grab semaphore.

RESOLUTION:
Code changes are done to fix this issue.

* 3940266 (Tracking ID: 3940235)

SYMPTOM:
A hang might be observed in case filesystem gets disbaled while enospace
handling is being taken care by inactive processing.
The stacktrace might look like:

 cv_wait+0x3c() ]
 delay_common+0x70()
 vx_extfree1+0xc08()
 vx_extfree+0x228()
 vx_te_trunc_data+0x125c()
 vx_te_trunc+0x878()
 vx_trunc_typed+0x230()
 vx_trunc_tran2+0x104c()
 vx_trunc_tran+0x22c()
 vx_trunc+0xcf0()
 vx_inactive_remove+0x4ec()
 vx_inactive_tran+0x13a4()
 vx_local_inactive_list+0x14()
 vx_inactive_list+0x6e4()
 vx_workitem_process+0x24()
 vx_worklist_process+0x1ec()
 vx_worklist_thread+0x144()
 thread_start+4()

DESCRIPTION:
In function smapchange funtion, it is possible in case of races that SMAP can
record the oldstate as VX_EAU_FREE or VX_EAU_ALLOCATED. But, the corresponding
EMAP won't be updated. This will happen if the concerned flag can get reset to 0
by some other thread in between. This leads to fm_dirtycnt leak which causes a
hang sometime afterwards.

RESOLUTION:
Code changes has been done to fix the issue by using the local variable instead of
global dflag variable directly which can get reset to 0.

* 3940368 (Tracking ID: 3940268)

SYMPTOM:
File system having disk layout version 13 might get disabled in case the size of
the directory surpasses the vx_dexh_sz value.

DESCRIPTION:
When LDH (large directory Hash) hash directory is filled up and the buckets are
filled up, we extend the size of the hash directory. For this we create a reorg
inode and copy extent map of LDH attr inode into reorg inode. This is done using
extent map reorg function. In that function, we check whether extent reorg
structure was passed for the same inode or not. If its not, then we dont
proceed with extent copying. we setup the extent reorg structure accordingly but
while setting up the fileset index,  we use inodes i_fsetindex. But in disk
layout version 13 onwards, we have overlaid the attribute inode and because of
these changes, we no longer sets i_fsetindex in attribute inode and it will
remain 0. Hence the checks in extent map reorg function is failing and resulting in
disabling FS.

RESOLUTION:
Code has been modified to pass correct fileset.

* 3940652 (Tracking ID: 3940651)

SYMPTOM:
During Disk Layout Version (DLV) vxupgrade command might observe a hang.

DESCRIPTION:
vxupgrade does a lookup on histino to identify mkfs version. In case of CFS 
lookup requires RWLOCK or GLOCK on inode.

RESOLUTION:
Code changes have been done to take RWLOCK and GLOCK on inode.

* 3940830 (Tracking ID: 3937042)

SYMPTOM:
Data corruption seen when issuing writev with mixture of named page and anonymous page 
buffers.

DESCRIPTION:
During writes, VxFS prefaults all of the user buffers into kernel and decides the write length 
depending on this prefault length. In case of mixed page buffers, VxFs issues prefault separately for each 
page i.e. for named page and anon page. This reduces length to be wrote and will cause page create 
optimization. Since VxFs erroneously enables page create optimization, data corruption was seen on disk.

RESOLUTION:
Code is modified such that VxFS will not enable page create optimization when short prefault is 
seen.


INSTALLING THE PATCH
--------------------
Run the Installer script to automatically install the patch:
-----------------------------------------------------------
Please be noted that the installation of this P-Patch will cause downtime.

To install the patch perform the following steps on at least one node in the cluster:
1. Copy the patch fs-rhel6_x86_64-Patch-7.3.1.100.tar.gz to /tmp
2. Untar fs-rhel6_x86_64-Patch-7.3.1.100.tar.gz to /tmp/hf
    # mkdir /tmp/hf
    # cd /tmp/hf
    # gunzip /tmp/fs-rhel6_x86_64-Patch-7.3.1.100.tar.gz
    # tar xf /tmp/fs-rhel6_x86_64-Patch-7.3.1.100.tar
3. Install the hotfix(Please be noted that the installation of this P-Patch will cause downtime.)
    # pwd /tmp/hf
    # ./installVRTSvxfs731P100 [<host1> <host2>...]

You can also install this patch together with 7.3.1 maintenance release using Install Bundles
1. Download this patch and extract it to a directory
2. Change to the Veritas InfoScale 7.3.1 directory and invoke the installer script
   with -patch_path option where -patch_path should point to the patch directory
    # ./installer -patch_path [<path to this patch>] [<host1> <host2>...]

Install the patch manually:
--------------------------
# rpm -Uvh VRTSvxfs-7.3.1.100-RHEL6.x86_64.rpm


REMOVING THE PATCH
------------------
# rpm -e rpm_name


KNOWN ISSUES
------------
* Tracking ID: 3933818

SYMPTOM: Oracle database start failure, with trace log like this:

ORA-63999: data file suffered media failure
ORA-01114: IO error writing block to file 304 (block # 722821)
ORA-01110: data file 304: <file_name>
ORA-17500: ODM err:ODM ERROR V-41-4-2-231-28 No space left on device

WORKAROUND: No


SPECIAL INSTRUCTIONS
--------------------
NONE


OTHERS
------
NONE