fs-aix-Patch-7.2.0.200
Obsolete
The latest patch(es) : fs-aix-Patch-7.2.0.300 

 Basic information
Release type: Patch
Release date: 2017-06-08
OS update support: None
Technote: None
Documentation: None
Popularity: 738 viewed    downloaded
Download size: 22.96 MB
Checksum: 20830920

 Applies to one or more of the following products:
InfoScale Enterprise 7.2 On AIX 7.1
InfoScale Foundation 7.2 On AIX 7.1
InfoScale Storage 7.2 On AIX 7.1

 Obsolete patches, incompatibilities, superseded patches, or other requirements:

This patch is obsolete. It is superseded by: Release date
fs-aix-Patch-7.2.0.300 2018-02-28

 Fixes the following incidents:
3909937, 3909940, 3909948, 3909949, 3909950, 3909951, 3910083, 3910085, 3910086, 3910088, 3910090, 3910093, 3910094, 3910096, 3910097, 3910098, 3910101, 3910103, 3910105, 3911290, 3911718, 3911719, 3911732, 3911952, 3912604, 3912988, 3912989, 3912990, 3913004, 3914871, 3918551

 Patch ID:
VRTSvxfs.bff

Readme file
                          * * * READ ME * * *
                  * * * Veritas File System 7.2 * * *
                         * * * Patch 200 * * *
                         Patch Date: 2017-05-29


This document provides the following information:

   * PATCH NAME
   * OPERATING SYSTEMS SUPPORTED BY THE PATCH
   * PACKAGES AFFECTED BY THE PATCH
   * BASE PRODUCT VERSIONS FOR THE PATCH
   * SUMMARY OF INCIDENTS FIXED BY THE PATCH
   * DETAILS OF INCIDENTS FIXED BY THE PATCH
   * INSTALLATION PRE-REQUISITES
   * INSTALLING THE PATCH
   * REMOVING THE PATCH
   * KNOWN ISSUES


PATCH NAME
----------
Veritas File System 7.2 Patch 200


OPERATING SYSTEMS SUPPORTED BY THE PATCH
----------------------------------------
AIX 7.1


PACKAGES AFFECTED BY THE PATCH
------------------------------
VRTSvxfs


BASE PRODUCT VERSIONS FOR THE PATCH
-----------------------------------
   * Veritas InfoScale Foundation 7.2
   * Veritas InfoScale Storage 7.2
   * Veritas InfoScale Enterprise 7.2


SUMMARY OF INCIDENTS FIXED BY THE PATCH
---------------------------------------
Patch ID: 7.2.0.200
* 3909937 (3908954) Some writes could be missed causing data loss.
* 3909940 (3894712) ACL permissions are not inherited correctly on cluster 
file system.
* 3909948 (3901346) VxFS creates unnamed process.
* 3909949 (3902812) Mounting a vxfs file system on large memory AIX server 
may fail.
* 3909950 (3890172) System panicked at vx_vnunmap.
* 3909951 (3894223) On the servers having multi TB memory, vxfs mount command 
takes time to finish.
* 3910083 (3707662) Race between reorg processing and fsadm timer thread (alarm expiry) leads to panic in vx_reorg_emap.
* 3910085 (3779916) vxfsconvert fails to upgrade layout verison for a vxfs file 
system with large number of inodes.
* 3910086 (3830300) Degraded CPU performance during backup of Oracle archive logs
on CFS vs local filesystem
* 3910088 (3855726) Panic in vx_prot_unregister_all().
* 3910090 (3790721) High cpu usage caused by vx_send_bcastgetemapmsg_remaus
* 3910093 (1428611) 'vxcompress' can spew many GLM block lock messages over the 
LLT network.
* 3910094 (3879310) The file system may get corrupted after a failed vxupgrade.
* 3910096 (3757609) CPU usage going high because of contention over ODM_IO_LOCK
* 3910097 (3817734) Direct command to run  fsck with -y|Y option was mentioned in
the message displayed to user when file system mount fails.
* 3910098 (3861271) Missing an inode clear operation when a Linux inode is being de-initialized on
SLES11.
* 3910101 (3846521) "cp -p" fails if modification time in nano seconds have 10 
digits.
* 3910103 (3817734) Direct command to run  fsck with -y|Y option was mentioned in
the message displayed to user when file system mount fails.
* 3910105 (3907902) System panic observed due to race between dalloc off thread
and getattr thread.
* 3911290 (3910526) fsadm fails with error number 28 during resize operation
* 3911718 (3905576) CFS hang during a cluster wide freeze
* 3911719 (3909583) Disable partitioning of directory if directory size is greater than upper 
threshold 
value.
* 3911732 (3896670) Intermittent CFS hang like situation with many CFS pglock 
grant messages pending on LLT layer
* 3911952 (3912203) cafs module is not getting loaded automatically on VRTSvxfs patch installation
* 3912604 (3914488) On a Local Mount Filesystem, while mounting the filesystem, it can be marked for FULLFSCK.
* 3912988 (3912315) EMAP corruption while freeing up the extent
* 3912989 (3912322) vxfs tunable max_seqio_extent_size cannot be tuned to any 
value less than 32768.
* 3912990 (3912407) CFS hang on VxFS 7.2 while thread on a CFS node waits for EAU 
delegation.
* 3913004 (3911048) LDH corrupt and filesystem hang.
* 3914871 (3915125) File system kernel threads deadlock while allocating/freeing blocks.
* 3918551 (3919135) vxtunefs man page changes for max_seqio_extent_size tunable


DETAILS OF INCIDENTS FIXED BY THE PATCH
---------------------------------------
This patch fixes the following incidents:

Patch ID: 7.2.0.200

* 3909937 (Tracking ID: 3908954)

SYMPTOM:
Whilst performing vectored writes, using writev(), where two iovec-writes write
to different offsets within the same 4K page-aligned range of a file, it is
possible to find null data at the beginning of the 4Kb range when reading
the
data back.

DESCRIPTION:
Whilst multiple processes are performing vectored writes to a file, using
writev(),

The following situation can occur 

We have 2 iovecs, the first is 448 bytes and the second is 30000 bytes. The
first iovec of 448 bytes completes, however the second iovec finds that the
source page is no longer in memory. As it cannot fault-in during uiomove, it has
to undo both iovecs. It then faults the page back in and retries the second iovec
only. However, as the undo operation also undid the first iovec, the first 448
bytes of the page are populated with nulls. When reading the file back, it seems
that no data was written for the first iovec. Hence, we find nulls in the file.

RESOLUTION:
Code has been changed to handle the unwind of multiple iovecs accordingly in the
scenarios where certain amount of data is written out of a particular iovec and
some from other.

* 3909940 (Tracking ID: 3894712)

SYMPTOM:
ACL permissions are not inherited correctly on cluster file system.

DESCRIPTION:
The ACL counts stored on a directory inode gets reset every 
time directory inodes 
ownership is switched between the nodes. When ownership on directory inode 
comes back to the node, 
which  previously abdicated it, ACL permissions were not getting inherited 
correctly for the newly 
created files.

RESOLUTION:
Modified the source such that the ACLs are inherited correctly.

* 3909948 (Tracking ID: 3901346)

SYMPTOM:
"ps -efk" shows unnamed process created by VxFS.

DESCRIPTION:
VxFS creates multiple worker threads which need to be grouped under
one process. But, when "ps -efk" was run, process created by VxFS was unnamed.

RESOLUTION:
Code has been modified such that VxFS names the process correctly.

* 3909949 (Tracking ID: 3902812)

SYMPTOM:
Mount command for vxfs file system fails with the following error on a AIX 
system having huge physical memory:

UX:vxfs mount: ERROR: V-3-20: Not enough space
UX:vxfs mount: ERROR: V-3-21256: cannot mount /dev/vx/dsk/datadg/vol1

DESCRIPTION:
For big memory systems the number of VMM buffers auto-tuned will be huge, 
which in turn will increase number of VMM buffers-per-PDT. This can lead to 
mount failures as it cant allocate huge chunk of memory from pinned heap. 
To alleviate this problem, number of PDTs in case of huge memory system 
needs to be increased to spread out the  VMM buffers per PDT.

RESOLUTION:
Tune up the number of PDTs based on amount of physical memory for big memory 
systems.

* 3909950 (Tracking ID: 3890172)

SYMPTOM:
System panicked at vx_vnunmap, the typical stack trace is like following:

vx_vnunmap+0000E8 ()
vnop_unmap+000110 (??, ??, ??)
vm_map_entry_delete+0005C4 (??, ??, ??)
vm_map_delete+000150 (??, ??, ??, ??)
vm_map_deallocate+000048 (??, ??)
kexitx+001288 (??)
kexit+00008C ()
issig+00052C (??, ??)
sig_deliver+000268 (??, ??)

DESCRIPTION:
If forced unmount happens during unmap of a file, which could be triggered 
when a thread gets killed with mmap'ed file, some fields of struct fsext get 
reset to NULL, hence access of fsext_info could cause panic.

RESOLUTION:
Modify the code to add check if the file system is already unmounted before 
unmapping a file, EIO would be returned in case of forced unmount.

* 3909951 (Tracking ID: 3894223)

SYMPTOM:
On the servers having multi TB memory, vxfs mount command takes 
time to 
finish.

DESCRIPTION:
VxFS calculates the amount of buffer cache it needs depending 
on 
the available physical memory. For big memory systems the amount of buffer 
cache and associated headers with it will be huge. During the mount process 
it 
needs buffer headers needs to be traversed. This can take time for big 
buffer 
cache.

RESOLUTION:
Put a high cap on the vxfs buffer cache size to keep it limited 
for big memory systems.

* 3910083 (Tracking ID: 3707662)

SYMPTOM:
Race between reorg processing and fsadm timer thread (alarm expiry) leads to panic in vx_reorg_emap with the following stack::

vx_iunlock
vx_reorg_iunlock_rct_reorg
vx_reorg_emap
vx_extmap_reorg
vx_reorg
vx_aioctl_full
vx_aioctl_common
vx_aioctl
vx_ioctl
fop_ioctl
ioctl

DESCRIPTION:
When the timer expires (fsadm with -t option), vx_do_close() calls vx_reorg_clear() on local mount which performs cleanup on reorg rct inode. Another thread currently active in vx_reorg_emap() will panic due to null pointer dereference.

RESOLUTION:
When fop_close is called in alarm handler context, we defer the cleaning up untill the kernel thread performing reorg completes its operation.

* 3910085 (Tracking ID: 3779916)

SYMPTOM:
vxfsconvert fails to upgrade layout verison for a vxfs file system with 
large number of inodes. Error message will show some inode discrepancy.

DESCRIPTION:
vxfsconvert walks through the ilist and converts inode. It stores 
chunks of inodes in a buffer and process them as a batch. The inode number 
parameter for this inode buffer is of type unsigned integer. The offset of a 
particular inode in the ilist is calculated by multiplying the inode number with 
size of inode structure. For large inode numbers this product of inode_number * 
inode_size can overflow the unsigned integer limit, thus giving wrong offset 
within the ilist file. vxfsconvert therefore reads wrong inode and eventually 
fails.

RESOLUTION:
The inode number parameter is defined as unsigned long to avoid 
overflow.

* 3910086 (Tracking ID: 3830300)

SYMPTOM:
Heavy cpu usage while oracle archive process are running on a clustered
fs.

DESCRIPTION:
The cause of the poor read performance in this case was due to fragmentation,
fragmentation mainly happens when there are multiple archivers running on the
same node. The allocation pattern of the oracle archiver processes is 

1. write header with O_SYNC
2. ftruncate-up the file to its final size ( a few GBs typically)
3. do lio_listio with 1MB iocbs

The problem occurs because all the allocations in this manner go through
internal allocations i.e. allocations below file size instead of allocations
past the file size. Internal allocations are done at max 8 Pages at once. So if
there are multiple processes doing this, they all get these 8 Pages alternately
and the fs becomes very fragmented.

RESOLUTION:
Added a tunable, which will allocate zfod extents when ftruncate
tries to increase the size of the file, instead of creating a hole. This will
eliminate the allocations internal to file size thus the fragmentation. Fixed
the earlier implementation of the same fix, which ran into
locking issues. Also fixed the performance issue while writing from secondary node.

* 3910088 (Tracking ID: 3855726)

SYMPTOM:
Panic happens in vx_prot_unregister_all(). The stack looks like this:

- vx_prot_unregister_all
- vxportalclose
- __fput
- fput
- filp_close
- sys_close
- system_call_fastpath

DESCRIPTION:
The panic is caused by a NULL fileset pointer, which is due to referencing the
fileset before it's loaded, plus, there's a race on fileset identity array.

RESOLUTION:
Skip the fileset if it's not loaded yet. Add the identity array lock to prevent
the possible race.

* 3910090 (Tracking ID: 3790721)

SYMPTOM:
High CPU usage on the vxfs thread process. The backtrace of such kind of threads
usually look like this:

schedule
schedule_timeout
__down
down
vx_send_bcastgetemapmsg_remaus
vx_send_bcastgetemapmsg
vx_recv_getemapmsg
vx_recvdele
vx_msg_recvreq
vx_msg_process_thread
vx_kthread_init
kernel_thread

DESCRIPTION:
The locking mechanism in vx_send_bcastgetemapmsg_process() is inefficient. So that
every
time vx_send_bcastgetemapmsg_process() is called, it will perform a series of
down-up
operation on a certain semaphore. This can result in a huge CPU cost when multiple
threads have contention on this semaphore.

RESOLUTION:
Optimize the locking mechanism in vx_send_bcastgetemapmsg_process(),
so that it only do down-up operation on the semaphore once.

* 3910093 (Tracking ID: 1428611)

SYMPTOM:
'vxcompress' command can cause many GLM block lock messages to be 
sent over the network. This can be observed with 'glmstat -m' output under the 
section "proxy recv", as shown in the example below -

bash-3.2# glmstat -m
         message     all      rw       g      pg       h     buf     oth    
loop
master send:
           GRANT     194       0       0       0       2       0     192      
98
          REVOKE     192       0       0       0       0       0     192      
96
        subtotal     386       0       0       0       2       0     384     
194

master recv:
            LOCK     193       0       0       0       2       0     191      
98
         RELEASE     192       0       0       0       0       0     192      
96
        subtotal     385       0       0       0       2       0     383     
194

    master total     771       0       0       0       4       0     767     
388

proxy send:
            LOCK      98       0       0       0       2       0      96      
98
         RELEASE      96       0       0       0       0       0      96      
96
      BLOCK_LOCK    2560       0       0       0       0    2560       0       
0
   BLOCK_RELEASE    2560       0       0       0       0    2560       0       
0
        subtotal    5314       0       0       0       2    5120     192     
194

DESCRIPTION:
'vxcompress' creates placeholder inodes (called IFEMR inodes) to 
hold the compressed data of files. After the compression is finished, IFEMR 
inode exchange their bmap with the original file and later given to inactive 
processing. Inactive processing truncates the IFEMR extents (original extents 
of the regular file, which is now compressed) by sending cluster-wide buffer 
invalidation requests. These invalidations need GLM block lock. Regular file 
data need not be invalidated across the cluster, thus making these GLM block 
lock requests unnecessary.

RESOLUTION:
Pertinent code has been modified to skip the invalidation for the 
IFEMR inodes created during compression.

* 3910094 (Tracking ID: 3879310)

SYMPTOM:
The file system may get corrupted after the file system freeze during 
vxupgrade. The full fsck gives the following errors:

UX:vxfs fsck: ERROR: V-3-20451: No valid device inodes found
UX:vxfs fsck: ERROR: V-3-20694: cannot initialize aggregate

DESCRIPTION:
The vxupgrade requires the file system to be frozen during its functional 
operation. It may happen that the corruption can be detected while the freeze 
is in progress and the full fsck flag can be set on the file system. However, 
this doesn't stop the vxupgrade from proceeding.
At later stage of vxupgrade, after structures related to the new disk layout 
are updated on the disk, vxfs frees up and zeroes out some of the old metadata 
inodes. If any error occurs after this point (because of full fsck being set), 
the file system needs to go back completely to the previous version at the tile 
of full fsck. Since the metadata corresponding to the previous version is 
already cleared, the full fsck cannot proceed and gives the error.

RESOLUTION:
The code is modified to check for the full fsck flag after freezing the file 
system during vxupgrade. Also, disable the file system if an error occurs after 
writing new metadata on the disk. This will force the newly written metadata to 
be loaded in memory on the next mount.

* 3910096 (Tracking ID: 3757609)

SYMPTOM:
High CPU usage because of contention over ODM_IO_LOCK

DESCRIPTION:
While performing ODM IO, to update some of the ODM counters we take
ODM_IO_LOCK which leads to contention from multiple  of iodones trying to update
 these counters at the same time. This is results in high CPU usage.

RESOLUTION:
Code modified to remove the lock contention.

* 3910097 (Tracking ID: 3817734)

SYMPTOM:
If file system with full fsck flag set is mounted, direct command message
is printed to the user to clean the file system with full fsck.

DESCRIPTION:
When mounting file system with full fsck flag set, mount will fail
and a message will be printed to clean the file system with full fsck. This
message contains direct command to run, which if run without collecting file
system metasave will result in evidences being lost. Also since fsck will remove
the file system inconsistencies it may lead to undesired data being lost.

RESOLUTION:
More generic message is given in error message instead of direct
command.

* 3910098 (Tracking ID: 3861271)

SYMPTOM:
Due to the missing inode clear action, a page can also be in a strange state.
Also, inode is not fully quiescent which leads to races in the inode code.
Sometime this can cause panic from iput_final().

DESCRIPTION:
We're missing an inode clear operation when a Linux inode is being
de-initialized on SLES11.

RESOLUTION:
Add the inode clear operation on SLES11.

* 3910101 (Tracking ID: 3846521)

SYMPTOM:
cp -p is failing with EINVAL for files with 10 digit 
modification time. EINVAL error is returned if the value in tv_nsec field is 
greater than/outside the range of 0 to 999, 999, 999.  VxFS supports the 
update in usec but when copying in the user space, we convert the usec to 
nsec. So here in this case, usec has crossed the upper boundary limit i.e 
999, 999.

DESCRIPTION:
In a cluster, its possible that time across nodes might 
differ.so 
when updating mtime, vxfs check if it's cluster inode and if nodes mtime is 
newer 
time than current node time, then accordingly increment the tv_usec instead of 
changing mtime to older time value. There might be chance that it,  tv_usec 
counter got overflowed here, which resulted in 10 digit mtime.tv_nsec.

RESOLUTION:
Code is modified to reset usec counter for mtime/atime/ctime when 
upper boundary limit i.e. 999999 is reached.

* 3910103 (Tracking ID: 3817734)

SYMPTOM:
If file system with full fsck flag set is mounted, direct command message
is printed to the user to clean the file system with full fsck.

DESCRIPTION:
When mounting file system with full fsck flag set, mount will fail
and a message will be printed to clean the file system with full fsck. This
message contains direct command to run, which if run without collecting file
system metasave will result in evidences being lost. Also since fsck will remove
the file system inconsistencies it may lead to undesired data being lost.

RESOLUTION:
More generic message is given in error message instead of direct
command.

* 3910105 (Tracking ID: 3907902)

SYMPTOM:
System panic observed due to race between dalloc off thread and 
getattr
thread.

DESCRIPTION:
With 7.2 release of VxFS, dalloc states are now stored in a new 
structure. when getting attributes of file, dalloc blocks are calculated and 
stored into this new structure. If a dalloc off thread races with getattr 
thread, there is a possibility of dereferencing of NULL dalloc structure by
getattr thread.

RESOLUTION:
Code changes has been done to take appropriate dalloc lock while
calculating dalloc blocks in getattr function to avoid the race.

* 3911290 (Tracking ID: 3910526)

SYMPTOM:
In expanding a full file system through vxresize(1M), the operation fails with
ENOSPC (errno 28) and the following error is printed :

UX:vxfs fsadm: ERROR: V-3-20340: attempt to resize <volume-name> failed with
errno 28

Despite the failure, the file system remains expanded even after vxresize has
shrunk back the volume after getting the ENOSPC error. As a result, the file
system is marked for full-fsck but a full fsck would fail with errors like the
followings :

UX:vxfs fsck: ERROR: V-3-26248: could not read from block offset devid/blknum
.... Device containing meta data may be missing in vset or device too big to be
read on a 32 bit system.
UX:vxfs fsck: ERROR: V-3-20694: cannot initialize aggregate file system check
failure, aborting ...

DESCRIPTION:
If there is no space available in the filesystem and the resize operation gets
initiated, the intent log extents are used for metadata setup to continue the
resize operation. If resize is successful, the superblock is updated with the
expanded size. A new intent log will be allocated because the old one has been
used for the resize. There is a chance that the new intent log will fail
allocation with ENOSPC because the expanded size is not big enough to return the
space allocated to the original intent log. The superblock update is already
made at this stage and will not be rolled back even a failure is returned.

RESOLUTION:
The code has been modified to fail the resize operation if resize
size is less than the size of the intent log.

* 3911718 (Tracking ID: 3905576)

SYMPTOM:
Cluster file system hangs. On one node, all worker threads are blocked due to file
system freeze. And there's another thread blocked with stack like this:

- __schedule
- schedule
- vx_svar_sleep_unlock
- vx_event_wait
- vx_olt_iauinit
- vx_olt_iasinit
- vx_loadv23_fsetinit
- vx_loadv23_fset
- vx_fset_reget
- vx_fs_reinit
- vx_recv_cwfa
- vx_msg_process_thread
- vx_kthread_init

DESCRIPTION:
The frozen CFS won't thaw because the mentioned thread is waiting for a work item
to be processed in vx_olt_iauinit(). Since all the worker threads are blocked,
there is no free thread to process this work item.

RESOLUTION:
Change the code in vx_olt_iauinit(), so that the work item will be processed even
with all worker threads blocked.

* 3911719 (Tracking ID: 3909583)

SYMPTOM:
Disable partitioning of directory if directory size is greater than upper threshold value.

DESCRIPTION:
If PD is enabled during mount, mount may take long time to complete. Because mount 
tries to partition all the directories hence looks like hung. To avoid such hangs, a new upper 
threshold 
value for PD is added which will disable partitioning of directory if directory size is above that value.

RESOLUTION:
Code is modified to disable partitioning of directory if directory size is greater than 
upper 
threshold value.

* 3911732 (Tracking ID: 3896670)

SYMPTOM:
Intermittent CFS hang like situation with many CFS pglock grant 
messages pending on LLT layer.

DESCRIPTION:
To optimize the CFS locking, VxFS may send greedy pglock grant 
messages to speed up the upcoming write operations. In certain scenarios 
created due to particular read and write pattern across nodes, one node can 
send these greedy msgs far more faster than the speed of response. This may 
cause build up lot of msgs on the CFS layer and may delay the response of 
other msgs and cause slowness in CFS operations.

RESOLUTION:
Fix is to send the next greedy msg only after receiving the 
response of the previous one. This way at a given time there will be only 
one 
pglock greedy msg will be in flight.

* 3911952 (Tracking ID: 3912203)

SYMPTOM:
cafs module is not getting loaded automatically on VRTSvxfs 
patch installation

DESCRIPTION:
The changes required to load the "cafs" driver automatically after
 installation of a patch over VxFS 7.2 release is missing in postinstall 
script in AIX. As a result, "cafs" driver is not getting loaded automatically 
while installing patch over 7.2.0 release. we need to manually run
 vxkextadm to load it.

RESOLUTION:
Code changes have been done to install the cafs module 
automatically.

* 3912604 (Tracking ID: 3914488)

SYMPTOM:
Local mount may fail with the following error message and as a result FULLFSCK get set on the filesystem:
vx_lctbad - <filesystemname> file system link count table bad

DESCRIPTION:
During Extop processing while local mounting a filesystem (primary fileset), LCT may get
marked bad and FULLFSCK may get set on the Filesystem. This is a corner case where earlier unmount opertion on the 
filesystem is not clean.

RESOLUTION:
Code Changes have been done to merge the LCT while mounting the primary fileset.

* 3912988 (Tracking ID: 3912315)

SYMPTOM:
EMAP corruption while freeing up the extent.
Feb  4 15:10:45 localhost kernel: vxfs: msgcnt 2 mesg 056: V-2-56: vx_mapbad - 
vx_smap_stateupd - file system extent allocation unit state bitmap number 0 
marked bad

Feb  4 15:10:45 localhost kernel: Call Trace:
vx_setfsflags+0x103/0x140 [vxfs]
vx_mapbad+0x74/0x2d0 [vxfs]
vx_smap_stateupd+0x113/0x130 [vxfs]
vx_extmapupd+0x552/0x580 [vxfs]
vx_alloc+0x3d6/0xd10 [vxfs]
vx_extsub+0x0/0x5f0 [vxfs]
vx_semapclone+0xe1/0x190 [vxfs]
vx_clonemap+0x14d/0x230 [vxfs]
vx_unlockmap+0x299/0x330 [vxfs]
vx_smap_dirtymap+0xea/0x120 [vxfs]
vx_do_extfree+0x2b8/0x2e0 [vxfs]
vx_extfree1+0x22e/0x7c0 [vxfs]
vx_extfree+0x9f/0xd0 [vxfs]
vx_exttrunc+0x10d/0x2a0 [vxfs]
vx_trunc_ext4+0x65f/0x7a0 [vxfs]
vx_validate_ext4+0xcc/0x1a0 [vxfs]
vx_trunc_tran2+0xb7f/0x1450 [vxfs]
vx_trunc_tran+0x18f/0x1e0 [vxfs]
vx_trunc+0x66a/0x890 [vxfs]
vx_iflush_list+0xaee/0xba0 [vxfs]
vx_iflush+0x67/0x80 [vxfs]
vx_workitem_process+0x24/0x50 [vxfs]

DESCRIPTION:
The issue oocurs due to wrongly validating the smap and changing 
the AU state from free to allocated.

RESOLUTION:
skip validation and change the EAU state only if it is a SMAP 
update.

* 3912989 (Tracking ID: 3912322)

SYMPTOM:
The vxfs tunable max_seqio_extent_size cannot be tuned to any 
value less than 32768.

DESCRIPTION:
In vxfs 7.1 the default for max_seqio_extent_size was changed from 
2048 to 32768. Due to a bug it doesn't allow setting any value less than 32768
for this tunable.

RESOLUTION:
Fix is to allow the tunable to be set to any value >=2048.

* 3912990 (Tracking ID: 3912407)

SYMPTOM:
CFS hang on VxFS 7.2 while thread on a CFS node waits for EAU 
delegation:

__schedule at ffffffff8163b46d
schedule at ffffffff8163bb09
vx_svar_sleep_unlock at ffffffffa0a99eb5 [vxfs]
vx_event_wait at ffffffffa0a9a68f [vxfs]
vx_async_waitmsg at ffffffffa09aa680 [vxfs]
vx_msg_send at ffffffffa09aa829 [vxfs]
vx_get_dele at ffffffffa09d5be1 [vxfs]
vx_extentalloc_device at ffffffffa0923bb0 [vxfs]
vx_extentalloc at ffffffffa0925272 [vxfs]
vx_bmap_ext4 at ffffffffa0953755 [vxfs]
vx_bmap_alloc_ext4 at ffffffffa0953e14 [vxfs]
vx_bmap_alloc at ffffffffa0950a2a [vxfs]
vx_write_alloc3 at ffffffffa09c281e [vxfs]
vx_tran_write_alloc at ffffffffa09c3321 [vxfs]
vx_cfs_prealloc at ffffffffa09b4220 [vxfs]
vx_write_alloc2 at ffffffffa09c25ad [vxfs]
vx_write_alloc at ffffffffa0b708fb [vxfs]
vx_write1 at ffffffffa0b712ff [vxfs]
vx_write_common_slow at ffffffffa0b7268c [vxfs]
vx_write_common at ffffffffa0b73857 [vxfs]
vx_write at ffffffffa0af04a6 [vxfs]
vfs_write at ffffffff811ded3d
sys_write at ffffffff811df7df
system_call_fastpath at ffffffff81646b09

And a corresponding delegation receiver thread should be seen looping on CFS 
primary:
 
PID: 18958  TASK: ffff88006c776780  CPU: 0   COMMAND: "vx_msg_thread"
__schedule at ffffffff8163a26d
mutex_lock at ffffffff81638b42
vx_emap_lookup at ffffffffa0ecf0eb [vxfs]
vx_extchkmaps at ffffffffa0e9c7b4 [vxfs]
vx_searchau_downlevel at ffffffffa0ea0923 [vxfs]
vx_searchau at ffffffffa0ea0e22 [vxfs]
vx_dele_get_freespace at ffffffffa0f53b6d [vxfs]
vx_getedele_size at ffffffffa0f54c4b [vxfs]
vx_pri_getdele at ffffffffa0f54edc [vxfs]
vx_recv_getdele at ffffffffa0f5690d [vxfs]
vx_recvdele at ffffffffa0f59800 [vxfs]
vx_msg_process_thread at ffffffffa0f2aead [vxfs]
vx_kthread_init at ffffffffa105cba4 [vxfs]
kthread at ffffffff810a5aef
ret_from_fork at ffffffff81645858

DESCRIPTION:
In VxFS 7.2 release a performance optimization was done in the way we 
update the allocation state map. Prior to 7.2 the updates were done synchronously 
and 
in 7.2 the updates were made transactional. The hang happens when an EAU needs to be 
converted back to FREE state after all the allocation from it is freed. In such case 
if 
the corresponding EAU delegation may time out before we can complete the process, it 
can result in inconsistent state. Because of this inconsistency, in future when a 
node 
will try to get delegation of this EAU, the primary may loop forever resulting the 
secondary to wait infinitely for the AU delegation.

RESOLUTION:
Code fix is done to not allow the delegation to timeout till the free 
processing is complete.

* 3913004 (Tracking ID: 3911048)

SYMPTOM:
The LDH bucket validation failure message is logged and system hang.

DESCRIPTION:
When modifying a large directory, vxfs needs to find a new bucket in the LDH 
for this 
directory, and once the bucket is full, it will be be split to get more 
bucket to use. 
When the bucket is split to maximum amount, overflow bucket will be 
allocated. Under 
some condition, the available bucket lookup on overflow bucket will may got 
incorrect 
result and overwrite the existing bucket entry thus corrupt the LDH file. 
Another 
problem is that when the bucket invalidation failed, the bucket buffer is 
released 
without checking whether the buffer is already in a previous transaction, 
this may 
cause the transaction flush thread to hang and finally stuck the whole 
filesystem.

RESOLUTION:
Correct the LDH bucket entry change code to avoid the corrupt. And release 
the bucket 
buffer without throw it out of memory to avoid blocking the transaction 
flush.

* 3914871 (Tracking ID: 3915125)

SYMPTOM:
File system freeze was stuck because of the deadlock among kernel threads 
while writing to a file.

DESCRIPTION:
While writing to any file, file system allocates space on disk. It needs 
to search for free blocks and hence needs to scan through various metadata 
information, like state map (SMAP), extent bitmap (EMAP), etc which are per-
allocation unit (AU) information.
Now, the problem was: we were not holding the global lock (GLM) while making 
changes to the particular allocation unit (AU).
Because of this, the allocator thread was continuously looping.

RESOLUTION:
Return error from the function which does the allocation in case the we 
modified the state map metadata and couldn't get appropriate GLM lock.

* 3918551 (Tracking ID: 3919135)

SYMPTOM:
The vxtunefs man page shows incorrect default value and minimum value
for 'max_seqio_extent_size' tunable.

DESCRIPTION:
The default value of the 'max_seqio_extent_size' tunable is 32k but
the man page shows the incorrect value i.e 2097152 blocks (1GB). Also, if we try
to set tunable value to less than 2048 blocks then it resets this to minimum
value (2048 blocks) but according to man page the reset value is deafault value
and not minimum value.

RESOLUTION:
Changing vxtunefs man page for correcting the max_seqio_extent_size
tunable's deafault value.



INSTALLING THE PATCH
--------------------
Run the Installer script to automatically install the patch:
-----------------------------------------------------------
Please be noted that the installation of this P-Patch will cause downtime.

To install the patch perform the following steps on at least one node in the cluster:
1. Copy the patch fs-aix71-Patch-7.2.0.200.tar.gz to /tmp
2. Untar fs-aix71-Patch-7.2.0.200.tar.gz to /tmp/hf
    # mkdir /tmp/hf
    # cd /tmp/hf
    # gunzip /tmp/fs-aix71-Patch-7.2.0.200.tar.gz
    # tar xf /tmp/fs-aix71-Patch-7.2.0.200.tar
3. Install the hotfix(Please be noted that the installation of this P-Patch will cause downtime.)
    # pwd /tmp/hf
    # ./installVRTSvxfs720P200 [<host1> <host2>...]

You can also install this patch together with 7.2 base release using Install Bundles
1. Download this patch and extract it to a directory
2. Change to the Veritas InfoScale 7.2 directory and invoke the installer script
   with -patch_path option where -patch_path should point to the patch directory
    # ./installer -patch_path [<path to this patch>] [<host1> <host2>...]

Install the patch manually:
--------------------------
AIX maintenance levels and APARs can be downloaded from the
    IBM Web site:
        http://techsupport.services.ibm.com
    Install the VRTSvxfs.bff patch if VRTSvxfs is
    already installed at fileset level 7.2.0.000  
    A system reboot is recommended after installing this patch.
    To apply the patch, first unmount all VxFS file systems,
    then enter these commands:
        # mount | grep vxfs
        # cd <patch_location> 
        # installp -aXd VRTSvxfs.bff VRTSvxfs
        # reboot


REMOVING THE PATCH
------------------
Run the Uninstaller script to automatically remove the patch:
------------------------------------------------------------
To uninstall the patch perform the following step on at least one node in the cluster:
    # /opt/VRTS/install/uninstallVRTSvxfs720P200 [<host1> <host2>...]

Remove the patch manually:
-------------------------
If you need to remove the patch, first unmount all VxFS
    file systems, then enter these commands:
        # mount | grep vxfs
        # installp -r VRTSvxfs 7.2.0.200
        # reboot


KNOWN ISSUES
------------
* Tracking ID: 3910100

SYMPTOM: Oracle database start failure, with trace log like this:

ORA-63999: data file suffered media failure
ORA-01114: IO error writing block to file 304 (block # 722821)
ORA-01110: data file 304: <file_name>
ORA-17500: ODM err:ODM ERROR V-41-4-2-231-28 No space left on device

WORKAROUND: No

* Tracking ID: 3918076

SYMPTOM: There may be some unused space in case of Local filesystem due to incorrect
accounting of space reservation.
This issue is seen on Local file system where delayed allocation feature is enabled.

WORKAROUND: 1. Unmount the filesystem.
2. Mount the filesystem.
3. Disable the Dalloc feature for the filesystem. (Using the vxtunefs command)



SPECIAL INSTRUCTIONS
--------------------
NONE


OTHERS
------
NONE