* * * READ ME * * *
                 * * * Symantec File System 6.1.1 * * *
                      * * * Patch 6.1.1.400 * * *
                         Patch Date: 2016-07-25


This document provides the following information:

   * PATCH NAME
   * OPERATING SYSTEMS SUPPORTED BY THE PATCH
   * PACKAGES AFFECTED BY THE PATCH
   * BASE PRODUCT VERSIONS FOR THE PATCH
   * SUMMARY OF INCIDENTS FIXED BY THE PATCH
   * DETAILS OF INCIDENTS FIXED BY THE PATCH
   * INSTALLATION PRE-REQUISITES
   * INSTALLING THE PATCH
   * REMOVING THE PATCH


PATCH NAME
----------
Symantec File System 6.1.1 Patch 6.1.1.400


OPERATING SYSTEMS SUPPORTED BY THE PATCH
----------------------------------------
RHEL5 x86-64


PACKAGES AFFECTED BY THE PATCH
------------------------------
VRTSvxfs


BASE PRODUCT VERSIONS FOR THE PATCH
-----------------------------------
   * Symantec File System 6.1
   * Symantec Storage Foundation 6.1
   * Symantec Storage Foundation Cluster File System HA 6.1
   * Symantec Storage Foundation for Oracle RAC 6.1
   * Symantec Storage Foundation HA 6.1


SUMMARY OF INCIDENTS FIXED BY THE PATCH
---------------------------------------
Patch ID: 6.1.1.400
* 3652109 (3553328) During internal testing full fsck failed to clean the file
system cleanly.
* 3729811 (3719523) 'vxupgrade' retains the superblock replica of old layout versions.
* 3765326 (3736398) NULL pointer dereference panic in lazy unmount.
* 3852733 (3729158) Deadlock occurs due to incorrect locking order between write 
advise and dalloc flusher thread.
* 3852736 (3457801) Kernel panics in block_invalidatepage().
* 3861521 (3549057) The "relatime" mount option is shown in /proc/mounts but it is
not supported by VxFS.
* 3864007 (3558087) The ls -l and other commands which uses stat system call may
take long time to complete.
* 3864010 (3269553) VxFS returns inappropriate message for read of hole via 
Oracle Disk Manager (ODM).
* 3864013 (3811849) System panics while executing lookup() in a directory with large directory hash(LDH).
* 3864035 (3790721) High cpu usage caused by vx_send_bcastgetemapmsg_remaus
* 3864036 (3233276) With a large file system, primary to secondary migration takes longer duration.
* 3864037 (3616907) System is unresponsive causing the NMI watchdog service to 
stall.
* 3864038 (3596329) Fix native-aio races with exiting threads
* 3864040 (3633683) vxfs thread consumes high CPU while running an 
application 
that makes excessive sync() calls.
* 3864041 (3613048) Support vectored AIO on Linux
* 3864042 (3466020) File system is corrupted with an error message "vx_direrr: vx_dexh_keycheck_1".
* 3864146 (3690078) The system panics at vx_dev_strategy() routine due to stack overflow.
* 3864148 (3695367) Unable to remove volume from multi-volume VxFS using "fsvoladm" command.
* 3864150 (3602322) System panics while flushing the dirty pages of the inode.
* 3864153 (3685391) Execute permissions for a file not honored correctly.
* 3864154 (3689354) Users having write permission on file cannot open the file
with O_TRUNC if the file has setuid or setgid bit set.
* 3864155 (3707662) Race between reorg processing and fsadm timer thread (alarm expiry) leads to panic in vx_reorg_emap.
* 3864156 (3662284) File Change Log (FCL) read may retrun ENXIO.
* 3864160 (3691633) Remove RCQ Full messages
* 3864161 (3708836) fallocate causes data corruption
* 3864163 (3712961) Stack overflow is detected in vx_dio_physio() right after submitting the I/O.
* 3864164 (3762125) Directory size increases abnormally.
* 3864166 (3731844) umount -r option fails for vxfs 6.2.
* 3864167 (3735697) vxrepquota reports error
* 3864170 (3743572) File system may get hang when reaching 1 billion inode 
limit
* 3864172 (3808091) fallocate caused data corruption
* 3864173 (3779916) vxfsconvert fails to upgrade layout verison for a vxfs file 
system with large number of inodes.
* 3864177 (3808033) When using 6.2.1 ODM on RHEL7, Oracle resource cannot be killed after forced umount via VCS.
* 3864178 (1428611) 'vxcompress' can spew many GLM block lock messages over the 
LLT network.
* 3864179 (3622323) Cluster Filesystem mounted as read-only panics when it gets sharing and/or compression statistics with the fsadm_vxfs(1M) command.
* 3864182 (3853338) Files on VxFS are corrupted while running the sequential
asynchronous write workload under high memory pressure.
* 3864184 (3857444) The default permission of /etc/vx/vxfssystem file is incorrect.
* 3864185 (3859032) System panics in vx_tflush_map() due to NULL pointer 
de-reference.
* 3864186 (3855726) Panic in vx_prot_unregister_all().
* 3864247 (3861713) High %sys CPU seen on Large CPU/Memory configurations.
* 3864250 (3833816) Read returns stale data on one node of the CFS.
* 3864255 (3827491) Data relocation is not executed correctly if the IOTEMP policy is set to AVERAGE.
* 3864256 (3830300) Degraded CPU performance during backup of Oracle archive logs
on CFS vs local filesystem
* 3864259 (3856363) Filesystem inodes have incorrect blocks.
* 3864260 (3846521) "cp -p" fails if modification time in nano seconds have 10 
digits.
* 3866968 (3866962) Data corruption seen when dalloc writes are going on the file and 
simultaneously fsync started on the same file.
* 3870704 (3867131) Kernel panic in internal testing.
* 3872661 (3857254) Assert failure because of missed flush before taking 
filesnap of the file.
* 3874662 (3871489) Performance issue observed when number of HBAs increased
on high end servers.
* 3875458 (3616694) Internal assert failure because of race condition between forced 
unmount thread and inactive processing thread.
* 3875633 (3869174) Write system call deadlock on rhel5 and sles10.
* 3876065 (3867128) Assert failed in internal native AIO testing.
* 3877070 (3880121) Internal assert failure when coalescing the extents on clone.
* 3877142 (3891801) Internal test hit debug assert.
* 3878983 (3872202) VxFS internal test hits an assert.
* 3890556 (2919310) During stress testing on cluster file system, an assertion failure was hit 
because of a missing linkage between the directory and the associated 
attribute inode.
* 3890659 (3514407) Internal stress test hit debug assert.
Patch ID: 6.1.1.300
* 3851511 (3821686) VxFS module failed to load on SLES11 SP4.
* 3852733 (3729158) Deadlock due to incorrect locking order between write advise
and dalloc flusher thread.
* 3852736 (3457801) Kernel panics in block_invalidatepage().
Patch ID: 6.1.1.100
* 3520113 (3451284) Internal testing hits an assert "vx_sum_upd_efree1"
* 3521945 (3530435) Panic in Internal test with SSD cache enabled.
* 3529243 (3616722) System panics because of race between the writeback cache offline thread and the writeback data flush thread.
* 3536233 (3457803) File System gets disabled intermittently with metadata IO error.
* 3583963 (3583930) When the external quota file is restored or over-written, the old quota records are preserved.
* 3617774 (3475194) Veritas File System (VxFS) fscdsconv(1M) command fails with metadata overflow.
* 3617776 (3473390) The multiple stack overflows with Veritas File System (VxFS) on RHEL6 lead to panics or system crashes.
* 3617781 (3557009) After the fallocate() function reserves allocation space, it results in the wrong file size.
* 3617788 (3604071) High CPU usage consumed by the vxfs thread process.
* 3617790 (3574404) Stack overflow during rename operation.
* 3617793 (3564076) The MongoDB noSQL db creation fails with an ENOTSUP error.
* 3617877 (3615850) Write system call hangs with invalid buffer length
* 3620279 (3558087) The ls -l command hangs when the system takes backup.
* 3620284 (3596378) The copy of a large number of small files is slower on vxfs compared to ext4
* 3620288 (3469644) The system panics in the vx_logbuf_clean() function.
* 3621420 (3621423) The VxVM caching shouldnt be disabled while mounting a file system in a situation where the VxFS cache area is not present.
* 3628867 (3595896) While creating OracleRAC 12.1.0.2 database, the node panics.
* 3636210 (3633067) While converting from ext3 file system to VxFS using vxfsconvert, it is observed that many inodes are missing..
* 3644006 (3451686) During internal stress testing on cluster file system(CFS),
debug assert is hit due to invalid cache generation count on incore inode.
* 3645825 (3622326) Filesystem is marked with fullfsck flag as an inode is
marked bad during checkpoint promote
Patch ID: 6.1.1.000
* 3370758 (3370754) Internal test with SmartIO write-back SSD cache hit debug asserts.
* 3383149 (3383147) The ACA operator precedence error may occur while turning off
delayed allocation.
* 3422580 (1949445) System is unresponsive when files were created on large directory.
* 3422584 (2059611) The system panics due to a NULL pointer dereference while
flushing bitmaps to the disk.
* 3422586 (2439261) When the vx_fiostats_tunable value is changed from zero to
non-zero, the system panics.
* 3422604 (3092114) The information output displayed by the "df -i" command may be inaccurate for 
cluster mounted file systems.
* 3422614 (3297840) A metadata corruption is found during the file removal process.
* 3422619 (3294074) System call fsetxattr() is slower on Veritas File System (VxFS) than ext3 file system.
* 3422624 (3352883) During the rename operation, lots of nfsd threads hang.
* 3422626 (3332902) While shutting down, the system running the fsclustadm(1M) 
command panics.
* 3422629 (3335272) The mkfs (make file system) command dumps core when the log 
size provided is not aligned.
* 3422634 (3337806) The find(1) command may panic the systems with Linux kernels
with versions greater than 3.0.
* 3422636 (3340286) After a file system is resized, the tunable setting of
dalloc_enable gets reset to a default value.
* 3422638 (3352059) High memory usage occurs when VxFS uses Veritas File Replicator (VFR) on the target even when no jobs are running.
* 3422649 (3394803) A panic is observed in VxFS routine vx_upgrade7() function
while running the vxupgrade command(1M).
* 3422657 (3412667) The RHEL 6 system panics with Stack Overflow.
* 3430467 (3430461) The nested unmounts fail if the parent file system is disabled.
* 3436431 (3434811) The vxfsconvert(1M) in VxFS 6.1 hangs.
* 3436433 (3349651) Veritas File System (VxFS) modules fail to load on RHEL 6.5 and display an error message.
* 3494534 (3402618) The mmap read performance on VxFS is slow
* 3502847 (3471245) The mongodb fails to insert any record.
* 3504362 (3472551) The attribute validation (pass 1d) of full fsck takes too much time to complete.
* 3506487 (3506485) The system does not allow write-back caching with Symantec Volume Replicator (VVR).
* 3512292 (3348520) In a Cluster File System (CFS) cluster having multi volume file system of a smaller size, execution of the fsadm command causes system hang if the free space in the file system is low.
* 3518943 (3534779) Internal stress testing on Cluster File System (CFS) hits a
debug assert.
* 3519809 (3463464) Internal kernel functionality conformance test hits a kernel panic due to null pointer dereference.
* 3522003 (3523316) The writeback cache feature does not work for write size of 2MB.
* 3528770 (3449152) Failed to set 'thin_friendly_alloc' tunable in case of cluster file system (CFS).
* 3529852 (3463717) Information regarding Cluster File System (CFS) that does not support the 'thin_friendly_alloc' tunable is not updated in the vxtunefs(1M) command  man page.
* 3530038 (3417321) The vxtunefs(1M) tunable man page gives an incorrect
* 3541125 (3541083) The vxupgrade(1M) command for layout version 10 creates
64-bit quota files with inappropriate permission configurations.
Patch ID: 6.1.0.200
* 3424575 (3349651) Veritas File System (VxFS) modules fail to load on RHEL 6.5 and display an error message.
Patch ID: 6.1.0.100
* 3418489 (3370720) Performance degradation is seen with Smart IO feature enabled.


DETAILS OF INCIDENTS FIXED BY THE PATCH
---------------------------------------
This patch fixes the following Symantec incidents:

Patch ID: 6.1.1.400

* 3652109 (Tracking ID: 3553328)

SYMPTOM:
During internal testing it was found that per node LCT file was
corrupted, due to which attribute inode reference counts were mismatching,
resulting in fsck failure.

DESCRIPTION:
During clone creation LCT from 0th pindex is copied to the new
clone's LCT. Any update to this LCT file from non-zeroth pindex can cause count
mismatch in the new fileset.

RESOLUTION:
The code is modified to handle this issue.

* 3729811 (Tracking ID: 3719523)

SYMPTOM:
'vxupgrade' does not clear the superblock replica of old layout versions.

DESCRIPTION:
While upgrading the file system to a new layout version, a new superblock inode is allocated and an extent is allocated for the replica superblock. After writing the new superblock (primary + replica), VxFS frees the extent of the old superblock replica.
Now, if the primary superblock corrupts, the full fsck searches for replica to repair the file system. If it finds the replica of old superblock, it restores the file system to the old layout, instead of creating a new one. This behavior is wrong.
In order to take the file system to a new version, we should clear the replica of old superblock as part of vxupgrade, so that full fsck won't detect it later.

RESOLUTION:
Clear the replica of old superblock as part of vxupgrade.

* 3765326 (Tracking ID: 3736398)

SYMPTOM:
Panic occurs in the lazy unmount path during deinit of VxFS-VxVM API.

DESCRIPTION:
The panic occurs when an exiting thread drops the last reference to 
a lazy-unmounted VxFS file system which is the last VxFS mount on the system. The 
exiting thread does unmount, which then makes call into VxVM to de-initialize the 
private FS-VM API as it is the last VxFS mounted file system. The function to be 
called in VxVM is looked-up via the files under /proc. This requires a file to be 
opened, but the exit processing has removed the structures needed by the thread 
to open a file, because of which a panic is observed

RESOLUTION:
The code is modified to pass the deinit work to worker thread.

* 3852733 (Tracking ID: 3729158)

SYMPTOM:
The fuser and other commands hang on VxFS file systems.

DESCRIPTION:
The hang is seen while 2 threads contest for 2 locks -ILOCK and 
PLOCK. The writeadvise thread owns the ILOCK but is waiting for the PLOCK, 
while the dalloc thread owns the PLOCK and is waiting for the ILOCK.

RESOLUTION:
The code is modified to correct the order of locking. Now PLOCK is 
followed by ILOCK.

* 3852736 (Tracking ID: 3457801)

SYMPTOM:
Kernel panics in block_invalidatepage().

DESCRIPTION:
The address-space struct of a page has "a_ops" as "vx_empty_aops".  
This is an empty structure, so do_invalidatepage() calls block_invalidatepage() - 
but these pages have VxFS's page buffer-heads attached, not kernel buffer-heads.  
So, block_invalidatepage() panics.

RESOLUTION:
Code is modified to fix this by flushing pages before 
vx_softcnt_flush.

* 3861521 (Tracking ID: 3549057)

SYMPTOM:
The "relatime" mount option wrongly shown in /proc/mounts.

DESCRIPTION:
The "relatime" mount option wrongly shown in /proc/mounts. VxFS does
not understand relatime mount option. It comes from Linux kernel.

RESOLUTION:
Code is modified to handle the issue.

* 3864007 (Tracking ID: 3558087)

SYMPTOM:
When stat system call is executed on VxFS File System with delayed
allocation feature enabled, it may take long time or it may cause high cpu
consumption.

DESCRIPTION:
When delayed allocation (dalloc) feature is turned on, the
flushing process takes much time. The process keeps the get page lock held, and
needs writers to keep the inode reader writer lock held. Stat system call may
keeps waiting for inode reader writer lock.

RESOLUTION:
Delayed allocation code is redesigned to keep the get page lock
unlocked while flushing.

* 3864010 (Tracking ID: 3269553)

SYMPTOM:
VxFS returns inappropriate message for read of hole via ODM.

DESCRIPTION:
Sometimes sparse files containing temp or backup/restore files are
created outside the Oracle database. And, Oracle can read these files only using
the ODM. As a result, ODM fails with an ENOTSUP error.

RESOLUTION:
The code is modified to return zeros instead of an error.

* 3864013 (Tracking ID: 3811849)

SYMPTOM:
System panics due to size mismatch in the cluster-wide buffers containing hash bucket data. Offending stack looks like below:

   $cold_vm_hndlr
   bubbledown
   as_ubcopy
   vx_populate_bpdata
   vx_getblk_clust
   $cold_vx_getblk
   vx_exh_getblk
   vx_exh_get_bucket
   vx_exh_lookup
   vx_dexh_lookup
   vx_dirscan
   vx_dirlook
   vx_pd_lookup
   vx_lookup_pd
   vx_lookup
   lookupname
   lstat
   syscall

On some platforms, instead of panic, LDH corruption can be reported. Full fsck can report some meta-data inconsistencies, which looks like the 
below sample messages: 

fileset 999 primary-ilist inode 263 has invalid alternate directory index
        (fileset 999 attribute-ilist inode 8193), clear index? (ynq)y
fileset 999 primary-ilist inode 29879 has invalid alternate directory index
        (fileset 999 attribute-ilist inode 8194), clear index? (ynq)y
fileset 999 primary-ilist inode 1070691 has invalid alternate directory 
index
        (fileset 999 attribute-ilist inode 24582), clear index? (ynq)y
fileset 999 primary-ilist inode 1262102 has invalid alternate directory 
index
        (fileset 999 attribute-ilist inode 8198), clear index? (ynq)y

DESCRIPTION:
On a very fragmented file system with FS block sizes 1K, 2K or 4K, any segment of the hash inode (i.e. buckets/CDF/directory segment with fixed size: 8K) can 
spread across multiple extents.

Instead of initializing the buffers on the final bmap after all allocations are finished, LDH code allocates the buffer-cache buffers as the allocations come along.As a result, small allocations can be merged in final bmap, e.g. two CFS nodes can end up having buffers representing same metadata, with different sizes. This leads to panics because the buffers are passed around the cluster or the corruption reaches LDH portions on the disk.

RESOLUTION:
The code is modified to separate the allocation and buffer initialization in LDH code paths.

* 3864035 (Tracking ID: 3790721)

SYMPTOM:
High CPU usage on the vxfs thread process. The backtrace of such kind of threads
usually look like this:

schedule
schedule_timeout
__down
down
vx_send_bcastgetemapmsg_remaus
vx_send_bcastgetemapmsg
vx_recv_getemapmsg
vx_recvdele
vx_msg_recvreq
vx_msg_process_thread
vx_kthread_init
kernel_thread

DESCRIPTION:
The locking mechanism in vx_send_bcastgetemapmsg_process() is inefficient. So that
every
time vx_send_bcastgetemapmsg_process() is called, it will perform a series of
down-up
operation on a certain semaphore. This can result in a huge CPU cost when multiple
threads have contention on this semaphore.

RESOLUTION:
Optimize the locking mechanism in vx_send_bcastgetemapmsg_process(),
so that it only do down-up operation on the semaphore once.

* 3864036 (Tracking ID: 3233276)

SYMPTOM:
On a 40 TB file system, the fsclustadm setprimary command consumes more than 2 minutes for execution. And, the unmount operation consumes more time causing a primary migration.

DESCRIPTION:
The old primary needs to process the delegated allocation units while migrating
from primary to secondary. The inefficient implementation of the allocation unit
list is consuming more time while removing the element from the list. As the file system size increases, the allocation unit list also increases, which results in additional migration time.

RESOLUTION:
The code is modified to process the allocation unit list efficiently. With this modification, the primary migration is completed in 1 second on the 40 TB file system.

* 3864037 (Tracking ID: 3616907)

SYMPTOM:
While performing the garbage collection operation, VxFS causes the non-maskable 
interrupt (NMI) service to stall.

DESCRIPTION:
With a highly fragmented Reference Count Table (RCT), when a garbage collection 
operation is performed, the CPU could be used for a longer duration. The CPU 
could be busy if a potential entry that could be freed is not identified.

RESOLUTION:
The code is modified such that the CPU is released after a when it is idle 
after a specified time interval.

* 3864038 (Tracking ID: 3596329)

SYMPTOM:
System panic in aio codepaths

DESCRIPTION:
On Linux, there has always been an issue with threads exiting with async DIOs 
inflight.  With the kernel's export restrictions, it is not possible for VxFS 
to take/drop a hold on a thread's mm_struct.  This leads to the issue where 
VxFS can use the mm_struct after it has been destroyed (due to thread exit).
On RHEL7, the issue is worse in that no threads (cloned and non-cloned) wait 
for any AIO before the mm_struct/task_struct is destroyed.  This can lead to 
panic and memory corruption (as VxFS continues to use stale pointers).

RESOLUTION:
This code is to fix the reference of destoryed structs in VxFS for AIO, 
while having good IO throughput.

* 3864040 (Tracking ID: 3633683)

SYMPTOM:
"top" command output shows vxfs thread consuming high CPU while 
running an application that makes excessive sync() calls.

DESCRIPTION:
To process sync() system call vxfs scans through inode cache 
which is a costly operation. If an user application is issuing excessive 
sync() calls and there are vxfs file systems mounted, this can make vxfs 
sync 
processing thread to consume high CPU.

RESOLUTION:
Combine all the sync() requests issued in last 60 second into a 
single request.

* 3864041 (Tracking ID: 3613048)

SYMPTOM:
System can panic with the following stack::
 
machine_kexec
crash_kexec
oops_end
die
do_invalid_op
invalid_op
aio_complete
vx_naio_worker
vx_kthread_init

DESCRIPTION:
VxFS does not correctly support IOCB_CMD_PREADV and IOCB_CMD_PREADV, which 
causes a BUG to fire in the kernel code (in fs/aio.c:__aio_put_req()).

RESOLUTION:
Add support for the vectored AIO commands and fixed the increment of ->ki_users 
so it is guarded by the required spinlock.

* 3864042 (Tracking ID: 3466020)

SYMPTOM:
File system is corrupted with the following error message in the log:

WARNING: msgcnt 28 mesg 008: V-2-8: vx_direrr: vx_dexh_keycheck_1 - /TraceFile
file system dir inode 3277090 dev/block 0/0 diren
 WARNING: msgcnt 27 mesg 008: V-2-8: vx_direrr: vx_dexh_keycheck_1 - /TraceFile
file system dir inode 3277090 dev/block 0/0 diren
 WARNING: msgcnt 26 mesg 008: V-2-8: vx_direrr: vx_dexh_keycheck_1 - /TraceFile
file system dir inode 3277090 dev/block 0/0 diren
 WARNING: msgcnt 25 mesg 096: V-2-96: vx_setfsflags -
 /dev/vx/dsk/a2fdc_cfs01/trace_lv01 file system fullfsck flag set - vx_direr
 WARNING: msgcnt 24 mesg 008: V-2-8: vx_direrr: vx_dexh_keycheck_1 - /TraceFile
file system dir inode 3277090 dev/block 0/0 diren

DESCRIPTION:
In case an error is returned from the vx_dirbread() function via the 
vx_dexh_keycheck1() function, the FULLFSCK flag is set on the file system 
unconditionally. A corrupted Large Directory Hash (LDH) can lead to the 
incorrect block being read, this results in the FULLFSCK flag being set. The 
system does not verify whether it reads the incorrect value due to a corrupted 
LDH. Subsequently, the FULLFSCK flag is set unnecessarily, because a corrupted 
LDH is fixed online by recreating the hash.

RESOLUTION:
The code is modified such that when a LDH corruption is detected, the system 
removes the LDH, instead of setting FULLFSCK. The LDH is recreated the next 
time the directory is modified.

* 3864146 (Tracking ID: 3690078)

SYMPTOM:
The system panics at vx_dev_strategy() routine with the following stack trace:
vx_snap_strategy()
vx_logbuf_write() 
vx_logbuf_io()
vx_logbuf_flush() 
vx_logflush()
vx_mapstrategy() 
vx_snap_strategy() 
vx_clonemap() 
vx_unlockmap() 
vx_holdmap() 
vx_extmaptran() 
vx_extmapchange() 
vx_extprevfind() 
vx_extentalloc() 
vx_te_bmap_alloc() 
vx_bmap_alloc_typed() 
vx_bmap_alloc() 
vx_get_alloc() 
vx_cfs_pagealloc() 
vx_alloc_getpage() 
vx_do_getpage() 
vx_internal_alloc() 
vx_write_alloc() 
vx_write1() 
vx_write_common_slow()
vx_write_common() 
vx_vop_write()
vx_writev() 
vx_naio_write_v2() 
vfs_writev()

DESCRIPTION:
The issue was observed due to low handoff limit of vx_extprevfind.

RESOLUTION:
The code is modified  to avoid the stack overflow.

* 3864148 (Tracking ID: 3695367)

SYMPTOM:
Unable to remove volume from multi-volume VxFS using "fsvoladm" command. It fails with "Invalid argument" error.

DESCRIPTION:
Volumes are not being added in the in-core volume list structure correctly. Therefore while removing volume from multi-volume VxFS using "fsvoladm", command fails.

RESOLUTION:
The code is modified to add volumes in the in-core volume list structure correctly.

* 3864150 (Tracking ID: 3602322)

SYMPTOM:
System may panic while flushing the dirty pages of the inode.

DESCRIPTION:
Panic may occur due to the synchronization problem between one
thread that flushes the inode, and the other thread that frees the chunks that
contain the inodes on the freelist. 

The thread that frees the chunks of inodes on the freelist grabs an inode, and 
clears/de-reference the inode pointer while deinitializing the inode. This may 
result in the pointer de-reference, if the flusher thread is working on the 
same
inode.

RESOLUTION:
The code is modified to resolve the race condition by taking proper
locks on the inode and freelist, whenever a pointer in the inode is de-
referenced. 

If the inode pointer is already de-initialized to NULL, then the flushing is 
attempted on the next inode.

* 3864153 (Tracking ID: 3685391)

SYMPTOM:
Execute permissions for a file not honored correctly.

DESCRIPTION:
The user was able to execute the file regardless of not having the execute permissions.

RESOLUTION:
The code is modified such that an error is reported when the execute permissions are not applied.

* 3864154 (Tracking ID: 3689354)

SYMPTOM:
Users having write permission on file cannot open the file with O_TRUNC
if the file has setuid or setgid bit set.

DESCRIPTION:
On Linux, kernel triggers an explicit mode change as part of
O_TRUNC processing to clear setuid/setgid bit. Only the file owner or a
privileged user is allowed to do a mode change operation. Hence for a
non-privileged user who is not the file owner, the mode change operation fails
making open() system call to return EPERM.

RESOLUTION:
Mode change request to clear setuid/setgid bit coming as part of
O_TRUNC processing is allowed for other users.

* 3864155 (Tracking ID: 3707662)

SYMPTOM:
Race between reorg processing and fsadm timer thread (alarm expiry) leads to panic in vx_reorg_emap with the following stack::

vx_iunlock
vx_reorg_iunlock_rct_reorg
vx_reorg_emap
vx_extmap_reorg
vx_reorg
vx_aioctl_full
vx_aioctl_common
vx_aioctl
vx_ioctl
fop_ioctl
ioctl

DESCRIPTION:
When the timer expires (fsadm with -t option), vx_do_close() calls vx_reorg_clear() on local mount which performs cleanup on reorg rct inode. Another thread currently active in vx_reorg_emap() will panic due to null pointer dereference.

RESOLUTION:
When fop_close is called in alarm handler context, we defer the cleaning up untill the kernel thread performing reorg completes its operation.

* 3864156 (Tracking ID: 3662284)

SYMPTOM:
File Change Log (FCL) read may retrun ENXIO as follows:

# file changelog 
changelog: ERROR: cannot read `changelog' (No such device or address)

DESCRIPTION:
VxFS reads FCL file and returns ENXIO when there is a HOLE in the file.

RESOLUTION:
The code is modified to zero out the user buffer when hitting a hole if FCL read
is from user space.

* 3864160 (Tracking ID: 3691633)

SYMPTOM:
Remove RCQ Full messages

DESCRIPTION:
Too many unnecessary RCQ Full messages were logging in the system log.

RESOLUTION:
The RCQ Full messages removed from the code.

* 3864161 (Tracking ID: 3708836)

SYMPTOM:
When using fallocate together with delayed extending write, data corruption may happen.

DESCRIPTION:
When doing fallocate after EOF, vxfs grows the file by splitting the last extent of the file into two parts, then converts the part after EOF to a ZFOD extent. During this procedure, a stale file size is used to calculate the start offset of the newly zeroed extent. This may overwrite the blocks which contain the unflushed data generated by the extending write and cause data corruption.

RESOLUTION:
The code is modified to use up-to-date file size instead of the stale file size, to make sure the new ZFOD extent is created correctly.

* 3864163 (Tracking ID: 3712961)

SYMPTOM:
SFCFS cluster with ODM panics by the following steps:
1)	use RDMA heartbeat for LLT
2)	use FSS
3)	disconnect one LLT link, one machine will panic

Panic stack is as follows:

vx_dio_physio
vx_dio_rdwri
fdd_write_end at
fdd_rw
fdd_odm_rw
odm_vx_io
odm_io_start
odm_io_req
odm_io
odm_io_stat
odm_ioctl_ctl
odm_ioctl_ctl_unlocked
vfs_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath

DESCRIPTION:
VxFS detects stack overflow and calls BUG_ON() to panic the kernel. The stack
overflow is detected right after VxFS submits the I/O to lower layer. The stack
overflow does not happen in VxFS, but somewhere in lower layers.

RESOLUTION:
There are two kernel parameters which can be used to resolve this issue. By
configuring these two parameters, a thread hand-off can be added before
submitting the I/Os to VxVM when there's no sufficient stack space left. These
parameters are not run time parameters. These can be set at module load time only.

The following two module parameters need to be configured for this solution:

1. vxfs_io_proxy_vxvm  - 	If set, include VxVM devices in I/O hand-off decisions.

2. vxfs_io_proxy_level - 	When free stack space is less than this, an I/O is
handed-off to a proxy. The default value of vxfs_io_proxy_level is 4K bytes.

Set above VxFS kernel parameters using .conf file as follows.

a) Create  vxfs.conf file inside /etc/modprobe.d directory.
    
      touch /etc/modprobe.d/vxfs.conf
       
b) Now copy the following lines into vxfs.conf file.
   options vxfs vxfs_io_proxy_vxvm=1
   options vxfs vxfs_io_proxy_level=6144

Check through crash debugger or through any other debugger whether these values
has been set or not.

E.g

crash> vxfs_io_proxy_vxvm
vxfs_io_proxy_vxvm = $3 = 1
crash> vxfs_io_proxy_level
vxfs_io_proxy_level = $4 = 6144

* 3864164 (Tracking ID: 3762125)

SYMPTOM:
Directory size sometimes keeps increasing even though the number of files inside it doesn't 
increase.

DESCRIPTION:
This only happens to CFS. A variable in the directory inode structure marks the start of 
directory free space. But when the directory ownership changes, the variable may become stale, which 
could cause this issue.

RESOLUTION:
The code is modified to reset this free space marking variable when there's 
ownershipchange. Now the space search goes from beginning of the directory inode.

* 3864166 (Tracking ID: 3731844)

SYMPTOM:
umount -r option fails for vxfs 6.2 with error "invalid options"

DESCRIPTION:
Till 6.2 vxfs did not have a umount helper on linux. We added a helper in 6.2,
because of this, each call to linux's umount also gets called to the umount
helper binary. Due to this the -r option, which was only handled by the linux
native umount, is forwarded to the umount.vxfs helper, which exits while
processing the option string becase we don't support readonly remounts.

RESOLUTION:
To solve this, we've changed the umount.vxfs code to not exit on
"-r" option, although we do not support readonly remounts, so if umount -r
actually fails and the os umount attempts a readonly remount, the mount.vxfs
binary will then exit with an error. This solves the problem of linux's default
scripts not working for our fs.

* 3864167 (Tracking ID: 3735697)

SYMPTOM:
vxrepquota reports error like,
# vxrepquota -u /vx/fs1
UX:vxfs vxrepquota: ERROR: V-3-20002: Cannot access 
/dev/vx/dsk/sfsdg/fs1:ckpt1: 
No such file or directory
UX:vxfs vxrepquota: ERROR: V-3-24996: Unable to get disk layout version

DESCRIPTION:
vxrepquota checks each mount point entry in mounted file system 
table. If any checkpoint mount point entry presents before the mount point 
specified in the vxrepquota command, vxrepquota will report errors, but the 
command can succeed.

RESOLUTION:
Skip checkpoint mount point in the mounted file system table.

* 3864170 (Tracking ID: 3743572)

SYMPTOM:
File system may get hang when reaching 1 billion inode limit, the 
hung stack is as following:

vx_svar_sleep_unlock
vx_event_wait
vx_async_waitmsg
vx_msg_send
llt_msgalloc
vx_cfs_getias
vx_update_ilist
vx_find_partial_au
vx_cfs_noinode
vx_noinode
vx_dircreate_tran
vx_pd_create
vx_dirlook
vx_create1_pd
vx_create1
vx_create_vp
vx_create

DESCRIPTION:
The maximum number of inodes supported by VxFS is 1 billion. 
When the file system is running out of inodes and the maximum inode 
allocation unit(IAU) limit is reached, VxFS can still create two extra IAUs 
if there is a hole in the last IAU. Because of the hole, when a secondary 
requests more inodes, the primary still thinks there is a hole available and 
notifies the secondary to retry. However, the secondary fails to find a slot 
since the 1 billion limit is hit, then it goes back to the primary to 
request free inodes again, and this loops infinitely, hence the hang.

RESOLUTION:
When the maximum IAU number is reached, prevent primary to 
create the extra IAUs.

* 3864172 (Tracking ID: 3808091)

SYMPTOM:
When using fallocate together with delayed extending write, data corruption may 
happen.

DESCRIPTION:
When doing fallocate after EOF, vxfs will grow the file by splitting the last 
extent of the file into two parts and convert the part after EOF to a ZFOD 
extent. During this procedure, we use a stale file size to calculate the start 
offset of the newly zeroed extent, thus may end up overwriting the blocks which 
contain the unflushed data generated by extending write and causes data 
corruption.

RESOLUTION:
Use up to date file size instead of the stale file size, to make sure the new 
ZFOD extent is created correctly.

* 3864173 (Tracking ID: 3779916)

SYMPTOM:
vxfsconvert fails to upgrade layout verison for a vxfs file system with 
large number of inodes. Error message will show some inode discrepancy.

DESCRIPTION:
vxfsconvert walks through the ilist and converts inode. It stores 
chunks of inodes in a buffer and process them as a batch. The inode number 
parameter for this inode buffer is of type unsigned integer. The offset of a 
particular inode in the ilist is calculated by multiplying the inode number with 
size of inode structure. For large inode numbers this product of inode_number * 
inode_size can overflow the unsigned integer limit, thus giving wrong offset 
within the ilist file. vxfsconvert therefore reads wrong inode and eventually 
fails.

RESOLUTION:
The inode number parameter is defined as unsigned long to avoid 
overflow.

* 3864177 (Tracking ID: 3808033)

SYMPTOM:
After a service group is set offline via VOM or VCSOracle process is left in an unkillable state.

DESCRIPTION:
Whenever ODM issues an async request to FDD, FDD is required to do iodone processing on it, regardless of how far the request gets. The forced unmount causes FDD to take one of the early error branch which misses iodone routine for this particular async request. From ODM's perspective, the request is submitted, but iodone will never be called. This has several bad consequences, one of which is a user thread is blocked uninterruptibly forever, if it waits for request.

RESOLUTION:
The code is modified to add iodone routine in the error handling code.

* 3864178 (Tracking ID: 1428611)

SYMPTOM:
'vxcompress' command can cause many GLM block lock messages to be 
sent over the network. This can be observed with 'glmstat -m' output under the 
section "proxy recv", as shown in the example below -

bash-3.2# glmstat -m
         message     all      rw       g      pg       h     buf     oth    
loop
master send:
           GRANT     194       0       0       0       2       0     192      
98
          REVOKE     192       0       0       0       0       0     192      
96
        subtotal     386       0       0       0       2       0     384     
194

master recv:
            LOCK     193       0       0       0       2       0     191      
98
         RELEASE     192       0       0       0       0       0     192      
96
        subtotal     385       0       0       0       2       0     383     
194

    master total     771       0       0       0       4       0     767     
388

proxy send:
            LOCK      98       0       0       0       2       0      96      
98
         RELEASE      96       0       0       0       0       0      96      
96
      BLOCK_LOCK    2560       0       0       0       0    2560       0       
0
   BLOCK_RELEASE    2560       0       0       0       0    2560       0       
0
        subtotal    5314       0       0       0       2    5120     192     
194

DESCRIPTION:
'vxcompress' creates placeholder inodes (called IFEMR inodes) to 
hold the compressed data of files. After the compression is finished, IFEMR 
inode exchange their bmap with the original file and later given to inactive 
processing. Inactive processing truncates the IFEMR extents (original extents 
of the regular file, which is now compressed) by sending cluster-wide buffer 
invalidation requests. These invalidations need GLM block lock. Regular file 
data need not be invalidated across the cluster, thus making these GLM block 
lock requests unnecessary.

RESOLUTION:
Pertinent code has been modified to skip the invalidation for the 
IFEMR inodes created during compression.

* 3864179 (Tracking ID: 3622323)

SYMPTOM:
Cluster Filesystem mounted as read-only panics when it gets sharing and/or compression statistics using the fsadm_vxfs(1M) command with the following stack:
	 
	- vx_irwlock
	- vx_clust_fset_curused
	- vx_getcompstats
	- vx_aioctl_getcompstats
	- vx_aioctl_common
	- vx_aioctl
	- vx_unlocked_ioctl
	- vx_ioctl
	- vfs_ioctl
	- do_vfs_ioctl
	- sys_ioctl
	- system_call_fastpath

DESCRIPTION:
When file system is mounted as read-only, part of the initial setup is skipped, including loading of few internal structures. These structures are referenced while gathering statistics for sharing and/or compression. As a result, panic occurs.

RESOLUTION:
The code is modified to only allow "fsadm -HS all" to gather sharing and/or compression statistics on read-write file systems. On read-only file systems, this command fails.

* 3864182 (Tracking ID: 3853338)

SYMPTOM:
Files on VxFS are corrupted while running the sequential write workload
under high memory pressure.

DESCRIPTION:
VxFS may miss out writes sometimes under excessive write workload.
Corruption occurs because of the race between the writer thread which is doing
sequential asynchronous writes and the flusher thread which flushes the in-core
dirty pages. Due to an overlapping write, they are serialized 
over a page lock. Because of an optimization, this lock is released, leading to
a small window where the waiting thread could race.

RESOLUTION:
The code is modified to fix the race by reloading the inode write
size after taking the page lock.

* 3864184 (Tracking ID: 3857444)

SYMPTOM:
The default permission of /etc/vx/vxfssystem file is incorrect.

DESCRIPTION:
When creating the file "/etc/vx/vxfssystem", no permission is passed, which results in having the permission to this file as 000.

RESOLUTION:
The code is modified to create the file "/etc/vx/vxfssystem" with default permission as "600".

* 3864185 (Tracking ID: 3859032)

SYMPTOM:
System panics in vx_tflush_map() due to NULL pointer dereference.

DESCRIPTION:
When converting VxFS using vxconvert, new blocks are allocated to 
the structural files like smap etc which can contain garbage. This is done with 
the expectation that fsck will rebuild the correct smap. but in fsck, we have 
missed to distinguish between EAU fully EXPANDED and ALLOCATED. because of
which, if allocation to the file which has the last allocation from such
affected EAU is done, it will create the sub transaction on EAU which are in
allocated state. Map buffers of such EAUs are not initialized properly in VxFS
private buffer cache, as a result, these buffers will be released back as stale
during the transaction commit. Later, if any file-system wide sync tries to
flush the metadata, it can refer to these buffer pointers and panic as these
buffers are already released and reused.

RESOLUTION:
Code is modified in fsck to correctly set the state of EAU on 
disk. Also, modified the involved code paths as to avoid using doing
transactions on unexpanded EAUs.

* 3864186 (Tracking ID: 3855726)

SYMPTOM:
Panic happens in vx_prot_unregister_all(). The stack looks like this:

- vx_prot_unregister_all
- vxportalclose
- __fput
- fput
- filp_close
- sys_close
- system_call_fastpath

DESCRIPTION:
The panic is caused by a NULL fileset pointer, which is due to referencing the
fileset before it's loaded, plus, there's a race on fileset identity array.

RESOLUTION:
Skip the fileset if it's not loaded yet. Add the identity array lock to prevent
the possible race.

* 3864247 (Tracking ID: 3861713)

SYMPTOM:
Contention observed on vx_sched_lk and vx_worklist_lk spinlock when profiled using lockstats.

DESCRIPTION:
Internal worker threads take a lock to sleep on a CV while waiting
for work. This lock is global, If there are large numbers of CPU's and large numbers of worker threads then contention 
can be seen on the vx_sched_lk and vx_worklist_lk using lockstat as well as an increased %sys CPU

RESOLUTION:
Make the lock more scalable in large CPU configs

* 3864250 (Tracking ID: 3833816)

SYMPTOM:
In a CFS cluster, one node returns stale data.

DESCRIPTION:
In a 2-node CFS cluster, when node 1 opens the file and writes to
it, the locks are used with CFS_MASTERLESS flag set. But when node 2 tries to
open the file and write to it, the locks on node 1 are normalized as part of
HLOCK revoke. But after the Hlock revoke on node 1, when node 2 takes the PG
Lock grant to write, there is no PG lock revoke on node 1, so the dirty pages on
node 1 are not flushed and invalidated. The problem results in reads returning
stale data on node 1.

RESOLUTION:
The code is modified to cache the PG lock before normalizing it in
vx_hlock_putdata, so that after the normalizing, the cache grant is still with
node 1.When node 2 requests PG lock, there is a revoke on node 1 which flushes
and invalidates the pages.

* 3864255 (Tracking ID: 3827491)

SYMPTOM:
Data relocation is not executed correctly if the IOTEMP policy is set to AVERAGE.

DESCRIPTION:
Database table is not created correctly which results in an error on the database query. This affects the relocation policy of data and the files are not relocated properly.

RESOLUTION:
The code is modified fix the database table creation issue. Therelocation policy based calculations are done correctly.

* 3864256 (Tracking ID: 3830300)

SYMPTOM:
Heavy cpu usage while oracle archive process are running on a clustered
fs.

DESCRIPTION:
The cause of the poor read performance in this case was due to fragmentation,
fragmentation mainly happens when there are multiple archivers running on the
same node. The allocation pattern of the oracle archiver processes is 

1. write header with O_SYNC
2. ftruncate-up the file to its final size ( a few GBs typically)
3. do lio_listio with 1MB iocbs

The problem occurs because all the allocations in this manner go through
internal allocations i.e. allocations below file size instead of allocations
past the file size. Internal allocations are done at max 8 Pages at once. So if
there are multiple processes doing this, they all get these 8 Pages alternately
and the fs becomes very fragmented.

RESOLUTION:
Added a tunable, which will allocate zfod extents when ftruncate
tries to increase the size of the file, instead of creating a hole. This will
eliminate the allocations internal to file size thus the fragmentation. Fixed
the earlier implementation of the same fix, which ran into
locking issues. Also fixed the performance issue while writing from secondary node.

* 3864259 (Tracking ID: 3856363)

SYMPTOM:
vxfs reports mapbad errors in the syslog as below:
vxfs: msgcnt 15 mesg 003: V-2-3: vx_mapbad - vx_extfind - 
/dev/vx/dsk/vgems01/lvems01 file system free extent bitmap in au 0 marked 
bad.

And, full fsck reports following metadata inconsistencies:

fileset 999 primary-ilist inode 6 has invalid number of blocks 
(18446744073709551583)
fileset 999 primary-ilist inode 6 failed validation clear? (ynq)n
pass2 - checking directory linkage
fileset 999 directory 8192 block devid/blknum 0/393216 offset 68 references 
free 
inode
                                ino 6 remove entry? (ynq)n
fileset 999 primary-ilist inode 8192 contains invalid directory blocks
                                clear? (ynq)n
pass3 - checking reference counts
fileset 999 primary-ilist inode 5 unreferenced file, reconnect? (ynq)n
fileset 999 primary-ilist inode 5 clear? (ynq)n
fileset 999 primary-ilist inode 8194 unreferenced file, reconnect? (ynq)n
fileset 999 primary-ilist inode 8194 clear? (ynq)n
fileset 999 primary-ilist inode 8195 unreferenced file, reconnect? (ynq)n
fileset 999 primary-ilist inode 8195 clear? (ynq)n
pass4 - checking resource maps

DESCRIPTION:
While processing the VX_IEZEROEXT extop, VxFS frees the extent without 
setting VX_TLOGDELFREE flag. Similarly, there are other cases where the flag 
VX_TLOGDELFREE is not set in the case of the delayed extent free, this could 
result in mapbad errors and invalid block counts.

RESOLUTION:
Since the flag VX_TLOGDELFREE need to be set on every extent free, 
modified to code to discard this flag and treat every extent free as delayed 
extent free implicitly.

* 3864260 (Tracking ID: 3846521)

SYMPTOM:
cp -p is failing with EINVAL for files with 10 digit 
modification time. EINVAL error is returned if the value in tv_nsec field is 
greater than/outside the range of 0 to 999, 999, 999.  VxFS supports the 
update in usec but when copying in the user space, we convert the usec to 
nsec. So here in this case, usec has crossed the upper boundary limit i.e 
999, 999.

DESCRIPTION:
In a cluster, its possible that time across nodes might 
differ.so 
when updating mtime, vxfs check if it's cluster inode and if nodes mtime is 
newer 
time than current node time, then accordingly increment the tv_usec instead of 
changing mtime to older time value. There might be chance that it,  tv_usec 
counter got overflowed here, which resulted in 10 digit mtime.tv_nsec.

RESOLUTION:
Code is modified to reset usec counter for mtime/atime/ctime when 
upper boundary limit i.e. 999999 is reached.

* 3866968 (Tracking ID: 3866962)

SYMPTOM:
Data corruption seen when dalloc writes are going on the file and 
simultaneously fsync started on the same file.

DESCRIPTION:
In case if dalloc writes are going on the file and simultaneously 
synchronous flushing is started on the same file, then synchronous flushing will try 
to flush all the dirty pages of the file without considering underneath allocation. 
In this case, flushing can happen on the unallocated blocks and this can result into 
data loss.

RESOLUTION:
Code is modified to flush data till actual allocation in case of dalloc 
writes.

* 3870704 (Tracking ID: 3867131)

SYMPTOM:
Kernel panic in internal testing.

DESCRIPTION:
In internal testing the vdl_fsnotify_sb is found NULL because we
are not allocating and initializing it in initialization routine i.e
vx_fill_super(). The vdl_fsnotify_sb would be initialized in vx_fill_super()
only when the kernel's fsnotify feature is available. But fsnotify feature is
not available in RHEL5/SLES10 kernel.

RESOLUTION:
code is added to check if fsnotify feature is available in the
running kernel.

* 3872661 (Tracking ID: 3857254)

SYMPTOM:
Assert failure because of missed flush before taking filesnap of the file.

DESCRIPTION:
If the delayed extended write on the file is not completed but the snap of the file is taken, then the inode size is not updated correctly. This will trigger internal assert because of incorrect inode size.

RESOLUTION:
The code is modified to flush the delayed extended write before taking filesnap.

* 3874662 (Tracking ID: 3871489)

SYMPTOM:
IO service times increased with IO intensive workload on high end 
server.

DESCRIPTION:
VxFS has worklist threads which sleep on single conditional
variable. while waking up the worker threads contention can be seen on the
OS sleep dispatch locks and service time for the IO can increase due to this
contention.

RESOLUTION:
Scale the number of conditional variables to reduce contention.
And also add padding to the conditional variable structure to avoid cache
allocation problems. Also make sure to wakeup exact number of threads that
required.

* 3875458 (Tracking ID: 3616694)

SYMPTOM:
Internal assert failure because of race condition between forced unmount 
thread and inactive processing thread.

DESCRIPTION:
There is a possible race condition between forced umount and inactive 
processing. In linux, last close of file is done with umount but this may not result 
in getting called for inactive processing due to dentry hold. Dentry can be purged due 
to memory pressure which will result in inactive processing which can happen during 
unmount as well. So if inactivation happened after gone over inactive list as part of 
unmount and the entry got added to inactive list, assert will trigger because of the 
wrong flags on the inode.

RESOLUTION:
Code is modified to resolve this race condition.

* 3875633 (Tracking ID: 3869174)

SYMPTOM:
Write system call might get into deadlock on rhel5 and sles10.

DESCRIPTION:
Issue exists due to page fault handling when holding the page lock.
On rhel5 and sles10 when we go for write we may hold page locks and now if page
fault happens, page fault handler will be waiting on the lock which we have
already held resulting in deadlock.

RESOLUTION:
This behavior has been taken care of. Now we prefault so that deadlock
can be skipped.

* 3876065 (Tracking ID: 3867128)

SYMPTOM:
Assert failed in internal native AIO testing.

DESCRIPTION:
On RHEL5/SLES10, in NAIO, the iovec comes from the kernel stack. So
when handed-off the work item to the worker thread, then the work item points
to an iovec structure in a stack-frame which no longer exists.  So, the iovecs
memory can be corrupted when it is used for a new stack-frame.

RESOLUTION:
Code is modified to allocate the iovec dynamically in naio hand-off code and 
copy it into the work item before doing handoff.

* 3877070 (Tracking ID: 3880121)

SYMPTOM:
Internal assert failure when coalescing the extents on clone.

DESCRIPTION:
When coalescing extents on clone, resolving overlay extent is not 
supported but still code try to resolve these overlay extents. This was 
resulting in internal assert failure.

RESOLUTION:
Code is modified to not resolve these overlay extents when 
coalescing.

* 3877142 (Tracking ID: 3891801)

SYMPTOM:
Internal test hit debug assert.

DESCRIPTION:
Got an debug assert while creating page in shared page cache for
zfod extent which is same as creating for HOLEs, which VxFS don't do.

RESOLUTION:
Added a check for page creation so that we don't create shared pages
for zfod extent.

* 3878983 (Tracking ID: 3872202)

SYMPTOM:
VxFS internal test hits an assert.

DESCRIPTION:
In page create case VxFS was taking the ipglock twice in a thread,
due to which the VxFS test hit the internal assert.

RESOLUTION:
Removed the ipglock from vx_wb_dio_write().

* 3890556 (Tracking ID: 2919310)

SYMPTOM:
During stress testing on cluster file system, an assertion failure was hit 
because of a missing linkage between the directory and the associated 
attribute inode.

DESCRIPTION:
As per the designed behavior, the node which owns the inode of the file, 
receives the request to remove the file from the directory. If the directory 
has an alternate index (hash directory) present, then in the file remove 
receive handler, the attribute inode is read from the disk. However, VxFS 
does not create a linkage between the directory and the corresponding inode, 
which results in an assert failure.

RESOLUTION:
The code is modified to set the directory inodes i_dirhash field to 
attribute inode. This change is exercised while bringing the inode incore 
during file or directory removal.

* 3890659 (Tracking ID: 3514407)

SYMPTOM:
Internal stress test hit debug assert.

DESCRIPTION:
In deli-cache code, when it reuses the inode, it is updating the
inode generation count only for reorg inodes.

RESOLUTION:
Code is added to update inode generation count unconditionally.

Patch ID: 6.1.1.300

* 3851511 (Tracking ID: 3821686)

SYMPTOM:
VxFS module might not get loaded on SLES11 SP4.

DESCRIPTION:
Since SLES11 SP4 is new release therefore VxFS module failed to load
on it.

RESOLUTION:
Added VxFS support for SLES11 SP4.

* 3852733 (Tracking ID: 3729158)

SYMPTOM:
fuser and other commands hang on vxfs file systems.

DESCRIPTION:
The hang is seen while 2 threads contest for 2 locks -ILOCK and
PLOCK. The writeadvise thread owns the ILOCK but is waiting for the PLOCK.
The dalloc thread owns the PLOCK and is waiting for the ILOCK.

RESOLUTION:
Correct order of locking is PLOCK followed by the ILOCK.

* 3852736 (Tracking ID: 3457801)

SYMPTOM:
Kernel panics in block_invalidatepage().

DESCRIPTION:
The address-space struct of a page has "a_ops" as "vx_empty_aops".  
This is an empty structure, so do_invalidatepage() calls block_invalidatepage() - 
but these pages have VxFS's page buffer-heads attached, not kernel buffer-heads.  
So, block_invalidatepage() panics.

RESOLUTION:
Code is modified to fix this by flushing pages before 
vx_softcnt_flush.

Patch ID: 6.1.1.100

* 3520113 (Tracking ID: 3451284)

SYMPTOM:
While allocating extent during write operation, if summary and bitmap data for
filesystem allocation unit get mismatched then the assert hits.

DESCRIPTION:
if extent was allocated using SMAP on the deleted inode, and part of the AU
space is moved from deleted inode to the new inode. At this point SMAP state is
set to VX_EAU_ALLOCATED and EMAP is not initialized. When more space is needed
for new inode, it tries to allocate from the same AU using EMAP and can hit
"f:vx_sum_upd_efree1:2a" assert, as EMAP is not initialized.

RESOLUTION:
Code has been modified to expand AU while moving partial AU space from one inode
to other inode.

* 3521945 (Tracking ID: 3530435)

SYMPTOM:
Panic in Internal test with SSD cache enabled.

DESCRIPTION:
The record end of the write back log record was wrongly getting modified
while adding a skip list node  in the punch hole case where expunge flag is set where then insertion of new 
node is skipped

RESOLUTION:
Code to modified to skip modification of the writeback log record when the
expunge flag is set and left end of the record is smaller or equal to the end
offset of the next punch hole request.

* 3529243 (Tracking ID: 3616722)

SYMPTOM:
Race between the writeback cache offline thread and the writeback data flush thread causes null pointer dereference, resulting in system panic.

DESCRIPTION:
While disabling writeback, the writeback cache information is deinitialized from each inode which results in the removal of writeback bmap lock pointer. But during this time frame, if the writeback flushing is still going on through some other thread which has writeback bmap lock, then while removing the writeback bmap lock, null pointer dereference hits since it was already removed through previous thread.

RESOLUTION:
The code is modified to handle such race conditions.

* 3536233 (Tracking ID: 3457803)

SYMPTOM:
File System gets disabled with the following message in the system log:
WARNING: V-2-37: vx_metaioerr - vx_iasync_wait - /dev/vx/dsk/testdg/test  file system meta data write error in dev/block

DESCRIPTION:
The inode's incore information gets inconsistent as one of its field is getting modified without the locking protection.

RESOLUTION:
Protect the inode's field properly by taking the lock operation.

* 3583963 (Tracking ID: 3583930)

SYMPTOM:
When external quota file is over-written or restored from backup, new settings which were added after the backup still remain.

DESCRIPTION:
The internal quota file is not always updated with correct limits, so the quotaon operation is to copy the quota limits from external to internal quota file. To complete the copy operation, the extent of external file is compared to the extent of internal file at the corresponding offset.     
If the external quota file is overwritten (or restored to its original copy) and the size of internal file is more than that of external, the quotaon operation does not clear the additional (stale) quota records in the internal file. Later, the sync operation (part of quotaon) copies these stale records from internal to external file. Hence, both internal and external files contain stale records.

RESOLUTION:
The code has been modified to remove the stale records in the internal file at the time of quotaon.

* 3617774 (Tracking ID: 3475194)

SYMPTOM:
Veritas File System (VxFS) fscdsconv(1M) command fails with the following error message:
...
UX:vxfs fscdsconv: INFO: V-3-26130: There are no files violating the CDS limits for this target.
UX:vxfs fscdsconv: INFO: V-3-26047:  Byteswapping in progress ...
UX:vxfs fscdsconv: ERROR: V-3-25656:  Overflow detected
UX:vxfs fscdsconv: ERROR: V-3-24418: fscdsconv: error processing primary inode 
list for fset 999
UX:vxfs fscdsconv: ERROR: V-3-24430: fscdsconv: failed to copy metadata
UX:vxfs fscdsconv: ERROR: V-3-24426: fscdsconv: Failed to migrate.

DESCRIPTION:
The fscdsconv(1M) command takes a filename argument which is used as a recovery failure, to be used to restore the original file system in case of failure when the file system conversion is in progress. This file has two parts: control part and data part. The control part is used to store information about all the metadata like inodes and extents etc. In this instance, the length of the control part is being underestimated for some file systems where there are few inodes, but the average number of extents per file is very large (this can be seen in the fsadm E report).

RESOLUTION:
Make recovery file sparse, start the data part after 1TB offset, and then the control part can do allocating writes to the hole from the beginning of the file.

* 3617776 (Tracking ID: 3473390)

SYMPTOM:
In memory pressure scenarios, you see panics or system crashes due to stack overflows.

DESCRIPTION:
Specifically on RHEL6, the memory allocation routines consume much more memory than other distributions like SLES, or even RHEL5. Due to this, multiple overflows are reported for the RHEL6 platform. Most of these overflows occur when VxFS tries to allocate memory under memory pressure.

RESOLUTION:
The code is modified to fix multiple overflows by adding handoff code paths, adjusting handoff limits, removing on-stack structures and reducing the number of function frames on stack wherever possible.

* 3617781 (Tracking ID: 3557009)

SYMPTOM:
Run the fallocate command with -l option to specify the length of the reserve allocation. The file size is not expected, but multiple of file system block size. For example:
If block size = 8K:
# fallocate -l 8860 testfile1
# ls -l
total 16
drwxr-xr-x. 2 root root    96 Jul  1 11:40 lost+found/
-rw-r--r--. 1 root root 16384 Jul  1 11:41 testfile1
The file size should be 8860, but it's 16384(which is 2*8192).

DESCRIPTION:
The vx_fallocate() function on Veritas File System (VxFS) creates larger file than specified because it allocates the extent in blocks. So the reserved file size is multiples of block size, instead of what the fallocate command specifies.

RESOLUTION:
The code is modified so that the vx_fallocate() function on VxFS sets the reserved file size to what it specifies, instead of multiples of block size.

* 3617788 (Tracking ID: 3604071)

SYMPTOM:
With the thin reclaim feature turned on, you can observe high CPU usage on the vxfs thread process. The backtrace of such kind of threads usually look like this:
	 
	 - vx_dalist_getau
	 - vx_recv_bcastgetemapmsg
	 - vx_recvdele
	 - vx_msg_recvreq
	 - vx_msg_process_thread
	 - vx_kthread_init

DESCRIPTION:
In the routine to get the broadcast information of a node which contains maps of Allocation Units (AUs) for which node holds the delegations, the locking mechanism is inefficient. Thus every time when this routine is called, it will perform a series of down-up operation on a certain semaphore. This can result in a huge CPU cost when many threads calling the routine in parallel.

RESOLUTION:
The code is modified to optimize the locking mechanism in the routine to get the broadcast information of a node which contains maps of Allocation Units (AUs) for which node holds the delegations, so that it only does down-up operation on the semaphore once.

* 3617790 (Tracking ID: 3574404)

SYMPTOM:
System panics because of a stack overflow during rename operation. The following stack trace can be seen during the panic:

machine_kexec 
crash_kexec 
oops_end 
no_context 
__bad_area_nosemaphore 
bad_area_nosemaphore 
__do_page_fault 
do_page_fault 
page_fault 
task_tick_fair 
scheduler_tick 
update_process_times 
tick_sched_timer 
__run_hrtimer 
hrtimer_interrupt 
local_apic_timer_interrupt 
smp_apic_timer_interrupt 
apic_timer_interrupt 
--- &lt;IRQ stack&gt; ---
apic_timer_interrupt 
mempool_free_slab 
mempool_free 
vx_pgbh_free  
vx_pgbh_detach  
vx_releasepage  
try_to_release_page 
shrink_page_list.clone.3 
shrink_inactive_list 
shrink_mem_cgroup_zone 
shrink_zone 
zone_reclaim 
get_page_from_freelist 
__alloc_pages_nodemask 
alloc_pages_current 
__get_free_pages 
vx_getpages  
vx_alloc  
vx_bc_getfreebufs  
vx_bc_getblk  
vx_getblk_bp  
vx_getblk_cmn  
vx_getblk  
vx_getmap  
vx_getemap  
vx_extfind  
vx_searchau_downlevel  
vx_searchau_downlevel  
vx_searchau_downlevel  
vx_searchau_downlevel  
vx_searchau_uplevel  
vx_searchau  
vx_extentalloc_device  
vx_extentalloc  
vx_bmap_ext4  
vx_bmap_alloc_ext4  
vx_bmap_alloc  
vx_write_alloc3  
vx_tran_write_alloc  
vx_idalloc_off1  
vx_idalloc_off  
vx_int_rename  
vx_do_rename  
vx_rename1  
vx_rename  
vfs_rename 
sys_renameat 
sys_rename 
system_call_fastpath

DESCRIPTION:
The stack is overflown by 88 bytes in the rename code path. The thread_info structure is disrupted with VxFS page buffer head addresses..

RESOLUTION:
We now use dynamic allocation of local structures in vx_write_alloc3 and vx_int_rename. Thissaves 256 bytes and gives enough room.

* 3617793 (Tracking ID: 3564076)

SYMPTOM:
The MongoDB noSQL db creation fails with an ENOTSUP error. MongoDB uses
posix_fallocate to create a file first. When it writes at offset which is not
aligned with File System block boundary, an ENOTSUP error comes up.

DESCRIPTION:
On a file system with 8k bsize and 4k page size, the application creates a file
using posix_fallocate, and then writes at some offset which is not aligned with
fs block boundary. In this case, the pre-allocated extent is split at the
unaligned offset into two parts for the write. However the alignment requirement
of the split fails the operation.

RESOLUTION:
Split the extent down to block boundary.

* 3617877 (Tracking ID: 3615850)

SYMPTOM:
The write system call writes up to count bytes from the pointed buffer to the file referred to by the file descriptor field:

ssize_t write(int fd, const void *buf, size_t count);

When the count parameter is invalid, sometimes it can cause the write() to hang on VxFS file system. E.g. with a 10000 bytes buffer, but the count is set to 30000 by mistake, then you may encounter such problem.

DESCRIPTION:
On recent linux kernels, you cannot take a page-fault while holding a page locked so as to avoid a deadlock. This means uiomove can copy less than requested, and any partially populated pages created in routine which establish a virtual mapping for the page are destroyed.
This can cause an infinite loop in the write code path when the given user-buffer is not aligned with a page boundary and the length given to write() causes an EFAULT; uiomove() does a partial copy, segmap_release destroys the partially populated pages and unwinds the uio. The operation is then repeated.

RESOLUTION:
The code is modified to move the pre-faulting to the buffered IO write-loops; The system either shortens the length of the copy if all of the requested pages cannot be faulted, or fails with EFAULT if no pages are pre-faulted. This prevents the infinite loop.

* 3620279 (Tracking ID: 3558087)

SYMPTOM:
Run simultaneous dd threads on a mount point and start the ls l command on the same mount point. Then the system hangs.

DESCRIPTION:
When the delayed allocation (dalloc) feature is turned on, the flushing process takes much time. The process keeps the glock held, and needs writers to keep the irwlock held. Thels l command starts stat internally and keeps waiting for irwlock to real ACLs.

RESOLUTION:
Redesign dalloc to keep the glock unlocked while flushing.

* 3620284 (Tracking ID: 3596378)

SYMPTOM:
The copy of a large number of small files is slower on Veritas File System (VxFS) compared to EXT4.

DESCRIPTION:
VxFS implements the fsetxattr() system call in a synchronized way. Hence, before returning to the system call, the VxFS will take some time to flush the data to the disk. In this way, the VxFS guarantees the file system consistency in case of file system crash. However, this implementation has a side-effect that it serializes the whole processing, which takes more time.

RESOLUTION:
The code is modified to change the transaction to flush the data in a delayed way.

* 3620288 (Tracking ID: 3469644)

SYMPTOM:
The system panics in the vx_logbuf_clean() function when it  traverses chain of transactions off the intent log buffer. The stack trace is as follows:

vx_logbuf_clean ()
vx_logadd ()
vx_log()
vx_trancommit()
vx_exh_hashinit ()
vx_dexh_create ()
vx_dexh_init ()
vx_pd_rename ()
vx_rename1_pd()
vx_do_rename ()
vx_rename1 ()
vx_rename ()
vx_rename_skey ()

DESCRIPTION:
The system panics as the vx_logbug_clean() function tries to access an already freed transaction from transaction chain to flush it to log.

RESOLUTION:
The code has been modified to make sure that the transaction gets flushed to the log before it is freed.

* 3621420 (Tracking ID: 3621423)

SYMPTOM:
The Veritas Volume manager (VxVM) caching is disabled or stopped after mounting a file system in a situation where the Veritas File System (VxFS) cache area is not present.

DESCRIPTION:
When the VxFS cache area is not present and the VxVM cache area is present and in ENABLED state, if you mount a file system on any of the volumes, the VxVM caching gets stopped for that volume, which is not an expected behavior.

RESOLUTION:
The code is modified not to disable VxVM caching for any mounted file system if the VxFS cache area is not present.

* 3628867 (Tracking ID: 3595896)

SYMPTOM:
While creating OracleRAC 12.1.0.2 database, the node panics with the following stack:
aio_complete()
vx_naio_do_work()
vx_naio_worker()
vx_kthread_init()

DESCRIPTION:
For a zero size request (with a correctly aligned buffer), Veritas File System (VxFS) wrongly queues the work internally and returns -EIOCBQUEUED. The kernel calls function aio_complete() for this zero size request. However, while VxFS is performing the queued work internally, the aio_complete() function gets called again. The double call of the aio_complete() function results in the panic.

RESOLUTION:
The code is modified so that the zero size requests will not queue elements inside VxFS work queue.

* 3636210 (Tracking ID: 3633067)

SYMPTOM:
While converting from ext3 file system to VxFS using vxfsconvert, it is observed that many inodes are missing.

DESCRIPTION:
When vxfsconvert(1M) is run on an ext3 file system, it misses an entire block group of inodes. This happens because of an incorrect calculation of block group number of a given inode in border case. The inode which is the last inode for a given block group is calculated to have the correct inode offset, but is calculated to be in the next block group. This causes
 the entire next block group to be skipped when the code attempts to find the next consecutive inode.

RESOLUTION:
The code is modified to correct the calculation of block group number.

* 3644006 (Tracking ID: 3451686)

SYMPTOM:
During internal stress testing on cluster file system(CFS), debug
assert is hit due to invalid cache generation count on incore inode.

DESCRIPTION:
Reset of the cache generation count in incore inode used in Disk
Layout Version(DLV) 10 was missed during inode reuse, causing the debug assert.

RESOLUTION:
The code is modified to reset the cache generation count in incore
inode during inode reuse.

* 3645825 (Tracking ID: 3622326)

SYMPTOM:
Filesystem is marked with fullfsck flag as an inode is marked bad
during checkpoint promote

DESCRIPTION:
VxFS incorrectly skipped pushing of data to clone inode due to
which the inode is marked bad during checkpoint promote which intern resulted in
filesystem being marked with fullfsck flag.

RESOLUTION:
Code is modified to push the proper data to clone inode.

Patch ID: 6.1.1.000

* 3370758 (Tracking ID: 3370754)

SYMPTOM:
Internal test with SmartIO write-back SSD cache hit debug asserts.

DESCRIPTION:
The debug asserts are hit due to race condition in various code segments for write-back SSD cache feature.

RESOLUTION:
The code is modified to fix the race conditions.

* 3383149 (Tracking ID: 3383147)

SYMPTOM:
The ACA operator precedence error may occur while turning AoffA delayed
allocation.

DESCRIPTION:
Due to the C operator precedence issue, VxFS evaluates a condition
wrongly.

RESOLUTION:
The code is modified to evaluate the condition correctly.

* 3422580 (Tracking ID: 1949445)

SYMPTOM:
System is unresponsive when files are created on large directory. The following stack is logged:

vxg_grant_sleep()                                             
vxg_cmn_lock()
vxg_api_lock()                                             
vx_glm_lock()
vx_get_ownership()                                                  
vx_exh_coverblk()  
vx_exh_split()                                                 
vx_dexh_setup() 
vx_dexh_create()                                              
vx_dexh_init() 
vx_do_create()

DESCRIPTION:
For large directories, large directory hash (LDH) is enabled to improve the lookup feature. When a system takes ownership of LDH inode twice in same thread context (while building hash for directory), it becomes unresponsive

RESOLUTION:
The code is modified to avoid taking ownership again if we already have the ownership of the LDH inode.

* 3422584 (Tracking ID: 2059611)

SYMPTOM:
The system panics due to a NULL pointer dereference while flushing the
bitmaps to the disk and the following stack trace is displayed:
a|
a|
vx_unlockmap+0x10c
vx_tflush_map+0x51c
vx_fsq_flush+0x504
vx_fsflush_fsq+0x190
vx_workitem_process+0x1c
vx_worklist_process+0x2b0
vx_worklist_thread+0x78

DESCRIPTION:
The vx_unlockmap() function unlocks a map structure of the file
system. If the map is being used, the hold count is incremented. The
vx_unlockmap() function attempts to check whether this is an empty mlink doubly
linked list. The asynchronous vx_mapiodone routine can change the link at random
even though the hold count is zero.

RESOLUTION:
The code is modified to change the evaluation rule inside the
vx_unlockmap() function, so that further evaluation can be skipped over when map
hold count is zero.

* 3422586 (Tracking ID: 2439261)

SYMPTOM:
When the vx_fiostats_tunable is changed from zero to non-zero, the
system panics with the following stack trace:
vx_fiostats_do_update
vx_fiostats_update
vx_read1
vx_rdwr
vno_rw
rwuio
pread

DESCRIPTION:
When vx_fiostats_tunable is changed from zero to non-zero, all the
incore-inode fiostats attributes are set to NULL. When these attributes are
accessed, the system panics due to the NULL pointer dereference.

RESOLUTION:
The code has been modified to check the file I/O stat attributes are
present before dereferencing the pointers.

* 3422604 (Tracking ID: 3092114)

SYMPTOM:
The information output by the "df -i" command can often be inaccurate for 
cluster mounted file systems.

DESCRIPTION:
In Cluster File System 5.0 release a concept of delegating metadata to nodes in 
the cluster is introduced. This delegation of metadata allows CFS secondary 
nodes to update metadata without having to ask the CFS primary to do it. This 
provides greater node scalability. 
However, the "df -i" information is still collected by the CFS primary 
regardless of which node (primary or secondary) the "df -i" command is executed 
on.

For inodes the granularity of each delegation is an Inode Allocation Unit 
[IAU], thus IAUs can be delegated to nodes in the cluster.
When using a VxFS 1Kb file system block size each IAU will represent 8192 
inodes.
When using a VxFS 2Kb file system block size each IAU will represent 16384 
inodes.
When using a VxFS 4Kb file system block size each IAU will represent 32768 
inodes.
When using a VxFS 8Kb file system block size each IAU will represent 65536 
inodes.
Each IAU contains a bitmap that determines whether each inode it represents is 
either allocated or free, the IAU also contains a summary count of the number 
of inodes that are currently free in the IAU.
The ""df -i" information can be considered as a simple sum of all the IAU 
summary counts.
Using a 1Kb block size IAU-0 will represent inodes numbers      0 -  8191
Using a 1Kb block size IAU-1 will represent inodes numbers   8192 - 16383
Using a 1Kb block size IAU-2 will represent inodes numbers  16384 - 32768
etc.
The inaccurate "df -i" count occurs because the CFS primary has no visibility 
of the current IAU summary information for IAU that are delegated to Secondary 
nodes.
Therefore the number of allocated inodes within an IAU that is currently 
delegated to a CFS Secondary node is not known to the CFS Primary.  As a 
result, the "df -i" count information for the currently delegated IAUs is 
collected from the Primary's copy of the IAU summaries. Since the Primary's 
copy of the IAU is stale, therefore the "df -i" count is only accurate when no 
IAUs are currently delegated to CFS secondary nodes.
In other words - the IAUs currently delegated to CFS secondary nodes will cause 
the "df -i" count to be inaccurate.
Once an IAU is delegated to a node it can "timeout" after a 3 minutes  of 
inactivity. However, not all IAU delegations will timeout. One IAU will always 
remain delegated to each node for performance reasons. Also an IAU whose inodes 
are all allocated (so no free inodes remain in the IAU) it would not timeout 
either.
The issue can be best summarized as:
The more IAUs that remain delegated to CFS secondary nodes, the greater the 
inaccuracy of the "df -i" count.

RESOLUTION:
Allow the delegations for IAU's whose inodes are all allocated (so no free 
inodes in the IAU) to "timeout" after 3 minutes of inactivity.

* 3422614 (Tracking ID: 3297840)

SYMPTOM:
A metadata corruption is found during the file removal process with the inode block count getting negative.

DESCRIPTION:
When the user removes or truncates a file having the shared indirect blocks, there can be an instance where the block count can be updated to reflect the removal of the shared indirect blocks when the blocks are not removed from the file. The next iteration of the loop updates the block count again while removing these blocks. This will eventually lead to the block count being a negative value after all the blocks are removed from the file. The removal code expects the block count to be zero before updating the rest of the metadata.

RESOLUTION:
The code is modified to update the block count and other tracking metadata in the same transaction as the blocks are removed from the file.

* 3422619 (Tracking ID: 3294074)

SYMPTOM:
System call fsetxattr() is slower on Veritas File System (VxFS) than ext3 file system.

DESCRIPTION:
VxFS implements the fsetxattr() system call in a synchronized sync way.  Hence, it will take some time to flush the data to the disk before returning to the system call to guarantee file system consistency in case of file system crash.

RESOLUTION:
The code is modified to allow the transaction to flush the data in a delayed way.

* 3422624 (Tracking ID: 3352883)

SYMPTOM:
During the rename operation, lots of nfsd threads waiting for mutex operation hang with the following stack trace :
vxg_svar_sleep_unlock 
vxg_get_block
vxg_api_initlock  
vx_glm_init_blocklock
vx_cbuf_lookup  
vx_getblk_clust 
vx_getblk_cmn 
vx_getblk
vx_fshdchange
vx_unlinkclones
vx_clone_dispose
vx_workitem_process
vx_worklist_process
vx_worklist_thread
vx_kthread_init


vxg_svar_sleep_unlock 
vxg_grant_sleep 
vxg_cmn_lock 
vxg_api_trylock 
vx_glm_trylock 
vx_glmref_trylock 
vx_mayfrzlock_try 
vx_walk_fslist 
vx_log_sync 
vx_workitem_process 
vx_worklist_process 
vx_worklist_thread 
vx_kthread_init

DESCRIPTION:
A race condition is observed between the NFS rename and additional dentry alias created by the current vx_splice_alias()function. 
This race condition causes two different directory dentries pointing to the same inode, which results in mutex deadlock in lock_rename()function.

RESOLUTION:
The code is modified to change the vx_splice_alias()function to prevent the creation of additional dentry alias.

* 3422626 (Tracking ID: 3332902)

SYMPTOM:
The system running the fsclustadm(1M) command panics while shutting down. The 
following stack trace is logged along with the panic:

machine_kexec
crash_kexec
oops_end
page_fault [exception RIP: vx_glm_unlock]
vx_cfs_frlpause_leave [vxfs]
vx_cfsaioctl [vxfs]
vxportalkioctl [vxportal]
vfs_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath

DESCRIPTION:
There exists a race-condition between "fsclustadm(1M) cfsdeinit"
and "fsclustadm(1M) frlpause_disable". The "fsclustadm(1M) cfsdeinit" fails
after cleaning the Group Lock Manager (GLM), without downgrading the CFS state.
Under the false CFS state, the "fsclustadm(1M) frlpause_disable" command enters
and accesses the GLM lock, which "fsclustadm(1M) cfsdeinit" frees resulting in a
panic.

Another race condition exists between the code in vx_cfs_deinit() and the code 
in
fsck, and it leads to the situation that although fsck has a reservation
held, but this couldn't prevent vx_cfs_deinit() from freeing vx_cvmres_list
because there is no such a check for vx_cfs_keepcount.

RESOLUTION:
The code is modified to add appropriate checks in the "fsclustadm(1M) 
cfsdeinit" and "fsclustadm(1M) frlpause_disable" to avoid the race-condition.

* 3422629 (Tracking ID: 3335272)

SYMPTOM:
The mkfs (make file system) command dumps core when the log size 
provided is not aligned. The following stack trace is displayed:

(gdb) bt
#0  find_space ()
#1  place_extents ()
#2  fill_fset ()
#3  main ()
(gdb)

DESCRIPTION:
While creating the VxFS file system using the mkfs command, if the 
log size provided is not aligned properly, you may end up in doing 
miscalculations for placing the RCQ extents and finding no place. This leads to 
illegal memory access of AU bitmap and results in core dump.

RESOLUTION:
The code is modified to place the RCQ extents in the same AU where 
log extents are allocated.

* 3422634 (Tracking ID: 3337806)

SYMPTOM:
On linux kernels greater than 3.0 find(1) command, the kernel may panic
in the link_path_walk() function with the following stack trace:

do_page_fault
page_fault
link_path_walk
path_lookupat
do_path_lookup
user_path_at_empty
vfs_fstatat
sys_newfstatat
system_call_fastpath

DESCRIPTION:
VxFS overloads a bit of the dentry flag at 0x1000 for internal
usage. Linux didn't use this bit until kernel version 3.0 onwards. Therefore it
is possible that both Linux and VxFS strive for this bit, which panics the kernel.

RESOLUTION:
The code is modified not to use 0x1000 bit in the dentry flag .

* 3422636 (Tracking ID: 3340286)

SYMPTOM:
The tunable setting of dalloc_enable gets reset to a default value
after a file system is resized.

DESCRIPTION:
The file system resize operation triggers the file system re-initialization
process. 
During this process, the tunable value of dalloc_enable gets reset to the
default value instead of retaining the old tunable value.

RESOLUTION:
The code is fixed such that the old tunable value of dalloc_enable is retained.

* 3422638 (Tracking ID: 3352059)

SYMPTOM:
Due to memory leak, high memory usage occurs with vxfsrepld on target when no jobs are running.

DESCRIPTION:
On the target side, high memory usage may occur even when
there are no jobs running because the memory allocated for some structures is not freed for every job iteration.

RESOLUTION:
The code is modified to resolve the memory leaks.

* 3422649 (Tracking ID: 3394803)

SYMPTOM:
The vxupgrade(1M) command causes VxFS to panic with the following stack trace:
panic_save_regs_switchstack()
panic
bad_kern_reference()
$cold_pfault()
vm_hndlr()
bubbleup()
vx_fs_upgrade()
vx_upgrade()
$cold_vx_aioctl_common()
vx_aioctl()
vx_ioctl()
vno_ioctl()
ioctl()
syscall()

DESCRIPTION:
The panic is caused due to de_referencing the operator in the NULL device (one
of the devices in the DEVLIST is showing as a NULL device).

RESOLUTION:
The code is modified to skip the NULL devices when the device in EVLIST is
processed.

* 3422657 (Tracking ID: 3412667)

SYMPTOM:
On RHEL 6, the inode update operation may create deep stack and cause system panic  due to stack overflow. Below is the stack trace:
dequeue_entity()
dequeue_task_fair()
dequeue_task()
deactivate_task()
thread_return()
io_schedule()
get_request_wait()
blk_queue_bio()
generic_make_request()
submit_bio()
vx_dev_strategy()
vx_bc_bwrite()
vx_bc_do_bawrite()
vx_bc_bawrite()
 vx_bwrite()
vx_async_iupdat()
vx_iupdat_local()
vx_iupdat_clustblks()
vx_iupdat_local()
vx_iupdat()
vx_iupdat_tran()
vx_tflush_inode()
vx_fsq_flush()
vx_tranflush()
vx_traninit()
vx_get_alloc()
vx_tran_get_alloc()
vx_alloc_getpage()
vx_do_getpage()
vx_internal_alloc()
vx_write_alloc()
vx_write1()
vx_write_common_slow()
vx_write_common()
vx_vop_write()
vx_writev()
vx_naio_write_v2()
do_sync_readv_writev()
do_readv_writev()
vfs_writev()
nfsd_vfs_write()
nfsd_write()
nfsd3_proc_write()
nfsd_dispatch()
svc_process_common()
svc_process()
nfsd()
kthread()
kernel_thread()

DESCRIPTION:
Some VxFS operation may need inode update. This may create very deep stack and cause system panic due to stack overflow.

RESOLUTION:
The code is modified to add a handoff point in the inode update function. If the stack usage reaches a threshold, it will start a separate thread to do the work to limit stack usage.

* 3430467 (Tracking ID: 3430461)

SYMPTOM:
The nested unmounts as well as the force unmounts fail if, the parent file system is disabled which further inhibits the unmounting of the child file system.

DESCRIPTION:
If a file system is mounted inside another vxfs mount, and if the parent file system gets disabled, then it is not possible to sanely unmount the child even with the force unmounts. This issue is observed because a disabled file system does not allow directory look up on it. On Linux, a file system can be unmounted only by providing the path of the mount point.

RESOLUTION:
The code is modified to allow the exceptional path look for unmounts. These are read only operations and hence are safer. This makes it possible for the unmount of child file system to proceed.

* 3436431 (Tracking ID: 3434811)

SYMPTOM:
In VxFS 6.1, the vxfsconvert(1M) command hangs within the vxfsl3_getext()
Function with following stack trace:

search_type()
bmap_typ()
vxfsl3_typext()
vxfsl3_getext()
ext_convert()
fset_convert()
convert()

DESCRIPTION:
There is a type casting problem for extent size. It may cause a non-zero value to overflow and turn into zero by mistake. This further leads to infinite looping inside the function.

RESOLUTION:
The code is modified to remove the intermediate variable and avoid type casting.

* 3436433 (Tracking ID: 3349651)

SYMPTOM:
Veritas File System (VxFS) modules fail to load on RHEL6.5 and display the following error message:
kernel: vxfs: disagrees about version of symbol putname
kernel: vxfs: disagrees about version of symbol getname

DESCRIPTION:
In RHEL6.5, the kernel interfaces for getname() and putname() functions used by VxFS have changed.

RESOLUTION:
The code is modified to use the latest kernel interfaces definitions for getname() and putname()functions.

* 3494534 (Tracking ID: 3402618)

SYMPTOM:
The mmap read performance on VxFS is slow.

DESCRIPTION:
The mmap read performance on VxFS is not good, because the read ahead operation is not triggered while the mmap reads is executed.

RESOLUTION:
An enhancement has been made to the read ahead operation. It helps improve the mmap read performance.

* 3502847 (Tracking ID: 3471245)

SYMPTOM:
The Mongodb fails to insert any record because lseek fails to seek to the EOF.

DESCRIPTION:
Fallocate doesn't update the inode's i_size on linux, which causes lseek unable to seek to the EOF.

RESOLUTION:
Before returning from the vx_fallocate() function, call the vx_getattr()function to update the Linux inode with the VxFS inode.

* 3504362 (Tracking ID: 3472551)

SYMPTOM:
The attribute validation (pass 1d) of full fsck takes too much time to complete.

DESCRIPTION:
The current implementation of full fsck Pass 1d (attribute inode validation) is single threaded. This causes slow full fsck performance on large file system, especially the ones having large number of attribute inodes.

RESOLUTION:
The Pass 1d is modified to work in parallel using multiple threads, which enables full fsck to process the attribute inode
validation faster.

* 3506487 (Tracking ID: 3506485)

SYMPTOM:
The system does not allow write-back caching with VVR.

DESCRIPTION:
If the volume or vset is a part of a RVG (Replicated Volume Group) on which the file system is mounted with the write-back feature, then the mount operation should succeed without enabling the write-back feature to maintain write order fidelity.
Similarly, if the write-back feature is enabled on the file system, then an attempt to add that volume or vset to RVG should fail.

RESOLUTION:
The code is modified to add the required limitation.

* 3512292 (Tracking ID: 3348520)

SYMPTOM:
In a Cluster File System (CFS) cluster having multi volume file system of a smaller size, execution of the fsadm command causes system hang if the free space in the file system is low. The following stack trace is displayed:
 
vx_svar_sleep_unlock()
vx_extentalloc_device() 
vx_extentalloc()
vx_reorg_emap()
vx_extmap_reorg()
vx_reorg()
vx_aioctl_full()
vx_aioctl_common()
vx_aioctl()
vx_unlocked_ioctl()
vfs_ioctl()
do_vfs_ioctl()
sys_ioctl()
tracesys()

And 

vxg_svar_sleep_unlock() 
vxg_grant_sleep()
vxg_api_lock()
vx_glm_lock()
vx_cbuf_lock()
vx_getblk_clust()
vx_getblk_cmn()
vx_getblk()
vx_getmap()
vx_getemap()
vx_extfind()
vx_searchau_downlevel() 
vx_searchau_uplevel()
vx_searchau()
vx_extentalloc_device() 
vx_extentalloc()
vx_reorg_emap()
vx_extmap_reorg()
vx_reorg()
vx_aioctl_full()
vx_aioctl_common()
vx_aioctl()
vx_unlocked_ioctl()
vfs_ioctl()
do_vfs_ioctl()
sys_ioctl()
tracesys()

DESCRIPTION:
While performing the fsadm operation, the secondary node in the CFS cluster is unable to allocate space from EAU (Extent Allocation Unit)
delegation given by the primary node. It requests the primary node for another delegation.
While giving such delegations, the primary node does not verify whether the EAU has exclusion zones set on it. It only verifies if it has enough free space.
On secondary node, the extent allocation cannot be done from EAU which has exclusion zone set, resulting in loop.

RESOLUTION:
The code is modified such that the primary node will not delegate EAU to the secondary node which have exclusion zone set on it.

* 3518943 (Tracking ID: 3534779)

SYMPTOM:
Internal stress testing on Cluster File System (CFS) hits a debug assert.

DESCRIPTION:
The assert was hit while refreshing the incore reference count
queue (rcq) values from the disk in response to a loadfs message. Due to which,
a race occurs with a rcq processing thread that has already advanced the incore
rcq indexes on a primary node in CFS.

RESOLUTION:
The code is modified to avoid selective updates in incore rcq.

* 3519809 (Tracking ID: 3463464)

SYMPTOM:
Internal kernel functionality conformance test hits a kernel panic due to null pointer dereference.

DESCRIPTION:
In the vx_fsadm_query()function, error handling code path incorrectly sets the nodeid to AnullA in the file system structure. As a result of clearing nodeid, any subsequent access to this field results in the kernel panic.

RESOLUTION:
The code is modified to improve the error handling code path.

* 3522003 (Tracking ID: 3523316)

SYMPTOM:
The writeback cache feature does not work for write size of 2MB.

DESCRIPTION:
In vx_wb_possible()function, the condition for checking of write size compatibility with write back caching skips the write request of 2MB from caching.

RESOLUTION:
The code is modified such that the conditions for checking compatibility of write size in vx_wb_possible() function allows write request of 2 MB for caching.

* 3528770 (Tracking ID: 3449152)

SYMPTOM:
The vxtunefs(1M) command fails to set the thin_friendly_alloc tunable in CFS.

DESCRIPTION:
The thin_friendly_alloc tunable is not supported on CFS. But when the vxtunefs(1M) command is used to set it in CFS, a false successful message is displayed.

RESOLUTION:
The code is modified to report error for the attempt to set the thin_friendly_alloc tunable in CFS.

* 3529852 (Tracking ID: 3463717)

SYMPTOM:
CFS does not support the 'thin_friendly_alloc' tunable. And, the vxtunefs(1M) command  man page is not updated with this information.

DESCRIPTION:
Since the man page does not explicitly mention that the 'thin_friendly_alloc' tunable is not supported, it is assumed that CFS supports this feature.

RESOLUTION:
The man page pertinent to the vxtunefs(1M) command  is updated to denote that CFS does not support the  'thin_friendly_alloc' tunable.

* 3530038 (Tracking ID: 3417321)

SYMPTOM:
The vxtunefs(1M) man page gives an incorrect

DESCRIPTION:
According to the current design, the tunable Adelicache_enableA is
enabled by default both in case of local mount and cluster mount. But, the man
page is not updated accordingly. It still specifies that this tunable is enabled
by default only in case of a local mount. The man page needs to be updated to
correct the

RESOLUTION:
The code is modified to update the man page of the vxtunefs(1m) tunable to
display the correct contents for the Adelicache_enableA tunable. Additional
information is provided with respect to the performance benefits, in case of CFS
being limited as compared to the local mount.
Also, in case of CFS, unlike the other CFS tunable parameters, there is a need
to explicitly turn this tunable on or off on each node.

* 3541125 (Tracking ID: 3541083)

SYMPTOM:
The vxupgrade(1M) command for layout version 10 creates 64-bit quota
files with inappropriate permission configurations.

DESCRIPTION:
Layout version 10 supports 64-bit quota feature. Thus, while
upgrading to version 10, 32-bit external quota files are converted to 64-bit.
During this conversion process, 64-bit files are created without specifying any
permission. Hence, random permissions are assigned to the 64-bit file, which
creates an impression that the conversion process was not successful as expected.

RESOLUTION:
The code is modified such that appropriate permissions are provided
while creating 64-bit quota files.

Patch ID: 6.1.0.200

* 3424575 (Tracking ID: 3349651)

SYMPTOM:
Veritas File System (VxFS) modules fail to load on RHEL6.5 and display the following error message:
kernel: vxfs: disagrees about version of symbol putname
kernel: vxfs: disagrees about version of symbol getname

DESCRIPTION:
In RHEL6.5, the kernel interfaces for getname() and putname() functions used by VxFS have changed.

RESOLUTION:
The code is modified to use the latest kernel interfaces definitions for getname() and putname()functions.

Patch ID: 6.1.0.100

* 3418489 (Tracking ID: 3370720)

SYMPTOM:
I/OAs pause periodically and this result in performance degradation. No explicit error is seen.

DESCRIPTION:
To avoid the kernel stack overflow, the work which consumes a large amount of stack is not done in the context of the original thread. Instead, such work items are added to a high priority work queue to be processed by a set of worker threads. If all the worker threads are busy, then there is an issue wherein the processing of the newly added work items in the work queue is subjected to an additional delay which in turn results in periodic stalls.

RESOLUTION:
The code is modified such that the high priority work items are processed by a set of dedicated worker threads. These dedicated threads do not have an issue when all the threads are busy and hence do not trigger periodic stalls.


INSTALLING THE PATCH
--------------------
Run the Installer script to automatically install the patch:
-----------------------------------------------------------
To install the patch perform the following steps on at least one node in the cluster:
1. Copy the patch fs-rhel5_x86_64-Patch-6.1.1.400.tar.gz to /tmp
2. Untar fs-rhel5_x86_64-Patch-6.1.1.400.tar.gz to /tmp/hf
    # mkdir /tmp/hf
    # cd /tmp/hf
    # gunzip /tmp/fs-rhel5_x86_64-Patch-6.1.1.400.tar.gz
    # tar xf /tmp/fs-rhel5_x86_64-Patch-6.1.1.400.tar
3. Install the hotfix
    # pwd /tmp/hf
    # ./installVRTSvxfs611P400 [<host1> <host2>...]

You can also install this patch together with 6.1.1 maintenance release using Install Bundles
1. Download this patch and extract it to a directory
2. Change to the Veritas InfoScale 6.1.1 directory and invoke the installmr script
   with -patch_path option where -patch_path should point to the patch directory
    # ./installmr -patch_path [<path to this patch>] [<host1> <host2>...]

Install the patch manually:
--------------------------
#rpm -Uvh VRTSvxfs-6.1.1.400-RHEL5.x86_64.rpm


REMOVING THE PATCH
------------------
#rpm -e rpm_name


SPECIAL INSTRUCTIONS
--------------------
NONE


OTHERS
------
NONE