File System on HP-UX, patch detail

To use SORT, JavaScript must be enabled. How to enable JavaScript.
fs-hpux1131-6.0.5.100 Obsolete Go to Download Center to download. The latest patch(es) : fs-hpux1131-Patch-6.0.5.500
Basic information
Release type:	Patch
Release date:	2014-09-11
OS update support:	None
Technote:	None
Documentation:	None
Popularity:	1427 viewed downloaded
Download size:	76.08 MB
Checksum:	3033086185
Applies to one or more of the following products:
Storage Foundation 6.0.1 On HP-UX 11i v3 (11.31)
Storage Foundation Cluster File System 6.0.1 On HP-UX 11i v3 (11.31)
Storage Foundation for Oracle RAC 6.0.1 On HP-UX 11i v3 (11.31)
Storage Foundation HA 6.0.1 On HP-UX 11i v3 (11.31)
Obsolete patches, incompatibilities, superseded patches, or other requirements:
This patch is obsolete. It is superseded by:	Release date
fs-hpux1131-Patch-6.0.5.500	2017-01-24
fs-hpux1131-Patch-6.0.5.400 (obsolete)	2016-03-03
This patch requires:	Release date
sfha-hpux1131-6.0.5	2014-04-15
Fixes the following incidents:
2705336, 2912412, 2923805, 2933290, 2933291, 2933292, 2933294, 2933295, 2933296, 2933301, 2933309, 2933313, 2933337, 2933571, 2933751, 2933822, 2937367, 2947029, 2959557, 2976664, 2978227, 2978234, 2978236, 2982161, 2984589, 2987373, 2999566, 2999582, 3027250, 3056103, 3059000, 3108176, 3131798, 3131801, 3131806, 3131826, 3131889, 3131896, 3131924, 3131955, 3248029, 3248042, 3248046, 3248051, 3248054, 3248089, 3248090, 3248094, 3248096, 3248099, 3284764, 3296988, 3299685, 3306410, 3306442, 3310758, 3321730, 3323912, 3338024, 3338026, 3338030, 3338057, 3338060, 3338063, 3338749, 3338755, 3338762, 3338766, 3338768, 3338774, 3338776, 3338779, 3338780, 3338787, 3338790, 3339230, 3339232, 3339884, 3340029, 3351946, 3351947, 3359278, 3364285, 3364289, 3364302, 3364307, 3364317, 3364333, 3364335, 3364338, 3364349, 3370650, 3372909, 3380905, 3387358, 3396539, 3402484, 3405172, 3409617, 3430687, 3436393, 3469683, 3498976, 3498978, 3498981, 3498983, 3498998, 3499005, 3499008, 3499011, 3499030, 3514824, 3515559, 3517702, 3579957, 3581566, 3584297, 3590573, 3597560
Patch ID:
PVCO_04036
PVKL_04037
Readme file
                          * * * READ ME * * *
                 * * * Veritas File System 6.0.5 * * *
                      * * * Patch 6.0.5.100 * * *
                         Patch Date: 2014-09-05


This document provides the following information:

   * PATCH NAME
   * OPERATING SYSTEMS SUPPORTED BY THE PATCH
   * PACKAGES AFFECTED BY THE PATCH
   * BASE PRODUCT VERSIONS FOR THE PATCH
   * SUMMARY OF INCIDENTS FIXED BY THE PATCH
   * DETAILS OF INCIDENTS FIXED BY THE PATCH
   * INSTALLATION PRE-REQUISITES
   * INSTALLING THE PATCH
   * REMOVING THE PATCH


PATCH NAME
----------
Veritas File System 6.0.5 Patch 6.0.5.100


OPERATING SYSTEMS SUPPORTED BY THE PATCH
----------------------------------------
HP-UX 11i v3 (11.31)


PACKAGES AFFECTED BY THE PATCH
------------------------------
VRTSvxfs
VRTSvxfs


BASE PRODUCT VERSIONS FOR THE PATCH
-----------------------------------
   * Veritas Storage Foundation 6.0.1
   * Veritas Storage Foundation Cluster File System HA 6.0.1
   * Veritas Storage Foundation for Oracle RAC 6.0.1
   * Veritas Storage Foundation HA 6.0.1


SUMMARY OF INCIDENTS FIXED BY THE PATCH
---------------------------------------
Patch ID: PVKL_04037, PVCO_04036
* 3469683 (3469681) File system is disabled while free space defragmentation is going on.
* 3498976 (3434811) The vxfsconvert(1M) in VxFS 6.1 hangs.
* 3498978 (3424564) fsppadm fails with ENODEV and "file is encrypted or is not a 
database" errors
* 3498981 (3433777) A single CPU machine panics due to the safety-timer check when the inodes are re-tuned.
* 3498983 (3410532) The VxFS file system hangs due to a self-deadlock.
* 3498998 (3466020) File System is corrupted with error message "vx_direrr: vx_dexh_keycheck_1"
* 3499005 (3469644) System panics in the vx_logbuf_clean() function.
* 3499008 (3484336) The fidtovp() system call can panic in the vx_itryhold_locked () function.
* 3499011 (3486726) VFR logs too much data on the target node.
* 3499030 (3484353) The file system may hang with a partitioned directory feature enabled.
* 3514824 (3443430) Fsck allocates too much memory.
* 3515559 (3498048) while the system is making backup, the Als AlA command on the same file system may hang.
* 3517702 (3517699) Return code 240 for command fsfreeze(1M) is not documented in man page for fsfreeze.
* 3579957 (3233315) "fsck" utility dumps core, with full scan.
* 3581566 (3560968) The delicache_enable tunable is not persistent in the Cluster File System (CFS) environment.
* 3584297 (3583930) While external quota file is restored or over-written, old quota records are preserved.
* 3590573 (3331010) Command fsck(1M) dumped core with segmentation fault
* 3597560 (3597482) The pwrite(2) function fails with the EOPNOTSUPP error.
Patch ID: PVKL_04003
* 2705336 (2059611) The system panics due to a NULL pointer dereference while
flushing bitmaps to the disk.
* 2933290 (2756779) The code is modified to improve the fix for the read and write performance
concerns on Cluster File System (CFS) when it runs applications that rely on
the POSIX file-record using the fcntl lock.
* 2933301 (2908391) It takes a long time to remove checkpoints from the VxFS file system, when there
are a large number of files present.
* 2947029 (2926684) In rare cases, the system may panic while performing a logged write.
* 2959557 (2834192) You are unable to mount the file system after the full fsck(1M) utility is run.
* 2978234 (2972183) The fsppadm(1M) enforce command takes a long time on the secondary nodes
compared to the primary nodes.
* 2978236 (2977828) The file system is marked bad after an inode table overflow
error.
* 2982161 (2982157) During internal testing, the Af:vx_trancommit:4A debug asset was hit when the available transaction space is lesser than required.
* 2999566 (2999560) The 'fsvoladm'(1M) command fails to clear the 'metadataok' flag on a volume.
* 3027250 (3031901) The 'vxtunefs(1M)' command accepts the garbage value for the 'max_buf_dat_size' tunable.
* 3056103 (3197901) prevent duplicate symbol in VxFS libvxfspriv.a and 
vxfspriv.so
* 3059000 (3046983) Invalid CFS node number in ".__fsppadm_fclextract", causes the DST policy 
enforcement failure.
* 3108176 (2667658) The 'fscdsconv endian' conversion operation fails because of a macro overflow.
* 3131798 (2839871) On a system with DELICACHE enabled, several file system operations may
hang.
* 3131801 (2850738) The system may hang in the low memory condition.
* 3131806 (2964018) On a high end machine with about 125 CPUs, the operations using the lstat64(2) 
function, may hang.
* 3131826 (2966277) Systems with high file system activity like read/write/open/lookup may panic 
the system.
* 3131889 (3010444) On a NFS filesystem cksum(1m) fails with the "cksum: read error on <filename>: 
Bad address" error.
* 3131896 (3031869) "vxfsstat -b" does not print correct information on maximum
buffer size.
* 3131924 (3049408) When the system is under the file-cache pressure, the find(1) command takes 
time to operate.
* 3131955 (3099638) The vxfs_ifree_timelag(5) tunable when tuned, displays incorrect minimum value.
* 3248029 (2439261) When the vx_fiostats_tunable value is changed from zero to
non-zero, the system panics.
* 3248042 (3072036) Read operations from secondary node in CFS can sometimes fail with the ENXIO 
error code.
* 3248046 (3092114) The information output displayed by the "df -i" command may be inaccurate for 
cluster mounted file systems.
* 3248051 (3121933) The pwrite(2) function fails with the EOPNOTSUPP error.
* 3248054 (3153919) The fsadm (1M) command may hang when the structural file set re-organization is 
in progress.
* 3248089 (3003679) When running the fsppadm(1M) command and removing a file with the named
stream attributes (nattr) at the same time, the file system does not respond.
* 3248090 (2963763) When the thin_friendly_alloc() and deliache_enable() functionality is enabled,
VxFS may enter a deadlock.
* 3248094 (3192985) Checkpoints quota usage on Cluster File System (CFS) can be negative.
* 3248096 (3214816) With the DELICACHE feature enabled, frequent creation and deletion of the inodes
of a user may result in corruption of the user quota file.
* 3248099 (3189562) Oracle daemons get hang with the vx_growfile() kernel function.
* 3284764 (3042485) During internal stress testing, the f:vx_purge_nattr:1 assert fails.
* 3296988 (2977035) A debug assert issue was encountered in vx_dircompact() function while running an internal noise test in the Cluster File System (CFS) environment
* 3299685 (2999493) The file system check validation fails with an error message after a successful full fsck operation during internal testing.
* 3306410 (2495673) Mismatch of concurrent I/O related data in an inode is observed during communication between the nodes in a cluster.
* 3306442 (3312030) The default quota support on Veritas File System (VxFS) version 6.0.4A and later is changed to 32 bit.
* 3310758 (3310755) Internal testing hits a debug assert Avx_rcq_badrecord:9:corruptfsA.
* 3321730 (3214328) A mismatch is observed between the states for the Global Lock Manager (GLM) grant level and the Global Lock Manager (GLM) data in a Cluster File System (CFS) inode.
* 3323912 (3259634) A Cluster File System (CFS) with blocks larger than 4GB may
become corrupt.
* 3338024 (3297840) A metadata corruption is found during the file removal process.
* 3338026 (3331419) System panic because of kernel stack overflow.
* 3338030 (3335272) The mkfs (make file system) command dumps core when the log 
size provided is not aligned.
* 3338057 (2970219) Panic in fcache_as_map+0x70 due to null v_vmdata.
* 3338060 (3228646) NFSv4 server panics in unlock path.
* 3338063 (3332902) While shutting down, the system running the fsclustadm(1M)
command panics.
* 3338749 (2444146) The Oracle Disk Manager read returns EINTR while running unspecified Oracle 
jobs.
* 3338755 (3066116) The system panics due to NULL pointer dereference at vx_worklist_process()
function.
* 3338762 (3096834) Intermittent vx_disable messages are displayed in the system log.
* 3338766 (3150368) vx_writesuper() function causes the system to panic in evfsevol_strategy().
* 3338768 (3157624) The fcntl() system call when used for file share reservations(F_SHARE command) 
can cause a memory leak in Cluster File System (CFS).
* 3338774 (3226462) On a cluster mounted file-system with unequal CPUs, a node may panic while 
doing a lookup operation.
* 3338776 (3224101) After you enable the optimization for updating the i_size across the cluster
nodes lazily, the system panics.
* 3338779 (3252983) On a high-end system greater than or equal to 48 CPUs, some file system operations may hang.
* 3338780 (3253210) File system hangs when it reaches the space limitation.
* 3338787 (3261462) File system with size greater than 16TB corrupts with vx_mapbad messages in the system log.
* 3338790 (3233284) FSCK binary hangs while checking Reference Count Table (RCT.
* 3339230 (3308673) A fragmented file system is disabled when delayed allocations
feature is enabled.
* 3339232 (2646933) VxFS takes long time to process the large sequential writes.
* 3339884 (1949445) System is unresponsive when files were created on large directory.
* 3340029 (3298041) With the delayed allocation feature enabled on a locally 
mounted file system, observable performance degradation might be experienced 
when writing to a file and extending the file size.
* 3351946 (3194635) The internal stress test on a locally mounted file system exited with an error message.
* 3351947 (3164418) Internal stress test on locally mounted VxFS filesytem results in data corruption in no space on device scenario while doing spilt on Zero Fill-On-Demand(ZFOD) extent
* 3359278 (3364290) The kernel may panic in Veritas File System (VxFS) when it is
internally working on reference count queue (RCQ) record.
* 3364285 (3364282) The fsck(1M) command  fails to correct inode list file
* 3364289 (3364287) Debug assert may be hit in the vx_real_unshare() function in the cluster environment.
* 3364302 (3364301) Assert failure because of improper handling of inode lock while truncating a reorg inode.
* 3364307 (3364306) Stack overflow seen in extent allocation code path.
* 3364317 (3364312) The fsadm(1M) command is unresponsive while processing the VX_FSADM_REORGLK_MSG message.
* 3364333 (3312897) System can hang when the Cluster File System (CFS) primary node is disabled.
* 3364335 (3331109) The full fsck does not repair the corrupted reference count queue (RCQ) record.
* 3364338 (3331045) Kernel Oops in unlock code of map while referring freed mlink due to a race with iodone routine for delayed writes.
* 3364349 (3359200) Internal test on Veritas File System (VxFS) fsdedup(1M) feature in cluster file system environment results in
a hang.
* 3370650 (2735912) The performance of tier relocation using the fsppadm(1M)
enforce command degrades while migrating a large number of files.
* 3372909 (3274592) Internal noise test on cluster file system is unresponsive while executing the fsadm(1M) command
* 3380905 (3291635) Internal testing found debug assert Avx_freeze_block_threads_all:7cA on locally mounted file systems while processing preambles for transactions.
* 3387358 (3349634) Assert failure if tried to write on file snapped allocated 
HOLE.
* 3396539 (3331093) Issue with MountAgent Process for vxfs. While doing repeated
switchover on HP-UX, MountAgent got stuck.
* 3402484 (3394803) A panic is observed in VxFS routine vx_upgrade7() function
while running the vxupgrade command(1M).
* 3405172 (3436699) An assert failure occurs because of a race condition between clone mount thread and directory removal thread while pushing data on clone.
* 3409617 (3369049) File system may hang with partitioned directory enabled (PD).
* 3430687 (3444775) Internal noise testing on cluster file system results in a kernel panic in function vx_fsadm_query() with an error message.
* 3436393 (3462694) The fsdedupadm(1M) command fails with error code 9 when it
tries to mount checkpoints on a cluster.
Patch ID: PVKL_03971
* 2933290 (2756779) The code is modified to improve the fix for the read and write performance
concerns on Cluster File System (CFS) when it runs applications that rely on
the POSIX file-record using the fcntl lock.
* 2933291 (2806466) A reclaim operation on a file system that is mounted on a
Logical Volume Manager (LVM) may panic the system.
* 2933292 (2895743) Accessing named attributes for some files stored in CFS seems to be slow.
* 2933294 (2750860) Performance of the write operation with small request size
may degrade on a large file system.
* 2933295 (2730759) The sequential read performance is poor because of the read-ahead issues.
* 2933296 (2923105) Removal of the VxFS module from the kernel takes a longer time.
* 2933309 (2858683) Reserve extent attributes changed after vxrestore, for files greater than 
8192bytes.
* 2933313 (2841059) full fsck fails to clear the corruption in attribute inode 15
* 2933337 (2616622) The performance of the mmap() function is slow when the file system block size is 8KB and the page size is 4KB.
* 2933571 (2417858) VxFS quotas do not support 64 bit limits.
* 2933751 (2916691) Customer experiencing hangs when doing dedups
* 2933822 (2624262) Filestore:Dedup:fsdedup.bin hit oops at vx_bc_do_brelse
* 2937367 (2923867) Internal test hits an assert "f:xted_set_msg_pri1:1".
* 2976664 (2906018) The vx_iread errors are displayed after successful log replay and mount of the 
file system.
* 2978227 (2857751) The internal testing hits the assert "f:vx_cbdnlc_enter:1a".
* 2984589 (2977697) A core dump is generated while you are removing the clone.
* 2987373 (2881211) File ACLs not preserved in checkpoints properly if file has hardlink.
* 2999582 (2850730) LM Conformance hit an assert "f:vx_do_getpage:6b, 3" and panics.
Patch ID: PVKL_03964
* 2912412 (2857629) File system corruption can occur requiring a full fsck of the 
system.
* 2923805 (2590918) Delay in freeing unshared extents upon primary switch over.


DETAILS OF INCIDENTS FIXED BY THE PATCH
---------------------------------------
This patch fixes the following Symantec incidents:

Patch ID: PVKL_04037, PVCO_04036

* 3469683 (Tracking ID: 3469681)

SYMPTOM:
Free space defragmentation results into EBUSY error and file system is disabled.

DESCRIPTION:
While remounting the file system, the re-initialization gives EBUSY error if the  in-core and on-disk version numbers of an inode does not match. When pushing data blocks to the clone, the inode version of the immediate clone inode is bumped. But if there is another clone in the chain, then the ILIST extent of this immediate clone inode is not pushed onto that clone. This is not right because the inode has been modified.

RESOLUTION:
The code is modified so that the ILIST extents of the immediate clone inode is pushed onto the next clone in chain.

* 3498976 (Tracking ID: 3434811)

SYMPTOM:
In VxFS 6.1, the vxfsconvert(1M) command hangs within the vxfsl3_getext()
Function with following stack trace:

search_type()
bmap_typ()
vxfsl3_typext()
vxfsl3_getext()
ext_convert()
fset_convert()
convert()

DESCRIPTION:
There is a type casting problem for extent size. It may cause a non-zero value to overflow and turn into zero by mistake. This further leads to infinite looping inside the function.

RESOLUTION:
The code is modified to remove the intermediate variable and avoid type casting.

* 3498978 (Tracking ID: 3424564)

SYMPTOM:
fsppadm fails with ENODEV and "file is encrypted or is not a database" 
errors

DESCRIPTION:
The error handler was missing for ENODEV, while we process only the 
directory inodes and the database got corrupted for 2nd error.

RESOLUTION:
Added a error handler to ignore the ENODEV while processing 
directory inode only and for database corruption: we added a log message to 
capture all the db logs to understand/know why corruption happened.

* 3498981 (Tracking ID: 3433777)

SYMPTOM:
A single CPU machine panics due to the safety-timer check when the inodes are re-tuned.
The following stack trace is observed:
spinunlock()
vx_ilist_chunkclean()
vx_inode_free_list()
vx_retune_ninode()
vx_do_inode_kmcache_callback()
vx_worklist_thread ()
kthread_daemon_startup ( )

DESCRIPTION:
When the inode cache list is traversed, the vxfsd daemon schedules a "vx_do_inode_kmcache_callbackA which does not free the CPU between the iterations. Thereby, the other threads cannot get access to the CPU. This results in the panic.

RESOLUTION:
The code is modified to use the sched_yield() function for every iteration in Avx_inode_free_listA to free the CPU, so that the other threads get a chance to be scheduled.

* 3498983 (Tracking ID: 3410532)

SYMPTOM:
If the Veritas File System (VxFS) file system is mounted with the "tranflush" mount option, it may hang. The following stack-trace is observed: 

swtch_to_thread()
slpq_swtch_core()
real_sleep()
sleep_one()
vx_rwsleep_lock()
vx_ilock()
vx_tflush_inode()
vx_fsq_flush()
vx_tranflush()
$cold_vx_tranidflush()
vx_exh_hashinit()
vx_dexh_create()
vx_dexh_init()
vx_pd_mkdir()
vx_mkdir1_pd()
vx_do_mkdir()
vx_mkdir1()
vx_mkdir()
vns_create()
vn_create()
mkdir()
syscall()

DESCRIPTION:
If the VxFS file system is mounted with the "tranflush" mount option, it may cause the thread to be holding the ILOCK and waiting for the same. This can lead to a self-deadlock situation which causes the file system to hang.

RESOLUTION:
The code is modified to avoid the self-deadlock situation.

* 3498998 (Tracking ID: 3466020)

SYMPTOM:
File System is corrupted with the following error message in the log:

WARNING: msgcnt 28 mesg 008: V-2-8: vx_direrr: vx_dexh_keycheck_1 - /TraceFile
file system dir inode 3277090 dev/block 0/0 diren
 WARNING: msgcnt 27 mesg 008: V-2-8: vx_direrr: vx_dexh_keycheck_1 - /TraceFile
file system dir inode 3277090 dev/block 0/0 diren
 WARNING: msgcnt 26 mesg 008: V-2-8: vx_direrr: vx_dexh_keycheck_1 - /TraceFile
file system dir inode 3277090 dev/block 0/0 diren
 WARNING: msgcnt 25 mesg 096: V-2-96: vx_setfsflags -
 /dev/vx/dsk/a2fdc_cfs01/trace_lv01 file system fullfsck flag set - vx_direr
 WARNING: msgcnt 24 mesg 008: V-2-8: vx_direrr: vx_dexh_keycheck_1 - /TraceFile
file system dir inode 3277090 dev/block 0/0 diren

DESCRIPTION:
In case error is returned from function vx_dirbread() via function vx_dexh_keycheck1(), the FULLFSCK flag is set on the FS unconditionally. A corrupted LDH can lead to the reading of the wrong block, which results in the setting of FULLFSCK flag. The system doesnAt verify whether it is reading the wrong value due to a corrupted LDH, so that the FULLFSCK flag is set unnecessarily because a corrupted LDH can be fixed online by recreating the hash.

RESOLUTION:
The code is modified such that when a corruption of the LDH is detected, the system removes the Large Directory hash instead of setting FULLFSCK. The Large Directory Hash will then be recreated the next time the directory is modified.

* 3499005 (Tracking ID: 3469644)

SYMPTOM:
System panics in the vx_logbuf_clean() function while traversing chain of transactions off the intent log buffer. The stack trace is as follows:


vx_logbuf_clean ()
vx_logadd ()
vx_log()
vx_trancommit()
vx_exh_hashinit ()
vx_dexh_create ()
vx_dexh_init ()
vx_pd_rename ()
vx_rename1_pd()
vx_do_rename ()
vx_rename1 ()
vx_rename ()
vx_rename_skey ()

DESCRIPTION:
The system panics as the vx_logbug_clean() function tries to access an already freed transaction from transaction chain to flush it to log.

RESOLUTION:
The code is modified to make sure that transaction gets flushed to the log before it is freed.

* 3499008 (Tracking ID: 3484336)

SYMPTOM:
The fidtovp() system call can panic in the vx_itryhold_locked() function with the following stack trace:

vx_itryhold_locked
vx_iget
vx_common_vget
vx_do_vget
vx_vget_skey
vfs_vget
fidtovp
kernel_add_gate_cstack
nfs3_fhtovp
rfs3_getattr
rfs_dispatch
svc_getreq
threadentry
[kdb_read_mem]

DESCRIPTION:
Some VxFS operations like the vx_vget() function try to get a hold on an in-core inode using the vx_itryhold_locked() function, but it doesnAt take the lock on the corresponding directory inode. This may lead to a race condition when this inode is present on the delicache list and is inactivated. Thereby this results in a panic when the vx_itryhold_locked() function tries to remove it from a free list. This is actually a known issue, but the last fix was not complete. It missed some functions which may also cause the race condition.

RESOLUTION:
The code is modified to take inode list lock inside the vx_inactive_tran(), vx_tranimdone() and vx_tranuninode() functions to avoid race condition.

* 3499011 (Tracking ID: 3486726)

SYMPTOM:
VFR logs too much data on the target node.

DESCRIPTION:
At the target node, it logs debug level messages evenif the debug mode was off. Also it doesnAt consider the debug mode specified at the time of job creation.

RESOLUTION:
The code is modified to not log the debug level messages on the target node if the specified debug mode is set off.

* 3499030 (Tracking ID: 3484353)

SYMPTOM:
It is a self-deadlock caused by a missing unlock of DIRLOCK. Its typical stack trace is like the following:
 
slpq_swtch_core()
real_sleep()
sleep_one()
vx_smp_lock()
vx_dirlock()
vx_do_rename()
vx_rename1()
vx_rename()
vn_rename()
rename()
syscall()

DESCRIPTION:
When a partitioned directory feature (PD) of Veritas File System (VxFS) is enabled, there is a possibility of self-deadlock when there are multiple renaming threads operating on the same target directory.
The issue is due to the fact that there is a missing unlock of DIRLOCK in the vx_int_rename() function.

RESOLUTION:
The code is modified by adding missing unlock for directory lock in the vx_int_rename()function..

* 3514824 (Tracking ID: 3443430)

SYMPTOM:
Fsck allocates too much memory.

DESCRIPTION:
Since Storage Foundation 6.0, parallel inode list processing with multiple threads is introduced to help reduce the fsck time. However, the parallel threads have to allocate redundant memory instead of reusing buffers in the buffer cache efficiently when inode list has many holes.

RESOLUTION:
The code is fixed to make each thread to maintain its own buffer cache from which it can reuse free memory.

* 3515559 (Tracking ID: 3498048)

SYMPTOM:
while the system is making backup, the Als AlA command on the same file system may hang.

DESCRIPTION:
When the dalloc (delayed allocation) feature is turned on, flushing takes quite a lot of time which keeps hold on getpage lock, as this lock is needed by writers which keep read write lock held on inodes. The Als AlA command needs ACLs(access control lists) to display information. But in Veritas File System (VxFS), ACLS are accessed only under protection of inode read write lock, which results in the hang.

RESOLUTION:
The code is modified to turn dalloc off and improve write throttling by restricting the kernel flusher from updating Intenal counter for write page flush..

* 3517702 (Tracking ID: 3517699)

SYMPTOM:
Return code 240 for command fsfreeze(1M) is not documented in man page for fsfreeze.

DESCRIPTION:
Return code 240 for command fsfreeze(1M) is not documented in man page for fsfreeze.

RESOLUTION:
The man page for fsfreeze(1M) is modified to document return code 240.

* 3579957 (Tracking ID: 3233315)

SYMPTOM:
"fsck" utility dumps core, while checking the RCT file.

DESCRIPTION:
"fsck" utility dumps core, while checking the RCT file. "bmap_search_typed()"
function is passed with wrong parameter, and results in the core dump with the
following stack trace:

bmap_get_typeparms ()
bmap_search_typed_raw()
bmap_search_typed()
rct_walk()
bmap_check_typed_raw()
rct_check()
main()

RESOLUTION:
Fixed the code to pass the correct parameters to "bmap_search_typed()" function.

* 3581566 (Tracking ID: 3560968)

SYMPTOM:
The delicache_enable tunable is inconsistent in the CFS environment.

DESCRIPTION:
On the secondary nodes, the tunable values are exported from the primary mount, while the delicache_enable tunable value comes from the AtunefstabA file. Therefore the  tunable values are not persistent.

RESOLUTION:
The code is fixed to read the "tunefstab" file only for the delicache_enable tunable during mount and set the value accordingly.

* 3584297 (Tracking ID: 3583930)

SYMPTOM:
When external quota file is over-written or restored from backup, new settings which were added after the backup still remain.

DESCRIPTION:
The purpose of the quotaon operation is to copy the quota limits from external to internal quota file, because internal quota file is not always updated with correct limits. To complete the copy operation, the extent of external file is compared to the extent of internal file at the corresponding offset.
     Now, if external quota file is overwritten (or restored to its original copy) and the size of internal file is more than that of external, the quotaon operation does not clear the additional (stale) quota records in the internal file. Later, the sync operation (part of quotaon) copies these stale records from internal to external file. Hence, both internal and external files contain stale records.

RESOLUTION:
The code is modified to get rid of the stale records in the internal file at the time of quotaon.

* 3590573 (Tracking ID: 3331010)

SYMPTOM:
Command fsck(1M) dumped core with segmentation fault.
Following stack is observed.

fakebmap()
rcq_apply_op()
rct_process_pending_tasklist()
process_device()
main()

DESCRIPTION:
While working on the device in function precess_device(), command fsck tries to
access already freed device related structures available in pending task list
during retry code path.

RESOLUTION:
Code is modified to free up the pending task list before retrying in function
precess_device().

* 3597560 (Tracking ID: 3597482)

SYMPTOM:
The pwrite(2) function fails with EOPNOTSUPP when the write range is in two indirect extents.

DESCRIPTION:
When the range of pwrite() falls in two indirect extents, one ZFOD extent belonging to DB2 pre-allocated files created with setext( , VX_GROWFILE, ) ioctl and another DATA extent belonging to adjacent INDIR, write fails with EOPNOTSUPP. 
The reason is that Veritas File System (VxFS) is trying to coalesce extents which belong to different indirect address extents as part of this transaction A such a meta-data change consumes more transaction resources which VxFS transaction engine is unable to support in the current implementation.

RESOLUTION:
The code is modified to retry the write transaction without combining the extents.

Patch ID: PVKL_04003

* 2705336 (Tracking ID: 2059611)

SYMPTOM:
The system panics due to a NULL pointer dereference while flushing the
bitmaps to the disk and the following stack trace is displayed:
a|
a|
vx_unlockmap+0x10c
vx_tflush_map+0x51c
vx_fsq_flush+0x504
vx_fsflush_fsq+0x190
vx_workitem_process+0x1c
vx_worklist_process+0x2b0
vx_worklist_thread+0x78

DESCRIPTION:
The vx_unlockmap() function unlocks a map structure of the file
system. If the map is being used, the hold count is incremented. The
vx_unlockmap() function attempts to check whether this is an empty mlink doubly
linked list. The asynchronous vx_mapiodone routine can change the link at random
even though the hold count is zero.

RESOLUTION:
The code is modified to change the evaluation rule inside the
vx_unlockmap() function, so that further evaluation can be skipped over when map
hold count is zero.

* 2933290 (Tracking ID: 2756779)

SYMPTOM:
Write and read performance concerns on Cluster File System (CFS) when running 
applications that rely on POSIX file-record locking (fcntl).

DESCRIPTION:
The usage of fcntl on CFS leads to high messaging traffic across nodes thereby 
reducing the performance of readers and writers.

RESOLUTION:
The code is modified to cache the ranges that are being file-record locked on 
the node. This is tried whenever possible to avoid broadcasting of messages 
across the nodes in the cluster.

* 2933301 (Tracking ID: 2908391)

SYMPTOM:
Checkpoint removal takes too long if Veritas File System (VxFS) has a large 
number of files. The cfsumount(1M) command could hang if removal of multiple 
checkpoints is in progress for such a file system.

DESCRIPTION:
When removing a checkpoint, VxFS traverses every inode to determine if 
pull/push is needed for upstream/downstream checkpoint in its chain. This is 
time consuming if the file system has large number of files. This results in 
the slow checkpoint removal.

The command "cfsumount -c fsname" forces the umounts operation on a VxFS file 
system if there is any asynchronous checkpoint removal job in progress by 
checking if the value of vxfs stat "vxi_clonerm_jobs" is larger than zero. 
However, the stat does not count in the jobs in the checkpoint removal working 
queue and the jobs are entered into the working queue.  The "force umount" 
operation does not happen even if there are pending checkpoint removal jobs 
because of the incorrect value of "vxi_clonerm_jobs" (zero).

RESOLUTION:
For slow checkpoint removal issue: 
Code is modified to create multiple threads to work on different Inode 
Allocation Units (IAUs) in parallel and to reduce the inode push work by 
sorting the checkpoint removal jobs by the creation time in ascending order and 
enlarged the checkpoint push size.

For the cfsumount(1M) command hang issue: 
Code is modified to add the counts of jobs in the working queue in 
the "vxi_clonerm_jobs" stat.

* 2947029 (Tracking ID: 2926684)

SYMPTOM:
On systems with heavy transactions workload like creation, deletion of files 
and so on, the system may panic with the following stack trace:
a|..
vxfs:vx_traninit+0x10
vxfs:vx_dircreate_tran+0x420
vxfs:vx_pd_create+0x980
vxfs:vx_create1_pd+0x1d0
vxfs:vx_do_create+0x80
vxfs:vx_create1+0xd4
vxfs:vx_create+0x158
a|..

DESCRIPTION:
In case of a delayed log, a transaction commit can complete before completing 
the log write. The memory for transaction is freed before logging the 
transaction and corrupts the transaction freelist causing the system to panic.

RESOLUTION:
The code is modified such that the transaction is not freed untill the log is 
written.

* 2959557 (Tracking ID: 2834192)

SYMPTOM:
The mount operation fails after full fsck(1M) utility is run and displays the 
following error message on the console:
'UX:vxfs mount.vxfs: ERROR: V-3-26881 : Cannot be mounted until it has been 
cleaned by fsck. Please run "fsck -t vxfs -y MNTPNT" before mounting'.

DESCRIPTION:
When a CFS is mounted, VxFS validates the per-node-cut entries (PNCUT) which 
are in-core against their counterparts on the disk. This validation failure 
makes the mount unsuccessful for the full fsck. Full fsck is in the fourth pass 
when it checks the free inode/extent maps and merges the dirty PNCUT files in-
core, and validates them with the corresponding on-disk values. However, if any 
PNCUT entry is corrupted, then the fsck(1M) utility simply ignores it. This 
results in the mount failure.

RESOLUTION:
The code is modified to enhance the fsck(1M) utility to handle any delinquent 
PNCUT entries and rebuild them as required.

* 2978234 (Tracking ID: 2972183)

SYMPTOM:
"fsppadm enforce"  takes longer than usual time force update the secondary 
nodes than it takes to force update the primary nodes.

DESCRIPTION:
The ilist is force updated on secondary node. As a result the performance on 
the secondary becomes low.

RESOLUTION:
Force update the ilist file on Secondary nodes only on error condition.

* 2978236 (Tracking ID: 2977828)

SYMPTOM:
The file system is marked bad after an inode table overflow error with
the following error messages:

kernel: vxfs: msgcnt 7911 mesg 014: V-2-14: vx_iget - inode table overflow
kernel: vxfs: msgcnt 7912 mesg 063: V-2-63: vx_fset_markbad -
<devicename>  file system  fileset (index <filesystem index>) marked bad
kernel: V-2-96: vx_setfsflags - <devicename> file system fullfsck
flag set - vx_fset_markbad

DESCRIPTION:
To remove a checkpoint, the system truncates every file that is
consumed by the checkpoint. When the number of the files are too large, the
inode cache may become full, leading to an ENFILE error (inode table full). And
the ENFILE error inappropriately sets the full fsck flag on the file system.

RESOLUTION:
The code is modified to convert the ENFILE error to the ENOSPC error
to fix the issue.

* 2982161 (Tracking ID: 2982157)

SYMPTOM:
During internal testing, the Af:vx_trancommit:4A debug asset was hit when the available transaction space is lesser than the required space.

DESCRIPTION:
The Af:vx_trancommit:4A assert is hit when available transaction space is lesser than required. During the file truncate operations, when VxFS calculates transaction space, it doesnAt consider the transaction space required in case the file has shared extents.  As a result, the Af:vx_trancommit:4A debug assert is hit.

RESOLUTION:
The code is modified to take into account the extra transaction buffer space required when the file being truncated has shared extents.

* 2999566 (Tracking ID: 2999560)

SYMPTOM:
While trying to clear the 'metadataok' flag on a volume of the volume set, the 'fsvoladm'(1M) command gives error.

DESCRIPTION:
The 'fsvoladm'(1M) command sets and clears 'dataonly' and 'metadataok'flags on a volume in a vset on which VxFS is mounted. 
The 'fsvoladm'(1M) command fails while clearing A the AmetadataokA flag and reports, an EINVAL (invalid argument) error for certain volumes. This failure occurs because while clearing the flag, VxFS reinitialize the reorg structure for some volumes. During re-initialization, VxFS frees the existing FS structures. However, it still refers to the stale device structure resulting in an EINVAL error.

RESOLUTION:
The code is modified to let the in-core device structure point to the updated and correct data.

* 3027250 (Tracking ID: 3031901)

SYMPTOM:
The 'vxtunefs(1M)' command accepts the garbage value for the 
'max_buf_dat_size' tunable.

DESCRIPTION:
When the garbage value for the 'max_buf_dat_size' tunable  using 'vxtunefs(1M)' is specified, the tunable accepts the value and gives the successful update message; but the value actually doesn't get reflected in the system. And, this error is not identified from parsing the command line value of THE 'max_buf_dat_size' tunable; hence the garbage value for this tunable is also accepted.

RESOLUTION:
The code is modified to handle the error returned from parsing the command line value of the 'max_buf_data_size' tunable.

* 3056103 (Tracking ID: 3197901)

SYMPTOM:
fset_get fails for the mention configuration

DESCRIPTION:
duplicate symbol fs_bmap in VxFS libvxfspriv.a and vxfspriv.so

RESOLUTION:
duplicate symbol fs_bmap in VxFS libvxfspriv.a and vxfspriv.so has 
being fixed by renaming to fs_bmap_priv in the libvxfspriv.a

* 3059000 (Tracking ID: 3046983)

SYMPTOM:
There is an invalid CFS node number (<inode number>) 
in ".__fsppadm_fclextract". This causes the Dynamic Storage Tiering (DST) 
policy enforcement to fail.

DESCRIPTION:
DST policy enforcement sometimes depends on the extraction of the File Change 
Log (FCL). When the FCL change log is processed, it reads the FCL records from 
the change log into the buffer. If it finds that the buffer is not big enough 
to hold the records, it will do some rollback and pass out the needed buffer 
size. However, the rollback is not complete, this results in the problem.

RESOLUTION:
The code is modified to add the codes to the rollback content of "fh_bp1-
>fb_addr" and "fh_bp2->fb_addr".

* 3108176 (Tracking ID: 2667658)

SYMPTOM:
Attempt to perform an fscdsconv-endian conversion from the SPARC little-endian 
byte order to the x86 big-endian byte order fails because of a macro overflow.

DESCRIPTION:
Using the fscdsconv(1M) command to perform endian conversion from the SPARC 
little-endian (any SPARC architecture machine) byte order to the x86 big-endian 
(any x86 architecture machine) byte order fails. The write operation for the 
recovery file results in the control data offset (a hard coded macro to 500MB) 
overflow.

RESOLUTION:
The code is modified to take an estimate of the control-data offset explicitly 
and dynamically while creating and writing the recovery file.

* 3131798 (Tracking ID: 2839871)

SYMPTOM:
On a system with DELICACHE enabled, several file system operations may hang 
with the following stack trace: 

vx_delicache_inactive 
vx_delicache_inactive_wp
vx_workitem_process 
vx_worklist_process
vx_worklist_thread
vx_kthread_init

DESCRIPTION:
The DELICACHE lock is used to synchronize the access to the DELICACHE list and 
it is held only while updating this list. However, in some cases it is held 
longer and is released only after the issued I/O is completed, causing other 
threads to  hang.

RESOLUTION:
The code is modified to release the spinlock before issuing a blocking I/O 
request.

* 3131801 (Tracking ID: 2850738)

SYMPTOM:
The Veritas File System (VxFS) module allocates memory with MEMWAIT in the 
callback() routine during the low memory condition. This causes the system to 
hang with the following
stack trace:
swtch_to_thread(inlined)
slpq_swtch_core+0x520   real_sleep(inlined)
sleep+0x400 
mrg_reserve_swapmem(inlined)
$cold_steal_swap+0x460
$cold_kalloc_nolgpg+0x4b0 
kalloc_internal(inlined)
$cold_kmem_arena_refill+0x650
kmem_arena_varalloc+0x280 
vx_alloc(inlined)vx_worklist_enqueue+0x40 
vx_buffer_kmcache_callback+0x160
kmem_gc_arena(inlined)
foreach_arena_ingroup+0x840
kmem_garbage_collect_group(inlined)
kmem_garbage_collect+0x390 
kmem_arena_gc+0x240 
kthread_daemon_startup+0x90

DESCRIPTION:
The VxFS kernel memory callback() routine allocates memory with MEMWAIT. As a 
result, the system hangs in low memory condition.

RESOLUTION:
The code is modified to allocate memory without waiting in the VxFS kernel 
memory callback() routine.

* 3131806 (Tracking ID: 2964018)

SYMPTOM:
On a high end machine with about 125 CPUs operations using the lstat64(2)  
function, may seem to be hung and the following stack trace is observed:
 
spinlock+0xe0  
rwspin_wrlock+0x30  
specvp+0x510  
vx_lookup+0x8a0  
->  lookuppnvp(inlined)
->  lookuppn(inlined)

DESCRIPTION:
The statvfsdev search calls the devnm() function to search the whole /dev/ 
directory for reverse-pathname

RESOLUTION:
The code is modified such that a new fs_load() function is implemented to make 
use of the incoming file descriptor, if it is already a character device. 
However, the devnm() function is still needed if the incoming file descriptor 
is a block device.

* 3131826 (Tracking ID: 2966277)

SYMPTOM:
Systems with high file-system activity like read/write/open/lookup may panic 
with the following stack trace due to a rare race condition:

 spinlock+0x21  ( )
 ->  vx_rwsleep_unlock()
  vx_ipunlock+0x40()
  vx_inactive_remove+0x530()
  vx_inactive_tran+0x450()
 vx_local_inactive_list+0x30()
  vx_inactive_list+0x420()
 ->  vx_workitem_process()
 ->  vx_worklist_process()
 vx_worklist_thread+0x2f0()
  kthread_daemon_startup+0x90()

DESCRIPTION:
ILOCK is released before doing a IPUNLOCK that causes a race condition. This 
results in a panicwhen an inode that has been set free is accessed.

RESOLUTION:
The code is modified so that the ILOCK is used to protect the inodes' memory 
from being set free, while the memory is being accessed.

* 3131889 (Tracking ID: 3010444)

SYMPTOM:
On a Network File System (NFS) mounted file system, the operations which read 
the file via the cksum (1m) command may fail with the following error message: 

cksum: read error on <filename>: Bad address
The following error messages would also be seen in the syslog <date:time>  
<system_name>  vmunix: WARNING: 
Synchronous Page I/O error

DESCRIPTION:
When the read-vnode operation (VOP_RDWR) is performed, certain requests are 
converted to direct the I/O for optimisation. However, the NFS buffers passed 
during the read requests are not the user buffers. As a result, there is an 
error.

RESOLUTION:
The code is modified to convert the I/O requests to the direct I/O, only if the 
buffer passed during the I/O is the user buffer.

* 3131896 (Tracking ID: 3031869)

SYMPTOM:
In a multi-CPU environment, the "vxfsstat -b" command does not print the 
correct information on the maximum-size buffer.

DESCRIPTION:
The "vx_bc_bufhwm" parameter represents the maximum amount of memory that can 
be used to cache the VxFS metadata. When the kctune(1M) command is used to tune 
the "vxfs_bc_bufhwm" parameter to a different value, the tunable is not set 
correctly due to the incorrect arithmetic. As a consequence, the "vxfsstat -b" 
command reports the maximum-size buffer to be increased, even though 
the "vxfs_bc_bufhwm" parameter is tuned to a lower value.

RESOLUTION:
The code is modified to correct the arithmetic for tuning the "vx_bc_bufhwm" 
parameter.

* 3131924 (Tracking ID: 3049408)

SYMPTOM:
When the system is under the file-cache pressure, the find(1) command takes 
time to operate.

DESCRIPTION:
The Veritas File System (VxFS) does not grow the metadata-buffer cache under 
system or file-cache memory pressure. When the vx_bcrecycle_timelag factor 
drops to zero, the metadata buffers are reused immediately after they are 
accessed. As a result, a large-directory scan takes many physical I/Os to scan 
the directory. The end result is that VxFS ends up performing excessive re-
reads for the same data, into the metadata-buffer cache. However, the file-
cache memory pressure is normal. There is no need to shrink the metadata-buffer 
cache, just because there is a file-cache memory pressure.

RESOLUTION:
The code is modified to unlink the metadata-buffer cache behaviour from the 
file-cache memory pressure.

* 3131955 (Tracking ID: 3099638)

SYMPTOM:
When the vxfs_ifree_timelag(5) tunable is tuned the following error message is 
displayed:
# kctune vxfs_ifree_timelag=400 
ERROR: mesg 095: V-2-95: Setting vxfs_ifree_timelag to 450 since the specified 
value for vxfs_ifree_timelag is less than the recommended minimum value of 1035

DESCRIPTION:
In the vxfs_ifree_timelag(5) tunable man page, the minimum value is set 
to "None". The error message is displayed when the vxfs_ifree_timelag(5) 
tunable is set to a value which is less than 450. In the error message, a 
garbage value is displayed as the recommended minimum value. The error occurs 
because a single argument is passed for the error message that has two format 
specifier's.

RESOLUTION:
The code is modified to set the correct minimum value of the vxfs_ifree_timelag
(5) tunable, and display the correct error message.

* 3248029 (Tracking ID: 2439261)

SYMPTOM:
When the vx_fiostats_tunable is changed from zero to non-zero, the
system panics with the following stack trace:
vx_fiostats_do_update
vx_fiostats_update
vx_read1
vx_rdwr
vno_rw
rwuio
pread

DESCRIPTION:
When vx_fiostats_tunable is changed from zero to non-zero, all the
incore-inode fiostats attributes are set to NULL. When these attributes are
accessed, the system panics due to the NULL pointer dereference.

RESOLUTION:
The code has been modified to check the file I/O stat attributes are
present before dereferencing the pointers.

* 3248042 (Tracking ID: 3072036)

SYMPTOM:
Reads from secondary node in CFS can sometimes fail with ENXIO (No such device 
or address).

DESCRIPTION:
The incore attribute ilist on secondary node is out of sync with that of the 
primary.

RESOLUTION:
The code is modified such that incore attribute ilist on secondary node is force
updated with data from primary node.

* 3248046 (Tracking ID: 3092114)

SYMPTOM:
The information output by the "df -i" command can often be inaccurate for 
cluster mounted file systems.

DESCRIPTION:
In Cluster File System 5.0 release a concept of delegating metadata to nodes in 
the cluster is introduced. This delegation of metadata allows CFS secondary 
nodes to update metadata without having to ask the CFS primary to do it. This 
provides greater node scalability. 
However, the "df -i" information is still collected by the CFS primary 
regardless of which node (primary or secondary) the "df -i" command is executed 
on.

For inodes the granularity of each delegation is an Inode Allocation Unit 
[IAU], thus IAUs can be delegated to nodes in the cluster.
When using a VxFS 1Kb file system block size each IAU will represent 8192 
inodes.
When using a VxFS 2Kb file system block size each IAU will represent 16384 
inodes.
When using a VxFS 4Kb file system block size each IAU will represent 32768 
inodes.
When using a VxFS 8Kb file system block size each IAU will represent 65536 
inodes.
Each IAU contains a bitmap that determines whether each inode it represents is 
either allocated or free, the IAU also contains a summary count of the number 
of inodes that are currently free in the IAU.
The ""df -i" information can be considered as a simple sum of all the IAU 
summary counts.
Using a 1Kb block size IAU-0 will represent inodes numbers      0 -  8191
Using a 1Kb block size IAU-1 will represent inodes numbers   8192 - 16383
Using a 1Kb block size IAU-2 will represent inodes numbers  16384 - 32768
etc.
The inaccurate "df -i" count occurs because the CFS primary has no visibility 
of the current IAU summary information for IAU that are delegated to Secondary 
nodes.
Therefore the number of allocated inodes within an IAU that is currently 
delegated to a CFS Secondary node is not known to the CFS Primary.  As a 
result, the "df -i" count information for the currently delegated IAUs is 
collected from the Primary's copy of the IAU summaries. Since the Primary's 
copy of the IAU is stale, therefore the "df -i" count is only accurate when no 
IAUs are currently delegated to CFS secondary nodes.
In other words - the IAUs currently delegated to CFS secondary nodes will cause 
the "df -i" count to be inaccurate.
Once an IAU is delegated to a node it can "timeout" after a 3 minutes  of 
inactivity. However, not all IAU delegations will timeout. One IAU will always 
remain delegated to each node for performance reasons. Also an IAU whose inodes 
are all allocated (so no free inodes remain in the IAU) it would not timeout 
either.
The issue can be best summarized as:
The more IAUs that remain delegated to CFS secondary nodes, the greater the 
inaccuracy of the "df -i" count.

RESOLUTION:
Allow the delegations for IAU's whose inodes are all allocated (so no free 
inodes in the IAU) to "timeout" after 3 minutes of inactivity.

* 3248051 (Tracking ID: 3121933)

SYMPTOM:
The pwrite()  function fails with EOPNOTSUPP when the write range is in two 
indirect extents.

DESCRIPTION:
When the range of pwrite() falls in two indirect extents (one ZFOD extent 
belonging to DB2 pre-allocated files created with setext( , VX_GROWFILE, ) 
ioctl and another DATA extent belonging to adjacent INDIR) write fails with 
EOPNOTSUPP. The reason is that VxFS is trying to coalesce extents which belong 
to different indirect address extents as part of this transaction - such a meta-
data change consumes more transaction resources which VxFS transaction engine 
is unable to support in the current implementation.

RESOLUTION:
Code is modified to retry the transaction without coalescing the extents, as 
latter is an optimisation and should not fail write.

* 3248054 (Tracking ID: 3153919)

SYMPTOM:
The fsadm(1M) command may hang when the structural file set re-organization is
in progress. The following stack trace is observed:
vx_event_wait
vx_icache_process
vx_switch_ilocks_list
vx_cfs_icache_process
vx_switch_ilocks
vx_fs_reinit
vx_reorg_dostruct
vx_extmap_reorg
vx_struct_reorg 
vx_aioctl_full
vx_aioctl_common
vx_aioctl
vx_ioctl
vx_compat_ioctl
compat_sys_ioctl

DESCRIPTION:
During the structural file set re-organization, due to some race condition, the
VX_CFS_IOWN_TRANSIT flag is set on the inode. At the final stage of the
structural file set re-organization, all the inodes are re-initialized. Since, 
the VX_CFS_IOWN_TRANSIT flag is set improperly, the re-initialization fails to
proceed. This causes the hang.

RESOLUTION:
The code is modified such that VX_CFS_IOWN_TRANSIT flag is cleared.

* 3248089 (Tracking ID: 3003679)

SYMPTOM:
The file system hangs when doing fsppadm and removing a file with named stream 
attributes (nattr) at the same time. The following two typical threads are 
involved: 

T1:
COMMAND: "fsppadm"
schedule at
 vxg_svar_sleep_unlock
vxg_grant_sleep
 vxg_cmn_lock
 vxg_api_lock
 vx_glm_lock
 vx_ihlock
 vx_cfs_iread
 vx_iget
 vx_traverse_tree
vx_dir_lookup
vx_rev_namelookup
vx_aioctl_common
vx_ioctl
vx_compat_ioctl
compat_sys_ioctl
T2:
COMMAND: "vx_worklist_thr"
 schedule
 vxg_svar_sleep_unlock
 vxg_grant_sleep
 vxg_cmn_lock
 vxg_api_lock
 vx_glm_lock
 vx_genglm_lock
 vx_dirlock
 vx_do_remove
 vx_purge_nattr
vx_nattr_dirremove
vx_inactive_tran
vx_cfs_inactive_list
vx_inactive_list
vx_workitem_process
vx_worklist_process
vx_worklist_thread
vx_kthread_init
kernel_thread

DESCRIPTION:
The file system hangs due to the deadlock between the threads. T1 initiated by 
fsppadm calls vx_traverse_tree to obtain the path name for a given inode 
number. T2 removes the inode as well as its affiliated nattr inodes.
The reverse name lookup (T1) holds the global dirlock in vx_dir_lookup during 
the lookup process. It traverses the entire path from bottom to top to resolve 
the inode number inversely in vx_traverse_tree. During the lookup, VxFS needs 
to hold the hlock of each inode to read them, and drop it after reading.
The file removal (T2) is processed via vx_inactive_tran which will take 
the "hlock" of the inode being removed. After that, it will remove all its 
named attribute inodes invx_do_remove, where sometimes the global dirlock is 
needed. Eventually, each thread waits for the lock, which is held by the other 
thread and this result in the deadlock.

RESOLUTION:
The code is modified so that the dirlock is not acquired during reserve name 
lookup.

* 3248090 (Tracking ID: 2963763)

SYMPTOM:
When thin_friendly_alloc and deliache_enable parameters are enabled, Veritas 
File System (VxFS) may hit the deadlock. The thread involved in the deadlock 
can have the following stack trace:

vx_rwsleep_lock()
vx_tflush_inode()
vx_fsq_flush()
vx_tranflush()
vx_traninit()
vx_remove_tran()
vx_pd_remove()
vx_remove1_pd()
vx_do_remove()
vx_remove1()
vx_remove_vp()
vx_remove()
vfs_unlink()
do_unlinkat

The threads waiting in vx_traninit() for transaction space, displays following 
stack trace:

vx_delay2() 
vx_traninit()
vx_idelxwri_done()
vx_idelxwri_flush()
vx_common_inactive_tran()
vx_inactive_tran()
vx_local_inactive_list()
vx_inactive_list+0x530()
vx_worklist_process()
vx_worklist_thread()

DESCRIPTION:
In the extent allocation code paths, VxFS is setting the IEXTALLOC flag on the 
inode, without taking the ILOCK, with overlapping transactions picking up this 
same inode off the delicache list makes the transaction done code paths to miss 
the IUNLOCK call.

RESOLUTION:
The code is modified to change the corresponding code paths to set the 
IEXTALLOC flag under proper protection.

* 3248094 (Tracking ID: 3192985)

SYMPTOM:
Checkpoints quota usage on CFS can be negative.
An example is as follows:
Filesystem     hardlimit     softlimit        usage         action_flag
/sofs1         51200         51200     18446744073709490176  << negative

DESCRIPTION:
In CFS, to manage the intent logs, and the other extra objects required for 
CFS, a holding object referred to as a per-node-object-location table (PNOLT) 
is created. In CFS, the quota usage is calculated by reading the per node cut 
(current usage table) files (member of PNOLT) and summing up the quota usage 
for each clone clain. However, when the quotaoff and quotaon operations are 
fired on a CFS checkpoint, the usage shows "0" after these two operations are 
executed. This happens because the quota usage calculation is skipped. 
Subsequently, if a delete operation is performed, the usage becomes negative 
since the blocks allocated for the deleted file are subtracted from zero.

RESOLUTION:
The code is modified such that when the quotaon operation is performed, the 
quota usage calculation is not skipped.

* 3248096 (Tracking ID: 3214816)

SYMPTOM:
When you create and delete the inodes of a user frequently with the DELICACHE 
feature enabled, the user quota file becomes corrupt.

DESCRIPTION:
The inode DELICACHE feature causes this issue. This feature optimizes the 
updates on the inode map during the file creation and deletion operations. It 
is enabled by default. You can disable this feature with the vxtunefs(1M) 
command.

When DELICACHE is enabled and the quota is set for Veritas File System (VxFS), 
VxFS updates the quota for the inodes before the inodes are on the DELICACHE 
list and after they are on the inactive list during the removal process. As a 
result, VxFS decrements the current number of user files twice. This causes the 
quota file corruption.

RESOLUTION:
The code is modified to identify the inodes moved to the inactive list from the 
DELICACHE list. This flag prevents the quota being decremented again during the 
removal process.

* 3248099 (Tracking ID: 3189562)

SYMPTOM:
Oracle daemons get hang with the vx_growfile() kernel function. You may
see similar stack trace as follows:
vx_growfile+0004D4 ()        
vx_doreserve+000118 ()
vx_tran_extset+0005DC ()
vx_extset_msg+0006E8 ()
vx_cfs_extset+000040 ()
vx_extset+0002D4 ()
vx_setext+000190 () 
vx_uioctl+0004AC ()
vx_ioctl+0000D0 ()
vx_ioctl_skey+00004C ()
vnop_ioctl+000050 (??, ??, ??, ??, ??, ??)
kernel_add_gate_cstack+000030 () 
vx_vop_ioctl+00001C () 
vx_odm_resize@AF15_6+00015C ()
vx_odm_resize+000030 ()
odm_vx_resize+000040 ()
odm_resize+0000E8 ()
vxodmioctl+00018C ()
hkey_legacy_gate+00004C ()
vnop_ioctl+000050 (??, ??, ??, ??, ??, ??)
vno_ioctl+000178 (??, ??, ??, ??, ??)

DESCRIPTION:
The vx_growfile() kernel function may run into a loop on a highly fragmented
file system, which causes multiple processes to hang. The vx_growfile() routine
is invoked through the setext(1) command or its Application Programming
Interface (API). When the vx_growfile() function requires more extents than the
typed extent buffer can spare, an VX_EBMAPLOCK error may occur. To handle the
error, VxFS cancels the transaction and repeats the same operation again, which
creates the loop.

RESOLUTION:
The code is modified to make VxFS commit the available extents to
proceed the growfile transaction, and repeat enough times until the transaction
is completed.

* 3284764 (Tracking ID: 3042485)

SYMPTOM:
During internal Stress testing, the f:vx_purge_nattr:1 assert fails.

DESCRIPTION:
In case of corruption, the file-system check utility is run, and the inodes to be checked or fixed are picked up serially.
However, in some cases the order in which these are processed changes, which cause inconsistent meta-data resulting in assert failure.

RESOLUTION:
The code is modified to handle named attribute inodes in an earlier pass during full fsck operation.

* 3296988 (Tracking ID: 2977035)

SYMPTOM:
While running an internal noise test in a Cluster File System (CFS) environment, a debug assert issue was observed in vx_dircompact()function.

DESCRIPTION:
Compacting directory blocks are avoided if the inode has AextopA (extended operation) flags, such as deferred inode removal and pass through truncation set.. The issue is caused when the inode has extended pass through truncation and is considered for compacting.

RESOLUTION:
The code is modified to avoid compacting the directory blocks of the inode if it has [0]an extended operation of pass through truncation set.[0]

* 3299685 (Tracking ID: 2999493)

SYMPTOM:
During internal testing, the file system check validation fails after a successful full fsck operation and displays the following error message: 
run_fsck :
First full fsck pass failed, exiting

DESCRIPTION:
Even after a successful full fsck completion, the fsck validation fails due to incorrect entries in a structural file (IFRCT) which maintains reference of count of shared extents. While processing information for indirect extents, the modified data does not get flushed to the disk because the buffer is not marked dirty after its contents are modified.

RESOLUTION:
The code is modified to mark the buffer dirty when its contents are modified.

* 3306410 (Tracking ID: 2495673)

SYMPTOM:
During communication between the nodes in a cluster, the incore inode gets marked AbadA and an internal test assertion fails.

DESCRIPTION:
In a Cluster File System (CFS) environment, when two nodes communicate for grant on inode, some data is also piggybacked to the initiating node. If there is any discrepancy on the data that is piggybacked between these two nodes within the cluster, the incore inode gets marked AbadA. During communication, the file system gets disabled causing stale concurrent I/O data transfer to the initiating node resulting in a mismatch.

RESOLUTION:
The code is modified such that if the file system gets disabled, it invalidates its concurrent I/O count state from other nodes and does not delegate false information when asked for concurrent I/O count from other nodes.

* 3306442 (Tracking ID: 3312030)

SYMPTOM:
The default quota support on Veritas File System version 6.0.4A and later is changed to 32 bit.

DESCRIPTION:
The quota conversion to 64-bit happens implicitly during the mount operation, which indicates that there is no option available for 32-bit quotas.

RESOLUTION:
The code is modified to change the default quota to 32-bit. The file systems which were created and mounted with quotas in 6.0.3 release continue with the 64-bit quota support when upgrading to 6.0.4. However, any new file systems created has the default quota set to 32-bit. 
Contact Symantec Customer Support if you need the 64-bit quota support for the newer file systems

* 3310758 (Tracking ID: 3310755)

SYMPTOM:
When the system processes an indirect extent, if it finds the first record as Zero Fill-On-Demand (ZFOD) extent (or first n records are ZFOD records), then it hits the assert.

DESCRIPTION:
In case of indirect extents reference count mechanism (shared block count)
regarding files having the shared ZFOD extents are not behaving correctly.

RESOLUTION:
The code for the reference count queue (RCQ) handling for the shared indirect ZFOD extents is modified, and the fsck(1M) issues with snapshot of file[0] when there are ZFOD extents has been fixed.

* 3321730 (Tracking ID: 3214328)

SYMPTOM:
A mismatch is observed between the states for the Global Lock Manager (GLM) grant level and the Global Lock Manager (GLM) data
in a Cluster File System (CFS) inode.

DESCRIPTION:
When a file system is disabled during some error situation, and if any thread starts its execution before disabling the file system, then the execution is completed in spite of file system being disabled in between. The Global Lock Manager (GLM) state of an inode changes without updating other flags like inode->i_cflags, which causes a mismatch between the states.

RESOLUTION:
The code is modified to skip updating the Global Lock Manager (GLM) state when specific flag is set in inode->i_cflags and also when the file system is disabled.

* 3323912 (Tracking ID: 3259634)

SYMPTOM:
In CFS, each node with mounted file system cluster has its own intent
log in the file system. A CFS with more than 4, 294, 967, 296 file system blocks
can zero out an incorrect location resulting from an incorrect typecasting. For
example, that kind of CFS can incorrectly zero out 65536 file system blocks at
the block offset of 1, 537, 474, 560 (file system blocks) with a 8-Kb file system
block size and an intent log with the size of 65536 file system blocks. This
issue can only occur if an intent log is located above an offset of
4, 294, 967, 296 file system blocks. This situation can occur when you add a new
node to the cluster and mount an additional CFS secondary for the first time,
which needs to create and zero a new intent log. This situation can also happen
if you resize a file system or intent log and clear an intent log.

The problem occurs only with the following file system size and the FS block
size combinations:

1kb block size and FS size > 4TB
2kb block size and FS size > 8TB
4kb block size and FS size > 16TB
8kb block size and FS size > 32TB

For example, the message log can contain the following messages:

The full fsck flag is set on a file system with the following type of messages:
 

2013 Apr 17 14:52:22 sfsys kernel: vxfs: msgcnt 5 mesg 096: V-2-96:
vx_setfsflags - /dev/vx/dsk/sfsdg/vol1 file system fullfsck flag set - vx_ierror

2013 Apr 17 14:52:22 sfsys kernel: vxfs: msgcnt 6 mesg 017: V-2-17:
vx_attr_iget - /dev/vx/dsk/sfsdg/vol1 file system inode 13675215 marked bad 
incore

2013 Jul 17 07:41:22 sfsys kernel: vxfs: msgcnt 47 mesg 096:  V-2-96:
vx_setfsflags - /dev/vx/dsk/sfsdg/vol1 file system fullfsck  flag set - 
vx_ierror 

2013 Jul 17 07:41:22 sfsys kernel: vxfs: msgcnt 48 mesg 017:  V-2-17:
vx_dirbread - /dev/vx/dsk/sfsdg/vol1 file system inode 55010476  marked bad 
incore

DESCRIPTION:
In CFS, each node with mounted file system cluster has its own
intent log in the file system. When an additional node mounts the file system as
a CFS Secondary, the CFS creates an intent log. Note that intent logs are never
removed, they are reused.

When you clear an intent log, Veritas File System (VxFS) passes an incorrect
block number to the log clearing routine, which zeros out an incorrect location.
The incorrect location might point to the file data or file system metadata. Or,
the incorrect location might be part of the file system's available free space.
This is silent corruption. If the file system metadata corrupts, VxFS can detect
the corruption when it subsequently accesses the corrupt metadata and marks the
file system for full fsck.

RESOLUTION:
The code is modified so that VxFS can pass the correct block number
to the log clearing routine.

* 3338024 (Tracking ID: 3297840)

SYMPTOM:
A metadata corruption is found during the file removal process with the inode block count getting negative.

DESCRIPTION:
When the user removes or truncates a file having the shared indirect blocks, there can be an instance where the block count can be updated to reflect the removal of the shared indirect blocks when the blocks are not removed from the file. The next iteration of the loop updates the block count again while removing these blocks. This will eventually lead to the block count being a negative value after all the blocks are removed from the file. The removal code expects the block count to be zero before updating the rest of the metadata.

RESOLUTION:
The code is modified to update the block count and other tracking metadata in the same transaction as the blocks are removed from the file.

* 3338026 (Tracking ID: 3331419)

SYMPTOM:
Machine panics with the following stack trace.

 #0 [ffff883ff8fdc110] machine_kexec at ffffffff81035c0b
 #1 [ffff883ff8fdc170] crash_kexec at ffffffff810c0dd2
 #2 [ffff883ff8fdc240] oops_end at ffffffff81511680
 #3 [ffff883ff8fdc270] no_context at ffffffff81046bfb
 #4 [ffff883ff8fdc2c0] __bad_area_nosemaphore at ffffffff81046e85
 #5 [ffff883ff8fdc310] bad_area at ffffffff81046fae
 #6 [ffff883ff8fdc340] __do_page_fault at ffffffff81047760
 #7 [ffff883ff8fdc460] do_page_fault at ffffffff815135ce
 #8 [ffff883ff8fdc490] page_fault at ffffffff81510985
    [exception RIP: print_context_stack+173]
    RIP: ffffffff8100f4dd  RSP: ffff883ff8fdc548  RFLAGS: 00010006
    RAX: 00000010ffffffff  RBX: ffff883ff8fdc6d0  RCX: 0000000000002755
    RDX: 0000000000000000  RSI: 0000000000000046  RDI: 0000000000000046
    RBP: ffff883ff8fdc5a8   R8: 000000000002072c   R9: 00000000fffffffb
    R10: 0000000000000001  R11: 000000000000000c  R12: ffff883ff8fdc648
    R13: ffff883ff8fdc000  R14: ffffffff81600460  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff883ff8fdc540] print_context_stack at ffffffff8100f4d1
#10 [ffff883ff8fdc5b0] dump_trace at ffffffff8100e4a0
#11 [ffff883ff8fdc650] show_trace_log_lvl at ffffffff8100f245
#12 [ffff883ff8fdc680] show_trace at ffffffff8100f275
#13 [ffff883ff8fdc690] dump_stack at ffffffff8150d3ca
#14 [ffff883ff8fdc6d0] warn_slowpath_common at ffffffff8106e2e7
#15 [ffff883ff8fdc710] warn_slowpath_null at ffffffff8106e33a
#16 [ffff883ff8fdc720] hrtick_start_fair at ffffffff810575eb
#17 [ffff883ff8fdc750] pick_next_task_fair at ffffffff81064a00
#18 [ffff883ff8fdc7a0] schedule at ffffffff8150d908
#19 [ffff883ff8fdc860] __cond_resched at ffffffff81064d6a
#20 [ffff883ff8fdc880] _cond_resched at ffffffff8150e550
#21 [ffff883ff8fdc890] vx_nalloc_getpage_lnx at ffffffffa041afd5 [vxfs]
#22 [ffff883ff8fdca80] vx_nalloc_getpage at ffffffffa03467a3 [vxfs]
#23 [ffff883ff8fdcbf0] vx_do_getpage at ffffffffa034816b [vxfs]
#24 [ffff883ff8fdcdd0] vx_do_read_ahead at ffffffffa03f705e [vxfs]
#25 [ffff883ff8fdceb0] vx_read_ahead at ffffffffa038ed8a [vxfs]
#26 [ffff883ff8fdcfc0] vx_do_getpage at ffffffffa0347732 [vxfs]
#27 [ffff883ff8fdd1a0] vx_getpage1 at ffffffffa034865d [vxfs]
#28 [ffff883ff8fdd2f0] vx_fault at ffffffffa03d4788 [vxfs]
#29 [ffff883ff8fdd400] __do_fault at ffffffff81143194
#30 [ffff883ff8fdd490] handle_pte_fault at ffffffff81143767
#31 [ffff883ff8fdd570] handle_mm_fault at ffffffff811443fa
#32 [ffff883ff8fdd5e0] __get_user_pages at ffffffff811445fa
#33 [ffff883ff8fdd670] get_user_pages at ffffffff81144999
#34 [ffff883ff8fdd690] vx_dio_physio at ffffffffa041d812 [vxfs]
#35 [ffff883ff8fdd800] vx_dio_rdwri at ffffffffa02ed08e [vxfs]
#36 [ffff883ff8fdda20] vx_write_direct at ffffffffa044f490 [vxfs]
#37 [ffff883ff8fddaf0] vx_write1 at ffffffffa04524bf [vxfs]
#38 [ffff883ff8fddc30] vx_write_common_slow at ffffffffa0453e4b [vxfs]
#39 [ffff883ff8fddd30] vx_write_common at ffffffffa0454ea8 [vxfs]
#40 [ffff883ff8fdde00] vx_write at ffffffffa03dc3ac [vxfs]
#41 [ffff883ff8fddef0] vfs_write at ffffffff81181078
#42 [ffff883ff8fddf30] sys_pwrite64 at ffffffff81181a32
#43 [ffff883ff8fddf80] system_call_fastpath at ffffffff8100b072

DESCRIPTION:
The panic is due to kernel referring to corrupted thread_info structure from the
scheduler, thread_info got corrupted by stack overflow. While doing direct I/O
write, user-space pages need to be pre-faulted using __get_user_pages() code
path. This code path is very deep can end up consuming lot of stack space.

RESOLUTION:
Reduced the kernel stack consumption by ~400-500 bytes in this code path by
making various changes in the way pre-faulting is done.

* 3338030 (Tracking ID: 3335272)

SYMPTOM:
The mkfs (make file system) command dumps core when the log size 
provided is not aligned. The following stack trace is displayed:

(gdb) bt
#0  find_space ()
#1  place_extents ()
#2  fill_fset ()
#3  main ()
(gdb)

DESCRIPTION:
While creating the VxFS file system using the mkfs command, if the 
log size provided is not aligned properly, you may end up in doing 
miscalculations for placing the RCQ extents and finding no place. This leads to 
illegal memory access of AU bitmap and results in core dump.

RESOLUTION:
The code is modified to place the RCQ extents in the same AU where 
log extents are allocated.

* 3338057 (Tracking ID: 2970219)

SYMPTOM:
When CPUs are added to the system, the system may panic with the following 
stack trace:
fcache_as_map+0x70 ()
vx_fcache_map+0x1d0 ()
vx_write_default+0x340 ()
vx_write1+0xea0 ()
vx_rdwr+0x1130 ()
rfs3_write+0x5b0 ()
common_dispatch+0xc10 ()
rfs_dispatch+0x40 ()
svc_getreq+0x250 ()
svc_run+0x310 ()
svc_do_run+0xd0 ()
nfssys+0x7c0 ()
hpnfs_nfssys+0x60 ()
coerce_scall_args+0x130 ()
syscall+0x590 ()

DESCRIPTION:
The issue occurs because of the race condition between the vnode-map 
initialization and deinitialization.

RESOLUTION:
The code is modified to add debug messages that will confirm if a race 
condition exists between the vnode-map initialization and deinitialization. The 
debug messages will help gather information if the problem occurs again.

* 3338060 (Tracking ID: 3228646)

SYMPTOM:
NFSv4 server may panic with the following stack trace, when fcntl() requests 
with F_SETLK are made on CFS:
 
vx_vn_inactive+0xf0 
vn_rele_inactive+0x140 
vfs_free_iflist+0x100 
rfs4_op_release_lockowner+0x4c0 
rfs4_compound+0x430 
common_dispatch+0xc10 
rfs_dispatch+0x40 
svc_getreq+0x250 
svc_run+0x300 
svc_do_run+0xd0 
nfssys+0x7c0 
hpnfs_nfssys+0x60

DESCRIPTION:
In a CFS configuration, if fcntl(1m) fails, some NFS specific structures(I_pid) 
are not updated correctly and may point to stale information. This causes the 
NFSv4 server to panic.

RESOLUTION:
The code is modified to preserve the l_pid value during a failed fcntl(F_SETLK) 
request.

* 3338063 (Tracking ID: 3332902)

SYMPTOM:
The system running the fsclustadm(1M) command panics while shutting
down. The following stack trace is logged along with the panic:
machine_kexec
crash_kexec
oops_end
page_fault [exception RIP: vx_glm_unlock]
vx_cfs_frlpause_leave [vxfs]
vx_cfsaioctl [vxfs]
vxportalkioctl [vxportal]
vfs_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath

DESCRIPTION:
There exists a race-condition between "fsclustadm(1M) cfsdeinit"
and "fsclustadm(1M) frlpause_disable". The "fsclustadm(1M) cfsdeinit" fails
after cleaning the Group Lock Manager (GLM),  without downgrading the CFS state.
Under the false CFS state, the "fsclustadm(1M) frlpause_disable" command enters
and accesses the GLM lock, which "fsclustadm(1M) cfsdeinit" frees resulting in a
panic.

There exists another race between the code in vx_cfs_deinit() and the code in
fsck, and it will lead to the situation that although fsck has a reservation
held, but this couldn't prevent vx_cfs_deinit() from freeing vx_cvmres_list
because there is no such a check for vx_cfs_keepcount.

RESOLUTION:
The code is modified to add appropriate checks in the
"fsclustadm(1M) cfsdeinit" and "fsclustadm(1M) frlpause_disable" to avoid the
race-condition.

* 3338749 (Tracking ID: 2444146)

SYMPTOM:
The Oracle Disk Manager (ODM) read returns EINTR while running unspecified 
Oracle jobs. Oracle returns following error:

ORA-01115: IO error reading block from file 11 (block #  <block number>)
 
ORA-01110: data file 11: '/<database file>'
 
ORA-17500: ODM err:ODM ERROR V-41-4-2-281-4 Interrupted system call
 
Oracle says that ODM ERROR V-41-4-2-281-4 Interrupted system call

DESCRIPTION:
In the ODM API version 1, if Oracle specified that IO is interruptible and the 
thread is interrupted then ODM has to indicate that by setting the 
ODM_IO_POSTED flag in the status word of the request. Oracle retries the IO 
after seeing this flag in the status word. ODM did not handle this condition 
and hence returned the error. In the ODM API version 2, all IO waits are non-
interruptible.

RESOLUTION:
The code is modified to check if the IO is interruptible and set the 
appropriate kernel flag.

* 3338755 (Tracking ID: 3066116)

SYMPTOM:
The system panics due to NULL pointer dereference with the following stack 
trace:

a|
bubbleup
vx_worklist_process
vx_worklist_thread
a|

DESCRIPTION:
To prevent too many running inactive threads, two adb 
tunables "vx_inactive_throttling" and "vx_inactive_process_throttling" are 
introduced to fix the issue of vxfsd taking lot of CPU time after deleting some 
large directories.
A bug in the code increments a local counter from 0 to 1. This in turn affects 
inactive work item dispatch. As a result, the empty work items are added to the 
local batch of work items. The system panics while processing this empty work 
item.

RESOLUTION:
The code is modified not to incremented the counter.

* 3338762 (Tracking ID: 3096834)

SYMPTOM:
Intermittent vx_disable messages are displayed in system log.

DESCRIPTION:
VxFS displays intermittent vx_disable messages. The file system is
not corrupt and the fsck(1M) command does not indicate any problem with the file
system. However, the file system gets disabled.

RESOLUTION:
The code is modified to make the vx_disable message verbose with
stack trace information to facilitate further debugging.

* 3338766 (Tracking ID: 3150368)

SYMPTOM:
A periodic sync operation on an Encrypted Volume and File System (EVFS) 
configuration may cause the system to panic with the following stack trace:

evfsevol_strategy()
io_invoke_devsw()
vx_writesuper()
vx_fsetupdate()
vx_sync1()
vx_sync0()
$cold_vx_do_fsext()
vx_workitem_process()
vx_worklist_process()
vx_walk_fslist_threaded()
vx_walk_fslist()
vx_sync_thread()
vx_worklist_thread()
kthread_daemon_startup()

DESCRIPTION:
In the EVFS environment, EVFS may get the STALE or garbage value of b_filevp, 
which is not initialized by Veritas File System (VxFS) causing the system to 
panic.

RESOLUTION:
The code is modified to initialize the b_filevp.

* 3338768 (Tracking ID: 3157624)

SYMPTOM:
The fcntl() system call when used for file share reservations (F_SHARE command) 
can cause a memory leak in Cluster File System (CFS). The memory leak is 
observed in the "ALLOCB_MBLK_LM" arena.
Stack trace for the leak (as seen in HP-UX vmtrace) is as follows:
 
$cold_kmem_arena_varalloc+0xd0
allocb+0x880
llt:llt_msgalloc+0xa0
gab:gab_mem_allocmsg+0x70
gab:gab_allocmsg+0x20
vx_msgalloc+0x70
vx_recv_shrlock+0x60
vx_recv_rpc+0x100

DESCRIPTION:
In CFS, file share reservation requests are broadcasted to all the nodes in the 
cluster to check for conflicts. Due to a bug in the code, the system cannot 
free the response messages received. This results in a memory leak for every 
broadcast of the "file share reservation" message.

RESOLUTION:
The code is modified to free the response message received.

* 3338774 (Tracking ID: 3226462)

SYMPTOM:
On a cluster mounted file-system with unequal CPUs, while doing a lookup 
operation, a node may panic with the stack trace:

vx_dnlc_recent_cookie
vx_dnlc_getpathname
audit_get_pathname_from_dnlc
audit_clean_path
$cold_audit_build_full_dir_name
inline change_p_cdir

DESCRIPTION:
The cause of the panic is of out-of-bounds access in the counters[] array whose 
size is defined by the vx_max_cpu variable. The value of vx_max_cpu can differ 
between the CFS nodes, if the nodes have different number of processors. 
However, the code assumes this value is the same across the cluster.

When propagating inode cookies across the cluster, the counter[] array is 
allocated based on the vx_max_cpu of the current CFS node. If the cookie is 
populated via vx_cbdnlc_populate_cookie(), having a CPU ID from another CFS 
node exceeding the local vx_max_cpu, the function vx_dnlc_recent_cookie() would 
access locations beyond the counter[] array allocated.

RESOLUTION:
The code is modified to detect the out-of-bound access at vx_dnlc_recent_cookie
() and return the ENOENT error.

* 3338776 (Tracking ID: 3224101)

SYMPTOM:
On a file system that is mounted by a cluster, the system panics after you
enable the lazy optimization for updating the i_size across the cluster nodes.
The stack trace may look as follows:
vxg_free()
vxg_cache_free4()
vxg_cache_free()
vxg_free_rreq()
vxg_range_unlock_body()
vxg_api_range_unlock()
vx_get_inodedata()
vx_getattr()
vx_linux_getattr()
vxg_range_unlock_body()
vxg_api_range_unlock()
vx_get_inodedata()
vx_getattr()
vx_linux_getattr()

DESCRIPTION:
On a file system that is mounted by a cluster with the -o cluster option, read
operations or write operations take a range lock to synchronize updates across
the different nodes. The lazy optimization incorrectly enables a node to release
a range lock which is not acquired and panic the node.

RESOLUTION:
The code has been modified to release only those range locks which are acquired.

* 3338779 (Tracking ID: 3252983)

SYMPTOM:
On a high-end system greater than or equal to 48 CPUs, some file-system
operations may hang with the following stack trace:
vx_ilock()
vx_tflush_inode()
vx_fsq_flush()
vx_tranflush()
vx_traninit()
vx_tran_iupdat()
vx_idelxwri_done()
vx_idelxwri_flush()
vx_delxwri_flush()
vx_workitem_process()
vx_worklist_process()
vx_worklist_thread()

DESCRIPTION:
The function to get an inode returns an incorrect error value if there are no
free inodes available in incore, this error value allocates an inode on-disk
instead of allocating it to the incore. As a result, the same function is called
again resulting in a continuous loop.

RESOLUTION:
The code is modified to return the correct error code.

* 3338780 (Tracking ID: 3253210)

SYMPTOM:
When the file system reaches the space limitation, it hangs with the following stack trace:
vx_svar_sleep_unlock()
default_wake_function()
wake_up()
vx_event_wait()
vx_extentalloc_handoff()
vx_te_bmap_alloc()
vx_bmap_alloc_typed()
vx_bmap_alloc()
vx_bmap()
vx_exh_allocblk()
vx_exh_splitbucket()
vx_exh_split()
vx_dopreamble()
vx_rename_tran()
vx_pd_rename()

DESCRIPTION:
When large directory hash is enabled through the vx_dexh_sz(5M) tunable,  Veritas File System (VxFS) uses the large directory hash for directories.
When you rename a file, a new directory entry is inserted to the hash table, which results in hash split. The hash split fails the current transaction and retries after some housekeeping jobs complete. These jobs include allocating more space for the hash table. However, VxFS doesn't check the return value of the preamble job. And thus, when VxFS runs out of space, the rename transaction is re-entered permanently without knowing if more space is allocated by preamble jobs.

RESOLUTION:
The code is modified to enable VxFS to exit looping when ENOSPC is returned from the preamble job.

* 3338787 (Tracking ID: 3261462)

SYMPTOM:
File system with size greater than 16TB corrupts with vx_mapbad messages in the system log.

DESCRIPTION:
The corruption results from the combination of the following two conditions:
a.	Two or more threads race against each other to allocate around the same offset range. As a result, VxFS returns the buffer locked only in shared mode for all the threads which fail in allocating the extent.
b.	Since the allocated extent is from a region beyond 16TB, threads need to convert the buffer to a different type so that to accommodate the new extentAs start value.
 
The buffer overrun happens because VxFS erroneously tries to unconditionally convert the buffer to the new type even though the buffer might not be able to accommodate the converted data.

RESOLUTION:
When the race condition is detected, VxFS returns proper retry errors to the caller, so that the whole operation is retried from the beginning. Also, the code is modified to ensure that VxFS doesnAt try to convert the buffer to the new type when it cannot accommodate the new data. In case this check fails, VxFS performs the proper split logic, so that buffer overrun doesnAt happen when the operation is retried.

* 3338790 (Tracking ID: 3233284)

SYMPTOM:
FSCK binary hangs while checking Reference Count Table (RCT) with the following stack trace:
bmap_search_typed_raw()
bmap_check_typed_raw()
rct_check()
process_device()
main()

DESCRIPTION:
The FSCK binary hangs due to the looping in the bmap_search_typed_raw() function. This function searches for extent entry in the indirect buffer for a given offset. In this case, the given offset is less than the start offset of the first extent entry. This unhandled corner case causes the infinite loop.

RESOLUTION:
The code is modified to handle the following cases:
1. Searching in empty indirect block.
2. Searching for an offset, which is less than the start offset of the first entry in the indirect block.

* 3339230 (Tracking ID: 3308673)

SYMPTOM:
With the delayed allocations feature enabled for local mounted file
system having highly fragmented available free space, the file system is
disabled with  the following message seen in the system log
WARNING: msgcnt 1 mesg 031: V-2-31: vx_disable - /dev/vx/dsk/testdg/testvol file
system disabled

DESCRIPTION:
VxFS transaction provides multiple extent allocations to fulfill one allocation
request for a file system that has a high free space fragmentation. Thus, the
allocation transaction becomes large and fails to commit. After retrying the
transaction for a defined number of times, the file system is disabled with the
with the above mentioned error

RESOLUTION:
The code is modified to commit the part of the transaction which is commit table
and retrying the remaining part

* 3339232 (Tracking ID: 2646933)

SYMPTOM:
Compared with the previous versions, VxFS now takes more time to
process the large sequential writes which are in the order of Gigabytes.

DESCRIPTION:
This issue occurs because the FVF_FLUSH_BHND flag is omitted when
VxFS calls the virtual memory interface of the operating system. The performance
degradation is observed only when the delayed allocation feature is used, which
is enabled by default. It doesn't show any slowness if the file system is
mounted with the cluster option.

RESOLUTION:
VxFS now uses the correct flag when flushing a file.

* 3339884 (Tracking ID: 1949445)

SYMPTOM:
System is unresponsive when files are created on large directory. The following stack is logged:

vxg_grant_sleep()                                             
vxg_cmn_lock()
vxg_api_lock()                                             
vx_glm_lock()
vx_get_ownership()                                                  
vx_exh_coverblk()  
vx_exh_split()                                                 
vx_dexh_setup() 
vx_dexh_create()                                              
vx_dexh_init() 
vx_do_create()

DESCRIPTION:
For large directories, large directory hash (LDH) is enabled to improve the lookup feature. When a system takes ownership of LDH inode twice in same thread context (while building hash for directory), it becomes unresponsive

RESOLUTION:
The code is modified to avoid taking ownership again if we already have the ownership of the LDH inode.

* 3340029 (Tracking ID: 3298041)

SYMPTOM:
While performing "delayed extent allocations" by writing to a file 
sequentially and extending the file's size, or performing a mixture of 
sequential write I/O and random write I/O which extend a file's size, the write 
I/O performance to the file can suddenly degrade significantly.

DESCRIPTION:
The 'dalloc' feature allows VxFS to allocate extents (file system 
blocks) to a file in a delayed fashion when extending a file size. Asynchronous 
writes that extend a file's size will create and dirty memory pages, new 
extents can therefore be allocated when the dirty pages are flushed to disk 
(via background processing) rather than allocating the extents in the same 
context as the write I/O. However, in some cases, with the delayed allocation 
on, the flushing of dirty pages may occur synchronously in the foreground in 
the same context as the write I/O, when triggered the foreground flushing can 
significantly slow the write I/O performance.

RESOLUTION:
The code is modified to avoid the foreground flushing of data in 
the same write context.

* 3351946 (Tracking ID: 3194635)

SYMPTOM:
Internal stress test on locally mounted filesystem exitsed with an error message.

DESCRIPTION:
With a file having Zero-Filled on Demand (ZFOD) extents, a write operation in ZFOD extent area may lead to the coalescing of extent of type SHARED or COMPRESSED, or both with new extent of type DATA. The new DATA extent may be coalesced with the adjacent extent, if possible. If this happens without unsharing for shared extent or uncompressing for compressed extent case, data or metadata corruption may occur.

RESOLUTION:
The code is modified such that adjacent shared, compressed or pseudo-compressed extent is not coalesced.

* 3351947 (Tracking ID: 3164418)

SYMPTOM:
Internal stress test on locally mounted VxFS filesytem results in data corruption in no space on device scenario while doing spilt on Zero Fill-On-Demand(ZFOD) extent.

DESCRIPTION:
When the split operation on Zero Fill-On-Demand(ZFOD) extent fails because of the ENOSPC(no space on device) error, then it erroneously processes the original ZFOD extent and returns no error. This may result in data corruption.

RESOLUTION:
The code is modified to return the ZFOD extent to its original state, if the ZFOD split operation fails due to ENOSPC error.

* 3359278 (Tracking ID: 3364290)

SYMPTOM:
The kernel may panic in Veritas File System (VxFS) when it is internally working
on reference count queue(RCQ) record.

DESCRIPTION:
The work item spawned by VxFS in kernel to process RCQ records during RCQ full
situation is getting passed file system pointer as argument. Since no active
level is held, this file system pointer is not guaranteed to be valid by the
time the workitem starts processing. This may result in the panic.

RESOLUTION:
The code is modified to pass externally visible file system structure, as this
structure is guaranteed to be valid since the creator of the work item takes a
reference held on the structure which is released after the workitem exits.

* 3364285 (Tracking ID: 3364282)

SYMPTOM:
The fsck(1M) command  fails to correct inode list file.

DESCRIPTION:
The fsck(1M) command fails to correct the inode list file to write metadata for the inode list file after writing to disk an extent for the inode list file, if the write operation is successful.

RESOLUTION:
The fsck(1M) command is modified to write metadata for the inode list file after succewrite operations of an extent for the inode list file.

* 3364289 (Tracking ID: 3364287)

SYMPTOM:
Debug assert may be hit in the vx_real_unshare() function in the
cluster environment.

DESCRIPTION:
The vx_extend_unshare() function wrongly looks at the offset immediately after
the current unshare length boundary. Instead, it should look at the offset that
falls on the last byte of current unshare length. This may result in hitting
debug asserts in the vx_real_unshare() function.

RESOLUTION:
The code is modified for the shared compressed extent. When the
vx_extend_unshare() function tries to extend the unshared region, it doesnAt
look up at the first byte immediately after the region is unshared. Instead, it
does a looks up at the last byte unshared.

* 3364302 (Tracking ID: 3364301)

SYMPTOM:
Assert failure because of improper handling of inode lock while truncating a reorg inode.

DESCRIPTION:
While truncating the reorg extent, there may be a case where unlock on inode is called even when 
lock on inode is not taken.While truncating reorg inode, locks held are released and before it acquires them 
again, it checks if the inode is cluster inode. if true, it goes for taking delegation hold lock. If there 
was error while taking delegation hold lock, it comes to error code path. Here it checks if there was any 
transaction and if it had tran commitable error. It commits the transaction and on success calls the unlock 
to release the locks which was not held.

RESOLUTION:
The code is modified to check whether lock is taken or not before unlocking.

* 3364307 (Tracking ID: 3364306)

SYMPTOM:
Stack overflow seen in extent allocation code path.

DESCRIPTION:
Stack overflow appears in the vx_extprevfind() code path.

RESOLUTION:
The code is modified to hand-off the extent allocation to a worker thread when stack consumption reaches 4k.

* 3364317 (Tracking ID: 3364312)

SYMPTOM:
The fsadm(1M) command is unresponsive while processing the VX_FSADM_REORGLK_MSG message. The following stack trace may be seen while processing VX_FSADM_REORGLK_MSG:

vx_tranundo()
vx_do_rct_gc()
vx_rct_setup_gc()
vx_reorg_complete_gc()
vx_reorg_complete()
vx_reorg_clear_rct()
vx_reorg_clear()
vx_reorg_clear()
vx_recv_fsadm_reorglk()
vx_recv_fsadm()
vx_msg_recvreq()
vx_msg_process_thread()
vx_thread_base()

DESCRIPTION:
In the vx_do_rct_gc() function, flag for in-directory cleanup is set for a shared indirect extent (SHR_IADDR_EXT). If the truncation fails, the vx_do_rct_gc()function does not clear the in-directory cleanup flag. As a result, the caller ends up calling the vx_do_rct_gc()function repeatedly leading to a never ending loop.

RESOLUTION:
The code is modified to reset the value of in-directory cleanup flag in case of truncation error inside the vx_do_rct_gc() function.

* 3364333 (Tracking ID: 3312897)

SYMPTOM:
In Cluster File System (CFS), system can hang while trying to perform any administrative operation when the primary node is disabled.

DESCRIPTION:
In CFS, when node 1 tries to do some administrative operation which freezes and thaws the file system (e.g. turning on/off fcl), a deadlock can occur between the thaw and recovery (which started due to CFS primary being disabled) threads. The thread on node 1 trying to thaw is blocked while waiting for node 2 to reply to the loadfs message. The thread processing the loadfs message is waiting to complete the recovery operation. The recovery thread on node 2 is waiting for lock on an extent map (emap) buffer. This lock is held on node 1, as part of a transaction that was committed during the freeze, which results into a deadlock.

RESOLUTION:
The code is modified such as to flush any transactions that were committed during a freeze before starting the thawing process.

* 3364335 (Tracking ID: 3331109)

SYMPTOM:
The full fsck does not repair corrupted reference count queue (RCQ) record.

DESCRIPTION:
When the RCQ record is corrupted due to an I/O error or log error, there is no code in full fsck which handles this corruption.
As a result, some further operations related to RCQ might fail.

RESOLUTION:
The code is modified To repair the corrupt RCQ entry during a full fsck.

* 3364338 (Tracking ID: 3331045)

SYMPTOM:
Kernel Oops in unlock code of map while referring freed mlink due to a race with iodone routine for delayed writes.

DESCRIPTION:
After issuing ASYNC I/O of map buffer, there is a possible race Between the vx_unlockmap() function and the vx_mapiodone() function. Due to a race, the vx_unlockmap() function refers a mlink after it gets freed.

RESOLUTION:
The code is modified to handle such race condition.

* 3364349 (Tracking ID: 3359200)

SYMPTOM:
Internal test on Veritas File System (VxFS) fsdedup(1M) feature in cluster filesystem environment results in
a hang.

DESCRIPTION:
The thread which processes the fsdedup(1M) request is taking the delegation lock on extent map which itself is waiting to acquire a lock on cluster-wide reference count queue(RCQ) buffer. While other internal VxFS thread is working on RCQ takes lock on cluster-wide RCQ buffer and is waiting to acquire delegation lock on extent map causinga deadlock.

RESOLUTION:
The code is modified to correct the lock hierarchy such that the  delegation lock on extent map is taken before taking lock on cluster-wide RCQ buffer.

* 3370650 (Tracking ID: 2735912)

SYMPTOM:
The performance of tier relocation for moving a large number of files is poor 
when the `fsppadm enforce' command is used.  When looking at the fsppadm(1M) 
command in the kernel, the following stack trace is observed:

vx_cfs_inofindau 
vx_findino
vx_ialloc
vx_reorg_ialloc
vx_reorg_isetup
vx_extmap_reorg
vx_reorg
vx_allocpolicy_enforce
vx_aioctl_allocpolicy
vx_aioctl_common
vx_ioctl
vx_compat_ioctl

DESCRIPTION:
When the relocation is for each file located in Tier 1 to be relocated to Tier 
2, Veritas File System (VxFS) allocates a new reorg inode and all its extents 
in Tier 2. VxFS then swaps the content of these two files and deletes the 
original file. This new inode allocation which involves a lot of processing can 
result in poor performance when a large number of files are moved.

RESOLUTION:
The code is modified to develop a reorg inode pool or cache instead of 
allocating it each time.

* 3372909 (Tracking ID: 3274592)

SYMPTOM:
Internal noise test on Cluster File System (CFS)is unresponsive while executing the fsadm(1M) command

DESCRIPTION:
In CFS, the fsadm(1M) command hangs in the kernel, while processing the fsadm-reorganisation message on a secondary node. The hang results due to a race with the thread processing fsadm-query message for mounting primary-fileset on secondary node where the thread processing fsadm-query message wins the race.

RESOLUTION:
The code is modified to synchronize the processing of fsadm-query message and fsadm-reorganization message on the primary node. This synchronization ensures that they are processed in the order in which they were received.

* 3380905 (Tracking ID: 3291635)

SYMPTOM:
Internal testing found the Avx_freeze_block_threads_all:7cA debug assert on locally mounted file systems while processing preambles for transactions

DESCRIPTION:
While processing preambles for transactions, if reference count queue (RCQ) is full, VxFS may hamper the processing of RCQ to free some records. This may result in hitting the debug assert.

RESOLUTION:
The code is modified to ignore the Reference count queue (RCQ) full errors when VxFS processes preambles for transactions.

* 3387358 (Tracking ID: 3349634)

SYMPTOM:
Assert failure if tried to write on snapped allocated HOLE.

DESCRIPTION:
In case, File has a big allocated HOLE and if snap of such file has 
been taken, writing to such file or snapped file at that HOLE offset can result 
into assert failure because the HOLE extent is marked as a shared between file 
and snap file. Ideally if tried to write on that particular HOLE offset on any 
file, we should allocate a new extent. While allocating the extent for new 
write, we are getting the same SHARED HOLE extent from inode's bmap cache 
instead of new one. This results in assert failure.

RESOLUTION:
Handled such a situation by invalidating inode's bmapcache and 
unsharing such shared allocated HOLE.

* 3396539 (Tracking ID: 3331093)

SYMPTOM:
MountAgent got stuck while doing repeated switchover due to current
VxFS-AMF notification/unregistration design with the following stacktrace:

sleep_spinunlock+0x61 ()
vx_delay2+0x1f0 ()
vx_unreg_callback_funcs_impl+0xd0 ()
disable_vxfs_api+0x190 ()
text+0x280 ()
amf_event_release+0x230 ()
amf_fs_event_lookup_notify_multi+0x2f0 ()
amf_vxfs_mount_opt_change_callback+0x190 ()
vx_aioctl_unsetmntlock+0x390 ()
cold_vx_aioctl_common+0x7c0 ()
vx_aioctl+0x300 ()
vx_admin_ioctl+0x610 ()
vxportal_ioctl+0x690 ()
spec_ioctl+0xf0 () 
vno_ioctl+0x350 ()
ioctl+0x410 ()
syscall+0x5b0 ()

DESCRIPTION:
This issue is related to VxFS-AMF interface. VxFS provides
notifications to AMF for certain events like FS being disabled or mount options
change. While VxFS has called into AMF, AMF event handling mechanism can trigger
an unregistration of VxFS in the same context since VxFS's notification
triggered the last event notification registered with AMF.

Before VxFS calls into AMF, a variable vx_fsamf_busy is set to 1 and it is reset
when the callback returns. The unregistration loops if it finds that
vx_fsamf_busy is set to 1. Since unregistration was called from the same context
of the notification call back, the vx_fsamf_busy was never set to 0 and the loop
goes on endlessly causing the command that triggered the notification to hang.

RESOLUTION:
A delayed unregistration mechanism is employed. The fix addresses
the issue of getting unregistration from AMF in the context of callback from
VxFS to AMF. In such scenario, the unregistration is marked for a later time.
When all the notifications return and if a delayed unregistration is marked, the
unregistration routine is explicitly called.

* 3402484 (Tracking ID: 3394803)

SYMPTOM:
The vxupgrade(1M) command causes VxFS to panic with the following stack trace:
panic_save_regs_switchstack()
panic
bad_kern_reference()
$cold_pfault()
vm_hndlr()
bubbleup()
vx_fs_upgrade()
vx_upgrade()
$cold_vx_aioctl_common()
vx_aioctl()
vx_ioctl()
vno_ioctl()
ioctl()
syscall()

DESCRIPTION:
The panic is caused due to de_referencing the operator in the NULL device (one
of the devices in the DEVLIST is showing as a NULL device).

RESOLUTION:
The code is modified to skip the NULL devices when the device in EVLIST is
processed.

* 3405172 (Tracking ID: 3436699)

SYMPTOM:
Assert failure occurs because of race condition between clone mount thread and directory removal thread while pushing data on clone.

DESCRIPTION:
There is a race condition between clone mount thread and directory removal thread (while pushing modified directory data on clone). On AIX, vnodes are added into the VFS vnode list (link-list of vnodes). The first entry to this vnode link-list must be root's vnode, which was done during the mount process. While mounting a clone, mount thread is scheduled before adding root's vnode into this list. During this time, the thread 2 takes the VFS lock on the same VFS list and tries to enter the directory's vnode into this vnode list. As there was no root vnode present at the start, it is assumed that this directory vnode as a root vnode and while cross checking this with the VROOT flag, the assert fails.

RESOLUTION:
The code is modified to handle the race condition by attaching root vnode into VFS vnode list before setting VFS pointer into file set.

* 3409617 (Tracking ID: 3369049)

SYMPTOM:
VxFS 6.0.3 file system may hang with partitioned directory (PD) enabled on HP-UX 11.31 with the following stack trace:
vx_recsmp_rangelock()
inline vx_irwlock()
inline vx_irwlock()
vx_rwlock4()
vx_int_rename()
vx_do_rename()
vx_rename1()

DESCRIPTION:
When partitioned directory (PD) of VxFS is enabled, a deadlock may occur if there are multiple renaming threads operating on the same target directory. The issue occurs because the renaming threads can issue blocking calls to acquire locks on multiple directories involved in the rename operation. And there is no definite order in the locking hierarchy vis-a-vis multiple directory locks.

RESOLUTION:
The code is modified to call the trylock operation, instead of blocking calls on source directory. If the trylock operation fails, retry the rename operation.

* 3430687 (Tracking ID: 3444775)

SYMPTOM:
Internal noise testing on Cluster File System (CFS) results in a kernel panic in function vx_fsadm_query()with the following error message "Unable to handle kernel paging request".

DESCRIPTION:
The issue occurs due to simultaneous asynchronous access or modification by two threads to inode list extent array. As a result, memory freed by one thread is accessed by the other thread, resulting in the panic.

RESOLUTION:
The code is modified to add relevant locks to synchronize access or modification of inode list extent array.

* 3436393 (Tracking ID: 3462694)

SYMPTOM:
The fsdedupadm(1M) command fails with error code 9 when it tries to
mount checkpoints on a cluster.

DESCRIPTION:
While mounting checkpoints, the fsdedupadm(1M) command fails to
parse the cluster mount option correctly, resulting in the mount failure.

RESOLUTION:
The code is modified to parse cluster mount options correctly in the
fsdedupadm(1M) operation.

Patch ID: PVKL_03971

* 2933290 (Tracking ID: 2756779)

SYMPTOM:
Write and read performance concerns on Cluster File System (CFS) when running 
applications that rely on POSIX file-record locking (fcntl).

DESCRIPTION:
The usage of fcntl on CFS leads to high messaging traffic across nodes thereby 
reducing the performance of readers and writers.

RESOLUTION:
The code is modified to cache the ranges that are being file-record locked on 
the node. This is tried whenever possible to avoid broadcasting of messages 
across the nodes in the cluster.

* 2933291 (Tracking ID: 2806466)

SYMPTOM:
A reclaim operation on a file system that is mounted on an LVM volume using the
fsadm(1M) command with the -R option may panic the system. And the following
stack trace is displayed:
vx_dev_strategy+0xc0() 
vx_dummy_fsvm_strategy+0x30() 
vx_ts_reclaim+0x2c0() 
vx_aioctl_common+0xfd0() 
vx_aioctl+0x2d0() 
vx_ioctl+0x180()

DESCRIPTION:
Thin reclamation supports only mounted file systems on a VxVM volume.

RESOLUTION:
The code is modified to return errors without panicking the system if the
underlying volume is LVM.

* 2933292 (Tracking ID: 2895743)

SYMPTOM:
It takes a longer than usual time for many Windows7 clients to log off in 
parallel if the user profile is stored in Cluster File system (CFS).

DESCRIPTION:
Veritas File System (VxFS) keeps file creation time/full ACL things for samba 
clients in the extended attribute which is implemented via named streams. VxFS 
reads the named stream for each of the ACL objects. Reading of named stream is 
a costly operation, as it results in an open, an opendir, a lookup, and another 
open to get the fd. The VxFS function vx_nattr_open() holds the exclusive 
rwlock to read an ACL object that stored as extended attribute. It may cause 
heavy lock contention when many threads want the same lock. They might get 
blocked until one of the nattr_open releases it. This takes time since 
nattr_open is very slow.

RESOLUTION:
The code is modified so that it takes the rwlock in shared mode instead of 
Exclusive mode.

* 2933294 (Tracking ID: 2750860)

SYMPTOM:
Performance of the write operation with small request size may degrade
on a large Veritas File System (VxFS) file system. Many threads may be found
sleeping with the following stack trace:

vx_sleep_lock
vx_lockmap
vx_getemap
vx_extfind
vx_searchau_downlevel
vx_searchau_downlevel
vx_searchau_downlevel
vx_searchau_downlevel
vx_searchau_uplevel
vx_searchau+0x600
vx_extentalloc_device
vx_extentalloc
vx_te_bmap_alloc
vx_bmap_alloc_typed
vx_bmap_alloc
vx_write_alloc3
vx_recv_prealloc
vx_recv_rpc
vx_msg_recvreq
vx_msg_process_thread
kthread_daemon_startup

DESCRIPTION:
A VxFS allocate unit (AU) is composed of 32768 disk blocks, and can
be expanded when it is partially allocated, or non-expanded when the AU is fully
occupied or completely unused. The extent map for a large file system with 1k
block size is organized as a big tree. For example, a 4-TB file system with 1KB
file system block size can have up to 128k Aus. To find an appropriate extent,
VxFS extent allocation algorithm will first search expanded AU to avoid causing
free space fragmentation by traversing the free extent map tree. If getting
failed, it will do the same with the non-expanded AUs. When there are too many
small extents(less than 32768 blocks) requests, and all the small free extents
are used up, but a large number of au-size extents (32768 blocks) are available;
the file system could run into this hang. Because of no small available extents
in the expanded AUs, VxFS will look for some larger non-expanded extents, namely
au-size extents, which are not what VxFS wanted (expanded AU is
expected). As a result, each request will walk along the big extent map tree for
every au-size extent, which will end up with failure finally. The requested
extent can be gotten during the second attempt for non-expanded AUs eventually,
but the unnecessary work consumes a lot of CPU resource.

RESOLUTION:
The code is modified to optimize the free-extend-search algorithm by
skipping certain au-size extents to reduce the overall search time.

* 2933295 (Tracking ID: 2730759)

SYMPTOM:
The sequential read performance is poor because of the read-ahead issues.

DESCRIPTION:
The read-ahead on sequential reads performed incorrectly because of wrong read-
advisory and the read-ahead pattern offsets are used to detect and perform the 
read-ahead. Also, more sync reads are performed which can affect the 
performance.

RESOLUTION:
The code is modified and the read-ahead pattern offsets are updated correctly 
to detect and perform the read-ahead at the required offsets. The read-ahead 
detection is also modified to reduce the sync reads.

* 2933296 (Tracking ID: 2923105)

SYMPTOM:
Removing the Veritas File System (VxFS) module using rmmod(8) on a system 
having heavy buffer cache usage may hang.

DESCRIPTION:
When a large number of buffers are allocated from the buffer cache, at the time 
of removing VxFS module, the process of freeing the buffers takes a long time.

RESOLUTION:
The code is modified to use an improved algorithm which prevents it from 
traversing the free lists even if it has found the free chunk. Instead, it will 
break out from the search and free that buffer.

* 2933309 (Tracking ID: 2858683)

SYMPTOM:
The reserve-extent attributes are changed after the vxrestore(1M ) operation, 
for files that are greater than 8192 bytes.

DESCRIPTION:
A local variable is used to contain the number of the reserve bytes that are 
reused during the vxrestore(1M) operation, for further VX_SETEXT ioctl call for 
files that are greater than 8k. As a result, the attribute information is 
changed.

RESOLUTION:
The code is modified to preserve the original variable value till the end of 
the function.

* 2933313 (Tracking ID: 2841059)

SYMPTOM:
The file system gets marked for a full fsck operation and the following message 
is displayed in the system log:

V-2-96: vx_setfsflags 
<volume name> file system fullfsck flag set - vx_ierror

vx_setfsflags+0xee/0x120 
vx_ierror+0x64/0x1d0 [vxfs]
vx_iremove+0x14d/0xce0 
vx_attr_iremove+0x11f/0x3e0
vx_fset_pnlct_merge+0x482/0x930
vx_lct_merge_fs+0xd1/0x120
vx_lct_merge_fs+0x0/0x120 
vx_walk_fslist+0x11e/0x1d0
vx_lct_merge+0x24/0x30 
vx_workitem_process+0x18/0x30
vx_worklist_process+0x125/0x290
vx_worklist_thread+0x0/0xc0
vx_worklist_thread+0x6d/0xc0
vx_kthread_init+0x9b/0xb0 

 V-2-17: vx_iremove_2
<volume name>: file system inode 15 marked bad incore

DESCRIPTION:
Due to a race condition, the thread tries to remove an attribute inode that has 
already been removed by another thread. Hence, the file system is marked for a 
full fsck operation and the attribute inode is marked as 'bad ondisk'.

RESOLUTION:
The code is modified to check if the attribute node that a thread is trying to 
remove has already been removed.

* 2933337 (Tracking ID: 2616622)

SYMPTOM:
The performance of the mmap() function is slow when the file system block size 
is 8 KB and the page size is 4 KB.

DESCRIPTION:
When the file system block size is 8 KB, the page size is 4 KB, and the mmap() 
function is performed on an 8 KB file, the file gets represented in memory as 
two pages (0 and 1). When the memory at offset 0 in the mapping is modified, a 
page fault occurs for page 0 in the file. When that disk 
block is allocated and marked as valid, the page mentioned in the fault request 
is expected to get flushed out to the disk and therefore, it is left 
uninitialized on the disk by default. Only that particular page is cleaned in 
memory and left modified so that it is known that the data in memory ismore 
recent than the data on disk. However, the other half of the block (which could 
eventually be mapped to page 1) gets cleared with a synchronous write because 
such a fault may not occur. This synchronous clearing of the other half of 8 KB 
block causes performance degradation.

RESOLUTION:
The code is modified to expand the range of the fault to cover the entire 8 KB 
block. The message from the OS asking for only one page is ignored and two 
pages are given to cover the entire file system block to save the separate 
synchronous clearing of the other half of 8 KB block.

* 2933571 (Tracking ID: 2417858)

SYMPTOM:
When the hard/soft limit of quota is specified above 1TB, the command 
fails and gives error.

DESCRIPTION:
We store the quota records corresponding to users in the external and 
internal quota files. In external quota file, the record are in the form of 
structure which are 32 bit. So, we can specify the block limits upto 32-bit 
value (1TB). This limit was insufficient in many cases.

RESOLUTION:
Made use of 64-bit structures and 64-bit limit macros to let users 
have usage/limits greater than 1 TB.

* 2933751 (Tracking ID: 2916691)

SYMPTOM:
fsdedup infinite loop with the following stack:

#5 [ffff88011a24b650] vx_dioread_compare at ffffffffa05416c4 
#6 [ffff88011a24b720] vx_read_compare at ffffffffa05437a2 
#7 [ffff88011a24b760] vx_dedup_extents at ffffffffa03e9e9b 
#11 [ffff88011a24bb90] vx_do_dedup at ffffffffa03f5a41 
#12 [ffff88011a24bc40] vx_aioctl_dedup at ffffffffa03b5163

DESCRIPTION:
vx_dedup_extents() do the following to dedup two files:

1.	Compare the data extent of the two files that need to be deduped.
2.	Split both files' bmap to make them share the first file's common data 
extent.
3. Free the duplicate data extent of the second file.

In step 2, During bmap split, vx_bmap_split() might need to allocate space for
the inode's bmap to add new bmap entries, which will add emap to this
transaction. (This condition is more likely to hit if the dedup is being run on
two large files that have interleaved duplicate/difference data extents, the
files bmap will needed to be splited more in this case)

In step 3, vx_extfree1() doesn't support Multi AU extent free if there is
already an emap in the same transaction,
In this case, it will return VX_ETRUNCMAX. (Please see incident e569695 for
history of this limitation)

VX_ETRUNCMAX is a retirable error, so vx_dedup_extents() will undo everything in
the transaction and retry from the beginning, then hit the same error again.
Thus infinite loop.

RESOLUTION:
We make vx_te_bmap_split() always register an transaction preamble for the bmap
split operation in dedup, and let vx_dedup_extents() perform the preamble at a
separate transaction before it retry the dedup operation.

* 2933822 (Tracking ID: 2624262)

SYMPTOM:
Panic hit in vx_bc_do_brelse() function while executing dedup functionality with 
following backtrace.
vx_bc_do_brelse()
vx_mixread_compare()
vx_dedup_extents()
enqueue_entity()
__alloc_pages_slowpath()
__get_free_pages()
vx_getpages()
vx_do_dedup()
vx_aioctl_dedup()
vx_aioctl_common()
vx_rwunlock()
vx_aioctl()
vx_ioctl()
vfs_ioctl()
do_vfs_ioctl()
sys_ioctl()

DESCRIPTION:
While executing function vx_mixread_compare() in dedup codepath, we hit error 
due 
to which an allocated data structure remained uninitialised.
The panic occurs due to writing to this uninitialised allocated data structure 
in 
the function vx_mixread_compare().

RESOLUTION:
Code is changed to free the memory allocated to the data structure when we are 
going out due to error.

* 2937367 (Tracking ID: 2923867)

SYMPTOM:
Got assert hit due to VX_RCQ_PROCESS_MSG having lower
priority(Numerically) than VX_IUPDATE_MSG;

DESCRIPTION:
When primary is going to send VX_IUPDATE_MSG message to the owner
of the inode about updation of the inode's non-transactional field change then
it checks for the current messaging priority(for VX_RCQ_PROCESS_MSG) with the
priority of the message being sent(VX_IUPDATE_MSG) to avoid possible deadlock.
In our case we were getting VX_RCQ_PROCESS_MSG priority numerically lower than 
VX_IUPDATE_MSG thus getting assert hit.

RESOLUTION:
We have changed the VX_RCQ_PROCESS_MSG priority numerically higher
than VX_IUPDATE_MSG thus avoiding possible assert hit.

* 2976664 (Tracking ID: 2906018)

SYMPTOM:
In the event of a system crash, the fsck-intent-log is not replayed and file 
system is marked clean. Subsequently, mounting the file-system-extended 
operations is not completed.

DESCRIPTION:
Only when a file system that contains PNOLTs is mounted locally (mounted 
without using 'mount -o cluster') are potentially exposed to this issue. 

The reason why fsck silently skips the intent-log replay is that each PNOLT has 
a flag to identify whether the intent-log is dirty or not - in the event of a 
system crash this flag signifies whether intent-log replay is required or not. 
In the event of a system crash whilst the file system was mounted locally and 
the PNOLTs are not utilized. The fsck intent-log replay will still check for 
the flags in the PNOLTs, however, these are the wrong flags to check if the 
file system was locally mounted. The fsck intent-log replay therefore assumes 
that the intent-logs are clean (because the PNOLTs are not marked dirty) and it 
therefore skips the replay of intent-log altogether.

RESOLUTION:
The code is modified such that when PNOLTs exist in the file system, VxFS will 
set the dirty flag in the CFS primary PNOLT while mounting locally. With this 
change, in the event of system crash whilst a file system is locally mounted, 
the subsequent fsck intent-log replay will correctly utilize the PNOLT 
structures and successfully replay the intent log.

* 2978227 (Tracking ID: 2857751)

SYMPTOM:
The internal testing hits the assert "f:vx_cbdnlc_enter:1a" when the upgrade was
in progress.

DESCRIPTION:
The clone/fileset should be mounted if there is an attempt to add an entry in
the dnlc. If the clone/fileset is not mounted and still there is an attempt to
add it to dnlc, then it is not valid.

RESOLUTION:
Fix is added to check if filset is mounted or not before adding an entry to dnlc.

* 2984589 (Tracking ID: 2977697)

SYMPTOM:
Deleting checkpoints of file systems with character special device
files viz. /dev/null using fsckptadm may panic the machine with the following
stack trace:
vx_idetach
vx_inode_deinit
vx_idrop
vx_inull_list
vx_workitem_process
vx_worklist_process
vx_worklist_thread
vx_kthread_init

DESCRIPTION:
During the checkpoint removal operation the type of the inodes is
converted to 'pass through inode'.  During a conversion we try to refer to the
device reference for the special file, which is invalid in the clone context
leading to a panic.

RESOLUTION:
The code is modified to remove device reference of the special character files
during the clone removal operation thus preventing the panic.

* 2987373 (Tracking ID: 2881211)

SYMPTOM:
File ACLs not preserved in checkpoints properly if file has hardlink. Works fine
with file ACLs which don't have hardlinks.

DESCRIPTION:
This issue is with attribute inode. When we add an acl entry, if its in the
immediate area its propagated to the clone . But in the case if attribute inode
is created, its not being propagated to the checkpoint. We are missing push in
the context of attribute inode and so getting this issue.

RESOLUTION:
Modified the code to propagate the ACLs entries (attribute inode case) to the clone.

* 2999582 (Tracking ID: 2850730)

SYMPTOM:
VxFS internal tests -> LM Conformance hit an assert "f:vx_do_getpage:6b, 3" and
panics.

DESCRIPTION:
The assertion "vx_do_getpage:6b, 3" was originally added to capture the cases where
shared/compressed extents are unshared/uncompressed properly prior to
any write side fault in the given range. However, this assumption might
fail in presence of superpages.

In absence of this assert, the code will proceed further. There, the
pages in the range unshared/uncompressed during vx_write1() will be
promoted from Read Only to RW. The pages in extended range, which are
still backed by compressed/shared extents will remain marked as Read
Only. This uneven page attributes will break the superpage & things will
proceed as normal.

RESOLUTION:
Relaxed the assert.

Patch ID: PVKL_03964

* 2912412 (Tracking ID: 2857629)

SYMPTOM:
When a new node takes over a primary for the file system, it could 
process stale shared extent records in a per node queue.  The primary will 
detect a bad record and set the full fsck flag.  It will also disable the file 
system to prevent further corruption.

DESCRIPTION:
Every node in the cluster that adds or removes references to shared extents, 
adds the shared extent records to a per node queue.  The primary node in the 
cluster processes the records in the per node queues and maintains reference 
counts in a global shared extent device.  In certain cases the primary node 
might process bad or stale records in the per node queue.  Two situations under 
which bad or stale records could be processed are:
    1. clone creation initiated from a secondary node immediately after primary 
migration to different node.
    2. queue wraparound on any node and take over of primary by new node 
immediately afterwards.
Full fsck might not be able to rectify the file system corruption.

RESOLUTION:
Update the per node shared extent queue head and tail pointers to correct values 
on primary before starting processing of shared extent records.

* 2923805 (Tracking ID: 2590918)

SYMPTOM:
Upon new node in the cluster taking over as primary of the file system, 
there might be a significant delay in freeing up unshared extents.  This problem 
can occur only in the case when shared extent addition or deletions occurred 
immediately after primary switch over to different node in the cluster.

DESCRIPTION:
When a new node in the cluster takes over as primary for the file 
system, a file system thread in the new primary performs a full scan of the 
shared extent device file to free up any shared extents that have become 
completely unshared.  If heavy shared extent related activity such as additional 
sharing or unsharing of extents were to occur anywhere in the cluster while the 
full scan was being performed, the full scan could get interrupted.  Due to a 
bug, the full scan is marked as completed and scheduled further scans of the 
shared extent device are partial scans.  This will cause a substantial delay in 
freeing up some of the unshared extents in the device file.

RESOLUTION:
If the first full scan of shared extent device upon primary takeover 
gets interrupted, then do not mark the full scan as complete.



INSTALLING THE PATCH
--------------------
Run the Installer script to automatically install the patch:
-----------------------------------------------------------
To install the patch perform the following steps on at least one node in the cluster:
1. Copy the hot-fix fs-hpux1131-6.0.5.100-patches.tar.gz to /tmp
2. Untar fs-hpux1131-6.0.5.100-patches.tar.gz to /tmp/hf
    # mkdir /tmp/hf
    # cd /tmp/hf
    # gunzip /tmp/fs-hpux1131-6.0.5.100-patches.tar.gz
    # tar xf /tmp/fs-hpux1131-6.0.5.100-patches.tar
3. Install the hotfix
    # pwd /tmp/hf
    # ./installFS605P1 [<host1> <host2>...]

Install the patch manually:
--------------------------
Installing VxFS 6.0.1 patch:
a)If you install this patch on a CVM cluster, install it one  system at a time so that all the nodes are not brought down  simultaneously.
b)VxFS 6.0.1(GA)  must be installed before applying these   patches.
c)To verify the VERITAS file system level, enter:
     # swlist -l product | egrep -i 'VRTSvxfs'
  VRTSvxfs              6.0.100.000    VERITAS File System
d)All prerequisite/corequisite patches have to be installed.The Kernel patch requires a system reboot for both installation and removal.
e)To install the patch, enter the following command:
# swinstall -x autoreboot=true -s <patch_directory> PVKL_04037 PVCO_04036 
Incase the patch is not registered, the patch can be registered using the following command:
# swreg -l depot <patch_directory>        ,
where  <patch_directory>         is the absolute path where the patch resides.


REMOVING THE PATCH
------------------
a)To remove the patch, enter the following command:
# swremove -x autoreboot=true PVKL_04037 PVCO_04036


SPECIAL INSTRUCTIONS
--------------------
NONE


OTHERS
------
NONE