fs-hpux1131-5.1SP1RP3P11

 Basic information
Release type: P-patch
Release date: 2018-03-19
OS update support: None
Technote: None
Documentation: None
Popularity: 885 viewed    downloaded
Download size: 68.18 MB
Checksum: 3381911975

 Applies to one or more of the following products:
Storage Foundation 5.1SP1 On HP-UX 11i v3 (11.31)
Storage Foundation Cluster File System 5.1SP1 On HP-UX 11i v3 (11.31)
Storage Foundation for Oracle RAC 5.1SP1 On HP-UX 11i v3 (11.31)
Storage Foundation HA 5.1SP1 On HP-UX 11i v3 (11.31)

 Obsolete patches, incompatibilities, superseded patches, or other requirements:

This patch supersedes the following patches: Release date
fs-hpux1131-5.1SP1RP3P10 (obsolete) 2017-09-20
fs-hpux1131-5.1SP1RP3P9 (obsolete) 2017-07-06
fs-hpux1131-5.1SP1RP3P8 (obsolete) 2017-01-23
fs-hpux1131-5.1SP1RP3P4 (obsolete) 2015-05-22
fs-hpux1131-5.1SP1RP3P2 (obsolete) 2014-08-04

 Fixes the following incidents:
2755784, 2801689, 2857465, 2930507, 2932216, 2937310, 3011828, 3024028, 3024042, 3024049, 3024052, 3024088, 3042340, 3042341, 3042352, 3042357, 3042373, 3042407, 3042427, 3042479, 3042501, 3047980, 3073371, 3131795, 3131824, 3131885, 3131920, 3138653, 3138663, 3138668, 3138675, 3138695, 3141278, 3141428, 3141433, 3141440, 3141445, 3142476, 3159607, 3160205, 3207096, 3226404, 3235517, 3243204, 3248982, 3249151, 3261334, 3261782, 3261849, 3262025, 3396530, 3410567, 3435207, 3471150, 3471152, 3471165, 3484316, 3526848, 3527418, 3537201, 3537431, 3560610, 3561998, 3597563, 3615527, 3615530, 3615532, 3660347, 3669985, 3669994, 3682335, 3706705, 3751305, 3754049, 3755915, 3769384, 3796751, 3800361, 3803799, 3803825, 3803849, 3829948, 3831186, 3832381, 3832587, 3859665, 3860554, 3864335, 3868662, 3873853, 3875839, 3876990, 3877524, 3878643, 3879382, 3879793, 3879796, 3879805, 3879887, 3880073, 3889125, 3896397, 3900972, 3910815, 3916659, 3920069, 3926568, 3934419

 Patch ID:
PHCO_44712
PHKL_44710

Readme file
                          * * * READ ME * * *
              * * * Veritas File System 5.1 SP1 RP3 * * *
                         * * * P-patch 11 * * *
                         Patch Date: 2018-03-15


This document provides the following information:

   * PATCH NAME
   * OPERATING SYSTEMS SUPPORTED BY THE PATCH
   * PACKAGES AFFECTED BY THE PATCH
   * BASE PRODUCT VERSIONS FOR THE PATCH
   * SUMMARY OF INCIDENTS FIXED BY THE PATCH
   * DETAILS OF INCIDENTS FIXED BY THE PATCH
   * INSTALLATION PRE-REQUISITES
   * INSTALLING THE PATCH
   * REMOVING THE PATCH


PATCH NAME
----------
Veritas File System 5.1 SP1 RP3 P-patch 11


OPERATING SYSTEMS SUPPORTED BY THE PATCH
----------------------------------------
HP-UX 11i v3 (11.31)


PACKAGES AFFECTED BY THE PATCH
------------------------------
VRTSvxfs


BASE PRODUCT VERSIONS FOR THE PATCH
-----------------------------------
   * Veritas Storage Foundation 5.1 SP1
   * Veritas Storage Foundation Cluster File System 5.1 SP1
   * Veritas Storage Foundation for Oracle RAC 5.1 SP1
   * Veritas Storage Foundation HA 5.1 SP1


SUMMARY OF INCIDENTS FIXED BY THE PATCH
---------------------------------------
Patch ID: PHKL_44710, PHCO_44712
* 3926568 (3926565) Kernel tunable vx_ninode(5) has incorrect dependency on another tunable ninode(5).
* 3934419 (3934418) System hang may be observed during massive file removal on a large filesystem
with smaller filesystem block size.
Patch ID: PHKL_44651
* 3831186 (3784990) CPU contention is observed during file system freeze.
* 3920069 (3915962) vx_do_setacl results in panic as the ACL buffer is invalid.
Patch ID: PHKL_44613
* 3910815 (3910526) fsadm fails with error number 28 during resize operation
* 3916659 (3910248) ODM IO may get hung while performing read/write operations.
Patch ID: PHKL_44567
* 3860554 (3873078) Read performance degraded due to smaller initial read ahead size.
* 3877524 (3869091) Mmap write operation receives a SIGBUS signal due to ENOSPC error when the file
system has less free space than the page size.
* 3896397 (3896396) Performance degradation may be observed in case of fcache invalidation of read
ahead pages.
* 3900972 (3900971) System call sendfile may hang in retrieving pages in VxFS
Patch ID: PHKL_44512
* 3868662 (3868661) The vxfsstat(1M) command displays incorrect file system statistics.
For example:
$vxfsstat -v <mount_point>
---
vxi_alloc_emap             -1085366410684661760    vxi_alloc_smap              
    -1
vxi_alloc_expand_retry     -6317191817700346753    vxi_alloc_find_retry      
-4612083491882057801
---
* 3873853 (3811849) On cluster file system (CFS), while executing lookup() function in a directory
with Large Directory Hash (LDH), the system panics and displays an error.
* 3875839 (3875837) Memory mapped read performance may get impacted due to code fix for sendfile
performance improvement.
* 3876990 (3873624) In case of memory mapped I/O, synchronous buffer allocated from
FC_BUF_DEFAULT_arena may not be freed, if error flag is set on them.
* 3878643 (3878641) In case of Cluster File System (CFS), if a file system is disabled during
file/directory creation, the thread that creates the inode may hang.
* 3879382 (3879381) The dynamic minimum value of vxfs_bc_bufhwm on VxFS 5.1SP1 is set to a much
larger value than previous releases when there are more than 16 CPUs on the
system. The larger value is not optimal for HPVM use case which typically wants
to minimize the memory usage on the VSP.
* 3879793 (3879792) When a large number of files or directories are deleted, File System Queue (FSQ)
spinlock contention may be observed.
* 3879796 (3879795) When a large number of files or directories are deleted, File System Queue (FSQ)
spinlock contention may be observed.
* 3879805 (3879804) On Cluster File System (CFS) with a single CPU or a core machine as a node in
the cluster, directory or file creation threads may hang.
* 3879887 (3805124) Using kctune(1M) to change vxfs_bc_bufhwm(5) gives error even though the new
value is larger than the current value.
* 3880073 (3877000) When the number of Inode Allocation Units (IAU) on the root VxFS file system for
HPUX system reaches a value greater than 256, the system boot hangs and displays
an error.
* 3889125 (3867995) Veritas File System (VxFS) worker threads hangs while flushing transaction log
buffer to the disk.
Patch ID: PHKL_44439
* 3829948 (1482790) System panics when VxFS DMAPI is used.
* 3832381 (3767366) VxFS read-ahead feature may not work, if the read request is greater than the 
read-ahead request.
* 3832587 (3832584) Files and directories on the VxFS file system are not able to inherit the 
default ACL entries of  other objects resulting in incorrect inherited-
permission for files and directories created under the parent directory.
* 3859665 (2767579) System may hang during reverse-name DNLC (Directory Name Lookup Cache) lookup 
operation on the file system.
* 3864335 (3864333) Cluster node may panic in the vx_dnlc_recent_cookie() function, when another
Cluster File System (CFS) node has higher number of CPUs.
Patch ID: PHKL_44293
* 3796751 (3784126) The mmap pages are invalidated during the file system freeze.
* 3800361 (3602322) System panics while flushing the dirty pages of the inode.
* 3803799 (3751205) The system may panic during the stop operation of the high availability cluster.
* 3803825 (3331093) The MountAgent process gets stuck when repeated switchover is performed due 
to the current
VxFS- Asynchronous Monitoring Framework (AMF) notification/unregistration 
design.
* 3803849 (3807129) The file system resize operation may result in panic on an almost full file 
system.
Patch ID: PHKL_44268
* 3751305 (2439108) System crashes when the read_preferred_io tunable is set to a non-page aligned 
size.
* 3754049 (3718924) On cluster file system (CFS), the file system I/O hangs for a few seconds and 
all the file pages are invalidated during the hang.
* 3769384 (3673599) The effective user permission is incorrectly displayed in the getacl(1M) 
command output.
Patch ID: PHKL_44215
* 3537201 (3469644) The system panics in the vx_logbuf_clean() function.
* 3597563 (3597482) The pwrite(2) function fails with the EOPNOTSUPP error.
* 3615527 (3604750) The kernel loops during the extent re-org.
* 3615530 (3466020) File system is corrupted with an error message "vx_direrr: vx_dexh_keycheck_1".
* 3615532 (3484336) The fidtovp() system call panics in the vx_itryhold_locked () function.
* 3660347 (3660342) VxFS 5.1SP1 package does not set the MANPATH environment variable correctly.
* 3669985 (3669983) When the file system with disk-layout version (DLV) 4 or 5 is in use, and 
the quota option is enabled, the system panics.
* 3669994 (3669989) On Cluster File System (CFS) and on high end machines, the spinlock contention 
may be observed when new files are created in parallel on nearly full file 
system.
* 3682335 (3637636) Cluster File System (CFS) node initialization and protocol upgrade may hang during the rolling upgrade.
* 3706705 (3703176) The tunable to enable or disable VxFS inactive-thread throttling, and VxFS
inactive-thread process throttling was not available.
* 3755915 (3755927) Space allocation to a file during write may hang on the file system if the 
remaining intent log space is low.
Patch ID: PHKL_44140
* 3526848 (3526845) The Data Translation Lookaside Buffer (DTLB) panic may occur when the directory
entries are read.
* 3527418 (3520349) When there is a huge number of dirty pages in the memory, and a sparse write is 
performed at a large offset of 4TB or above, on an existing file that is not 
null, the file system hangs.
* 3537431 (3537414) The "mount -v" command does not display the "nomtime" mount option when the 
file system is mounted with the "nomtime" mount option.
* 3560610 (3560187) The kernel may panic when the buffer is freed in the vx_dexh_preadd_space() 
function with the message "Data Key Miss Fault in kernel mode".
* 3561998 (2387439) An internal debug assert is hit when the conformance test is run for the 
partitioned-directory feature.
Patch ID: PHKL_43916
* 2937310 (2696657) Internal noise test on the cluster file system hits a debug 
assert.
* 3261849 (3253210) The file system hangs when it reaches the space limitation.
* 3396530 (3252983) On a high-end system greater than or equal to 48 CPUs,
some file system operations may hang.
* 3410567 (3410532) The file system hangs due to self-deadlock.
* 3435207 (3433777) A single CPU machine panics due to the safety-timer check when the inodes are 
re-tuned.
* 3471150 (3150368) vx_writesuper() function causes the system to panic in evfsevol_strategy().
* 3471152 (3153919) The fsadm (1M) command may hang when the structural file set re-organization is 
in progress.
* 3471165 (3332902) While shutting down, the system running the fsclustadm(1M) 
command panics.
* 3484316 (2555201) The internal conformance and stress testing on local and 
cluster mounted file system hits a debug assert.
Patch ID: PHKL_43539
* 2755784 (2730759) The sequential read performance is poor because of the read-ahead issues.
* 2801689 (2695390) Accessing a vnode from cbdnlc cache hits an assert during internal testing.
* 2857465 (2735912) The performance of tier relocation using the fsppadm(1M)
enforce command degrades while migrating a large number of files.
* 2930507 (2215398) Internal stress test in the cluster environment hits the xted_set_msg_pri1:1 
assert.
* 2932216 (2594774) The "vx_msgprint" assert is observed several times in the internal Cluster File 
System (CFS) testing.
* 3011828 (2963763) When the thin_friendly_alloc() and deliache_enable() functionality is enabled, 
VxFS may hit a deadlock.
* 3024028 (2899907) On CFS, some file-system operations like vxcompress utility and de-duplication 
fail to respond.
* 3024042 (2923105) Removal of the VxFS module from the kernel takes a longer time.
* 3024049 (2926684) In rare cases the system may panic while performing a logged write.
* 3024052 (2906018) The vx_iread errors are displayed after successful log replay and mount of the 
file system.
* 3024088 (3008451) In a Cluster File System (CFS) environment, shutting down the cluster may panic 
one of the nodes with a null pointer dereference.
* 3131795 (2912089) The system becomes unresponsive while growing a file through
vx_growfile in a fragmented file system.
* 3131824 (2966277) Systems with high file-system activity like read/write/open/lookup may panic 
the system.
* 3131885 (3010444) On a NFS filesystem cksum(1m) fails with the "cksum: read error on <filename>: 
Bad address" error.
* 3131920 (3049408) When the system is under the file-cache pressure, the find(1) command takes 
time to operate.
* 3138653 (2972299) The initial and subsequent reads on the directory with many symbolic links is
very slow.
* 3138663 (2732427) A Cluster mounted file-system may hang and become unresponsive.
* 3138668 (3121933) The pwrite(2) fails with the EOPNOTSUPP error.
* 3138675 (2756779) The read and write performances are slow on Cluster File
System (CFS) when it runs applications that rely on the POSIX file-record using
the fcntl lock.
* 3138695 (3092114) The information output displayed by the "df -i" command may be inaccurate for 
cluster mounted file systems.
* 3141278 (3066116) The system panics due to NULL pointer dereference at vx_worklist_process()
function.
* 3141428 (2972183) The fsppadm(1M) enforce command takes a long time on the secondary nodes
compared to the primary nodes.
* 3141433 (2895743) Accessing named attributes for some files stored in CFS seems to be slow.
* 3141440 (2908391) It takes a longer time to remove checkpoints from the Veritas File System 
(VxFS) file system with a large number of files.
* 3141445 (3003679) When running the fsppadm(1M) command and removing a file with the named stream 
attributes (nattr) at the same time, the file system does not respond.
* 3142476 (3072036) Read operations from secondary node in CFS can sometimes fail with the ENXIO 
error code.
* 3159607 (2779427) The full fsck flag is set in after a failed inode read operation.
* 3160205 (3157624) The fcntl() system call when used for file share reservations(F_SHARE command) 
can cause a memory leak in Cluster File System (CFS).
* 3207096 (3192985) Checkpoints quota usage on CFS can be negative.
* 3226404 (3214816) When you create and delete the inodes of a user frequently with the DELICACHE 
feature enabled, the user quota file becomes corrupt.
* 3235517 (3240635) In a CFS environment, when a checkpoint is mount using the mount(1M) command 
the system may panic.
* 3243204 (3226462) On a cluster mounted file-system with unequal CPUs, a node may panic while 
doing a lookup operation.
* 3248982 (3272896) Internal stress test on the local mount hits a deadlock.
* 3249151 (3270357) The fsck (1m) command fails to clean the corrupt file system during the 
internal 'noise' test.
* 3261334 (3228646) NFSv4 server panics in unlock path.
* 3261782 (3240403) The fidtovp()system call may cause panic in the vx_itryhold_locked () function.
* 3262025 (3259634) A CFS that has more than 4 GB blocks is corrupted because the blocks containing 
some file system metadata gets eliminated.
Patch ID: PHKL_43432
* 3042340 (2616622) The performance of the mmap() function is slow when the file system block size is 8KB and the page size is 4KB.
* 3042341 (2555198) sendfile() does not create DMAPI events for HSM on VxFS.
* 3042352 (2806466) "fsadm -R" resulting in panic at LVM layer due to vx_ts.ts_length set to 2GB.
* 3042357 (2750860) Performance issue due to CFS fragmentation in CFS cluster
* 3042373 (2874172) Infinite looping in vx_exh_hashinit()
* 3042407 (3031869) "vxfsstat -b" does not print correct information on maximum
buffer size
* 3042427 (2850738) The system may hang in the low memory condition.
* 3042479 (3042460) Add support for DOX lock to tackle vnode lock contention in VN_HOLD/RELE
* 3042501 (3042497) Atomically increment/decrement active level in HP
* 3047980 (2439261) [VxFS]VxFS 16 bytes sequential writes buffer IO performance is 30% slower than jfs2. fiostats seems to be taking time.
* 3073371 (3073372) Changing default max pdir level to 2 and default threshold size to 32768.


DETAILS OF INCIDENTS FIXED BY THE PATCH
---------------------------------------
This patch fixes the following incidents:

Patch ID: PHKL_44710, PHCO_44712

* 3926568 (Tracking ID: 3926565)

SYMPTOM:
Kernel tunable vx_ninode(5) has incorrect dependency on another tunable
ninode(5). When trying to reduce vxfs_ninode to a value below ninode, it returns
error as follows.
# kctune vx_ninode=65536
ERROR:   mesg 099: V-2-99: The specified value for vx_ninode is less than the 
recommended minimum value of 242048.

DESCRIPTION:
vx_ninode(5) concerns the inode cache size for VxFS. The minimum value allowed
(VX_MINNINODE) is incorrectly set to depend on ninode(5), which refers to the
inode cache size of HFS.

RESOLUTION:
Code changes have been done to change the definition of VX_MINNINODE.

* 3934419 (Tracking ID: 3934418)

SYMPTOM:
System hang may be observed during massive file removal on a large filesystem
with smaller filesystem block size.

DESCRIPTION:
The massive file removal generates huge amount of data to be flushed that leads
to heavy spinlock contention on the FS:vxfs:fsq spinlock while flushing the
transactions. These transactions, when written to the disk, receive a
confirmation in the form an interrupt (biodone()) when not written in the
synchronous mode. These interrupts are to be served and require the same lock
which is needed by the threads, trying to flush those transactions, causing
contention. Since it is a spinlock which essentially means that the threads
which are running over the CPU would block the interrupts hence slow down the
biodone()s.

RESOLUTION:
Code changes have been done to introduce a new kctune hidden tunable to yield
the CPU in case total execution time of the thread crosses the limit set by the
tunable. Also added a counter in vxfsstat command.

Patch ID: PHKL_44651

* 3831186 (Tracking ID: 3784990)

SYMPTOM:
On high-end servers (typically with large memory size and more than 64
processors), CPU starvation or mini-hang could be experienced with high CPU
utilization by vxfsd threads triggered by a file system freeze.

DESCRIPTION:
During the file system freeze, the VxFS background thread (vx_sched_thread ())
spawns vxfsd threads to flush the file system data and metadata cached in
memory. The number of vxfsd threads created depends on the amount of cached data
to be flushed and is limited based on the kernel tunable max_thread_proc. The
default maximum size of the VxFS inode cache is auto-tuned based on the memory
size. On a high-end server, the inode cache size can exceed 100000, allowing a
large amount of cached inodes to be flushed during a file system freeze. At the
same time, if max_thread_proc is set above 1000, VxFS can end up creating and
waking up close to 1000 vxfsd threads within a short time. It results in a
thundering herd of vxfsd threads running in parallel on different processors and
trying to acquire some file system level locks for flushing.

RESOLUTION:
To resolve this issue, code changes have been made which restricts the number of
vxfsd threads to be woken up. A hidden tunable, independent of max_thread_proc,
is also provided to limit the number of vxfsd threads to be created.

* 3920069 (Tracking ID: 3915962)

SYMPTOM:
When the NFS client attempts to set default ACL on the NFS mounted VxFS file
system, the system would panic.
The stack trace may look like:

panic: Fault when executing in kernel mode
Stack Trace:
  Function Name
  bad_kern_reference+0xa0
  $cold_vfault+0x500
  vm_hndlr+0x620
  bubbledown+0x0
  vx_do_setacl+0xe40
  vx_setacl+0x410
  acl3_setacl+0x480
  common_dispatch+0xc10
  acl_dispatch+0x40
  svc_getreq+0x250
  svc_run+0x350
  svc_do_run+0xd0
  nfssys+0x7f0
  hpnfs_nfssys+0x60
  coerce_scall_args+0x130
  syscall+0x580

DESCRIPTION:
The panic occurs when the NFS client passes an invalid ACL specification to
VOP_SETACL which has 4 default ACL entries without the base non-default entries
(i.e. OBJ_USER, OBJ_GROUP, OBJ_OTHER and OBJ_CLASS). When the NFS server passes
the same invalid ACL specification to VxFS, the validation check fails to detect
the missing base ACL entries and results in a system panic in subsequent processing.

RESOLUTION:
To resolve this issue, code changes for ACL validation have been added to check
for base/minimum number of non-default entries in each ACL specification.

Patch ID: PHKL_44613

* 3910815 (Tracking ID: 3910526)

SYMPTOM:
In expanding a full file system through vxresize(1M), the operation fails with
ENOSPC (errno 28) and the following error is printed :

UX:vxfs fsadm: ERROR: V-3-20340: attempt to resize <volume-name> failed with
errno 28

Despite the failure, the file system remains expanded even after vxresize has
shrunk back the volume after getting the ENOSPC error. As a result, the file
system is marked for full-fsck but a full fsck would fail with errors like the
followings :

UX:vxfs fsck: ERROR: V-3-26248: could not read from block offset devid/blknum
.... Device containing meta data may be missing in vset or device too big to be
read on a 32 bit system.
UX:vxfs fsck: ERROR: V-3-20694: cannot initialize aggregate file system check
failure, aborting ...

DESCRIPTION:
If there is no space available in the filesystem and the resize operation gets
initiated, the intent log extents are used for metadata setup to continue the
resize operation. If resize is successful, the superblock is updated with the
expanded size. A new intent log will be allocated because the old one has been
used for the resize. There is a chance that the new intent log will fail
allocation with ENOSPC because the expanded size is not big enough to return the
space allocated to the original intent log. The superblock update is already
made at this stage and will not be rolled back even a failure is returned.

RESOLUTION:
The code has been modified to fail the resize operation if resize
size is less than the size of the intent log.

* 3916659 (Tracking ID: 3910248)

SYMPTOM:
ODM I/O operations hang due to a deadlock between two threads performing I/O to
the same file.
The threads involved in the IO operations may have stacks as below:

Thread 1 :

inline swtch_to_thread+0x220 ()
_swtch+0x52 ()
_mp_b_sema_sleep+0x320 ()
b_psema_c+0x460 ()
mutex_lock+0x70 ()
pfd_lock+0x50 ()
fault_in_pages+0x2e0 ()
bring_in_pages+0x330 ()
inline vaslockpages+0x220 ()
pas_pin_core+0x80 ()
pas_pin+0x340 ()
vx_memlock+0x70 ()
vx_dio_chain_start+0xf0 ()
vx_dio_iovec+0x380 ()
vx_dio_rdwri+0x280 ()
vx_dio_read+0x200 ()
vx_read1+0x580 ()
vx_rdwr+0x1060 ()
odm_vx_io_retry+0xa0 ()
odm_vx_iocleanup+0x220 ()
odm_io_sync+0xc40 ()
odm_io+0x830 ()
odm_io_stat+0x170 ()
odmioctl+0x190 ()
vno_ioctl+0x350 ()
ioctl+0x410 ()
syscall+0x5a0 ()

Thread 2 :

slpq_swtch_core+0x520 ()
real_sleep+0x400 ()
sleep_spinunlock2+0x4f ()
sleep_spinunlock+0x61 ()
vxg_ilock_wait+0x100 ()
vxg_range_cmn_lock+0x410 ()
vxg_api_range_lock+0xe0 ()
vx_glm_range_lock+0x70 ()
vx_glmrange_rangelock+0x150 ()
inline vx_irwlock2+0x60 ()
vx_irwlock+0xa0 ()
vx_rdwr+0x12a0 ()
odm_vx_io_retry+0x1b0 ()
odm_vx_iocleanup+0x220 ()
odm_io_sync+0xc40 ()
odm_io+0x830 ()
odm_io_stat+0x170 ()
odmioctl+0x190 ()
vno_ioctl+0x350 ()
ioctl+0x410 ()
syscall+0x5a0 ()

DESCRIPTION:
Deadlock is observed between ODM reader and ODM writer thread 
while performing IO on HOLE extent. ODM Reader thread holds IRWLOCK in 
SHARED mode and waits for locking the pages while ODM writer thread locks the 
pages and waits for IRWLOCK held by ODM Reader thread.

RESOLUTION:
The code is changed to have ODM providing a hint to VxFS to specify
that it is a HOLE extent being read by ODM so VxFS does not require the pages to
be locked. The fix consists of two parts, one from the VxFS patch and the other
from the ODM patch.

Patch ID: PHKL_44567

* 3860554 (Tracking ID: 3873078)

SYMPTOM:
Performance degradation has been observed due to small initial read ahead size.

DESCRIPTION:
Newer versions of VxFS has smaller initial read ahead size than VxFS 5.0.1
version where the initial read ahead size is dependent on read_perf_io size. If
read request of the application falls in this large initial read ahead range and
VxFS is upgraded to newer version then the performance regression may be
observed for that particular application.

RESOLUTION:
A new kctune private tunable has been introduced to enable large initial read
ahead based on the read request length.

* 3877524 (Tracking ID: 3869091)

SYMPTOM:
Consider a file system having a block size smaller than the systems page size
on HP-UX. The mmap write operation may receive SIGBUS due to ENOSPC errors when
the amount of free space is less than the page size.

DESCRIPTION:
While writing to a memory-mapped file, the mmap write operation allocates file
pages with a write fault. If the end of the file does not align with the page
boundary, then the allocation will be rounded up to the page size on HP-UX. This
results in VxFS treating the file as having a HOLE between the actual end of the
file and the next page boundary. In subsequent writes to this page, even though
the write happens within the actual file size, VxFS will attempt to allocate
file system storage for the HOLE and may fail with ENOSPC if the file system
happens to have free space less than a page.

RESOLUTION:
The code is modified accordingly. Allocation of the file pages is only done up
to the file size, without rounding up to the page size. The SIGBUS issue is
avoided because overwriting of writes will not require extra file system space
allocation. But, this change also requires VxFS to invalidate the last file page
in the file cache before subsequent allocation of writes or file truncation to
ensure that the mmap has to read the last page from the disk again.

* 3896397 (Tracking ID: 3896396)

SYMPTOM:
Performance degradation may be observed in case of fcache invalidation of read
ahead pages.

DESCRIPTION:
if fcache is too small to hold up the cached portion of the file and cache
invalidation throws out the data fetched in by read ahead, then read ahead
pattern detection may break and performance degradation may be observed.

RESOLUTION:
Code has been modified to restart the read ahead detection in case the fcache
invalidation happens and read ahead pattern breaks.

* 3900972 (Tracking ID: 3900971)

SYMPTOM:
System call sendfile may hang while retrieving pages from VxFS file system with
following stack trace:

wait_for_lock()
spinlock()
slpq_wakeup()
rwlock_unlock_select_new_owner()
rwlock_unlock()
fcache_page_alloc()
vx_page_alloc()
vx_do_getpage()
vx_getpage1()
vx_getpage()
preg_vn_fault()
fcache_as_fault()
sofl_vec_read()
sendfile()
syscall()

DESCRIPTION:
Hang during Sendfile operation is caused due to range alignment to block size for
read request in VxFS page retrieval. This range alignment may create conflicting
filecache allocation request which results in the liveloop condition.

RESOLUTION:
Code is modified to detect live loop condition and retry the page allocation
without modifying range for read request.

Patch ID: PHKL_44512

* 3868662 (Tracking ID: 3868661)

SYMPTOM:
The vxfsstat(1M) command displays negative values which are incorrect.

DESCRIPTION:
All statistics in vxfsstat(1M) command have unsigned values. But due to
incorrect calculation some statistics displayed using vxfsstat(1M) may have
wrong negative values.

RESOLUTION:
The code is modified to calculate the file system statistics correctly.

* 3873853 (Tracking ID: 3811849)

SYMPTOM:
On cluster file system (CFS), due to a size mismatch in the cluster-wide buffers
containing hash bucket for large directory hashing (LDH), the system panics with
the following stack trace:
  
   vx_populate_bpdata()
   vx_getblk_clust()
   vx_getblk()
   vx_exh_getblk()
   vx_exh_get_bucket()
   vx_exh_lookup()
   vx_dexh_lookup()
   vx_dirscan()
   vx_dirlook()
   vx_pd_lookup()
   vx_lookup_pd()
   vx_lookup()
   
On some platforms, instead of panic, LDH corruption is reported. Full fsck
reports some meta-data inconsistencies as displayed in the following sample
messages:

fileset 999 primary-ilist inode 263 has invalid alternate directory index
        (fileset 999 attribute-ilist inode 8193), clear index? (ynq)y

DESCRIPTION:
On a highly fragmented file system with a file system block size of 1K, 2K or
4K, the bucket(s) of an LDH inode, which has a fixed size of 8K, can spread
across multiple small extents. Currently in-core allocation for bucket of LDH
inode happens in parallel to on-disk allocation, which results in small in-core
buffer allocations. Combination of these small in-core allocations will be
merged for final in memory representation of LDH inodes bucket. On two Cluster
File System (CFS) nodes, this may result in same LDH metadata/bucket represented
as in-core buffers of different sizes. This may result in system panic as LDH
inodes bucket are passed around the cluster, or this may result in on-disk
corruption of LDH inode's buckets, if these buffers are flushed to disk.

RESOLUTION:
The code is modified to separate the on-disk allocation and in-core buffer
initialization in LDH code paths, so that in-core LDH bucket will always be
represented by a single 8K buffer.

* 3875839 (Tracking ID: 3875837)

SYMPTOM:
Memory mapped read performance may get impacted due to code fix for sendfile
performance improvement.

DESCRIPTION:
When VxFS read ahead is initiated through getpage VOP (Vnode Operation) for
sendfile() operation, there is a possibility wherein larger pages are read and
returned by the operating system. As a result, auditing in read ahead code path
does not match up with the next expected fault offset and that breaks read ahead
pattern.

RESOLUTION:
Earlier fix has been modified to use the length of actual pages read rather than
the requested length in calculation of expected read fault offset for next read
ahead. This occurs in scenarios where read pages are larger than requested pages
in getpage VOP.

* 3876990 (Tracking ID: 3873624)

SYMPTOM:
Memory leaks occur in FC_BUF_DEFAULT_ arena, showing buffers corresponding to
mmap I/Os that have returned errors.

DESCRIPTION:
In case of memory mapped I/O, currently synchronous buffer having error flag set
will not be freed due to glitch in the code, leading to memory leak in
FC_BUF_DEFAULT_arena.

RESOLUTION:
The code is modified to free all synchronous buffers, irrespective of error.

* 3878643 (Tracking ID: 3878641)

SYMPTOM:
In case of cluster file system (CFS), if a file system is disabled during
file/directory creation, the thread that creates file/directory may hang with a
typical stack trace:
vx_int_create()
vx_do_create()
vx_create1()
vx_create0()
vx_create()
or
vx_do_mkdir()
vx_mkdir1()
vx_mkdir()

DESCRIPTION:
In case of cluster file system (CFS), if a file system is disabled (with system
log message as vx_disable: <disk> file system disabled) during file/directory
creation, the creation thread gets stuck in a live-loop. This is because the
disabled condition of file system is not verified before the file/directory
creation.

RESOLUTION:
The code is modified to verify if a file system is disabled during the inode
creation on CFS and therefore it displays an appropriate error.

* 3879382 (Tracking ID: 3879381)

SYMPTOM:
The dynamic minimum value of vxfs_bc_bufhwm on VxFS 5.1SP1 is set to a much
larger value than previous releases when there are more than 16 CPUs on the
system. The larger value is not optimal for HPVM use case which typically wants
to minimize the memory usage on the VSP.

DESCRIPTION:
There needs to be a way to allow HPVM to specify an optimal vxfs_bc_bufhwm value
as in previous VxFS releases.

RESOLUTION:
The code is modified to allow vxfs_bc_bufhwm to be explicitly tuned to a value
that is greater than the dynamic minimum based on the auto-tuning in previous
releases, even though it may be smaller than the dynamic minimum based on the
5.1SP1 calculation.

* 3879793 (Tracking ID: 3879792)

SYMPTOM:
When a large number of files or directories are deleted, FSQ spinlock contention
may be observed.

DESCRIPTION:
Because of the massive file removal, large number of removed inodes are stored
up in the delicache list as delicache is enabled by default. If these inodes are
not re-used, they may be moved to the inactive list altogether after a certain
time for inactive processing. The inactive processing of all these removed
inodes belonging to the same file system will trigger high contention on the
file system queue spinlock.

RESOLUTION:
The code is modified such that FSQ lock will be used only to protect required
part of movement operation from delicache list to an inactive list which will in
turn reduces FSQ lock contention.

* 3879796 (Tracking ID: 3879795)

SYMPTOM:
On a system with a large VxFS inode cache, deleting a large number of files or
directories from a single file system may lead to heavy contention on a
per-file-system spinlock. High disk service time and/or high vxfsd CPU
utilization may be observed, typically around 12 minutes after file deletion, as
a result.

DESCRIPTION:
Because of the massive file removal, large number of removed inodes are stored
up in the delicache list as delicache is enabled by default. If these inodes are
not re-used, they may be moved to the inactive list altogether after a certain
time for inactive processing. The inactive processing of all these removed
inodes belonging to the same file system will trigger high contention on the
file system queue spinlock.

RESOLUTION:
Two new VxFS tunables have been introduced for fine tuning on delicache sizing
and timing as a remedy, which essentially limit the amount of inodes stored in
the delicache list and shorten the duration of an inodes stay in the delicache
list.

* 3879805 (Tracking ID: 3879804)

SYMPTOM:
A thread creating a directory on a full CFS file system will retry the search
for a free inode in the preferred allocation unit (AU) because other thread(s)
working on the same AU have kept the AUs delegation map busy. On a system with
only a single CPU running, the delay can persist indefinitely because the other
thread keeping the AUs delegation map busy never has a chance to run as the CPU
keeps on retrying the search considering only the preferred AU, creating a
live-lock scenario with stack traces as shown below. In such a situation, the
cluster daemon process cmcld can experience long scheduling delays which may
consequently lead to a Serviceguard TOC. The live-lock scenario or the SG TOC,
however, is not expected to happen on a CFS cluster node which fulfils the
minimum requirement of having at least 2 CPUs. On Cluster File System with a
single CPU or a core machine as a node in the cluster, directory or file
creation threads may hang with the following stack traces:

vx_dircreate_tran()
vx_pd_create()
vx_create1_pd()
vx_do_create()
vx_create1()
vx_create0()
vx_create()

and

vx_mdelelock_try()
vx_mdele_tryhold()
vx_cfs_inofindau()
vx_findino()
vx_ialloc() 
vx_dirmakeinode() 
vx_dircreate()
vx_dircreate_tran()
vx_pd_create()
vx_create1_pd()
vx_do_create()
vx_create1()
vx_create0() 
vx_create()

DESCRIPTION:
A thread is currently holding lock on Inode Allocation Unit (IAU) which has free
inodes. When the other thread tries to allocate an inode on IAU, it fails
because it cannot get access on the same IAU. In case of a system running with a
single core, the two threads can interlock indefinitely on the CFS secondary
instead of requesting new IAU from the CFS primary.

RESOLUTION:
The code is modified to yield CPU on single CPU or core system, only if, there
are free inodes available in Inode Allocation Unit.

* 3879887 (Tracking ID: 3805124)

SYMPTOM:
Using kctune(1M) to change vxfs_bc_bufhwm(5) gives the following error even
though the new value is larger than the current value:

ERROR: mesg 110: V-2-110: The specified value for vx_bc_bufhwm is less than the
recommended minimum value of <min>

DESCRIPTION:
At VxFS initialization, the VxFS buffer cache high water mark is by default
auto-tuned to a dynamic minimum value based on the system configuration.
Subsequent attempts to set vxfs_bc_bufhwm(5) via kctune(1M) will not be allowed
if the value is smaller than the dynamic minimum. It was possible, however, to
bypass such validation by hardcoding the vxfs_bc_bufhwm value in /stand/system.
The error is returned when the new value specified in the kctune command is
smaller than the dynamic minimum even though it may be larger than the current
value as hardcoded in /stand/system.

RESOLUTION:
The code is modified to re-instate the dynamic minimum value as the effective
value of vxfs_bc_bufhwm if the value specified in /stand/system is smaller.
Following warning message will be displayed to indicate that the effective value
has been changed and to notify users that the tunable value displayed by kctune
may not correspond to the effective value:

WARNING: mesg 139: V-2-139: Setting vxfs_bc_bufhwm to recommended minimum value
of <min>, since the specified value was less than <value>

* 3880073 (Tracking ID: 3877000)

SYMPTOM:
In an unusual scenario, where the number of Inode Allocation Units (IAU) on the
root VxFS file system for HPUX system reaches a value greater than 256 (number
of inodes exceeding 32 millions), the system boot hangs with the following stack
trace.

vx_event_wait()
vx_olt_iauinit()
vx_olt_iasinit()
vx_loadv23_fsetinit()
vx_loadv23_fset()
vx_fset_reget()
vx_fs_mntdup()
vx_fs_reinit()
vx_doremount()
vx_fset_localremnt()
vx_remount()
vx_mountroot()
vx_evfsop_root_remount()
vfs_extended_vfs_op()
Im_preinitrc()
DoCallist()

DESCRIPTION:
If the number of IAUs are greater than 256, then, as an optimization the Veritas
File System (VxFS) creates multiple work items to load IAUs in parallel. These
work items are serviced by the worker threads. VxFS mount waits for these work
items to be completed. At the time of mounting root VxFS file system during boot
process, there are no worker threads available. Hence, the VxFS mount process hangs.

RESOLUTION:
The code is modified such that the mount thread itself will help process the
work items created by it.

* 3889125 (Tracking ID: 3867995)

SYMPTOM:
VxFS file system appears hang, while vxfsd threads are waiting to flush log
buffers with the following stack traces: 

vx_sleep_lock()
vx_logbuf_clean()
vx_map_delayflush()
vx_tflush_map()
vx_fsq_flush()
vx_tranflush_threaded()
vx_workitem_process()
vx_worklist_process()

Or

vx_sleep_lock()
vx_logbuf_clean()
vx_logflush()
vx_async_iupdat()
vx_iupdat_local()
vx_iupdat()
vx_iflush_list()
vx_iflush()
vx_workitem_process()
vx_worklist_process()

DESCRIPTION:
If an extent free operation is executed instantaneously before the completion of
the previous extent free transaction, then the extent map is updated instantly
and the freed extent can be picked up for some other inode. But, since the
previous extent free transaction is incomplete and has not hit the disk due to a
particular sequence of events, VxFS transaction log encounters an inconsistent
state. Thus, transaction log buffer flush may hang.

RESOLUTION:
The code is modified to treat every extent free operation as delayed extent free
implicitly. This means extent free operation will no longer be executed
instantaneously and the particular hang should be avoided.

Patch ID: PHKL_44439

* 3829948 (Tracking ID: 1482790)

SYMPTOM:
System may panic when the mknod() operation is performed on the file system, 
when VxFS data management API (DMAPI) is used. 
The following stack trace is observed:
vx_hsm_createandmkdir()
vx_create()
vns_create()
vn_create()
mknod()
mknod()
syscall()

DESCRIPTION:
DMAPI feature is always enabled in VxFS for hierarchical storage management.
As part of VxFS DMAPI code, snode for the device are handled as VxFS inodes for 
the mknod () operation. 
This results in the inappropriate memory access and panic.

RESOLUTION:
The code is modified such that the inode type is checked and the further 
processing is done only if it is a VxFS inode.

* 3832381 (Tracking ID: 3767366)

SYMPTOM:
Using the sendfile(2) function to transfer a big file, takes long, as compared 
with the same transfer via the send(2) function. The sequential file access via 
the sendfile(2) function, generates a lot of synchronous I/Os, suggesting that 
VxFS read-ahead is not fully utilized.

DESCRIPTION:
When the page size is greater than 4 KB, VOP_GETPAGE may return more data than
requested, breaking the read-ahead detection.

RESOLUTION:
The code is modified to update the read-ahead parameters, considering the 
possibility that more data can be returned via VOP_GETPAGE, so as to continue 
doing the read-ahead for the sequential read.

* 3832587 (Tracking ID: 3832584)

SYMPTOM:
Files and directories on the VxFS file system are not able to inherit properly 
the default USER_OBJ, CLASS_OBJ and OTHER_OBJ ACL entries.

DESCRIPTION:
The condition set to calculate the inheritance-permission mask of the parent 
directory is incorrect. 
This subsequently results in incorrect inherited-permission for files and 
directories created under the parent directory.

RESOLUTION:
The code is modified to correct the condition that calculates the inheritance-
permission mask of the parent directory.

* 3859665 (Tracking ID: 2767579)

SYMPTOM:
The system may hang during DNLC-lookup operation on VxFS file system with the 
following stack trace:
as_ubcopy()
vx_dnlc_pathname_realloc()
vx_dnlc_getpathname()
pstat_pathname_fillin()
pstat_pathname()
pstat()
syscall()

DESCRIPTION:
The system hangs because of an infinite loop, that gets triggered when an inode 
with negative DNLC entry is encountered during a reverse name DNLC lookup.

RESOLUTION:
The code is modified to detect the negative DNLC entry, and fail the lookup 
instead of looping it infinitely.

* 3864335 (Tracking ID: 3864333)

SYMPTOM:
The earlier fix to incident 3226462, whereby a CFS node may panic in the 
vx_dnlc_recent_cookie() function, when another CFS node has a higher number of 
CPUs, still relies on VX_MAX_CPU being static at runtime.

DESCRIPTION:
The earlier fix sets the size of the counters[] array to VX_MAX_CPU, which 
performs a runtime check on the number of CPUs. A more robust fix is to size 
the array based on MAX_PROCS instead, which is guaranteed to be static at 
runtime.

RESOLUTION:
The code is enhanced to use MAX_PROCS, in allocating an internal array to 
better safe-guard against out-of-bound array access.

Patch ID: PHKL_44293

* 3796751 (Tracking ID: 3784126)

SYMPTOM:
The application experienced delays showing high vfault counters, as process 
text pages in memory are invalidated and need to be reloaded. For a 
Serviceguard cluster, the heartbeat communication is slowed down considerably 
because cmclds TEXT pages need to be paged in again. This results in a SG INIT 
failure.

DESCRIPTION:
When a freeze operation is handled, VxFS flushes and invalidates dirty pages of 
a file system. Due to a bug in the code, even read-only mmap pages, which are 
typically process TEXT pages, also get invalidated unnecessarily.
Consider a freeze on /usr or some other file systems that host the program 
executables, such page invalidation can cause delays to applications as TEXT 
pages need to be faulted in again.

RESOLUTION:
The code is modified to skip the read only mmap pages invalidation during the 
file system freeze.

* 3800361 (Tracking ID: 3602322)

SYMPTOM:
System may panic while flushing the dirty pages of the inode. The following 
stack traces are observed:

vx_iflush_list()
vx_workitem_process()
vx_worklist_process()
vx_worklist_thread()

and

vx_vn_cache_deinit()
vx_inode_deinit
vx_ilist_chunkclean()
vx_inode_free_list()
vx_ifree_scan_list()
vx_workitem_process()
vx_worklist_process()
vx_worklist_thread()

DESCRIPTION:
Panic may occur due to the synchronization problem between one thread that 
flushes the inode, and the other thread that frees the chunks that contain the 
inodes on the freelist. 

The thread that frees the chunks of inodes on the freelist grabs an inode, and 
clears/dereference the inode pointer while deinitializing the inode. This may 
result in the pointer dereference, if the flusher thread is working on the same 
inode.

RESOLUTION:
The code is modified to resolve the race condition by taking proper locks on 
the inode and freelist, whenever a pointer in the inode is dereferenced. 

If the inode pointer is already de-initialized to NULL, then the flushing is 
attempted on the next inode.

* 3803799 (Tracking ID: 3751205)

SYMPTOM:
During force unmount of VxFS file system, the system may panic if parallel 
directory entry retrieval is in progress. The following stack trace is observed:

vx_readdir3()
getdents()
syscall()

DESCRIPTION:
During the force unmount operation, all metadata structures for fileset are 
destroyed. Thereby, any access to the file set structure can cause panic due to 
deference in the NULL pointer.

RESOLUTION:
The code is modified to return EIO when accessing the file set structure which 
is forcefully unmounted.

* 3803825 (Tracking ID: 3331093)

SYMPTOM:
The MountAgent process gets stuck, when repeated switchover is performed, due 
to the current
VxFS-AMF notification/unregistration design. The following stack trace is 
observed:

vx_delay2()
vx_unreg_callback_funcs_impl()
disable_vxfs_api ()
text ()
amf_event_release()
amf_fs_event_lookup_notify_multi()
amf_vxfs_mount_opt_change_callback()
vx_aioctl_unsetmntlock()
vx_aioctl_common()
vx_aioctl()
vx_admin_ioctl()
vxportal_ioctl()

DESCRIPTION:
This issue is related to the VxFS-AMF interface. VxFS provides notifications to 
AMF for certain events, for example, when VxFS is disabled or the mount options 
change. When VxFS calls AMF, AMF event handling mechanism can trigger an 
unregistration of VxFS, in the same context since VxFS's notification triggered 
the last event notification registered with AMF.

Before VxFS calls into AMF, a vx_fsamf_busy variable is set to 1, and it is 
reset when the callback returns. The unregistration loops if it finds that the 
vx_fsamf_busy variable is set to 1. Since, unregistration is called from the 
same context of the notification call back, the vx_fsamf_busy variable is never 
set to 0, and the loop goes on endlessly causing the command that triggered the 
notification to hang.

RESOLUTION:
The code is modified so that a delayed unregistration mechanism is employed. 
The fix addresses the issue of getting the unregistration from AMF in the 
context of callback from VxFS to AMF.

In such scenario, the unregistration is marked for a later time. When all the 
notifications return, and if a delayed unregistration is marked, the 
unregistration routine is explicitly called.

* 3803849 (Tracking ID: 3807129)

SYMPTOM:
When a file system that is above 2 TB in size is re-sized, a panic may occur 
with the following stack trace if the file system is almost full:
vx_multi_bufinval()
vx_alloc.c() 
vx_dunemap()
vx_demap()
vx_trancommit2()
vx_trancommit ()
vx_trunc_tran2()
vx_trunc_tran()
vx_trunc()
vx_inactive_remove()
vx_inactive_tran()
vx_local_inactive_list()
vx_inactive_list ()
vx_worklist_process()
vx_worklist_thread()
kthread_daemon_startup()

DESCRIPTION:
When resizing a file system that is almost full, VxFS may have to temporarily 
steal the last 32 blocks from the intent log to grow the extent allocation 
maps. The intent log files organization can be switched from IORG_EXT4 to 
IORG_TYPED during the process. But the truncation code still assumes IORG_EXT4, 
causing the truncated blocks to still appear in the intent logs extent map. 
Subsequently, the truncated blocks will be allocated for growing the extent 
map, resulting in corruption as the same blocks will appear allocated to both 
the intent log and another file structure. The panic occurs when such 
corruption is detected.

RESOLUTION:
The code is modified to switch the intent log file to IORG_TYPED at the start 
of the resize operation, to ensure that the last 32 blocks should be truncated 
properly.

Patch ID: PHKL_44268

* 3751305 (Tracking ID: 2439108)

SYMPTOM:
Due to the page alignment issues in the VxFS code, the system panics when the 
read_preferred_io tunable is set to a non-page aligned size. The following 
stack trace is observed:

fcache_buf_create()
vx_fcache_buf_create()
vx_io_setup()
vx_io_ext()
vx_alloc_getpage()
vx_do_getpage()
vx_getpage1()
vx_getpage()
preg_vn_fault()
fcache_as_fault()
vx_fcache_as_fault()
vx_do_read_ahead()
vx_read_ahead()
vx_fcache_read()
vx_read1()
vx_rdwr()

DESCRIPTION:
VxFS ends up consuming an extra page, when the preferred read I/O size is not a 
multiple of the page size,  and runs out of the allocated pages before the 
getpage() function call could finish. This results in the panic.

RESOLUTION:
The code is modified to use the read_preferred_io tunable size only after 
rounding it by the page size.

* 3754049 (Tracking ID: 3718924)

SYMPTOM:
On CFS, the file system I/O hangs for a few seconds and all the file pages are 
invalidated during the hang. This could occur as a routine
 It may cause delay in running the application processes, if the text pages of 
these processes are invalidated.

DESCRIPTION:
When VxFS intent log-ID is overflowed, a log-ID reset is required. The reset 
triggers a file system freeze which subsequently invalidates all the pages 
unnecessarily.

RESOLUTION:
The code is modified to avoid the unnecessary page invalidation.

* 3769384 (Tracking ID: 3673599)

SYMPTOM:
For VxFS files inheriting some ACL entry from the parent directory having a 
default ACL entry, the initial class permission is not set correctly to align 
with the file mode creation mask and the umask setting at file creation time.

DESCRIPTION:
When a  file  is created, the ACL inheritance needs to take place before 
applying the file mode creation mask and the umask setting, so that the latter 
is honored .

RESOLUTION:
The code is modified to honour the file mode creation mask and the umask 
setting when creating files with ACL inheritance.

Patch ID: PHKL_44215

* 3537201 (Tracking ID: 3469644)

SYMPTOM:
The system panics in the vx_logbuf_clean() function, when it  traverses the 
chain of transactions off the intent-log buffer. The stack trace is as follows:

vx_logbuf_clean ()
vx_logadd ()
vx_log()
vx_trancommit()
vx_exh_hashinit ()
vx_dexh_create ()
vx_dexh_init ()
vx_pd_rename ()
vx_rename1_pd()
vx_do_rename ()
vx_rename1 ()
vx_rename ()
vx_rename_skey ()

DESCRIPTION:
The system panics in the vx_logbug_clean() function, when it tries to access an 
already freed transaction from the transaction chain to flush-it-to log.

RESOLUTION:
The code is modified to ensure that the transaction gets flushed to the log 
before it is freed.

* 3597563 (Tracking ID: 3597482)

SYMPTOM:
The pwrite(2) function fails with EOPNOTSUPP error when the write range is in 
two indirect extents.

DESCRIPTION:
The ZFOD extent that belongs to DB2 pre-allocated files and the other DATA 
extent belongs to the adjacent INDIR, fail with the EOPNOTSUPP error. Because 
the range of the pwrite() function falls between two indirect extents.
VxFS tries to coalesce the extents which belong to different indirect-address 
extents as a part of the transaction. This kind of meta-data change consumes a 
lot of transaction resources. However, the VxFS transaction engine is unable to 
support the current implementation and fails with an error message.

RESOLUTION:
The code is modified to retry the write transaction without combining the 
extents.

* 3615527 (Tracking ID: 3604750)

SYMPTOM:
The kernel loops during the extent re-org with the following stack trace:
vx_bmap_enter()
vx_reorg_enter_zfod()
vx_reorg_emap()
vx_extmap_reorg()
vx_reorg()
vx_aioctl_full()
$cold_vx_aioctl_common()
vx_aioctl()
vx_ioctl()
vno_ioctl()
ioctl()
syscall()

DESCRIPTION:
The extent re-org minimizes the file system fragmentation. When the re-org 
request is issued for an inode with a lot of ZFOD extents, it reallocates the 
extents of the original inode to the re-org inode. During this, the ZFOD extent 
are preserved and enter the re-org inode in a transaction. If the extent 
allocated is big, the transaction that enters the ZFOD extents becomes big and 
returns an error. Even when the transaction is retried the same issue occurs. 
As a result, the kernel loops during the extent re-org.

RESOLUTION:
The code is modified to enter the Bmap (block map) of the allocated extent and 
then perform the ZFOD processing. If you get a committable error during the 
ZFOD enter, then commit the transaction and continue with the ZFOD enter.

* 3615530 (Tracking ID: 3466020)

SYMPTOM:
File system is corrupted with the following error message in the log:

WARNING: msgcnt 28 mesg 008: V-2-8: vx_direrr: vx_dexh_keycheck_1 - /TraceFile
file system dir inode 3277090 dev/block 0/0 diren
 WARNING: msgcnt 27 mesg 008: V-2-8: vx_direrr: vx_dexh_keycheck_1 - /TraceFile
file system dir inode 3277090 dev/block 0/0 diren
 WARNING: msgcnt 26 mesg 008: V-2-8: vx_direrr: vx_dexh_keycheck_1 - /TraceFile
file system dir inode 3277090 dev/block 0/0 diren
 WARNING: msgcnt 25 mesg 096: V-2-96: vx_setfsflags -
 /dev/vx/dsk/a2fdc_cfs01/trace_lv01 file system fullfsck flag set - vx_direr
 WARNING: msgcnt 24 mesg 008: V-2-8: vx_direrr: vx_dexh_keycheck_1 - /TraceFile
file system dir inode 3277090 dev/block 0/0 diren

DESCRIPTION:
In case an error is returned from the vx_dirbread() function via the 
vx_dexh_keycheck1() function, the FULLFSCK flag is set on the file system 
unconditionally. A corrupted Large Directory Hash (LDH) can lead to the 
incorrect block being read, this results in the FULLFSCK flag being set. The 
system does not verify whether it reads the incorrect value due to a corrupted 
LDH. Subsequently, the FULLFSCK flag is set unnecessarily, because a corrupted 
LDH is fixed online by recreating the hash.

RESOLUTION:
The code is modified such that when a LDH corruption is detected, the system 
removes the LDH, instead of setting FULLFSCK. The LDH is recreated the next 
time the directory is modified.

* 3615532 (Tracking ID: 3484336)

SYMPTOM:
The fidtovp() system call panics in the vx_itryhold_locked() function with the 
following stack trace:

vx_itryhold_locked()
vx_iget()
vx_common_vget()
vx_do_vget()
vx_vget_skey()
vfs_vget()
fidtovp()
kernel_add_gate_cstack()
nfs3_fhtovp()
rfs3_getattr()
rfs_dispatch()
svc_getreq()
threadentry()
[kdb_read_mem] ()

DESCRIPTION:
Some VxFS operations like the vx_vget() function try to get a hold on an in-
core inode using the vx_itryhold_locked() function, but it does not take the 
lock on the corresponding directory inode. This may lead to a race condition, 
when this inode is present on the delicache list and is inactivated. Thereby, 
this results in a panic when the vx_itryhold_locked() function tries to remove 
it from the free list.

RESOLUTION:
The code is modified to take the inode list lock inside the vx_inactive_tran(), 
vx_tranimdone() and vx_tranuninode() functions. This subsequently prevents the 
race condition.

* 3660347 (Tracking ID: 3660342)

SYMPTOM:
VxFS 5.1SP1 package does not set the MANPATH environment variable correctly.

DESCRIPTION:
The VxFS 5.1SP1 install package does not set the man-page-search path for VxFS 
5.1SP1 correctly in /etc/MANPATH. It inserts the search path for VxFS5.1SP1 
after /usr/share/man/. However, there are old versions of VxFS man pages 
in /usr/share/man/man1m.Z. As a result, the older version of the man page is 
displayed first and VxFS 5.1SP1 pages are not displayed.

RESOLUTION:
The post-install script is modified to place the search path for VxFS 5.1SP1 
man page before /usr/share/man.

* 3669985 (Tracking ID: 3669983)

SYMPTOM:
When the file system with DLV 4 or 5 is in use, and the quota option is 
enabled, the system panics. The following stack trace is observed:
memset()
vx_qrec2dq64()
vx_qflush()
vx_qsync()
vx_local_doquota()
vx_vfsquota()
quotactl()
syscall()

DESCRIPTION:
The 64-bit quotas feature was introduced in the release 5.1 SP1RP2. This 
feature increased the maximum soft/hard quota limits for users/groups. But the 
file system with DLV<=5 cannot use this feature, because the DLV 5 quota 
structure itself contains 32-bit elements. When the file system with DLV 5 is 
mounted with the quota option, accessing the on-disk 32-bit structures, using 
the 64-bit-quota structures in the kernel, results in the panic.

RESOLUTION:
The code is modified to disable the file system quota for DLV less than 6.

* 3669994 (Tracking ID: 3669989)

SYMPTOM:
On Cluster File System (CFS) and on high end machines, the spinlock contention 
may be observed when new files are created in parallel on nearly full file 
system.

DESCRIPTION:
When free Inode Allocation Unit (IAU) is searched for, to allocate the inode in 
the list of IAUs, the spinlock is taken to serialize various threads. Because 
the file system is nearly full, large number of iterations are required to find 
the free IAU.

RESOLUTION:
The code is modified to optimize the free IAU search using a hint index. The 
hintindex is modified when a free IAU is found. The subsequent search could use 
hint index, to directly jump to the free IAU pointed by the hint index.

* 3682335 (Tracking ID: 3637636)

SYMPTOM:
Cluster File System (CFS) node initialization and protocol upgrade may hang during rolling upgrade with the following stack trace:
vx_svar_sleep_unlock()
vx_event_wait()
vx_async_waitmsg()
vx_msg_broadcast()
vx_msg_send_join_version()
vx_msg_send_join()
vx_msg_gab_register()
vx_cfs_init()
vx_cfs_reg_fsckd()
vx_cfsaioctl()
vxportalunlockedkioctl()
vxportalunlockedioctl()

And

vx_delay()
vx_recv_protocol_upgrade_intent_msg()
vx_recv_protocol_upgrade()
vx_ctl_process_thread()
vx_kthread_init()

DESCRIPTION:
CFS node initialization waits for the protocol upgrade to complete. Protocol upgrade waits for the flag related to the CFS initialization to be cleared. As the result, the deadlock occurs.

RESOLUTION:
The code is modified so that the protocol upgrade process does not wait to clear the CFS initialization flag.

* 3706705 (Tracking ID: 3703176)

SYMPTOM:
The tunable to enable or disable VxFS inactive-thread throttling, and VxFS
inactive-thread process throttling was not available .

DESCRIPTION:
The tunable to enable or disable VxFS inactive-thread throttling, and VxFS
inactive-thread process throttling was not available  through the kctune(1M) 
interface.

RESOLUTION:
The code is modified so that the tunable to enable or disable VxFS inactive-
thread throttling, and VxFS inactive-thread process throttling is available 
through the kctune(1M) interface with the relevant man page info.

* 3755915 (Tracking ID: 3755927)

SYMPTOM:
Space allocation to a file during write may hang on the file system if the 
remaining intent log space is low.
The following stack trace is observed:

vx_te_bmap_split()
vx_pre_bmapsplit()
vx_dopreamble()
vx_write_alloc3()
vx_tran_write_alloc()
vx_write_alloc2()
vx_external_alloc()
vx_write_alloc()
vx_write1()
vx_rdwr()

DESCRIPTION:
During allocation of write to the new file, the transaction for allocation 
fails to commit due to low intent log space on the file system. This results in 
execution of the preamble routine for the block map split which tries to make 
21 typed extent entries in the inode immediate area. 
As the inode immediate area is not able to hold the 21 typed extent entries, 
the preamble routine may end up into a live loop.

RESOLUTION:
The code is modified to add the exit condition for the live loop in the 
preamble routine for the block map split.

Patch ID: PHKL_44140

* 3526848 (Tracking ID: 3526845)

SYMPTOM:
The Data Translation Lookaside Buffer (DTLB) panic may occur when the directory
entries are read. The following stack trace is observed:

bcmp()
vx_real_readdir3()
vx_readdir3()
getdents()
syscall()

DESCRIPTION:
When the directory entry is read, the directory name is checked using the bcmp
() function against the VX_PDMAGIC identifier string. This is used to determine 
if the directory is the partitioned directory or not. The thread panics in the 
vx_real_readdir3() function, because the length of the directory name is less 
than the length of the VX_PDMAGIC identifier string. As a result, the bcmp() 
function accesses the unallocated area.

RESOLUTION:
The code is modified to check if the length of the directory-entry name is 
greater thanthe VX_PDMAGIC string, before the bcmp() function is called.

* 3527418 (Tracking ID: 3520349)

SYMPTOM:
When there is a huge number of dirty pages in the memory, and a sparse write is 
performed at a large offset of 4 TB or above, on an existing file that is not 
null, the file system hangs in the thread. The following stack trace is 
observed:

fcache_buf_iowait()
vx_fcache_buf_iowait()
vx_io_wait()
vx_alloc_getpage()
vx_do_getpage()
vx_getpage1()
vx_getpage()
preg_vn_fault()
fcache_as_uiomove_rd()
fcache_as_uiomove()
vx_fcache_as_uiomove()
vx_fcache_read()
vx_read1()
vx_rdwr()
vn_rdwr()

DESCRIPTION:
When a sparse write is performed at an offset of 4TB or above, on a file that 
has ext4 extent orgtype with some blocks that are already allocated, this can 
result in a file system hang. 
This is caused due to a type casting bug in the offset calculation in the vxfs 
extent allocation code path. A sparse write should create a 'HOLE' between the 
last allocated offset and the current offset on which the write is requested.
Due to the type-casting bug, VxFS may allocate the space between the last 
offset and the new offset, instead of creating a 'HOLE' in certain scenarios. 
This generates a huge number of dirty pages, and fills up the file system space 
incorrectly. The memory pressure due to the huge number of dirty pages causes 
the hang. 
The sparse write offset on which the problem occurs depends on the file system 
block size. For a file system with block size 1 KB, the problem can occur at 
the sparse write offset of 4TB.

RESOLUTION:
The code is modified so that the VxFS extent allocation code calculates the 
offset correctly, and does not allocate space for a sparse write. This resolves 
the type casting bug.

* 3537431 (Tracking ID: 3537414)

SYMPTOM:
The "mount -v" command does not display the "nomtime" mount option when the 
file system is mounted with the "nomtime" mount option.

DESCRIPTION:
When the current option string of a file system is created, the "nomtime"
mount option is not appended when the file system is mounted with the "nomtime" 
mount option. As a result, the "nomtime" mount option is not displayed.

RESOLUTION:
The code is modified to append the "nomtime" mount option to the current option 
string, when the file system is mounted with the "nomtime" mount option.

* 3560610 (Tracking ID: 3560187)

SYMPTOM:
The kernel may panic when the buffer is freed in the vx_dexh_preadd_space() 
function with the message "Data Key Miss Fault in kernel mode". The following 
stack trace is observed:
kmem_arena_free()
vx_free()
vx_dexh_preadd_space()
vx_dopreamble()
vx_dircreate_tran()
vx_do_create()
vx_create1()
vx_create0()
vx_create()
vn_open()

DESCRIPTION:
The buffers in the extended-hash structure are allocated, zeroed outside, and 
freed outside the transaction retry loop. For some error scenarios, the 
transaction is re-executed from the beginning. Since the buffers are zeroed 
outside of the transaction retry loop, during the transaction retry the 
extended-hash structure may have some stale buffers from the last try. As a 
result, some stale parts of the structure are freed incorrectly.This results in 
panic.

RESOLUTION:
The code is modified to zero-out the extended-hash structure within the retry 
loop, so that the stale values are not used during retry.

* 3561998 (Tracking ID: 2387439)

SYMPTOM:
An internal debug assert is hit when the conformance test is run for the 
partitioned-directory feature.

DESCRIPTION:
The debug assert is hit because the "directory reading" operation is called 
without holdingthe read-write lock on the partition hash directory.

RESOLUTION:
The code is modified to take the read-write lock on the partition hash 
directory.The read operation is performed on the directory.

Patch ID: PHKL_43916

* 2937310 (Tracking ID: 2696657)

SYMPTOM:
Internal noise test on the cluster file system hits a debug assert related to 
file size in inode.

DESCRIPTION:
In VxFS, the inode wsize(size after write) field is not maintained in 
synchronous with
The inode nsize(new size) field in case of an error. This triggers the debug 
assert.

RESOLUTION:
The code is modified to bring back the inode wsize in synchronous with nsize.

* 3261849 (Tracking ID: 3253210)

SYMPTOM:
When the file system reaches the space limitation, it hangs with the following 
stack trace:
vx_svar_sleep_unlock()
default_wake_function()
wake_up()
vx_event_wait()
vx_extentalloc_handoff()
vx_te_bmap_alloc()
vx_bmap_alloc_typed()
vx_bmap_alloc()
vx_bmap()
vx_exh_allocblk()
vx_exh_splitbucket()
vx_exh_split()
vx_dopreamble()
vx_rename_tran()
vx_pd_rename()

DESCRIPTION:
When large directory hash is enabled through the vx_dexh_sz (5M) tunable, 
Veritas File System (VxFS) uses the large directory hash for directories. When 
you rename a file, a new directory entry is inserted to the hash table, which 
results in hash split. The hash split fails the current transaction. The 
transaction is retried after completing some housekeeping jobs. These jobs 
include allocating more space for the hash table. However, VxFS does not check 
the return value of the preamble job. As a result, when VxFS runs out of space, 
the rename transaction is re-entered permanently without knowing if more space 
is allocated by preamble jobs.

RESOLUTION:
The code is modified to enable VxFS to exit looping when the ENOSPC error is 
returned from the preamble job.

* 3396530 (Tracking ID: 3252983)

SYMPTOM:
On a high-end system greater than or equal to 48 CPUs, some file-system
operations may hang with the following stack trace:
vx_ilock()
vx_tflush_inode()
vx_fsq_flush()
vx_tranflush()
vx_traninit()
vx_tran_iupdat()
vx_idelxwri_done()
vx_idelxwri_flush()
vx_delxwri_flush()
vx_workitem_process()
vx_worklist_process()
vx_worklist_thread()

DESCRIPTION:
The function to get an inode returns an incorrect error value if there are no
free inodes available in incore, this error value allocates an inode on-disk
instead of allocating it to the incore. As a result, the same function is called
again resulting in a continuous loop.

RESOLUTION:
The code is modified to return the correct error code.

* 3410567 (Tracking ID: 3410532)

SYMPTOM:
The VxFS file system may hang, if it is mounted with the "tranflush" mount 
option. The following stack-trace is observed: 

swtch_to_thread()
slpq_swtch_core()
real_sleep()
sleep_one()
vx_rwsleep_lock()
vx_ilock()
vx_tflush_inode()
vx_fsq_flush()
vx_tranflush()
$cold_vx_tranidflush()
vx_exh_hashinit()
vx_dexh_create()
vx_dexh_init()
vx_pd_mkdir()
vx_mkdir1_pd()
vx_do_mkdir()
vx_mkdir1()
vx_mkdir()
vns_create()
vn_create()
mkdir()
syscall()

DESCRIPTION:
If the VxFS file system is mounted with the "tranflush" mount option, it may 
cause the thread to be holding the ILOCK and waiting for the same. This can 
lead to a self-deadlock situation which causes the file system to hang.

RESOLUTION:
The code is modified to avoid the self-deadlock situation.

* 3435207 (Tracking ID: 3433777)

SYMPTOM:
A single CPU machine panics due to the safety-timer check when the inodes are 
re-tuned.
The following stack trace is observed:
spinunlock()
vx_ilist_chunkclean()
vx_inode_free_list()
vx_retune_ninode()
vx_do_inode_kmcache_callback()
vx_worklist_thread ()
kthread_daemon_startup ( )

DESCRIPTION:
When the inode cache list is traversed, the vxfsd daemon schedules 
a "vx_do_inode_kmcache_callback" which does not free the CPU between the 
iterations. Thereby, the other threads cannot get access to the CPU. This 
results in panic.

RESOLUTION:
The code is modified to use the sched_yield() function for every iteration 
in "vx_inode_free_list" to free the CPU, so that the other threads get a chance 
to be scheduled.

* 3471150 (Tracking ID: 3150368)

SYMPTOM:
A periodic sync operation on an Encrypted Volume and File System (EVFS) 
configuration may cause the system to panic with the following stack trace:

evfsevol_strategy()
io_invoke_devsw()
vx_writesuper()
vx_fsetupdate()
vx_sync1()
vx_sync0()
$cold_vx_do_fsext()
vx_workitem_process()
vx_worklist_process()
vx_walk_fslist_threaded()
vx_walk_fslist()
vx_sync_thread()
vx_worklist_thread()
kthread_daemon_startup()

DESCRIPTION:
In the EVFS environment, EVFS may get the STALE or garbage value of b_filevp, 
which is not initialized by Veritas File System (VxFS) causing the system to 
panic.

RESOLUTION:
The code is modified to initialize the b_filevp.

* 3471152 (Tracking ID: 3153919)

SYMPTOM:
The fsadm(1M) command may hang when the structural file set re-organization is
in progress. The following stack trace is observed:
vx_event_wait
vx_icache_process
vx_switch_ilocks_list
vx_cfs_icache_process
vx_switch_ilocks
vx_fs_reinit
vx_reorg_dostruct
vx_extmap_reorg
vx_struct_reorg 
vx_aioctl_full
vx_aioctl_common
vx_aioctl
vx_ioctl
vx_compat_ioctl
compat_sys_ioctl

DESCRIPTION:
During the structural file set re-organization, due to some race condition, the
VX_CFS_IOWN_TRANSIT flag is set on the inode. At the final stage of the
structural file set re-organization, all the inodes are re-initialized. Since, 
the VX_CFS_IOWN_TRANSIT flag is set improperly, the re-initialization fails to
proceed. This causes the hang.

RESOLUTION:
The code is modified such that VX_CFS_IOWN_TRANSIT flag is cleared.

* 3471165 (Tracking ID: 3332902)

SYMPTOM:
The system running the fsclustadm(1M) command panics while shutting down. The 
following stack trace is logged along with the panic:

machine_kexec
crash_kexec
oops_end
page_fault [exception RIP: vx_glm_unlock]
vx_cfs_frlpause_leave [vxfs]
vx_cfsaioctl [vxfs]
vxportalkioctl [vxportal]
vfs_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath

DESCRIPTION:
There exists a race-condition between "fsclustadm(1M) cfsdeinit"
and "fsclustadm(1M) frlpause_disable". The "fsclustadm(1M) cfsdeinit" fails
after cleaning the Group Lock Manager (GLM), without downgrading the CFS state.
Under the false CFS state, the "fsclustadm(1M) frlpause_disable" command enters
and accesses the GLM lock, which "fsclustadm(1M) cfsdeinit" frees resulting in a
panic.

Another race condition exists between the code in vx_cfs_deinit() and the code 
in
fsck, and it leads to the situation that although fsck has a reservation
held, but this couldn't prevent vx_cfs_deinit() from freeing vx_cvmres_list
because there is no such a check for vx_cfs_keepcount.

RESOLUTION:
The code is modified to add appropriate checks in the "fsclustadm(1M) 
cfsdeinit" and "fsclustadm(1M) frlpause_disable" to avoid the race-condition.

* 3484316 (Tracking ID: 2555201)

SYMPTOM:
The internal conformance and stress testing on local and cluster mounted file-
system hits a debug assert.

DESCRIPTION:
If more than ten conditions are used in a macro and the macro is called from an
if condition, the HP Itanium based compiler does some code optimization which 
affects the if condition logic. This results in the hitting of various asserts 
during internal testing.

RESOLUTION:
The code is modified such that the macro using more than ten condition is 
replaced by a function.

Patch ID: PHKL_43539

* 2755784 (Tracking ID: 2730759)

SYMPTOM:
The sequential read performance is poor because of the read-ahead issues.

DESCRIPTION:
The read-ahead on sequential reads performed incorrectly because of wrong read-
advisory and the read-ahead pattern offsets are used to detect and perform the 
read-ahead. Also, more sync reads are performed which can affect the 
performance.

RESOLUTION:
The code is modified and the read-ahead pattern offsets are updated correctly 
to detect and perform the read-ahead at the required offsets. The read-ahead 
detection is also modified to reduce the sync reads.

* 2801689 (Tracking ID: 2695390)

SYMPTOM:
TEDed built hits "f:vx_cbdnlc_lookup:3" assert during internal test runs.

DESCRIPTION:
If a vnode having a NULL file system pointer is added to cbdnlc cache and 
later, during some lookup, if vnode is returned, an assert gets hit during 
validation of the vnode.

RESOLUTION:
The code is modified to identify the place where an invalid vnode (whose file 
system pointer is not set) is added to the cache and to prevent it from being 
added to the cbdnlc cache.

* 2857465 (Tracking ID: 2735912)

SYMPTOM:
The performance of tier relocation for moving a large number of files is poor 
when the `fsppadm enforce' command is used.  When looking at the fsppadm(1M) 
command in the kernel, the following stack trace is observed:

vx_cfs_inofindau 
vx_findino
vx_ialloc
vx_reorg_ialloc
vx_reorg_isetup
vx_extmap_reorg
vx_reorg
vx_allocpolicy_enforce
vx_aioctl_allocpolicy
vx_aioctl_common
vx_ioctl
vx_compat_ioctl

DESCRIPTION:
When the relocation is for each file located in Tier 1 to be relocated to Tier 
2, Veritas File System (VxFS) allocates a new reorg inode and all its extents 
in Tier 2. VxFS then swaps the content of these two files and deletes the 
original file. This new inode allocation which involves a lot of processing can 
result in poor performance when a large number of files are moved.

RESOLUTION:
The code is modified to develop a reorg inode pool or cache instead of 
allocating it each time.

* 2930507 (Tracking ID: 2215398)

SYMPTOM:
Internal stress test in the cluster environment hits an assert.

DESCRIPTION:
The transactions for the partitioned directories need to be initiated without 
the flush operation. Currently, the transaction flag is not set properly to 
initiate the transactions without the flush operation for the partitioned 
directories, this results in the assert to be hit.

RESOLUTION:
The code is modified to set the transaction flag properly, to initiate the 
transactions without the flush operation for the partitioned directories.

* 2932216 (Tracking ID: 2594774)

SYMPTOM:
The f:vx_msgprint:ndebug assert is observed several times in the internal 
Cluster File System (CFS) testing.

DESCRIPTION:
In case of CFS, the "no space left on device" (ENOSPC) error is observed when 
the File Change Log (FCL) is enabled during the reorganization operation. The 
secondary node requests the primary to delegate the allocation units (AUs). The 
primary node delegates an AU which has an exclusion zone set. This returns the 
ENOSPC error. You must retry to get another AU. Currently the retry count for 
getting an AU and allocation failures are set at 3. This retry count can be 
increased.

RESOLUTION:
Code is modified to increase the number of retries when allocation fails 
because the exclusion zones are set on the delegated AU and when the CFS is 
frozen.

* 3011828 (Tracking ID: 2963763)

SYMPTOM:
When thin_friendly_alloc and deliache_enable parameters are enabled, Veritas 
File System (VxFS) may hit the deadlock. The thread involved in the deadlock 
can have the following stack trace:

vx_rwsleep_lock()
vx_tflush_inode()
vx_fsq_flush()
vx_tranflush()
vx_traninit()
vx_remove_tran()
vx_pd_remove()
vx_remove1_pd()
vx_do_remove()
vx_remove1()
vx_remove_vp()
vx_remove()
vfs_unlink()
do_unlinkat

The threads waiting in vx_traninit() for transaction space, displays following 
stack trace:

vx_delay2() 
vx_traninit()
vx_idelxwri_done()
vx_idelxwri_flush()
vx_common_inactive_tran()
vx_inactive_tran()
vx_local_inactive_list()
vx_inactive_list+0x530()
vx_worklist_process()
vx_worklist_thread()

DESCRIPTION:
In the extent allocation code paths, VxFS is setting the IEXTALLOC flag on the 
inode, without taking the ILOCK, with overlapping transactions picking up this 
same inode off the delicache list makes the transaction done code paths to miss 
the IUNLOCK call.

RESOLUTION:
The code is modified to change the corresponding code paths to set the 
IEXTALLOC flag under proper protection.

* 3024028 (Tracking ID: 2899907)

SYMPTOM:
Some file-system operations on a Cluster File System (CFS) may hang with the 
following stack trace. 
vxg_svar_sleep_unlock
vxg_grant_sleep
vxg_cmn_lock
vxg_api_lock
vx_glm_lock
vx_mdele_hold
vx_extfree1
vx_exttrunc
vx_trunc_ext4
vx_trunc_tran2
vx_trunc_tran
vx_cfs_trunc
vx_trunc
vx_inactive_remove
vx_inactive_tran
vx_cinactive_list
vx_workitem_process
vx_worklist_process
vx_worklist_thread
vx_kthread_init
kernel_thread

DESCRIPTION:
In CFS, a node can lock a mdelelock for an extent map while holding a mdelelock 
for a different extent map locked. This can result in a deadlock between 
different nodes in the cluster.

RESOLUTION:
The code is modified to prevent the deadlock between different nodes in the 
cluster.

* 3024042 (Tracking ID: 2923105)

SYMPTOM:
Removing the Veritas File System (VxFS) module using rmmod(8) on a system 
having heavy buffer cache usage may hang.

DESCRIPTION:
When a large number of buffers are allocated from the buffer cache, at the time 
of removing VxFS module, the process of freeing the buffers takes a long time.

RESOLUTION:
The code is modified to use an improved algorithm which prevents it from 
traversing the free lists even if it has found the free chunk. Instead, it will 
break out from the search and free that buffer.

* 3024049 (Tracking ID: 2926684)

SYMPTOM:
On systems with heavy transactions workload like creation, deletion of files 
and so on, the system may panic with the following stack trace:
a|..
vxfs:vx_traninit+0x10
vxfs:vx_dircreate_tran+0x420
vxfs:vx_pd_create+0x980
vxfs:vx_create1_pd+0x1d0
vxfs:vx_do_create+0x80
vxfs:vx_create1+0xd4
vxfs:vx_create+0x158
a|..

DESCRIPTION:
In case of a delayed log, a transaction commit can complete before completing 
the log write. The memory for transaction is freed before logging the 
transaction and corrupts the transaction freelist causing the system to panic.

RESOLUTION:
The code is modified such that the transaction is not freed untill the log is 
written.

* 3024052 (Tracking ID: 2906018)

SYMPTOM:
In the event of a system crash, the fsck-intent-log is not replayed and file 
system is marked clean. Subsequently, mounting the file-system-extended 
operations is not completed.

DESCRIPTION:
Only when a file system that contains PNOLTs is mounted locally (mounted 
without using 'mount -o cluster') are potentially exposed to this issue. 

The reason why fsck silently skips the intent-log replay is that each PNOLT has 
a flag to identify whether the intent-log is dirty or not - in the event of a 
system crash this flag signifies whether intent-log replay is required or not. 
In the event of a system crash whilst the file system was mounted locally and 
the PNOLTs are not utilized. The fsck intent-log replay will still check for 
the flags in the PNOLTs, however, these are the wrong flags to check if the 
file system was locally mounted. The fsck intent-log replay therefore assumes 
that the intent-logs are clean (because the PNOLTs are not marked dirty) and it 
therefore skips the replay of intent-log altogether.

RESOLUTION:
The code is modified such that when PNOLTs exist in the file system, VxFS will 
set the dirty flag in the CFS primary PNOLT while mounting locally. With this 
change, in the event of system crash whilst a file system is locally mounted, 
the subsequent fsck intent-log replay will correctly utilize the PNOLT 
structures and successfully replay the intent log.

* 3024088 (Tracking ID: 3008451)

SYMPTOM:
On a cluster mounted filesystem hastop -all command may panic some of the nodes 
with the following stack trace. 

vxfs:vx_is_fs_disabled_implamf:is_fs_disabled
amf:amf_ev_fsoff_verify
amf:amf_event_reg
amf:amfioctl
amf:amf_ioctl
specfs:spec_ioctl
genunix:fop_ioctl
genunix:ioctl

DESCRIPTION:
vx_is_fs_disabled_impl function which is called during the umount operation 
(triggered by hastop-all)  traverses the vx_fsext_list one by one and returns 
true if the file system is disabled.  While traversing this list, It also 
accesses filesystems which have fse_zombie flag which denoted that the 
filesystem is in unstable state some pointers may be NULL which when accessed 
would panic the machine with the above mentioned stack trace.

RESOLUTION:
The code is modified to skip fsext with fse_zombie flag set since fse_zombie 
flag set implies fsext is in unstable state.

* 3131795 (Tracking ID: 2912089)

SYMPTOM:
On a Cluster mounted File System which is highly fragmented, a grow file 
operation may hang with the following stack traces. 

T1: 
vx_event_wait+0001A8 
vx_async_waitmsg+000174
vx_msg_send+0006B0005BC
vx_cfs_pagealloc+00023C
vx_alloc_getpage+0002DC
vx_do_getpage+001618 
vx_mm_getpage+0000B4
vx_internal_alloc+00029C 
vx_write_alloc+00051C 
vx_write1+0014D4
vx_write_common_slow+000EB0
vx_write_common+000C34
vx_rdwr_attr+0002C4

T2:
vx_glm_lock+000120
vx_genglm_lock+0000B0
vx_iglock3+0004B4
vx_iglock2+0005E4
vx_iglock+00004C
vx_write1+000E70
vx_write_common_slow+000EB0
vx_write_common+000C34
vx_rdwr_attr+0002C4

DESCRIPTION:
While growing a file a transaction is performed to allocate extents. CFS can 
only allow up to a maximum number of sub transactions within a transaction. 
When the maximum limit for sub transactions is reached, CFS retries the 
operation. If the file system is badly fragmented then CFS goes into an 
infinite loop due to crossing maximum sub transaction limit in every retrial.

RESOLUTION:
Code is modified to specify a maximum retry limit and abort the operation with 
ENOSPC error after the retry limit is reached.

* 3131824 (Tracking ID: 2966277)

SYMPTOM:
Systems with high file-system activity like read/write/open/lookup may panic 
with the following stack trace due to a rare race condition:
spinlock+0x21 ( )
 ->  vx_rwsleep_unlock()
 vx_ipunlock+0x40()
 vx_inactive_remove+0x530()
 vx_inactive_tran+0x450()
 vx_local_inactive_list+0x30()
 vx_inactive_list+0x420()
 ->  vx_workitem_process()
 ->  vx_worklist_process()
 vx_worklist_thread+0x2f0()
 kthread_daemon_startup+0x90()

DESCRIPTION:
ILOCK is released before doing a IPUNLOCK that causes a race condition. This 
results in a panic when an inode that has been set free is accessed.

RESOLUTION:
The code is modified so that the ILOCK is used to protect the inodes' memory 
from being set free, while the memory is being accessed.

* 3131885 (Tracking ID: 3010444)

SYMPTOM:
On a Network File System (NFS) mounted file system, the operations which read 
the file via the cksum (1m) command may fail with the following error message: 

cksum: read error on <filename>: Bad address
The following error messages would also be seen in the syslog <date:time>  
<system_name>  vmunix: WARNING: 
Synchronous Page I/O error

DESCRIPTION:
When the read-vnode operation (VOP_RDWR) is performed, certain requests are 
converted to direct the I/O for optimisation. However, the NFS buffers passed 
during the read requests are not the user buffers. As a result, there is an 
error.

RESOLUTION:
The code is modified to convert the I/O requests to the direct I/O, only if the 
buffer passed during the I/O is the user buffer.

* 3131920 (Tracking ID: 3049408)

SYMPTOM:
When the system is under the file-cache pressure, the find(1) command takes 
time to operate.

DESCRIPTION:
The Veritas File System (VxFS) does not grow the metadata-buffer cache under 
system or file-cache memory pressure. When the vx_bcrecycle_timelag factor 
drops to zero, the metadata buffers are reused immediately after they are 
accessed. As a result, a large-directory scan takes many physical I/Os to scan 
the directory. The end result is that VxFS ends up performing excessive re-
reads for the same data, into the metadata-buffer cache. However, the file-
cache memory pressure is normal. There is no need to shrink the metadata-buffer 
cache, just because there is a file-cache memory pressure.

RESOLUTION:
The code is modified to unlink the metadata-buffer cache behaviour from the 
file-cache memory pressure.

* 3138653 (Tracking ID: 2972299)

SYMPTOM:
open(O_CREAT) (1m) operation can take upto  0.5 seconds to complete. A high 
value of vxi_bc_reuse   counter is also seen in vxfsstat data.

DESCRIPTION:
After the directory blocks are cached, they are expected to remain in cache 
till they are evicted from the cache. The buffer-cache reuse code uses 
the "lbolt" value to determine the age of the buffer. All the buffers which are 
older than a particular threshold are reused. Errors are introduced to buffer-
reuse calculations because the simple signed-unsigned arithmetic causes the 
buffers to be reused every time. Hence subsequent reads take a longer than 
expected time.

RESOLUTION:
The code is modified so that the variables which store time are correctly 
declared as signed int.

* 3138663 (Tracking ID: 2732427)

SYMPTOM:
The system hangs with the following stacks:

T1:
_spin_lock_irqsave
vx_bc_do_brelse
vx_bc_biodone
vx_inode_iodone
vx_end_io_bp
vx_end_io
blk_update_request
blk_update_bidi_request
__blk_end_request_all
vxvm_end_request
volkiodone
volsiodone
vol_subdisksio_done
volkcontext_process
voldiskiodone
getnstimeofday
voldiskiodone_intr
gendmpiodone
blk_update_request
blk_update_bidi_request
blk_end_bidi_request
scsi_end_request
scsi_io_completion
blk_done_softirq
__do_softirq
call_softirq
do_softirq
irq_exit
do_IRQ
--- <IRQ stack> ---
ret_from_intr
    [exception RIP: vxg_api_deinitlock+147]
vx_glm_deinitlock
vx_cbuf_free_list
vx_workitem_process
vx_worklist_process
vx_worklist_thread
vx_kthread_init
kernel_thread

T2:
_spin_lock
vx_cbuf_rele
vx_bc_getblk
vx_getblk_bp
vx_getblk_clust
vx_getblk_cmn
find_busiest_group
vx_getblk
vx_iupdat_local
vx_cfs_iupdat
vx_iflush_list
vx_iflush
vx_workitem_process
vx_worklist_process
vx_worklist_thread
vx_kthread_init

T3:
_spin_lock_irqsave
vxvm_end_request
volkiodone
volsiodone
vol_subdisksio_done
volkcontext_process
voldiskiodone
getnstimeofday
voldiskiodone_intr
gendmpiodone
blk_update_request
blk_update_bidi_request
blk_end_bidi_request
scsi_end_request
scsi_io_completion
blk_done_softirq
__do_softirq
call_softirq
do_softirq
irq_exit
do_IRQ
--- <IRQ stack> ---
ret_from_intr
    [exception RIP: _spin_lock+9]
vx_cbuf_lookup
vx_getblk_clust
vx_getblk_cmn
find_busiest_group
vx_cfs_iupdat
vx_iflush_list
vx_iflush
vx_workitem_process
vx_worklist_process
vx_worklist_thread
vx_kthread_init
kernel_thread

DESCRIPTION:
There are three locks which constitutes the dead lock. They include a volume 
manager lock (L1), a buffer list lock (L2), and cluster buffer list lock (L3). 
T1, which tries to release a buffer for I/O completion, holds a volume manager 
spin lock (L1) and waits for a buffer free list lock (L2). T2 is the owner of 
L2. The T2 is chasing a cluster buffer lock (L3) to release its affiliated 
cluster buffer. When T3 tries to obtain the L3 an unexpected disk interrupt 
happens, which processes an iodone job. As a result, T3 in volume manager layer 
is stuck by the volume manager lock L1, which causes a deadlock.

RESOLUTION:
The code is modified so that in vx_bc_getblk, buffer list lock is dropped 
before acquiring cluster buffer list lock.

This means that the "df -i" count will rarely be accurate. However once an 
Inode Allocation Unit (IAU) has all its inodes allocated, its delegation will 
now timeout. As files are created, the IAU delegations will timeout one by one 
after 3 minutes of inactivity, thus allowing the CFS primary to obtain more 
accurate df -i count information.
As the number of files in the file system grows, any remaining "df -i" count 
inaccuracy, due to the current CFS secondary IAU delegations will become 
increasingly irrelevant. 
The

* 3138668 (Tracking ID: 3121933)

SYMPTOM:
The pwrite()  function fails with EOPNOTSUPP when the write range is in two 
indirect extents.

DESCRIPTION:
When the range of pwrite() falls in two indirect extents (one ZFOD extent 
belonging to DB2 pre-allocated files created with setext( , VX_GROWFILE, ) 
ioctl and another DATA extent belonging to adjacent INDIR) write fails with 
EOPNOTSUPP. The reason is that VxFS is trying to coalesce extents which belong 
to different indirect address extents as part of this transaction - such a meta-
data change consumes more transaction resources which VxFS transaction engine 
is unable to support in the current implementation.

RESOLUTION:
Code is modified to retry the transaction without coalescing the extents, as 
latter is an optimisation and should not fail write.

* 3138675 (Tracking ID: 2756779)

SYMPTOM:
Write and read performance concerns on Cluster File System (CFS) when running 
applications that rely on POSIX file-record locking (fcntl).

DESCRIPTION:
The usage of fcntl on CFS leads to high messaging traffic across nodes thereby 
reducing the performance of readers and writers.

RESOLUTION:
The code is modified to cache the ranges that are being file-record locked on 
the node. This is tried whenever possible to avoid broadcasting of messages 
across the nodes in the cluster.

* 3138695 (Tracking ID: 3092114)

SYMPTOM:
The information output by the "df -i" command can often be inaccurate for 
cluster mounted file systems.

DESCRIPTION:
In Cluster File System 5.0 release a concept of delegating metadata to nodes in 
the cluster is introduced. This delegation of metadata allows CFS secondary 
nodes to update metadata without having to ask the CFS primary to do it. This 
provides greater node scalability. 
However, the "df -i" information is still collected by the CFS primary 
regardless of which node (primary or secondary) the "df -i" command is executed 
on.

For inodes the granularity of each delegation is an Inode Allocation Unit 
[IAU], thus IAUs can be delegated to nodes in the cluster.
When using a VxFS 1Kb file system block size each IAU will represent 8192 
inodes.
When using a VxFS 2Kb file system block size each IAU will represent 16384 
inodes.
When using a VxFS 4Kb file system block size each IAU will represent 32768 
inodes.
When using a VxFS 8Kb file system block size each IAU will represent 65536 
inodes.
Each IAU contains a bitmap that determines whether each inode it represents is 
either allocated or free, the IAU also contains a summary count of the number 
of inodes that are currently free in the IAU.
The ""df -i" information can be considered as a simple sum of all the IAU 
summary counts.
Using a 1Kb block size IAU-0 will represent inodes numbers      0 -  8191
Using a 1Kb block size IAU-1 will represent inodes numbers   8192 - 16383
Using a 1Kb block size IAU-2 will represent inodes numbers  16384 - 32768
etc.
The inaccurate "df -i" count occurs because the CFS primary has no visibility 
of the current IAU summary information for IAU that are delegated to Secondary 
nodes.
Therefore the number of allocated inodes within an IAU that is currently 
delegated to a CFS Secondary node is not known to the CFS Primary.  As a 
result, the "df -i" count information for the currently delegated IAUs is 
collected from the Primary's copy of the IAU summaries. Since the Primary's 
copy of the IAU is stale, therefore the "df -i" count is only accurate when no 
IAUs are currently delegated to CFS secondary nodes.
In other words - the IAUs currently delegated to CFS secondary nodes will cause 
the "df -i" count to be inaccurate.
Once an IAU is delegated to a node it can "timeout" after a 3 minutes  of 
inactivity. However, not all IAU delegations will timeout. One IAU will always 
remain delegated to each node for performance reasons. Also an IAU whose inodes 
are all allocated (so no free inodes remain in the IAU) it would not timeout 
either.
The issue can be best summarized as:
The more IAUs that remain delegated to CFS secondary nodes, the greater the 
inaccuracy of the "df -i" count.

RESOLUTION:
Allow the delegations for IAU's whose inodes are all allocated (so no free 
inodes in the IAU) to "timeout" after 3 minutes of inactivity.

* 3141278 (Tracking ID: 3066116)

SYMPTOM:
The system panics due to NULL pointer dereference with the following stack 
trace:

a|
bubbleup
vx_worklist_process
vx_worklist_thread
a|

DESCRIPTION:
To prevent too many running inactive threads, two adb 
tunables "vx_inactive_throttling" and "vx_inactive_process_throttling" are 
introduced to fix the issue of vxfsd taking lot of CPU time after deleting some 
large directories.
A bug in the code increments a local counter from 0 to 1. This in turn affects 
inactive work item dispatch. As a result, the empty work items are added to the 
local batch of work items. The system panics while processing this empty work 
item.

RESOLUTION:
The code is modified not to incremented the counter.

* 3141428 (Tracking ID: 2972183)

SYMPTOM:
"fsppadm enforce"  takes longer than usual time force update the secondary 
nodes than it takes to force update the primary nodes.

DESCRIPTION:
The ilist is force updated on secondary node. As a result the performance on 
the secondary becomes low.

RESOLUTION:
Force update the ilist file on Secondary nodes only on error condition.

* 3141433 (Tracking ID: 2895743)

SYMPTOM:
It takes a longer than usual time for many Windows7 clients to log off in 
parallel if the user profile is stored in Cluster File system (CFS).

DESCRIPTION:
Veritas File System (VxFS) keeps file creation time/full ACL things for samba 
clients in the extended attribute which is implemented via named streams. VxFS 
reads the named stream for each of the ACL objects. Reading of named stream is 
a costly operation, as it results in an open, an opendir, a lookup, and another 
open to get the fd. The VxFS function vx_nattr_open() holds the exclusive 
rwlock to read an ACL object that stored as extended attribute. It may cause 
heavy lock contention when many threads want the same lock. They might get 
blocked until one of the nattr_open releases it. This takes time since 
nattr_open is very slow.

RESOLUTION:
The code is modified so that it takes the rwlock in shared mode instead of 
exclusive for linux getxattr code path.

* 3141440 (Tracking ID: 2908391)

SYMPTOM:
Checkpoint removal takes too long if Veritas File System (VxFS) has a large 
number of files. The cfsumount(1M) command could hang if removal of multiple 
checkpoints is in progress for such a file system.

DESCRIPTION:
When removing a checkpoint, VxFS traverses every inode to determine if 
pull/push is needed for upstream/downstream checkpoint in its chain. This is 
time consuming if the file system has large number of files. This results in 
the slow checkpoint removal.

The command "cfsumount -c fsname" forces the umounts operation on a VxFS file 
system if there is any asynchronous checkpoint removal job in progress by 
checking if the value of vxfs stat "vxi_clonerm_jobs" is larger than zero. 
However, the stat does not count in the jobs in the checkpoint removal working 
queue and the jobs are entered into the working queue.  The "force umount" 
operation does not happen even if there are pending checkpoint removal jobs 
because of the incorrect value of "vxi_clonerm_jobs" (zero).

RESOLUTION:
For slow checkpoint removal issue: 
Code is modified to create multiple threads to work on different Inode 
Allocation Units (IAUs) in parallel and to reduce the inode push work by 
sorting the checkpoint removal jobs by the creation time in ascending order and 
enlarged the checkpoint push size.

For the cfsumount(1M) command hang issue: 
Code is modified to add the counts of jobs in the working queue in 
the "vxi_clonerm_jobs" stat.

* 3141445 (Tracking ID: 3003679)

SYMPTOM:
The file system hangs when doing fsppadm and removing a file with named stream 
attributes (nattr) at the same time. The following two typical threads are 
involved: 

T1:
COMMAND: "fsppadm"
schedule at
 vxg_svar_sleep_unlock
vxg_grant_sleep
 vxg_cmn_lock
 vxg_api_lock
 vx_glm_lock
 vx_ihlock
 vx_cfs_iread
 vx_iget
 vx_traverse_tree
vx_dir_lookup
vx_rev_namelookup
vx_aioctl_common
vx_ioctl
vx_compat_ioctl
compat_sys_ioctl
T2:
COMMAND: "vx_worklist_thr"
 schedule
 vxg_svar_sleep_unlock
 vxg_grant_sleep
 vxg_cmn_lock
 vxg_api_lock
 vx_glm_lock
 vx_genglm_lock
 vx_dirlock
 vx_do_remove
 vx_purge_nattr
vx_nattr_dirremove
vx_inactive_tran
vx_cfs_inactive_list
vx_inactive_list
vx_workitem_process
vx_worklist_process
vx_worklist_thread
vx_kthread_init
kernel_thread

DESCRIPTION:
The file system hangs due to the deadlock between the threads. T1 initiated by 
fsppadm calls vx_traverse_tree to obtain the path name for a given inode 
number. T2 removes the inode as well as its affiliated nattr inodes.
The reverse name lookup (T1) holds the global dirlock in vx_dir_lookup during 
the lookup process. It traverses the entire path from bottom to top to resolve 
the inode number inversely in vx_traverse_tree. During the lookup, VxFS needs 
to hold the hlock of each inode to read them, and drop it after reading.
The file removal (T2) is processed via vx_inactive_tran which will take 
the "hlock" of the inode being removed. After that, it will remove all its 
named attribute inodes invx_do_remove, where sometimes the global dirlock is 
needed. Eventually, each thread waits for the lock, which is held by the other 
thread and this result in the deadlock.

RESOLUTION:
The code is modified so that the dirlock is not acquired during reserve name 
lookup.

* 3142476 (Tracking ID: 3072036)

SYMPTOM:
Reads from secondary node in CFS can sometimes fail with ENXIO (No such device 
or address).

DESCRIPTION:
The incore attribute ilist on secondary node is out of sync with that of the 
primary.

RESOLUTION:
The code is modified so that the incore attribute ilist on secondary node is 
not out of sync with that of the primary.

* 3159607 (Tracking ID: 2779427)

SYMPTOM:
On a Cluster Mounted File-system read/write/lookup operation may mark the file-
system for a full fsck. The following messages are seen in the system log:

vxfs: msgcnt <count> mesg 096: V-2-96: vx_setfsflags - <vol_name>  file system 
fullfsck flag set - vx_cfs_iread

DESCRIPTION:
The in-core ilist (inode list) on secondary node is not synchronized with the 
primary node.

RESOLUTION:
The code is modified to retry the operation a fixed number of times before 
giving out the error.

* 3160205 (Tracking ID: 3157624)

SYMPTOM:
The fcntl() system call when used for file share reservations (F_SHARE command) 
can cause a memory leak in Cluster File System (CFS). The memory leak is 
observed in the "ALLOCB_MBLK_LM" arena.
Stack trace for the leak (as seen in HP-UX vmtrace) is as follows:
 
$cold_kmem_arena_varalloc+0xd0
allocb+0x880
llt:llt_msgalloc+0xa0
gab:gab_mem_allocmsg+0x70
gab:gab_allocmsg+0x20
vx_msgalloc+0x70
vx_recv_shrlock+0x60
vx_recv_rpc+0x100

DESCRIPTION:
In CFS, file share reservation requests are broadcasted to all the nodes in the 
cluster to check for conflicts. Due to a bug in the code, the system cannot 
free the response messages received. This results in a memory leak for every 
broadcast of the "file share reservation" message.

RESOLUTION:
The code is modified to free the response message received.

* 3207096 (Tracking ID: 3192985)

SYMPTOM:
Checkpoints quota usage on CFS can be negative.
An example is as follows:
Filesystem     hardlimit     softlimit        usage         action_flag
/sofs1         51200         51200     18446744073709490176  << negative

DESCRIPTION:
In CFS, to manage the intent logs, and the other extra objects required for 
CFS, a holding object referred to as a per-node-object-location table (PNOLT) 
is created. In CFS, the quota usage is calculated by reading the per node cut 
(current usage table) files (member of PNOLT) and summing up the quota usage 
for each clone clain. However, when the quotaoff and quotaon operations are 
fired on a CFS checkpoint, the usage shows "0" after these two operations are 
executed. This happens because the quota usage calculation is skipped. 
Subsequently, if a delete operation is performed, the usage becomes negative 
since the blocks allocated for the deleted file are subtracted from zero.

RESOLUTION:
The code is modified such that when the quotaon operation is performed, the 
quota usage calculation is not skipped.

* 3226404 (Tracking ID: 3214816)

SYMPTOM:
When you create and delete the inodes of a user frequently with the DELICACHE 
feature enabled, the user quota file becomes corrupt.

DESCRIPTION:
The inode DELICACHE feature causes this issue. This feature optimizes the 
updates on the inode map during the file creation and deletion operations. It 
is enabled by default. You can disable this feature with the vxtunefs(1M) 
command.

When DELICACHE is enabled and the quota is set for Veritas File System (VxFS), 
VxFS updates the quota for the inodes before the inodes are on the DELICACHE 
list and after they are on the inactive list during the removal process. As a 
result, VxFS decrements the current number of user files twice. This causes the 
quota file corruption.

RESOLUTION:
The code is modified to identify the inodes moved to the inactive list from the 
DELICACHE list. This flag prevents the quota being decremented again during the 
removal process.

* 3235517 (Tracking ID: 3240635)

SYMPTOM:
In a CFS environment, when a checkpoint is mount using the mount(1M) command 
the system may panic. The following stack trace is observed:

vx_domount
vx_fill_super 
get_sb_nodev
vx_get_sb_nodev
vx_get_clone_impl
vx_get_clone_sb
do_kern_mount
do_mount
sys_mount

DESCRIPTION:
When a checkpoint is mounted cluster-wide the protocol version is verified. 
However, if the primary fileset (999) is not mounted cluster-wide, some of the 
file system data structures remain uninitialized. This results in a panic.

RESOLUTION:
The code is modified to disable the cluster-wide mount of the checkpoint if the 
primary fileset is mounted locally.

* 3243204 (Tracking ID: 3226462)

SYMPTOM:
On a cluster mounted file-system with unequal CPUs, while doing a lookup 
operation, a node may panic with the stack trace:

vx_dnlc_recent_cookie
vx_dnlc_getpathname
audit_get_pathname_from_dnlc
audit_clean_path
$cold_audit_build_full_dir_name
inline change_p_cdir

DESCRIPTION:
The cause of the panic is of out-of-bounds access in the counters[] array whose 
size is defined by the vx_max_cpu variable. The value of vx_max_cpu can differ 
between the CFS nodes, if the nodes have different number of processors. 
However, the code assumes this value is the same across the cluster.

When propagating inode cookies across the cluster, the counter[] array is 
allocated based on the vx_max_cpu of the current CFS node. If the cookie is 
populated via vx_cbdnlc_populate_cookie(), having a CPU ID from another CFS 
node exceeding the local vx_max_cpu, the function vx_dnlc_recent_cookie() would 
access locations beyond the counter[] array allocated.

RESOLUTION:
The code is modified to detect the out-of-bound access at vx_dnlc_recent_cookie
() and return the ENOENT error.

* 3248982 (Tracking ID: 3272896)

SYMPTOM:
Internal stress test on the local mount hits a deadlock.

DESCRIPTION:
When the rename operation is performed, the directory lock of the file system 
is necessary to check whether the source directory is being renamed to the sub-
directory of itself (directory loop). However, when the directory is 
partitioned this check is not required. This unnecessary directory lock caused 
a deadlock when the directory is partitioned.

RESOLUTION:
The code is modified to take the directory lock in case of the rename operation 
when the directory is partitioned.

* 3249151 (Tracking ID: 3270357)

SYMPTOM:
The fsck (1m) command fails to clean the corrupt file system during the 
internal 'noise' test. The following error message is displayed:
pass0 - checking structural files
pass1 - checking inode sanity and blocks
fileset 999 primary-ilist inode <inode number> mismatched reorg extent map
fileset 999 primary-ilist inode <inode number> bad blocks found clear? (ynq)n
fileset 999 primary-ilist inode <inode number> does not have corresponding 
matchino
clear? (ynq)n

DESCRIPTION:
The inode which is reorged is from the atribute ilist. When a reorg inode is 
allocated, the 2262161th inode from the delicache is referred to. However, this 
inode is from the primary ilist. There is no check in the 'vx_ialloc' that 
forces an attribute list inode's corresponding reorg inode to be allocated from 
the same ilist. But the fsck (1M) code expects the source and the reorg inode 
to be from the same ilist. So that when the reorg inode is examined from the 
primary ilist, it checks the corresponding source inode. Also, in the primary 
list the VX_IEREORG is not set. Thereby, an error message is displayed.

RESOLUTION:
The code is modified to the add a check for 'vx_ialloc' to ensure that the 
reorg inode is allocated from the same ilist.

* 3261334 (Tracking ID: 3228646)

SYMPTOM:
NFSv4 server may panic with the following stack trace, when fcntl() requests 
with F_SETLK are made on CFS:
 
vx_vn_inactive+0xf0 
vn_rele_inactive+0x140 
vfs_free_iflist+0x100 
rfs4_op_release_lockowner+0x4c0 
rfs4_compound+0x430 
common_dispatch+0xc10 
rfs_dispatch+0x40 
svc_getreq+0x250 
svc_run+0x300 
svc_do_run+0xd0 
nfssys+0x7c0 
hpnfs_nfssys+0x60

DESCRIPTION:
In a CFS configuration, if fcntl(1m) fails, some NFS specific structures(I_pid) 
are not updated correctly and may point to stale information. This causes the 
NFSv4 server to panic.

RESOLUTION:
The code is modified to preserve the l_pid value during a failed fcntl(F_SETLK) 
request.

* 3261782 (Tracking ID: 3240403)

SYMPTOM:
The fidtovp() system call can panic in the vx_itryhold_locked() function with 
the following stack trace:

vx_itryhold_locked
vx_iget
vx_common_vget
vx_do_vget
vx_vget_skey
vfs_vget
fidtovp
kernel_add_gate_cstack
nfs3_fhtovp
rfs3_write
rfs_dispatch
svc_getreq
threadentry
[kdb_read_mem]

DESCRIPTION:
Some VxFS operations like the vx_vget() function try to get a hold on an in-
core inode using the vx_itryhold_locked() function, without taking the lock on 
the corresponding directory inode. This might lead to a race condition when 
this inode is present on the delicache list and is inactivated Thereby this 
results in a panic when the vx_itryhold_locked() function tries to remove it 
from a free list.

RESOLUTION:
The code is modified to take the delicahe lock before unlocking the ilist lock 
inside the vx_inactive() function when the IDELICACHE flag is set. This 
prevents the race condition.

* 3262025 (Tracking ID: 3259634)

SYMPTOM:
A CFS that has more than 4 GB blocks is corrupted due to some file system 
metadata being zeroed out incorrectly. The blocks which get zeroed out may 
contain any metadata or file data and can be located anywhere on the disk. The 
problem occurs only with the following file system size and the FS block size 
combinations:

1kb block size and FS size > 4TB
2kb block size and FS size > 8TB
4kb block size and FS size > 16TB
8kb block size and FS size > 32TB

DESCRIPTION:
When a CFS is mounted for the first time on the secondary node, a per-node-
intent log is created. When the intent log is created, the blocks newly 
allocated to it are zeroed out. The start offset and the length to be cleared 
is passed to the block that clears the routine. Due to a miscalculation a wrong 
start offset is passed. This results in the disk content at that offset getting 
zeroed out incorrectly. This content can be file system metadata or file data. 
If it is the metadata, this corruption is detected when the metadata is 
accessed and the file system is marked for full fsck(1M).

RESOLUTION:
The code is modified so that the correct start offset is passed to the block 
that clears the routine.

Patch ID: PHKL_43432

* 3042340 (Tracking ID: 2616622)

SYMPTOM:
The performance of the mmap() function is slow when the file system block size 
is 8 KB and the page size is 4 KB.

DESCRIPTION:
When the file system block size is 8 KB, the page size is 4 KB, and the mmap() 
function is performed on an 8 KB file, the file gets represented in memory as 
two pages (0 and 1). When the memory at offset 0 in the mapping is modified, a 
page fault occurs for page 0 in the file. When that disk 
block is allocated and marked as valid, the page mentioned in the fault request 
is expected to get flushed out to the disk and therefore, it is left 
uninitialized on the disk by default. Only that particular page is cleaned in 
memory and left modified so that it is known that the data in memory ismore 
recent than the data on disk. However, the other half of the block (which could 
eventually be mapped to page 1) gets cleared with a synchronous write because 
such a fault may not occur. This synchronous clearing of the other half of 8 KB 
block causes performance degradation.

RESOLUTION:
The code is modified to expand the range of the fault to cover the entire 8 KB 
block. The message from the OS asking for only one page is ignored and two 
pages are given to cover the entire file system block to save the separate 
synchronous clearing of the other half of 8 KB block.

* 3042341 (Tracking ID: 2555198)

SYMPTOM:
On HPUX 11.31 binary mode, File Transfer Protocol (FTP) transfer uses the 
sendfile() interface, which  does not create the DMAPI events for Hierarchical 
Storage Management (HSM).

DESCRIPTION:
The sendfile() interface does not call the Veritas File System (VxFS) read() 
function that creates the DMAPI events. It uses the HP Unified File Cache(UFC) 
interface. The UFC interface is not aware of the HSM application. As a result, 
the DMAPI events are not generated.

RESOLUTION:
The code is modified to set a flag in the vfs structure during the mount time, 
to indicate if the file system is configured under HSM. This flag information 
is used by the UFC interface to generate the DMAPI events.

* 3042352 (Tracking ID: 2806466)

SYMPTOM:
A reclaim operation on a filesystem mounted on a Logical Volume Manager (LVM)
volume using the fsadm(1M) command with the 'R' option may panic the system and 
the following stack trace is displayed:
vx_dev_strategy+0xc0() 
vx_dummy_fsvm_strategy+0x30() 
vx_ts_reclaim+0x2c0() 
vx_aioctl_common+0xfd0() 
vx_aioctl+0x2d0() 
vx_ioctl+0x180()

DESCRIPTION:
Thin reclamation is supported only on the file systems mounted on a Veritas 
Volume Manager (VxVM) volume.

RESOLUTION:
The code is modified to error out gracefully if the underlying volume is LVM.

* 3042357 (Tracking ID: 2750860)

SYMPTOM:
On a large file system(4TB or greater), the performance of the write(1) 
operation with many small request sizes may degrade, and many threads may be 
found sleeping with the following stack trace:
real_sleep
sleep_one
vx_sleep_lock
vx_lockmap
vx_getemap
vx_extfind
vx_searchau_downlevel
vx_searchau_downlevel
vx_searchau_downlevel
vx_searchau_downlevel
vx_searchau_uplevel
vx_searchau
vx_extentalloc_device
vx_extentalloc
vx_te_bmap_alloc
vx_bmap_alloc_typed
vx_bmap_alloc
vx_write_alloc3
vx_recv_prealloc
vx_recv_rpc
vx_msg_recvreq
vx_msg_process_thread
kthread_daemon_startup

DESCRIPTION:
For a cluster-mounted file system, the free-extend-search algorithm is not 
optimized for a large file system (4TB or greater), and for instances where 
the number of free Allocation Units (AUs) available can be very large.

RESOLUTION:
The code is modified to optimize the free-extend-search algorithm by skipping 
certain AUs. This reduces the overall search time.

* 3042373 (Tracking ID: 2874172)

SYMPTOM:
Network File System (NFS) file creation thread might loop continuously 
with the following stack trace:

vx_getblk_cmn(inlined)
vx_getblk+0x3a0 
vx_exh_allocblk+0x3c0
vx_exh_hashinit+0xa50
vx_dexh_create+0x240
vx_dexh_init+0x8b0 
vx_do_create+0x1e0 
vx_create1+0x1d0 
vx_create0+0x270
vx_create+0x40
rfs3_create+0x420
common_dispatch+0xb40
rfs_dispatch+0x40
svc_getreq+0x250
svc_run+0x310  
svc_do_run+0xd0
nfssys+0x6a0  
hpnfs_nfssys+0x60  
coerce_scall_args+0x130 
syscall+0x590

DESCRIPTION:
The Veritas File System (VxFS) file creation vnode operation (VOP) 
routine expects the parent vnode to be a directory vnode pointer. But, the NFS 
layer passes a stale file vnode pointer by default. This might cause unexpected 
results such as hang during VOP handling.

RESOLUTION:
The code is modified to check for the vnode type of the parent 
vnode pointer at the beginning of the create VOP call and return an error if it 
is not a directory vnode pointer.

* 3042407 (Tracking ID: 3031869)

SYMPTOM:
In a multi-CPU environment, the "vxfsstat -b" command does not print the 
correct information on the maximum-size buffer.

DESCRIPTION:
The "vx_bc_bufhwm" parameter represents the maximum amount of memory that can 
be used to cache the VxFS metadata. When the kctune(1M) command is used to tune 
the "vxfs_bc_bufhwm" parameter to a different value, the tunable is not set 
correctly due to the incorrect arithmetic. As a consequence, the "vxfsstat -b" 
command reports the maximum-size buffer to be increased, even though 
the "vxfs_bc_bufhwm" parameter is tuned to a lower value.

RESOLUTION:
The code is modified to correct the arithmetic for tuning the "vx_bc_bufhwm" 
parameter.

* 3042427 (Tracking ID: 2850738)

SYMPTOM:
The Veritas File System (VxFS) module allocates memory with MEMWAIT in the 
callback() routine during the low memory condition. This causes the system to 
hang with the following
stack trace:
swtch_to_thread(inlined)
slpq_swtch_core+0x520   real_sleep(inlined)
sleep+0x400 
mrg_reserve_swapmem(inlined)
$cold_steal_swap+0x460
$cold_kalloc_nolgpg+0x4b0 
kalloc_internal(inlined)
$cold_kmem_arena_refill+0x650
kmem_arena_varalloc+0x280 
vx_alloc(inlined)vx_worklist_enqueue+0x40 
vx_buffer_kmcache_callback+0x160
kmem_gc_arena(inlined)
foreach_arena_ingroup+0x840
kmem_garbage_collect_group(inlined)
kmem_garbage_collect+0x390 
kmem_arena_gc+0x240 
kthread_daemon_startup+0x90

DESCRIPTION:
The VxFS kernel memory callback() routine allocates memory with MEMWAIT. As a 
result, the system hangs in low memory condition.

RESOLUTION:
The code is modified to allocate memory without waiting in the VxFS kernel 
memory callback() routine.

* 3042479 (Tracking ID: 3042460)

SYMPTOM:
On high end configurations (single npar or stand alone systems) having more 
than 128 number of processing units, where the work load frequently involves 
translating a pathname to vnode such as open, stat, find etc, may observe 
reduced performance due to vnode spin lock contention.

DESCRIPTION:
Efficient locking technique is not used in the pathname traversal component of 
VxFS.

RESOLUTION:
The code has been modified to use a locking mechanism called "shared write 
spinlocks" which provides an efficient means to lock a vnode. To make use of 
this new locking, the following set of products, including this patch, has to 
be installed on the system: 
PHKL_43180
PHKL_43178
SyncShwrspl
VfsShwrsplEnh

* 3042501 (Tracking ID: 3042497)

SYMPTOM:
On high end configurations (single npar or stand alone systems) having more 
than 128 number of processing units, contention is seen on the locks to 
serialize the increment and decrement of active levels used to serialize the 
file-system activity.

DESCRIPTION:
In the current implementation, the active level is incremented or decremented 
by taking a spin lock leading to the contention on the locks.

RESOLUTION:
The code has been enhanced to enable incrementing or decrementing the active 
level atomically.

* 3047980 (Tracking ID: 2439261)

SYMPTOM:
When vx_fiostats_tunable is changed from zero to non-zero, the system panics 
with the following stack trace:
panic_save_regs_switchstack+0x110 ()
panic+0x410 ()
bad_kern_reference+0xa0 ()
$cold_pfault+0x5c0 ()
vm_hndlr+0x370 ()
bubbleup+0x880 ()
vx_fiostats_do_update+0x140 ()
vx_fiostats_update+0x170 ()
vx_read1+0x10e0 ()
vx_rdwr+0x790 ()
vno_rw+0xd0 ()
rwuio+0x32f ()
pread+0x121 ()
syscall+0x590 ()
in ?? ()

DESCRIPTION:
When vx_fiostats_tunable is changed from zero to non-zero, all the incore-inode 
fiostats attributes are set to NULL. When these attributes are accessed, the 
system panics due to the NULL pointer dereference.

RESOLUTION:
The code has been modified such that when vx_fiostats_tunable is changed from 
zero to non-zero, it is verified if the fiostats attributes of inode are NULL 
or not. This will prevent the panic.

* 3073371 (Tracking ID: 3073372)

SYMPTOM:
Contention as observed in the lookup-code path when the maximum number of the 
partition-directory level is set to 3 and the the default partition-directory 
threshold is 32000.

DESCRIPTION:
An enhancement is required to change the default maximum number of the 
partition-directory level to 2 and the default partition-directory threshold 
(directory size beyond which partition directories come into effect) to 32768.

RESOLUTION:
An enhancement is made to change the default maximum number of the partition-
directory level to 2 and the default partition-directory threshold to 32768. 
The man pages are updated to reflect these changes accordingly.



INSTALLING THE PATCH
--------------------
1.Installing VxFS 5.1 SP1RP3P11 patch:

a)If you install this patch on a CVM cluster, install it one system at a time so that all the nodes are not brought down simultaneously.

b)VxFS 5.1(GA)  must be installed before applying these patches.

c)To verify the VERITAS file system level, enter:

     # swlist -l product | egrep -i 'VRTSvxfs'

  VRTSvxfs     5.1.100.000        VERITAS File System

d)All prerequisite/corequisite patches have to be installed.The Kernel patch requires a system reboot for both installation and removal.

e)To install the patch, enter the following command:

# swinstall -x autoreboot=true -s &amp;lt;patch_directory&amp;gt;     PHCO_44712  PHKL_44710

Incase the patch is not registered, the patch can be registered using the following command:

# swreg -l depot &amp;lt;patch_directory&amp;gt;  ,

where  &amp;lt;patch_directory&amp;gt;   is the absolute path where the patch resides.


REMOVING THE PATCH
------------------
Removing VxFS 5.1 SP1RP3P11 patches:

a)To remove the patch, enter the following command:

# swremove -x autoreboot=true PHCO_44712  PHKL_44710


SPECIAL INSTRUCTIONS
--------------------
NONE


OTHERS
------
NONE