* * * READ ME * * *
                * * * Veritas File System 5.0 MP1 * * *
                         * * * P-patch 9 * * *
                         Patch Date: 2012-05-09


This document provides the following information:

   * PATCH NAME
   * PACKAGES AFFECTED BY THE PATCH
   * BASE PRODUCT VERSIONS FOR THE PATCH
   * OPERATING SYSTEMS SUPPORTED BY THE PATCH
   * INCIDENTS FIXED BY THE PATCH
   * INSTALLATION PRE-REQUISITES
   * INSTALLING THE PATCH
   * REMOVING THE PATCH


PATCH NAME
----------
Veritas File System 5.0 MP1 P-patch 9


PACKAGES AFFECTED BY THE PATCH
------------------------------
VRTSvxfs
VRTSvxfs


BASE PRODUCT VERSIONS FOR THE PATCH
-----------------------------------
   * Veritas File System 5.0 MP1
   * Veritas Storage Foundation for Oracle RAC 5.0 MP1
   * Veritas Storage Foundation Cluster File System 5.0 MP1
   * Veritas Storage Foundation 5.0 MP1
   * Veritas Storage Foundation High Availability 5.0 MP1
   * Veritas Storage Foundation for Oracle 5.0 MP1


OPERATING SYSTEMS SUPPORTED BY THE PATCH
----------------------------------------
HP-UX 11i v3 (11.31)


INCIDENTS FIXED BY THE PATCH
----------------------------
This patch fixes the following Symantec incidents:

Patch ID: PHCO_42918, PHKL_42919

* 2245447 (Tracking ID: 1537731)

SYMPTOM:
A panic can occur when a write hits an ENOSPC whilst a re-size is in operation.

DESCRIPTION:
The cause of problem here is we get an ENOSPC during a write and re-size it at 
the
same time. We drop the active level do some other task and then again raise it.
The time gap between dropping and raising active level gives a window for the
re-size to come in and do its work. Here, we do not update the related pointer 
and
this results in an panic.

RESOLUTION:
Updated fs pointer after re-aquiring active level.

* 2370062 (Tracking ID: 2370046)

SYMPTOM:
Readahead with read_nstream value <> 1 misses early blocks of data.

DESCRIPTION:
Readahead is not reading portion of the file which is expected to be read 
in advance. These blocks are read on demand rather than actual readahead. This 
is 
happening because the parameter determining next readahead offset is wrongly 
zeroed out.

RESOLUTION:
Changed the code which causes zeroing out of the parameter determining next 
readahead offset.

* 2604150 (Tracking ID: 2559450)

SYMPTOM:
Command fsck_vxfs(1m) may core-dump with SEGV_ACCERR error.

DESCRIPTION:
Command fsck_vxfs(1M) is trying to allocate the memory for a structure, but
memory allocation fails for the structure resulting in the segmentation fault.

RESOLUTION:
A check is added in the code for failed memory allocation to print an error
message instead of a core dump.

* 2696070 (Tracking ID: 2696067)

SYMPTOM:
A file created which inherits the default ACL from parent shows wrong permissions
when getaccess command is issued.

DESCRIPTION:
vx_daccess() does not fabricate any GROUP_OBJ entry (as vx_do_getacl does). If a
newly created file leverages its parent directory's ACL entries

RESOLUTION:
To address this, a fix is to fabricate a GROUP_OBJ entry.

* 2722872 (Tracking ID: 1244756)

SYMPTOM:
A race condition corrupts data structures, causing a NULL pointer dereference.

DESCRIPTION:
There is a race condition between some DNLC functions. One of those functions is
already holding a lock and wants another lock which results in a race condition.
This causes the freelist corruption and a NULL pointer dereference.

RESOLUTION:
Updated DNLC_LOOKUP to make sure DNLC entry is not free and on freelist 
when moving it to tail, avoiding a race with DNLC_GET or DNLC_ENTER.

* 2723003 (Tracking ID: 2670022)

SYMPTOM:
Duplicate file names can be seen in a directory.

DESCRIPTION:
VxFS maintains internal directory name lookup cache (DNLC) to improve the 
performance of directory lookups. A race condition is arising in DNLC lists 
manipulation code during lookup/creation of file names having >32 characters ( 
which will further affect other file creations). This is causing DNLC to have a 
stale entry for an existing file in the directory. A lookup of such a file 
through DNLC will say file as non-existent which will allow another duplicate 
file name in the directory.

RESOLUTION:
Fixed the race condition by protecting the DNLC lists through proper locks.

* 2723005 (Tracking ID: 2651922)

SYMPTOM:
"ls -l" command on Local VxFS file system is running slow and high CPU 
usage is seen on HP platform.

DESCRIPTION:
This issue occurs when system is under inode pressure, and the 
most of inodes on inode free list are CFS inodes. On HP-UX platform, currently, 
CFS inodes are not allowed to be reused as local  inodes to avoid GLM deadlock 
issue when vxFS reconfig is in process. So if system needs a VxFS local inode, 
it has to take a amount of time to loop through all the inode free lists to 
find a local inode, if the free lists are almost filled up with CFS inodes.

RESOLUTION:
1. added a global "vxi_icache_cfsinodes" to count cfs inodes in 
inode cache. 2. relaxed the condition for converting cluster inode to local 
inode when the number of in-core cfs inodes is greater than the 
threshold"vx_clreuse_threshold" and reconfig is not in progress.

* 2723127 (Tracking ID: 2680946)

SYMPTOM:
panic in vx_itryhold+0x40/spinlock() - due to NULL child pointer of dnlc entry.

Data Access Rights Fault in KERNEL mode 
spinlock+0x40 
vx_itryhold+0x40 
vx_dnlc_lookup+0x1b0 
vx_cbdnlc_lookup+0x130 
vx_fast_lookup+0x120 
vx_lookup+0x3c0 
lookuppnvp+0x2d0 
lookuppn+0x90 
lookupname+0x60 
vn_open+0xa0 
copen+0x170 
open+0x80 
syscall+0x920

DESCRIPTION:
The panic is because of NULL pointer de-reference. The reason for NULL pointer
is not known for sure.
One of the possibility for NULL child pointer is while inserting an entry in DNLC.

RESOLUTION:
We have global variable which we increment if we see the DNLC with NULL child
pointer is getting inserted. So when next time issue occurs we can look value of
the global and rule out this possibility.

Patch ID: PHCO_42617, PHKL_42618

* 2289635 (Tracking ID: 1633670)

SYMPTOM:
Panic in vx_inull_list() / vx_inactive() / vx_inode_deinit() after forced unmount
of writable clone and unmount of primary fileset.

DESCRIPTION:
This happens due to accessing of NULL pointer dereferencing.

RESOLUTION:
Do not iflush clone inodes which have already been force unmounted when running
iflush on force unmount of a primary fileset. Beware of null vfsp pointers in
partially processed force-unmounted inodes. Reference the vx_vfs struct through
the fset instead of the vnode.

* 2351018 (Tracking ID: 2253938)

SYMPTOM:
In a Cluster File System (CFS) environment , the file read
performances gradually degrade up to 10% of the original
read performance and the fsadm(1M) -F vxfs -D -E
<mount point> shows a large number (> 70%) of free blocks in
extents smaller than 64k.
For example,
% Free blocks in extents smaller than 64 blks: 73.04
% Free blocks in extents smaller than  8 blks: 5.33

DESCRIPTION:
In a CFS environment, the disk space is divided into
Allocation Units (AUs).The delegation for these AUs is
cached locally on the nodes.

When an extending write operation is performed on a file,
the file system tries to allocate the requested block from
an AU whose delegation is locally cached, rather than
finding the largest free extent available that matches the
requested size in the other AUs. This leads to a
fragmentation of the free space, thus leading to badly
fragmented files.

RESOLUTION:
The code is modified such that the time for which the
delegation of the AU is cached can be reduced using a
tuneable, thus allowing allocations from other AUs with
larger size free extents. Also, the fsadm(1M) command is
enhanced to de-fragment free space using the -C option.

* 2406458 (Tracking ID: 2283893)

SYMPTOM:
In a Cluster File System (CFS) environment , the file read
performances gradually degrade up to 10% of the original
read performance and the fsadm(1M) -F vxfs -D -E
<mount point> shows a large number (> 70%) of free blocks in
extents smaller than 64k.
For example,
% Free blocks in extents smaller than 64 blks: 73.04
% Free blocks in extents smaller than  8 blks: 5.33

DESCRIPTION:
In a CFS environment, the disk space is divided into
Allocation Units (AUs).The delegation for these AUs is
cached locally on the nodes.

When an extending write operation is performed on a file,
the file system tries to allocate the requested block from
an AU whose delegation is locally cached, rather than
finding the largest free extent available that matches the
requested size in the other AUs. This leads to a
fragmentation of the free space, thus leading to badly
fragmented files.

RESOLUTION:
The code is modified such that the time for which the
delegation of the AU is cached can be reduced using a
tuneable, thus allowing allocations from other AUs with
larger size free extents. Also, the fsadm(1M) command is
enhanced to de-fragment free space using the -C option.

* 2410789 (Tracking ID: 1466351)

SYMPTOM:
Mount hangs in vx_bc_binval_cookie like the following stack,
delay
vx_bc_binval_cookie
vx_blkinval_cookie
vx_freeze_flush_cookie
vx_freeze_all
vx_freeze
vx_set_tunefs1
vx_set_tunefs
vx_aioctl_full
vx_aioctl_common
vx_aioctl
vx_ioctl
genunix:ioctl
unix:syscall_trap32

DESCRIPTION:
The hanging process is waiting for a buffer to be unlocked. But that buffer can
only be released if its associated cloned map writes get flushed. But a
necessary flush is missed.

RESOLUTION:
Add code to synchronize cloned map writes so that all the cloned maps will be
cleared and the buffers associated with them will be released.

* 2551555 (Tracking ID: 2428964)

SYMPTOM:
Value of kernel tunable max_thread_proc gets incremented by 1 after every software
maintenance related activity (install, remove etc.) of VRTSvxfs package.

DESCRIPTION:
In the postinstall script for VRTSvxfs package, value of kernel tunable
max_thread_proc is wrongly increment by 1.

RESOLUTION:
From postinstall script increment operation of max_thread_proc tunable is removed.

* 2551569 (Tracking ID: 2510903)

SYMPTOM:
Writing to clones loops permanently on HP-UX 11.31, there are some threads of
the typical stack like following:

vx_tranundo
vx_logged_cwrite
vx_write_clone
vx_write1
vx_rdwr
vno_rw
inline rwuio
write
syscall

DESCRIPTION:
A VxFS write with small size can go to logged write which stores the data in
intent log. The logged write can boost performance for small writes but requires
the write size within logged write limit. However, When we write data to check
points and the write length is greater than logged write limit, vxfs cannot
proceed with logged write and retry forever.

RESOLUTION:
Skipped the logged write if the write size exceeds the specific limit.

* 2556096 (Tracking ID: 2515380)

SYMPTOM:
The ff command hangs and later it exits after program exceeds memory
limit with following error.

# ff -F vxfs   /dev/vx/dsk/bernddg/testvol 
UX:vxfs ff: ERROR: V-3-24347: program limit of 30701385 exceeded for directory
data block list
UX:vxfs ff: ERROR: V-3-20177: /dev/vx/dsk/bernddg/testvol

DESCRIPTION:
'ff' command lists all files on device of vxfs file system. In 'ff' command we
do directory lookup. In a function we save the block addresses for a directory.
For that we traverse all the directory blocks.
Then we have function which keeps track of buffer in which we read directory
blocks and the extent up to which we have read directory blocks. This function
is called with offset and it return the offset up to which we have read the
directory blocks. The offset passed to this function has to be the offset within
the extent. But, we were wrongly passing logical offset which can be greater
than extent size. As a effect the offset returned gets wrapped to 0. The caller
thinks that we have not read anything and hence the loop.

RESOLUTION:
Remove call to function which maintains buffer offsets for reading data. That
call was incorrect and redundant. We actually call that function correctly from
one of the functions above.

* 2561752 (Tracking ID: 2561739)

SYMPTOM:
When the file is created and the if the parent has default ACL entry then that
entry is not taken into account for calculating the class entry of that file. When
a separate dummy entry added we take into account the default entry from parent as
well.
e.g.
$ getacl .
# file: .
# owner: root
# group: sys
user::rwx
group::rwx
class:rwx
other:rwx
default:user:user_one:r-x

$ touch file1
$ getacl file1
# file: try1
# owner: root
# group: sys
user::rw-
user:user_one:r-x
group::rw-
class:rw- <------
other:rw-

The class entry here should be rwx.

DESCRIPTION:
We were not taking into account the default entry of parent. We share the
attribute inode with parent and do not create new attribute inode for newly
created file. But when an ACL entry is explicitly made we create the separate
attribute inode so the default entry also get copied in new inode and taken into
consideration while returning the class entry of file.

RESOLUTION:
Now before returning the ACL entry buffer we calculate the class entry again and
consider all the entries.

* 2561757 (Tracking ID: 2492304)

SYMPTOM:
"find" command displays duplicate directory entries.

DESCRIPTION:
Whenever the directory entries can fit in the inode's immediate area VxFS
doesn't allocate new directory blocks. As new directory entries get added to the 
directory this immediate area gets filled and  all the directory entries are
then moved to a newly allocated directory block. 

The directory blocks have space reserved at the start of the block to hold the 
block hash information which is used for fast lookup of entries in that block. 

Offset of the directory entry, which was at say x bytes in the inode's immediate
area, when moved to the directory block, will be at (x + y) bytes. "y" is the
size of the block hash. 

During this transition phase from immediate area to directory blocks, a
readdir() can report a directory entry more than once.

RESOLUTION:
Directory entry offsets returned to the "readdir" call are adjusted so that when 
the entries move to a new block, they will be at the same offsets.

* 2616625 (Tracking ID: 2616622)

SYMPTOM:
slow mmap() performace when the filesystem block size is 8k and 
pagesize
is 4K.

DESCRIPTION:
When we have an 8k block size file system and 4k pages and mmap say 
a 8K file, the file as represented in memory ends up as two pages (0 and 1). 
When the memory at offset 0 into the mapping is modified, we get a page fault 
for page 0 in the file. However, we haven't had a page fault yet for page 1 and 
can't guarantee that we will in the future. When we allocate that disk block 
and mark it as valid, we trust that the page mentioned in the fault request 
will get flushed out to disk and thereby leave it uninitialized on disk by 
default. We clear just the page in memory and leave it dirty so we know the 
data in memory is more recent than the data on disk. However, the other half of 
the block (which could eventually be mapped to page 1) gets cleared with a 
synchronous write because we don't know if we will ever see a fault. This 
synchronous clearing of the other half of 8K block was causing performance 
degradation.

RESOLUTION:
We now expand the range of the fault to cover whole 8K block. In 
this case we would just ignore that the OS asked for only one page and give it 
two pages anyway to cover the whole file system block to save the separate 
synchronous clearing of the other half of 8K block.

* 2619959 (Tracking ID: 1027438)

SYMPTOM:
Internal noise for VxFS hit "f:vx_statvfs_pri:2" assert on cluster file system

DESCRIPTION:
While updating summaries for Allocation units on node, summaries for all node of
cluster need to be synchronized using broadcast message. The broadcast message is
not sent as recovery for file system is in progress, resulting in wrong free space
calculation on node causing the assert.

RESOLUTION:
Code is changed to send the broadcast message after file system recovery is
complete.


INSTALLING THE PATCH
--------------------
Installing VxFS 5.0 MP1P9 patch:

a)If you install this patch on a CVM cluster, install it one
 system at a time so that all the nodes are not brought down
 simultaneously.

b)VxFS 5.0(GA)  must be installed before applying these
  patches.

c)To verify the VERITAS file system level, enter:

     # swlist -l product | egrep -i 'VRTSvxfs'

  VRTSvxfs     5.0.31.0        VERITAS File System

Note : VRTSfsman is a corequisite for VRTSvxfs.Hence VRTSfsman also
needs to be installed alongwith VRTSvxfs.

    # swlist -l product | egrep -i 'VRTS'

  VRTSvxfs      5.0.31.0      Veritas File System
  VRTSfsman     5.0.31.0      Veritas File System Manuals

d)All prerequisite/corequisite patches have to be installed.The Kernel patch
  requires a system reboot for both installation and removal.

e)To install the patch, enter the following command:

# swinstall -x autoreboot=true -s <patch_directory>  PHCO_42918 PHKL_42919

Incase the patch is not registered, the patch can be registered
using the following command:

# swreg -l depot <patch_directory> ,

where  <patch_directory>  is the absolute path where the patch resides.


REMOVING THE PATCH
------------------
Removing VxFS 5.0 MP1P9 patches:

a)To remove the patch, enter the following command:

# swremove -x autoreboot=true PHCO_42918 PHKL_42919


SPECIAL INSTRUCTIONS
--------------------
NONE


OTHERS
------
MP1P8
======
2289635 Panic in vx_inull_list after forced unmount of writable clone and unmount of prinmary fileset.
2410789 AIX-5.0MP3RP1 - LM-Stress test hung.
2551555 Invoke "increase_tunable" without -i option in postinstall
2551569 Customer looking for rca of bdf hang on 5.0.1RP2 on HP 11.31
2556096 ff_vxfs ERROR: V-3-24347: program limit of 30701385 exceeded
2561752 Class perm changed to "rwx" after adding user ACL entry with null perm.
2561757 File entry is displayed twice if find/ls run immediately after creation
2594549 LM noise.fullfsck.LN4 hit an assert f:vx_flushsuper:ndebug.
2616625 QXCR1001162870 on VxFS 11.31/5.0 : MMAP of sparse files on JFS 5.0 is slower on 11.31 than 11.23
2619959 Solaris cfs.noise hit panic "f:vx_statvfs_pri:2" in 6.0A14 build using sol2.9 3 node cluster
2351018 EAU delegation timeouts
2406458 Add functionality of free space defragmentation through fsadm.

MP1P7
=====
2494165 Memory leak in internal buffercache after 497 days (lbolt wrap-over)

MP1P6
======
224538y Panic in vx_inode_mem_deinit -->  mutex_destroy
2245409 1009: HART Vpars:SYM5.1 6/8:non-telnetable hang after 14+ CHO
2245426 System panic in vx_iupdat_clustblks()
2245471 vx_clone_setup can call vx_ierror when clone inodes are free
2275533 write() system call hangs for over 10 seconds on VxFS 3.5 on 11.23
2277469 CFS panic in vx_irtyhold_locked() due to corrupt freelist
2289635 Panic in vx_inull_list after forced unmount of writable clone and unmount of prinmary fileset.
2348422 Threads stuck in vx_rwsleep_rec_lock_em
2351018 EAU delegation timeouts
2376389 vxrestore man page to add -b option details
2405553 Hit ted assertion "f:fdd_getdev:1" when did verification for e1209803
2405565 11.31/5.0.1:OnlineJFS01.VXFS50-AD-RN configure script errors in IC362 coldinstall
2406455 Panic: "pfd_unlock: bad lock state!"
2406457 Performace issue pointing to read flush behind algorithm
2406458 Add functionality of free space defragmentation through fsadm.

MP1P5
======
2207698 QXCR1000550182 issue, not ported to 4.1/11.23
2207702 vxfsstat does not reflect the change of vx_ninode
2232562 LM noise fullfsck NF1 hits assert f:vx_memlock:1a with 5.1SP1A14 build
2238895 VxFS mmap performance degredation on HP-UX 11.31 and 5.0.1 vx_alloc_getpage() suspected
2245420 fsck fails to repair corrupt directory blocks having duplicate directory entries.
2245423 odmstat with vol_vxtrace 0 shows high WTIME values
2245434 Bloomberg requirements: enable rename() and unlink() calls to succeed on the target file currently under execution
2303153 investigation on ilist HOLE
2340745 One second delays introduced with tranflush mountoption
2348339 QXCR1001094046: Unable to tune vxfs_bc_bufhwm on systems >  227 GB memory
2348363 Request to re-word message generated while performing online deactivation of memory cells.
2348513 Access denied on files inheriting default group ACL from parent directory.
2364911 LM stress aborted due to "run_fsck : Failed to full fsck cleanly".

MP1P4
=====
2223387 QXCR1001081989 - FCL License not provided as part of Online JFS licensing for VxFS 5.0 on 11.31
2233241 CFS noise.replay hung due to busy loop in vx_statvfs
2207725 QXCR1001072570: pread() system call returns EINVAL
2252142 Panic at getblk() when growing a full filesystem with fsadm

MP1P3
=====
2168326 O_SYNC isn't sync enough. also, missing timestamp updates.
2168341 vx_rdwr should handle the EIO returned from vx_tranidflush
2185390 VxFS5.0 quota information gets messed up on large files

MP1P2
=====
2111616 Modification time is not updated when O_SYNC and nodatainlog mount option used.
2129456 vxfsd is taking a lot of CPU time after deleting some directories
2149410 mmap and msync not updating modification/access times on 5.0/11.31
2162976 5.0.1RP1 LM and CFS kernel conformance-> funmount-> testvops hits assert "Fault when executing in kernel mode"

MP1P1
=====
2068826 Hang due to disowned beta semaphore
2070455 getacl shows no permission for user but user can still access file
2079589 reduce contention on vx_worklist_lk using bulk enqueue/dequeue
2080725 dm_punch_hole request does not invalidate pages