* * * READ ME * * *
* * * Veritas File System 5.1 SP1 RP1 * * *
* * * P-patch 1 * * *
Patch Date: 2012-02-07
This document provides the following information:
* PATCH NAME
* PACKAGES AFFECTED BY THE PATCH
* BASE PRODUCT VERSIONS FOR THE PATCH
* OPERATING SYSTEMS SUPPORTED BY THE PATCH
* INCIDENTS FIXED BY THE PATCH
* INSTALLATION PRE-REQUISITES
* INSTALLING THE PATCH
* REMOVING THE PATCH
PATCH NAME
----------
Veritas File System 5.1 SP1 RP1 P-patch 1
PACKAGES AFFECTED BY THE PATCH
------------------------------
VRTSvxfs
BASE PRODUCT VERSIONS FOR THE PATCH
-----------------------------------
* Veritas Storage Foundation for Oracle RAC 5.1 SP1
* Veritas Storage Foundation Cluster File System 5.1 SP1
* Veritas Storage Foundation 5.1 SP1
* Veritas Storage Foundation High Availability 5.1 SP1
OPERATING SYSTEMS SUPPORTED BY THE PATCH
----------------------------------------
HP-UX 11i v3 (11.31)
INCIDENTS FIXED BY THE PATCH
----------------------------
This patch fixes the following Symantec incidents:
Patch ID: PHCO_42719, PHKL_42718
* 2403663 (Tracking ID: 2376382)
SYMPTOM:
vxrestore man page does not specify that vxrestore will fail if -b option is
used while taking dump and block size used is greater than default max i.e. 63
at the same time -b option is not used while restoring from dump.
DESCRIPTION:
If vxdump is used to take the dump and -b option is used with block size greater
than default max i.e. 63 then vxrestore fails to restore this dump if used
without -b options. This is because vxrestore attempts to dynamically determine
the tape block size , up to default maximum of 63.
This is not mentioned in the man page of vxrestore man page.
RESOLUTION:
Added
"vxrestore will attempt to dynamically determine the tape block size , upto the
default maximum of 63.So, if -b option is used when creating a dump, but not
used when restoring the dump, the restore will fail when the tape block size is
specified to be greater than 63."
to -b option of vxrestore man page.
* 2508171 (Tracking ID: 2246127)
SYMPTOM:
Mount command may take more time in case of large IAU file.
DESCRIPTION:
At time of mount, IAU file is read one block at time. The read block is processed
and then next block is read. In case there are huge number of files in filesystem,
IAU file for the filesystem becomes large. Reading of such large IAU file, one
block at time is taking time to complete mount command.
RESOLUTION:
Code is changed to read IAU file using multiple threads in parallel, also now
complete extent is read and then it is processed.
* 2521672 (Tracking ID: 2515380)
SYMPTOM:
The ff command hangs and later it exits after program exceeds memory
limit with following error.
# ff -F vxfs /dev/vx/dsk/bernddg/testvol
UX:vxfs ff: ERROR: V-3-24347: program limit of 30701385 exceeded for directory
data block list
UX:vxfs ff: ERROR: V-3-20177: /dev/vx/dsk/bernddg/testvol
DESCRIPTION:
'ff' command lists all files on device of vxfs file system. In 'ff' command we
do directory lookup. In a function we save the block addresses for a directory.
For that we traverse all the directory blocks.
Then we have function which keeps track of buffer in which we read directory
blocks and the extent up to which we have read directory blocks. This function
is called with offset and it return the offset up to which we have read the
directory blocks. The offset passed to this function has to be the offset within
the extent. But, we were wrongly passing logical offset which can be greater
than extent size. As a effect the offset returned gets wrapped to 0. The caller
thinks that we have not read anything and hence the loop.
RESOLUTION:
Remove call to function which maintains buffer offsets for reading data. That
call was incorrect and redundant. We actually call that function correctly from
one of the functions above.
* 2521674 (Tracking ID: 2510903)
SYMPTOM:
Writing to clones loops permanently on HP-UX 11.31, there are some threads of
the typical stack like following:
vx_tranundo
vx_logged_cwrite
vx_write_clone
vx_write1
vx_rdwr
vno_rw
inline rwuio
write
syscall
DESCRIPTION:
A VxFS write with small size can go to logged write which stores the data in
intent log. The logged write can boost performance for small writes but requires
the write size within logged write limit. However, When we write data to check
points and the write length is greater than logged write limit, vxfs cannot
proceed with logged write and retry forever.
RESOLUTION:
Skipped the logged write if the write size exceeds the specific limit.
* 2551564 (Tracking ID: 2428964)
SYMPTOM:
Value of kernel tunable max_thread_proc gets incremented by 1 after every software
maintenance related activity (install, remove etc.) of VRTSvxfs package.
DESCRIPTION:
In the postinstall script for VRTSvxfs package, value of kernel tunable
max_thread_proc is wrongly increment by 1.
RESOLUTION:
From postinstall script increment operation of max_thread_proc tunable is removed.
* 2551576 (Tracking ID: 2526174)
SYMPTOM:
The vxfs_fcl_seektime function seeks to the first record in the FCL file that
has a timestamp greater than or equal to the specified time. FCL
vxfs_fcl_seektime()API can incorrectly return EINVAL(no records in FCL file
newer than specified time) error even though records after specified time are
present in FCL log.
DESCRIPTION:
Error EINVAL was returned when partial records exist in FCL log, because search
for FCL record was not continued till correct last offset. 'last offset' is
where search should stop. last offset is determined in binary search when record
newer than specified time is found. While doing binary search FCL partial
records are skipped. When record newer than specified time is found - length for
which partial records were found was not considered in last offset calculation.
Therefore last offset calculation was incorrect and hence FCL record search was
terminated earlier i.e. before record at specified time is found.
RESOLUTION:
if partial records are found during binary search, length of partial records is
now added to the last offset to get the correct last offset.
* 2561355 (Tracking ID: 2561334)
SYMPTOM:
System log file may contain following error message on multi-threaded
environment
with Dynamic Storage Tiers(DST).
UX:vxfs fsppadm: ERROR: V-3-26626: File Change Log IOTEMP and ACCESSTEMP index
creation failure for /vx/fsvm with message Argument list too long
DESCRIPTION:
In DST, while enforcing policy, SQL queries are generated and written to file
.__fsppadm_enforcesql present in lost+found. In multi threaded environment, 16
threads works in parallel on ILIST and geenrate SQL queries and write it to
file. This may lead to corruption of file, if multiple threads write to file
simultaneously.
RESOLUTION:
using flockfile() instead of adding new code to take lock on
.__fsppadm_enforcesq file descriptor before writing into it.
* 2564431 (Tracking ID: 2515459)
SYMPTOM:
Local mount hangs in vx_bc_binval_cookie like the following stack
delay
vx_bc_binval_cookie
vx_blkinval_cookie
vx_freeze_flush_cookie
vx_freeze_all
vx_freeze
vx_set_tunefs1
vx_set_tunefs
vx_aioctl_full
vx_aioctl_common
vx_aioctl
vx_ioctl
genunix:ioctl
unix:syscall_trap32
DESCRIPTION:
The hanging process for local mount is waiting for a buffer to be unlocked. But
that buffer can only be released if its associated cloned map writes get flushed.
But a necessary flush is missed.
RESOLUTION:
Add code to synchronize cloned map writes so that all the cloned maps will be
cleared and the buffers associated with them will be released.
* 2567091 (Tracking ID: 2527578)
SYMPTOM:
System crashes due to NULL pointer deference with the following stack -
simple_lock+000014 ()
vx_bhash_rele@AF161_63+00001C ()
vx_inode_deinit+0000D4 ()
vx_idrop+0002A4 ()
vx_detach_fset+000CC8 ()
vx_unmount+0001AC ()
vx_unmount_skey+000034 ()
vfs_unmount+000098 ()
kunmount+0000DC ()
uvmount+000208 ()
ovlya_addr_sc_flih_main+000130 ()
DESCRIPTION:
The crash happens as a result of accessing an address for which memory hasn't
been allocated. This address corresponds to a spinlock and therefore the crash
while locking the spinlock.
RESOLUTION:
Allocate and initialize the spinlock before locking.
* 2574396 (Tracking ID: 2433934)
SYMPTOM:
Performance degradation observed when CFS is used compared to standalone VxFS as
back-end NFS data servers.
DESCRIPTION:
In CFS, if one thread holding read-write lock on inode in exclusive mode, other
threads are stuck for the same inode, even if they want to access inode in shared
mode, resulting in performance degradation.
RESOLUTION:
Code is changed to avoid taking read-write lock for inode in exclusive mode, where
it is not required.
* 2581351 (Tracking ID: 2588593)
SYMPTOM:
df(1M) shows wrong usage value for volume when large file is deleted.
DESCRIPTION:
We maintain all freed extent size in the in core global variable and transaction
subroutine specific data structures.
After deletion of a large file, we were missing to update this in core global
variable. df(1M) while reporting
usage data, read the freed space information from this global variable which
contains stale information.
RESOLUTION:
Code is modified to account the freed extent data into global vaiable used by
df(1M) so that correct usage for volume is
reported by df(1M).
* 2587025 (Tracking ID: 2528819)
SYMPTOM:
AIX can fail to create new worker threads for VxFS. The following message is seen
in the system log-
"WARNING: msgcnt 175 mesg 097: V-2-97: vxfs failed to create new thread"
DESCRIPTION:
AIX is failing the thread creation because it cannot find a free slot in that
kproc and returning ENOMEM.
RESOLUTION:
Limit the maximum number of VxFS worker threads.
* 2587030 (Tracking ID: 2561739)
SYMPTOM:
When the file is created and the if the parent has default ACL entry then that
entry is not taken into account for calculating the class entry of that file. When
a separate dummy entry added we take into account the default entry from parent as
well.
e.g.
$ getacl .
# file: .
# owner: root
# group: sys
user::rwx
group::rwx
class:rwx
other:rwx
default:user:user_one:r-x
$ touch file1
$ getacl file1
# file: try1
# owner: root
# group: sys
user::rw-
user:user_one:r-x
group::rw-
class:rw- <------
other:rw-
The class entry here should be rwx.
DESCRIPTION:
We were not taking into account the default entry of parent. We share the
attribute inode with parent and do not create new attribute inode for newly
created file. But when an ACL entry is explicitly made we create the separate
attribute inode so the default entry also get copied in new inode and taken into
consideration while returning the class entry of file.
RESOLUTION:
Now before returning the ACL entry buffer we calculate the class entry again and
consider all the entries.
* 2587033 (Tracking ID: 2492304)
SYMPTOM:
"find" command displays duplicate directory entries.
DESCRIPTION:
Whenever the directory entries can fit in the inode's immediate area VxFS
doesn't allocate new directory blocks. As new directory entries get added to the
directory this immediate area gets filled and all the directory entries are
then moved to a newly allocated directory block.
The directory blocks have space reserved at the start of the block to hold the
block hash information which is used for fast lookup of entries in that block.
Offset of the directory entry, which was at say x bytes in the inode's immediate
area, when moved to the directory block, will be at (x + y) bytes. "y" is the
size of the block hash.
During this transition phase from immediate area to directory blocks, a
readdir() can report a directory entry more than once.
RESOLUTION:
Directory entry offsets returned to the "readdir" call are adjusted so that when
the entries move to a new block, they will be at the same offsets.
* 2602982 (Tracking ID: 2599590)
SYMPTOM:
Expansion of a 100% full file system may panic the machine with the following
stack trace.
bad_kern_reference()
$cold_vfault()
vm_hndlr()
bubbledown()
vx_logflush()
vx_log_sync1()
vx_log_sync()
vx_worklist_thread()
kthread_daemon_startup()
DESCRIPTION:
When 100% full file system is expanded intent log of the file system is
truncated and blocks freed up are used during the expansion. Due to a bug the
block map of the replica intent log inode was not getting updated thereby
causing the block maps of the two inodes to differ. This caused some of the in-
core structures of the intent log to go NULL. The machine panics while de-
referencing one of this structure.
RESOLUTION:
Updated the block map of the replica intent log inode correctly. 100% full file
system now can be expanded only If the last extent in the intent log contains
more than 32 blocks, otherwise fsadm will fail. To expand such a file-system,
some of the files should be deleted manually and resize be retried.
* 2603015 (Tracking ID: 2565400)
SYMPTOM:
Sequential buffered I/O reads are slow in performance.
DESCRIPTION:
Read-Aheads are not happening because the file-system's read-ahead size gets
incorrectly calculated.
RESOLUTION:
Fixed the incorrect typecast.
* 2621650 (Tracking ID: 2534693)
SYMPTOM:
A man page for vx_dexh_sz(5) tunable is not available.
DESCRIPTION:
A man page for vx_dexh_sz(5) tunable need to be created.
RESOLUTION:
A man page has been added for the vx_dexh_sz(5) tunable.
* 2631026 (Tracking ID: 2332314)
SYMPTOM:
Internal noise.fullfsck test with ODM enabled hit an assert fdd_odm_aiodone:3
DESCRIPTION:
In case of failed IO in fdd_write_clone_end() function, error was not set on
buffer which is causing the assert.
RESOLUTION:
Code is changed so we set the error on buffer in case of IO failures in
fdd_write_clone_end() function.
* 2631315 (Tracking ID: 2631276)
SYMPTOM:
Lookup fails for the file which is in partitioned directory and is being
accessed
using its vxfs namespace extension name.
DESCRIPTION:
If file is present in the partitioned directory and is accessed using its vxfs
namespace extension name then its name is searched in
one of the hidden leaf directory. This leaf directory mostly doesn't contains
entry
for this file. Due this lookup fails.
RESOLUTION:
Code has been modified to call partitioned directory related lookup routine at
upper
level so that lookup doesn't fails even if
file is accessed using its extended namespace name.
* 2635583 (Tracking ID: 2271797)
SYMPTOM:
Internal Noise Testing with locally mounted VxFS filesystem hit an assert
"f:vx_getblk:1a"
DESCRIPTION:
The assert is hit due to overlay inode is being marked with the flag regarding bad
copy of inode present on disk.
RESOLUTION:
Code is changed to set the flag regarding bad copy of inode present on disk, only
if the inode is not overlay.
* 2642027 (Tracking ID: 2350956)
SYMPTOM:
Internal noise test on locally mounted filesystem exited with error message
"bin/testit : Failed to full fsck cleanly, exiting" and in the logs we get the
userspace assert
"bmaptops.c 369: ASSERT(devid == 0 || (start == VX_HOLE && devid ==
VX_DEVID_HOLE)) failed".
DESCRIPTION:
The function bmap_data4_set() gets called while entering bmap allocation
information for typed extents of type VX_TYPE_DATA_4 or VX_TYPE_IADDR_4. The
assert expects that, either devid should be zero or if extent start is a hole,
then devid should be VX_DEVID_HOLE. However, we never have extent descriptors to
represent holes in typed extents. The assertion is incorrect.
RESOLUTION:
The assert corrected to check that extent start is not a hole and either devid is
zero, or extent start is VX_OVERLAY with devid being VX_DEVID_HOLE.
* 2654644 (Tracking ID: 2630954)
SYMPTOM:
During internal CFS stress reconfiguration testing, the fsck(1M)
command exits by hitting an assert.
DESCRIPTION:
When the dexh_getblk() function is executed, if the extent size is
greater than 256 Kbytes, the extent is divided into chunks of 256 Kbytes each.
When the extent of the hash inodes are read in the dexh_getblk() function, a
maximum of 256 Kbytes (MAXBUFSZ) is read at a time. Currently, the chunk size is
assigned as 256 Kbytes every time. But there is a bug when the last chunk in the
extent is less than 256 Kbytes because of which the length of the buffer is
assigned incorrectly and we get an aliased buffer in the buffer cache. Instead,
for the last chunk, the remaining size in the extent to be read should be
assigned as the chunk size.
RESOLUTION:
The code is modified so that the buffer size is calculated correctly.
* 2669195 (Tracking ID: 2326037)
SYMPTOM:
Internal Stress Test on cluster file system with clones failed while writing to
file with error ENOENT.
DESCRIPTION:
VxFS file-system trying to write to clone which is in process of removal. As clone
removal process works asynchronously, process starts to push changes from inode of
primary fset to inode of clone fset. But when actual write happens the inode of
clone fset is removed, hence error ENOENT is returned.
RESOLUTION:
Code is added to re-validate the inode being written.
Patch ID: PHKL_42228
* 2169326 (Tracking ID: 2169324)
SYMPTOM:
On LM , When clone is mounted for a file system and some quota is
assigned to clone. And if quota exceeds then clone is removed and if files from
clones are being accessed then assert may hit in function vx_idelxwri_off()
through vx_trunc_tran()
DESCRIPTION:
During clone removable, we go through the all inodes of the
clone(s) being removed and hit the assert because there is difference between
on-disk and in-core sizes for the file , which is being modified by the
application.
RESOLUTION:
While truncating files, if VX_IEPTTRUNC op is set,
set the in-core file size to on_disk file size.
* 2243061 (Tracking ID: 1296491)
SYMPTOM:
Performing a nested mount on a CFS file system triggers a data page fault if a
forced unmount is also taking place on the CFS file system. The panic stack
trace involves the following kernel routines:
vx_glm_range_unlock
vx_mount
domount
mount
syscall
DESCRIPTION:
When the underlying cluster mounted file system is in the process of unmounting,
the nested mount dereferences a NULL vfs structure pointer, thereby causing a
system panic.
RESOLUTION:
The code has been modified to prevent the underlying cluster file system from a
forced unmount when a nested mount above the file system, is in progress. The
ENXIO error will be returned to the forced unmount attempt.
* 2243063 (Tracking ID: 1949445)
SYMPTOM:
Hang when file creates were being performed on large directory. stack of hung
thread is similar to below:
vxglm:vxg_grant_sleep+226
vxglm:vxg_cmn_lock+563
vxglm:vxg_api_lock+412
vxfs:vx_glm_lock+29
vxfs:vx_get_ownership+70
vxfs:vx_exh_coverblk+89
vxfs:vx_exh_split+142
vxfs:vx_dexh_setup+1874
vxfs:vx_dexh_create+385
vxfs:vx_dexh_init+832
vxfs:vx_do_create+713
DESCRIPTION:
For large directories, Large Directory Hash(LDH) is enabled to improve lookup on
such large directories. Hang was due to taking ownership of LDH inode twice in
same thread context i.e. while building hash for directory.
RESOLUTION:
Avoid taking ownership again if we already have the ownership of the LDH inode.
* 2247299 (Tracking ID: 2161379)
SYMPTOM:
In a CFS enviroment various filesytems operations hang with the following stack
trace
T1:
vx_event_wait+0x40
vx_async_waitmsg+0xc
vx_msg_send+0x19c
vx_iread_msg+0x27c
vx_rwlock_getdata+0x2e4
vx_glm_cbfunc+0x14c
vx_glmlist_thread+0x204
T2:
vx_ilock+0xc
vx_assume_iowner+0x100
vx_hlock_getdata+0x3c
vx_glm_cbfunc+0x104
vx_glmlist_thread+0x204
DESCRIPTION:
Due to improper handling of the ENOTOWNER error in the iread receive function.
We continously retry the operation while holding an Inode Lock blocking
all other threads and causing a deadlock
RESOLUTION:
The code is modified to release the inode lock on ENOTOWNER error and acquire it
again, thus resolving the deadlock
There are totally 4 vx_msg_get_owner() caller with ilocked=1:
vx_rwlock_getdata() : Need Fix
vx_glock_getdata() : Need Fix
vx_cfs_doextop_iau(): Not using the owner for message loop, no need to fix.
vx_iupdat_msg() : Already has 'unlock/delay/lock' on ENOTOWNER
condition!
* 2257904 (Tracking ID: 2251223)
SYMPTOM:
The 'df -h' command can take 10 seconds to run to completion and yet still
report an inaccurate free block count, shortly after removing a large number
of files.
DESCRIPTION:
When removing files, some file data blocks are released and counted in the
total free block count instantly. However blocks may not always be freed
immediately as VxFS can sometimes delay the releasing of blocks. Therefore
the displayed free block count, at any one time, is the summation of the
free blocks and the 'delayed' free blocks.
Once a file 'remove transaction' is done, its delayed free blocks will be
eliminated and the free block count increased accordingly. However, some
functions which process transactions, for example a metadata update, can also
alter the free block count, but ignore the current delayed free blocks. As a
result, if file 'remove transactions' have not finished updating their free
blocks and their delayed free blocks information, the free space count can
occasionally show greater than the real disk space. Therefore to obtain an
up-to-date and valid free block count for a file system a delay and retry
loop was delaying 1 second before each retry and looping 10 times before
giving up. Thus the 'df -h' command can sometimes take 10 seconds, but
even if the file system waits for 10 seconds there is no guarantee that the
output displayed will be accurate or valid.
RESOLUTION:
The delayed free block count is recalculated accurately when transactions
are created and when metadata is flushed to disk.
* 2275543 (Tracking ID: 1475345)
SYMPTOM:
write() system call hangs for over 10 seconds
DESCRIPTION:
While performing a transactions in case of logged write we used to
asynchronously flush one buffer at a time belonging to the transaction space.
Such Asynchronous flushing was causing intermediate delays in write operation
because of reduced transaction space.
RESOLUTION:
Flush all the dirty buffers on the file in one attempt through synchronous
flush, which will free up a large amount of transaction space. This will reduce
the delay during write system call.
* 2289610 (Tracking ID: 2073336)
SYMPTOM:
vxfsstat does not reflect the change of vx_ninode after changing it with
kctune.
DESCRIPTION:
we haven't sync the vxfs_ninode and vxi_icache_maxino in our call
back function.
RESOLUTION:
add sync code in our callback function vx_ninode_callback
* 2311490 (Tracking ID: 2074806)
SYMPTOM:
a dmapi program using dm_punch_hole may result in corrupted data
DESCRIPTION:
When the dm_punch_hole call is made on a file with allocated
extents is used immediatly after a previous write then data can be written
through stale pages. This causes data to be written to the wrong location
RESOLUTION:
dm_punch_hole will now invalidate all the pages within the hole its
creating.
* 2329887 (Tracking ID: 2253938)
SYMPTOM:
In a Cluster File System (CFS) environment , the file read
performances gradually degrade up to 10% of the original
read performance and the fsadm(1M) -F vxfs -D -E
shows a large number (> 70%) of free blocks in
extents smaller than 64k.
For example,
% Free blocks in extents smaller than 64 blks: 73.04
% Free blocks in extents smaller than 8 blks: 5.33
DESCRIPTION:
In a CFS environment, the disk space is divided into
Allocation Units (AUs).The delegation for these AUs is
cached locally on the nodes.
When an extending write operation is performed on a file,
the file system tries to allocate the requested block from
an AU whose delegation is locally cached, rather than
finding the largest free extent available that matches the
requested size in the other AUs. This leads to a
fragmentation of the free space, thus leading to badly
fragmented files.
RESOLUTION:
The code is modified such that the time for which the
delegation of the AU is cached can be reduced using a
tuneable, thus allowing allocations from other AUs with
larger size free extents. Also, the fsadm(1M) command is
enhanced to de-fragment free space using the -C option.
* 2329893 (Tracking ID: 2316094)
SYMPTOM:
vxfsstat incorrectly reports "vxi_bcache_maxkbyte" greater than "vx_bc_bufhwm"
after reinitialization of buffer cache globals. reinitialization can happen in
case of dynamic reconfig operations.
vxfsstat's "vxi_bcache_maxkbyte" counter shows maximum memory available for
buffer cache buffers allocation. Maximum memory available for buffer allocation
depends on total memory available for Buffer cache(buffers + buffer headers)
i.e. "vx_bc_bufhwm" global. Therefore vxi_bcache_maxkbyte should never greater
than vx_bc_bufhwm.
DESCRIPTION:
"vxi_bcache_maxkbyte" is per-CPU counter i.e. part of global per-CPU 'vx_info'
counter structure. vxfsstat does sum of all per-cpu counters and reports result
of sum. During re-intitialation of buffer cache, this counter was not set to
zero properly before new value is assigned to it. Therefore total sum of this
per-CPU counter can be more than 'vx_bc_bufhwm'.
RESOLUTION:
During buffer cache re-initialization, "vxi_bcache_maxkbyte" is now correctly
set to zero such that final sum of this per-CPU counter is correct.
* 2340755 (Tracking ID: 2334061)
SYMPTOM:
When file system is mounted with tranflush option, operations requiring metadata
update take comparatively more time.
DESCRIPTION:
When VxFS file system is mounted with tranflush option, we flush transaction
metadata on the disk and wait for 100 milliseconds before
flushing next transaction. This delay is affecting severely operation of various
commands on the VxFS file system.
RESOLUTION:
Since the flushing is synchronous and is performed in loop, 100 milliseconds
delay is too much. To solve the problem, the delay is reduced to a more
reasonable 2 milliseconds value from 100 milliseconds.
* 2340799 (Tracking ID: 2059611)
SYMPTOM:
system panics because NULL tranp in vx_unlockmap().
DESCRIPTION:
vx_unlockmap is to unlock a map structure of file system. If the map is being
handled, we incremented the hold count. vx_unlockmap() attempts to check whether
this is an empty mlink doubly linked list while we have an async vx_mapiodone
routine which can change the link at unpredictable timing even though the hold
count is zero.
RESOLUTION:
evaluation order is changed inside vx_unlockmap(), such that
further evaluation can be skipped over when map hold count is zero.
* 2340802 (Tracking ID: 2129455)
SYMPTOM:
Lots of vxfs threads seen doing inactive processing.
DESCRIPTION:
There were 2 issues which can cause lots of vxfs threads doing inactive processing:
1, We used to spawn one inactive processing thread per inactive list. On high end
machines, we could see lots of threads doing inactive processing.
2, vx_inactive_started was bumped wrongly in vx_icache_process() instead of
vx_inactive_process() which could cause lots of inactive processing threads in
corner case.
RESOLUTION:
For the first issue, we can change that to max(ncpu/2, 8) number of threads at
one time that
will do inactive processing.
For the second issue, it gets fixed by bumping vx_inactive_started in
vx_inactive_process()
* 2340813 (Tracking ID: 2183320)
SYMPTOM:
VxFS mmap performance degredation on HP-UX 11.31.
DESCRIPTION:
While filling the pages in vx_alloc_getpage, various synchronous I/Os
are issued. These I/Os were being performed sequentially ie. we were issuing next
I/O only after first is finished. This caused performance problem in case of large
pages on HP.
RESOLUTION:
Problem is fixed by making a single chain of buffers and issuing I/Os
in parallel on all the buffers and then waiting for their completion instead of
waiting for completion of one I/O before issuing next as it was done previously.
* 2340817 (Tracking ID: 2192895)
SYMPTOM:
System panics when performing fcl commands at
unix:panicsys
unix:vpanic_common
unix:panic
genunix:vmem_xalloc
genunix:vmem_alloc
unix:segkmem_xalloc
unix:segkmem_alloc_vn
genunix:vmem_xalloc
genunix:vmem_alloc
genunix:kmem_alloc
vxfs:vx_getacl
vxfs:vx_getsecattr
genunix:fop_getsecattr
genunix:cacl
genunix:acl
unix:syscall_trap32
DESCRIPTION:
The acl count in inode can be corrupted due to race condition. For
example, setacl can change the acl count when getacl is processing the same
inode, which could cause a invalid use of acl count.
RESOLUTION:
Code is modified to add the protection for the vulnerable acl count to avoid
corruption.
* 2340831 (Tracking ID: 2272072)
SYMPTOM:
GAB panics the box because VCS engine "had" did not respond, the lbolt
wraps around.
DESCRIPTION:
The lbolt wraps around after 498 days machine uptime. In VxFS, we
flush VxFS meta data buffers based on their age. The age calculation happens
taking lbolt in account.
Due to lbolt wrapping the buffers were not flushed. So, a lot of metadata IO's
stopped and hence, the panic.
RESOLUTION:
In the function for handling flushing of dirty buffers, also handle
the condition if lbolt has wrapped. If it has then assign current lbolt time
to the last update time of dirtylist.
* 2340834 (Tracking ID: 2302426)
SYMPTOM:
System panics when multiple 'vxassist mirror' commands are running
concurrently with following stack strace:
0) panic+0x410
1) unaligned_hndlr+0x190
2) bubbleup+0x880 ( )
+------------- TRAP #1 ----------------------------
| Unaligned Reference Fault in KERNEL mode
| IIP=0xe000000000b03ce0:0
| IFA=0xe0000005aa53c114 <---
| p struct save_state 0x2c561031.0x9fffffff5ffc7400
+------------- TRAP #1 ----------------------------
LVL FUNC ( IN0, IN1, IN2, IN3, IN4, IN5, IN6, IN7 )
3) vx_copy_getemap_structs+0x70
4) vx_send_getemapmsg+0x240
5) vx_cfs_getemap+0x240
6) vx_get_freeexts_ioctl+0x990
7) vxportal_ioctl+0x4d0
8) spec_ioctl+0x100
9) vno_ioctl+0x390
10) ioctl+0x3c0
11) syscall+0x5a0
DESCRIPTION:
Panic is caused because of de-referencing an unaligned address in CFS message
structure.
RESOLUTION:
Used bcopy to ensure proper alignment of the addresses.
* 2340839 (Tracking ID: 2316793)
SYMPTOM:
Shortly after removing files in a file system commands like 'df', which use
'statfs()', can take 10 seconds to complete.
DESCRIPTION:
To obtain an up-to-date and valid free block count in a file system a delay and
retry loop was delaying 1 second before each retry and looping 10 times before
giving up. This unnecessarily excessive retying could cause a 10 second delay
per file system when executing the df command.
RESOLUTION:
The original 10 retries with a 1 second delay each, have been reduced to 1 retry
after a 20 millisecond delay, when waiting for an updated free block count.
* 2360819 (Tracking ID: 2337470)
SYMPTOM:
Cluster File System can unexpectedly and prematurely report a 'file system
out of inodes' error when attempting to create a new file. The error message
reported will be similar to the following:
vxfs: msgcnt 1 mesg 011: V-2-11: vx_noinode - /dev/vx/dsk/dg/vol file system out
of inodes
DESCRIPTION:
When allocating new inodes in a cluster file system, vxfs will search for an
available free inode in the 'Inode-Allocation-Units' [IAUs] that are currently
delegated to the local node. If none are available, it will then search the
IAUs that are not currently delegated to any node, or revoke an IAU delegated
to another node. It is also possible for gaps or HOLEs to be created in the IAU
structures as a side effect of the CFS delegation processing. However when
searching for an available free inode vxfs simply ignores any HOLEs it may find,
if the maximum size of the metadata structures has been reached (2^31) new IAUs
cannot be created, thus one of the HOLEs should then be populated and used for
new inode allocation. The problem occurred as HOLEs were being ignored,
consequently vxfs can prematurely report the "file system out of inodes" error
message even though there is plenty of free space in the vxfs file system to
create new inodes.
RESOLUTION:
New inodes will now be allocated from the gaps, or HOLEs, in the IAU structures
(created as a side effect of the CFS delegation processing). The HOLEs will be
populated rather than returning a 'file system out of inodes' error.
* 2360820 (Tracking ID: 2345626)
SYMPTOM:
File access to be denied on regular files that inherit the default group ACL
from the parent directory. When this behavior occurs, commands that attempt to
open an affected regular file will fail with a "cannot open", or similar message.
DESCRIPTION:
In case of files which do not have ACL's explicitly set, the file, shares ACLs
with its parent directory. Now when checking access for the file we need to read
only default ACL entries of parent directory. The default entries are stored
after the non-default entries in parent directory's inode.
We do count default entries correctly but we also need to advance the aclp
pointer to the start of the default entries.
Here is example in which file "file1" that inherits ACL entries from directory
"dir1" incorrectly denies access to a user in group "grp1":
# umask 007
# getacl /dir1
# file: /dir1
# owner: root
# group: sys
user::rwx
group::rwx
group:grp1:r-x
class:rwx
other:---
default:group:grp1:r-x
# touch /dir1/file1
# getacl /dir1/file1
# file: /dir1/file1
# owner: root
# group: sys
user::rw-
group::rw-
group:grp1:r-x #effective:r--
class:rw-
other:---
# getaccess -u user1 -g grp1 /dir1/file1
--- /dir1/file1
RESOLUTION:
For the file which shares ACL with parent directory read the ACL entries correctly.
* 2360821 (Tracking ID: 1956458)
SYMPTOM:
When attempting to check information of checkpoints by
fsckptadm -C blockinfo ,
the command failed with error 6 (ENXIO), the file system is disabled and some
errors come out in message file:
vxfs: msgcnt 4 mesg 012: V-2-12: vx_iget - /dev/vx/dsk/sfsdg/three file system
invalid inode number 4495
vxfs: msgcnt 5 mesg 096: V-2-96: vx_setfsflags - /dev/vx/dsk/sfsdg/three file
system fullfsck flag set - vx_cfs_iread
DESCRIPTION:
VxFS takes use of ilist files in primary fileset and checkpoints to accommodate
inode information. A hole in a ilist file indicates that inodes in the hole
don't exist and are not allocated yet in the corresponding fileset or
checkpoint.
fsckptadm will check every inode in the primary fileset and the downstream
checkpoints. If the inode falls into a hole in a prior checkpoint,
i.e. the associated file was not generated at the time of the checkpoint
creation, fsckptadm exits with error.
RESOLUTION:
Skip inodes in the downstream checkpoints, if these inodes are located in a
hole.
* 2368738 (Tracking ID: 2368737)
SYMPTOM:
If a file which has shared extents has corrupt indirect blocks, then in certain
cases the reference count tracking system can try to interpret this block and
panic the system. Since this is a asynchronous background operation, this
processing will retry repeatedly on every file system mount and hence can result
in panic every time the file system is mounted.
DESCRIPTION:
Reference count tracking system for shared extents updates reference count in a
lazy fashion. So in certain cases it asynchronously has to access shared
indirect blocks belonging to a file to account for reference count updates. But
due if this indirect block has been corrupted badly "a priori", then this
tracking mechanism can panic the system repeatedly on every mount.
RESOLUTION:
The reference count tracking system validates the read indirect extent from the
disk and in case it is not found valid sets VX_FULLFSCK flag in the superblock
marking it for full fsck and disables the file system on the current node.
* 2368788 (Tracking ID: 2343158)
SYMPTOM:
For tuning failure of vx_ninode in case of new value is less than
(250*vx_nfreelists), below message is displayed :
"vmunix: ERROR: mesg 112: V-2-112: The new value requires changes to Inode table
which can be made only after a reboot "
DESCRIPTION:
Message after this tuning failure is reported as "ERROR" which can confuse some
customers and create questions like ' whether system is safe to run after this
error'.
This tuning failure is not serious failure/error and therefore failure message
can be reported as WARNING and message itself can be modified to mention than
tuning requests has failed.
RESOLUTION:
Failure Message is modified such that it is reported as 'WANRING' instead of
'ERROR' and message will clearly mention about failure of tuning operation.
* 2371921 (Tracking ID: 2371910)
SYMPTOM:
mkfs fails to create VxFS with disk layout version 4
DESCRIPTION:
In 5.1SP1, the support for VxFS disk layout version (DLV) 4 has been
deprecated. This means, mkfs for DLV 4 will fail. However, DLV 4 can still be
mounted so that file system can be upgraded to the supported DLV using vxupgrade.
RESOLUTION:
On special request from HP, DLV4 has been un-deprecated for 5.1SP1RP2.
Hence, now it will be possible to create VxFS with DLV4 using mkfs in 5.1SP1RP2.
* 2371923 (Tracking ID: 2371909)
SYMPTOM:
In VxFS, Delete performance affected on Postmark test.
DESCRIPTION:
In VxFS, when function vx_delxwri_flush() is executed, all inode are flushed to
disk, which result in extra load while deleting files affecting delete performance..
RESOLUTION:
Code changed to mark the inodes which are last flushed and flush inodes which are
not flushed in previous run.
* 2373565 (Tracking ID: 2283315)
SYMPTOM:
System may panic when "fsadm -e" is run on a file system containing file level
snapshots. The panic stack looks like:
crash_kexec()
__die at()
do_page_fault()
error_exit()
[exception RIP: vx_bmap_lookup+36]
vx_bmap_lookup()
vx_bmap()
vx_reorg_emap()
vx_extmap_reorg()
vx_reorg()
vx_aioctl_full()
vx_aioctl_common()
vx_aioctl()
vx_ioctl()
do_ioctl()
vfs_ioctl()
sys_ioctl()
tracesys
DESCRIPTION:
The panic happened because of a NULL inode pointer passed to vx_bmap_lookup()
function. During reorganizing extents of a file, block map (bmap) lookup
operation is done on a file to get the information about the extents of the
file. If this bmap lookup finds a hole at an offset in a file containing shared
extents, a local variable is not updated that makes the inode pointer NULL
during the next bmap lookup operation.
RESOLUTION:
Initialized the local variable such that inode pointer passed to
vx_bmap_lookup() will be non NULL.
* 2386483 (Tracking ID: 2374887)
SYMPTOM:
Access to a file system can hang when creating a named attribute
due to a read/write lock being held exclusively and indefinitely
causing a thread to loop in vx_tran_nattr_dircreate()
A typical stacktrace of a looping thread:
vx_itryhold_locked
vx_iget
vx_attr_iget
vx_attr_kgeti
vx_attr_getnonimmed
vx_acl_inherit
vx_aclop_creat
vx_attr_creatop
vx_new_attr
vx_attr_inheritbuf
vx_attr_inherit
vx_tran_nattr_dircreate
vx_nattr_copen
vx_nattr_open
vx_setea
vx_linux_setxattr
vfs_setxattr
link_path_walk
sys_setxattr
system_call
DESCRIPTION:
The initial creation of a named attribute for a regular file
or directory will result in the automatic creation of a
'named attribute directory'. Creations are initially attempted
in a single transaction. Should the single transaction fail
due to a read/write lock being held then a retry should split
the task into multiple transactions. An incorrect reset of a
tracking structure meant that all retries were performed using
a single transaction creating an endless retry loop.
RESOLUTION:
The tracking structure is no longer reset within the retry loop.
* 2409792 (Tracking ID: 2373239)
SYMPTOM:
Performace issue pointing to read flush behind algorithm
DESCRIPTION:
While system is under memory pressure, the vxfs read flush behind algorithm may
invalidating pages we read ahead before we have a chance to consume it. The
invalidated pages must be re-read which lead to bad application performance.
Customer used adb to turn this feature off and did get some very good improvements.
RESOLUTION:
Keep a gap between the read flush offset and current read offset, the gap length
is fs_flush_size. Pages in this gap range will not be flushed which give user
application a chance to consume them.
* 2412029 (Tracking ID: 2384831)
SYMPTOM:
System panics with the following stack trace. This happens in some cases when
names streams are used in VxFS.
machine_kexec()
crash_kexec()
__die
do_page_fault()
error_exit()
[exception RIP: iput+75]
vx_softcnt_flush()
vx_ireuse_clean()
vx_ilist_chunkclean()
vx_inode_free_list()
vx_ifree_scan_list()
vx_workitem_process()
vx_worklist_process()
vx_worklist_thread()
vx_kthread_init()
kernel_thread()
DESCRIPTION:
VxFS internally creates a directory to keep the named streams pertaining to a
file. In some scenarios, an error code path is missing to release the hold on
that directory. Due to this unmount of the file system will not clean the inode
belonging to that directory. Later when VxFS reuses such a inode panic is seen.
RESOLUTION:
Release the hold on the named streams directory in case of an error.
* 2412173 (Tracking ID: 2383225)
SYMPTOM:
System panics during a user write with the following stack trace and with the
panic string "pfd_unlock: bad lock state!"
(panic+0x128)
(bad_kern_reference+0x64)
(vfault+0x1ec)
($0000009B+0xac)
($thndlr_rtn+0x0)
(vx_dio_rdwri+0xdc)
(vx_write_direct+0x2ec)
(vx_write1+0x13a8)
(vx_rdwr+0xa88)
(vno_rw+0x64)
(rwuio+0x11c)
(aio_rw_child_thread+0x178)
(aio_exec_req_thread+0x258)
(kthread_daemon_startup+0x24)
(kthread_daemon_startup+0x0)
DESCRIPTION:
When write to a file is handled as a direct I/O, user pages are pinned using
the pas_pin() interface provided by the OS before the I/O is issued. pas_pin()
interface can return ENOSPC error. VxFS Write code path has misinterpreted the
ENOSPC error and retried the write without resetting a variable in the uio
structure. System panics while dereferencing that variable of the uio structure
later.
RESOLUTION:
Do not retry the write when pas_pin() returns ENOSPC error.
* 2412177 (Tracking ID: 2371710)
SYMPTOM:
User quota file corruption occurs when DELICACHE feature is enabled, the current
usage of inodes of a user becomes negative after frequent file creations and
deletions. Checking quota info using command "vxquota -vu username", the number
of files is "-1" like:
# vxquota -vu testuser2
Disk quotas for testuser2 (uid 500):
Filesystem usage quota limit timeleft files quota limit
timeleft
/vol01 1127809 8239104 8239104 -1 0 0
DESCRIPTION:
This issue is introduced by the inode DELICACHE feature in 5.1SP1, it is a
performance enhancement to optimize the updates done to inode map during file
creations and deletions. The feature is enabled by default, and can be changed
by vxtunefs.
When DELICACHE is enabled and quota is set for vxfs, there will be an extra
quota update for the inodes on inactive list during removing process. Since
these inodes' quota has been updated already before put on delicache list, the
current number of user files gets decremented twice eventually.
RESOLUTION:
Add a flag to identify the inodes moved to inactive list from delicache list, so
that the flag can be used to prevent updating the quota again during removing
process.
* 2412179 (Tracking ID: 2387609)
SYMPTOM:
Quota usage gets set to ZERO when umount/mount the file system though files
owned by users exist. This issue may occur after some file creations and
deletions. Checking the quota usage using "vxrepquota" command and the output
would be like following:
# vxrepquota -uv /vx/sofs1/
/dev/vx/dsk/sfsdg/sofs1 (/vx/sofs1):
Block limits File limits
User used soft hard timeleft used soft hard timeleft
testuser1 -- 0 3670016 4194304 0 0 0
testuser2 -- 0 3670016 4194304 0 0 0
testuser3 -- 0 3670016 4194304 0 0 0
Additionally the quota usage may not be updated after inode/block usage reaches
ZERO.
DESCRIPTION:
The issue occurs when VxFS merges external per node quota files with internal
quota file. The block offset within external quota file could be calculated
wrongly in some scenario. When any hole found in per node quota file, the file
offset such that it points the next non-HOLE offset will be modified, but we
miss to change the block offset accordingly which points to the next available
quota record in a block.
VxFS updates per node quota records only when global internal quota file shows
either of some bytes or inode usage, otherwise it doesn't copy the usage from
global quota file to per node quota file. But for the case where quota usage in
external quota files has gone down to zero and both bytes and inode usage in
global file becomes zero, per node quota records would be not updated and left
with incorrect usage. It should also check bytes or inodes usage in per node
quota record. It should skip coping records only when bytes and inodes usage in
both global quota file and per node quota file is zero.
RESOLUTION:
Corrected the way to calculate the block offset when any hole is found in per
node quota file.
Added code to also check blocks or inodes usage in per node quota record while
updating user quota usage.
* 2413004 (Tracking ID: 2413036)
SYMPTOM:
In CFS environment with partitioned directory enabled (disk layout 8), trying to
create huge number of subdirectories within a directory hangs with the following
stacktrace
vx_pd_lookup+0008A8 ()
vx_init_pdnlink+0003D4 ()
vx_get_pdnlink+0004FC ()
vx_mkdir1_pd+00017C ()
vx_do_mkdir@AF53_26+000080 ()
vx_do_mkdir+00002C ()
vx_mkdir1@AF54_27+00004C ()
vx_mkdir1+00002C ()
vx_mkdir+0000D4 ()
DESCRIPTION:
With partitioned directory support, CFS can create MAXLINK - delta number of
subdirectories within a directory. The value of delta depends on number of nodes
present in the cluster. The delta value was not checked while detecting if the
limit for subdirectories is reached and thus retrying the operation endlessly.
RESOLUTION:
Code is modified to consider delta while checking the limit for links during mkdir.
* 2413010 (Tracking ID: 2413037)
SYMPTOM:
In VxFS system with partitioned directory enabled (disk layout 8) and accessed
through NFS, directory listing lists less number of entries
DESCRIPTION:
NFS uses a limited sized buffer to read directory entries and hence have to
invoke readdir call multiple times to read all the entries of a big directory.
NFS uses d_off field of the last directory entry in the buffer as the offset for
next readdir invocation. d_off field was not set properly resulting the
directory listing operation terminate prematurely.
RESOLUTION:
With partitioned directory support, entries of a namespace visible directory are
distributed across multiple hidden subdirectories. Code is modified so that
d_off field reflects the offset of an entry from the beginning of the namespace
visible directory rather than from the beginning of the hidden subdirectory.
* 2413015 (Tracking ID: 2413039)
SYMPTOM:
In VxFS system with partitioned directory enabled (disk layout 8) and accessed
through a read-only mount, in some particular scenario directory listing lists
less number of entries.
DESCRIPTION:
With partitioned directory support, all the entries of a namespace visible
directory are distributed across multiple hidden subdirectories. The
distribution happens during partitioning, and if the partitioning does not
complete for any reason, such as system crash, then some entries move to
respective hidden directories, but some still remain under the namespace visible
directory. Those entries may not get reported as part of directory listing if
the filesystem is mounted read only.
RESOLUTION:
Readdir code is modified to ensure such entries get reported even if the file
system is mounted read-only.
* 2418819 (Tracking ID: 2283893)
SYMPTOM:
In a Cluster File System (CFS) environment , the file read
performances gradually degrade up to 10% of the original
read performance and the fsadm(1M) -F vxfs -D -E
shows a large number (> 70%) of free blocks in
extents smaller than 64k.
For example,
% Free blocks in extents smaller than 64 blks: 73.04
% Free blocks in extents smaller than 8 blks: 5.33
DESCRIPTION:
In a CFS environment, the disk space is divided into
Allocation Units (AUs).The delegation for these AUs is
cached locally on the nodes.
When an extending write operation is performed on a file,
the file system tries to allocate the requested block from
an AU whose delegation is locally cached, rather than
finding the largest free extent available that matches the
requested size in the other AUs. This leads to a
fragmentation of the free space, thus leading to badly
fragmented files.
RESOLUTION:
The code is modified such that the time for which the
delegation of the AU is cached can be reduced using a
tuneable, thus allowing allocations from other AUs with
larger size free extents. Also, the fsadm(1M) command is
enhanced to de-fragment free space using the -C option.
* 2420060 (Tracking ID: 2403126)
SYMPTOM:
Hang is seen in the cluster when one of the nodes in the cluster leaves or
rebooted. One of the nodes in the cluster will contain the following stack trace.
e_sleep_thread()
vx_event_wait()
vx_async_waitmsg()
vx_msg_send()
vx_send_rbdele_resp()
vx_recv_rbdele+00029C ()
vx_recvdele+000100 ()
vx_msg_recvreq+000158 ()
vx_msg_process_thread+0001AC ()
vx_thread_base+00002C ()
threadentry+000014 (??, ??, ??, ??)
DESCRIPTION:
Whenever a node in the cluster leaves, reconfiguration happens and all the
resources that are held by the leaving nodes are consolidated. This is done on
one node of the cluster called primary node. Each node sends a message to the
primary node about the resources it is currently holding. During this
reconfiguration, in a corner case, VxFS is incorrectly calculating the message
length which is larger than what GAB(Veritas Group Membership and Atomic
Broadcast) layer can handle. As a result the message is getting lost. The sender
thinks that the message is sent and waits for acknowledgement. The message is
actually dropped at sender and never sent. The master node which is waiting for
this message will wait forever and the reconfiguration never completes leading
to hang.
RESOLUTION:
The message length calculation is done properly now and GAB can handle the messages.
* 2425429 (Tracking ID: 2422574)
SYMPTOM:
On CFS, after turning the quota on, when any node is rebooted and rejoins the
cluster, it fails to mount the filesystem.
DESCRIPTION:
At the time of mounting the filesystem after rebooting the node, mntlock was
already set, which didn't allow the remount of filesystem, if quota is on.
RESOLUTION:
Code is changed so that the mntlock flag is masked in quota operation as it's
already set on the mount.
* 2426039 (Tracking ID: 2412604)
SYMPTOM:
Once the time limit expires after exceeding the soft-limit of user quota size on
VxFS filesystem, writes are still permissible over that soft-limit.
DESCRIPTION:
After exceeding the soft-limit, in the initial setup of the soft-limit the timer
didn't use to start.
RESOLUTION:
Start the timer during the initial setting of quota limits if current usage has
already crossed the soft quota limits.
* 2427269 (Tracking ID: 2399228)
SYMPTOM:
Occasionally Oracle Archive logs can be created smaller than they should be,
in the reported case the resultant Oracle Archive logs were incorrectly sized
as 512 bytes.
DESCRIPTION:
The fcntl [file control] command F_FREESP [Free storage space] can be
utilised to change the size of a regular file. If the file size is reduced we
call it a "truncate", and space allocated in the truncated area will be
returned to the file system freespace pool. If the file size is increased using
F_FREESP we call it a "truncate-up", although the file size changes no space is
allocated in the extended area of the file.
Oracle archive logs utilize the F_FREESP fcntl command to perform a truncate-up
of a new file before a smaller write of 512 bytes [at the the start of the
file] is then performed. A timing window was found with F_FREESP which meant
that 'truncate-up' file size was lost, or rather overwritten, by the subsequent
write of the data, thus causing the file to appear with a size of just 512
bytes.
RESOLUTION:
A timing window has been closed whereby the flush of the allocating [512byte]
write was triggered after the new F_FREESP file size has been updated in the
inode.
* 2478237 (Tracking ID: 2384861)
SYMPTOM:
The following asserts are seen during internal stress and regression runs
f:vx_do_filesnap:1b
f:vx_inactive:2a
f:xted_check_rwdata:31
f:vx_do_unshare:1
DESCRIPTION:
These asserts validate some assumption in various function also there were some
miscellaneous issues which were seen during internal testing.
RESOLUTION:
The code has been modified to fix the internal reported issues which other
miscellaneous changes.
* 2482337 (Tracking ID: 2431674)
SYMPTOM:
panic in vx_common_msgprint() via vx_inactive()
DESCRIPTION:
The problem is that the call VX_CMN_ERR( ) , uses a "llx" format
character which vx_common_msgprint() doesn't understand. It gives up trying to
process that format, but continues on without consuming the corresponding
parameter.
Everything else in the parameter list is effectively shifted by 8 bytes, and
when we
get to processing the string argument, it's game over.
RESOLUTION:
Changed the format to "llu", which vx_common_msgprint() understands.
* 2486597 (Tracking ID: 2486589)
SYMPTOM:
Multiple threads may wait on a mutex owned by a thread that is in function
vx_ireuse_steal() with following stack trace on machine with severe inode pressure.
vx_ireuse_steal()
vx_ireuse()
vx_iget()
DESCRIPTION:
Several thread are waiting to get inodes from VxFS. The current number of inodes
reached max number of inodes (vxfs_ninode) that can be created in memory. So no
new allocations can be possible, which results in thread wait.
RESOLUTION:
Code is modified so that in such situation, threads return ENOINODE instead of
retrying to get inodes.
* 2494464 (Tracking ID: 2247387)
SYMPTOM:
Internal local mount noise.fullfsck.N4 test hit an assert vx_ino_update:2
With stack trace looking as below
panic: f:vx_ino_update:2
Stack Trace:
IP Function Name
0xe0000000023d5780 ted_call_demon+0xc0
0xe0000000023d6030 ted_assert+0x130
0xe000000000d66f80 vx_ino_update+0x230
0xe000000000d727e0 vx_iupdat_local+0x13b0
0xe000000000d638b0 vx_iupdat+0x230
0xe000000000f20880 vx_tflush_inode+0x210
0xe000000000f1fc80 __vx_fsq_flush___vx_tran.c__4096000_0686__+0xed0
0xe000000000f15160 vx_tranflush+0xe0
0xe000000000d2e600 vx_tranflush_threaded+0xc0
0xe000000000d16000 vx_workitem_process+0x240
0xe000000000d15ca0 vx_worklist_thread+0x7f0
0xe000000001471270 kthread_daemon_startup+0x90
End of Stack Trace
DESCRIPTION:
INOILPUSH flag is not set when inode is getting updated, which caused above
assert. The problem was creation and deletion of clone resets the INOILPUSH flag
and function vx_write1_fast() does not set the flag after updating the inode and
file.
RESOLUTION:
Code is modified so that if INOILPUSH flag is not set while function
vx_write1_fast(), then the flag is set in the function.
* 2496959 (Tracking ID: 2496954)
SYMPTOM:
Using vxtunefs(1M) command, tunable pdir_enable can be set to invalid values.
DESCRIPTION:
During sanity check, tunable pdir_enable is improperly checked. Due to it, invalid
values can be set to tunable pdir_enable,
RESOLUTION:
Code is changed to correct the error in the sanity check of pdir_enable.
* 2508164 (Tracking ID: 2481984)
SYMPTOM:
Access to the file system got hang.
DESCRIPTION:
In function 'vx_setqrec', it will call 'vx_dqget'. when 'vx_dqget' return
errors, it will try to unlock DQ structure using 'VX_DQ_CLUSTER_UNLOCK'. But, in
this situation, DQ structure doesn't hold the lock. hence, this hang happens.
RESOLUTION:
'dq_inval' would be set in 'vx_dqget' in case of any error happens in
'vx_dqget'. Skip unlocking DQ structure in the error code path of 'vx_setqrec',
if 'dq_inval' is set.
* 2529356 (Tracking ID: 2340953)
SYMPTOM:
During internal stress test, f:vx_iget:1a assert is seen.
DESCRIPTION:
While renaming certain file, we check if the target directory is in the path of
the source file to be renamed. while using function vx_iget() to reach till root
inode, one of parent directory incode number was 0 and hence the assert.
RESOLUTION:
Code is changed so that during renames, parent directory is first assigned correct
inode number before using vx_iget() function to retrieve root inode.
* 2559601 (Tracking ID: 476179)
SYMPTOM:
The most common symptom known so far involves corrupted IAU header(s), uncovered
by a full-fsck, as shown in the example below.
# fsck -F vxfs -o full /dev/vx/dsk/bigdg/bigvol
fileset 999, invalid magic number in primary IAU 196
log replay in progress
fileset 999, invalid magic number in primary IAU 196
fileset 999, invalid magic number in primary IAU 196
fileset 999, invalid magic number in primary IAU 196
pass0 - checking structural files
fileset 1 primary-ilist inode 64 (Primary IAU)
failed validation clear? (ynq)
...
DESCRIPTION:
The root cause of the issue is that the kernel routine 'vx_write_blk()'
incorrectly truncates a 64-bit block offset into a 32-bit block offset, ignoring
the high order bits of the offset. VxFS invokes this routine in several places to
write metadata, including inode allocation units (IAUs), extended attributes and
file change log. If this routine happens to write to a file system block mapping
to a greater than 4TB offset, the data will be incorrectly written to an offset 4TB
less.
RESOLUTION:
Code changes were made to remove truncation of 64 bit block offset to 32 bit.
* 2559801 (Tracking ID: 2429566)
SYMPTOM:
Memory used for VxFS internal buffer cache may significantly grow after 497 days
uptime when LBOLT(global which gives current system time) wraps over.
DESCRIPTION:
We calculate age of buffers based on LBOLT value. Like age = (current LBOLT -
LBOLT when buffer added to list). Buffer is reused when age becomes greater than
threshold.
When LBOLT wraps, current LBOLT becomes very small value and age becomes
negative. VxFS thinks that this is not old buffer and never reuses it. Buffer
cache memory usage increases as buffers are not reused.
RESOLUTION:
Now we check if the the LBOLT has wrapped around. If it is, we reassign the
buffer time with current LBOLT so that it gets reused after some time.
INSTALLING THE PATCH
--------------------
1.Installing VxFS 5.0 SP1RP1P1 patch:
a)If you install this patch on a CVM cluster, install it one
system at a time so that all the nodes are not brought down
simultaneously.
b)VxFS 5.0(GA) must be installed before applying these
patches.
c)To verify the VERITAS file system level, enter:
# swlist -l product | egrep -i 'VRTSvxfs'
VRTSvxfs 5.1.100.000 VERITAS File System
Note : VRTSfsman is a corequisite for VRTSvxfs.Hence VRTSfsman also
needs to be installed alongwith VRTSvxfs.
# swlist -l product | egrep -i 'VRTS'
VRTSvxfs 5.1.100.000 Veritas File System
d)All prerequisite/corequisite patches have to be installed.The Kernel patch
requires a system reboot for both installation and removal.
e)To install the patch, enter the following command:
# swinstall -x autoreboot=true -s PHCO_42719 PHKL_42718
Incase the patch is not registered, the patch can be registered
using the following command:
# swreg -l depot ,
where is the absolute path where the patch resides.
REMOVING THE PATCH
------------------
2.Removing VxFS 5.0 SP1RP1P1 patches:
a)To remove the patch, enter the following command:
# swremove -x autoreboot=true PHCO_42719 PHKL_42718
SPECIAL INSTRUCTIONS
--------------------
NONE
OTHERS
------
NONE