File System on Linux, patch detail

To use SORT, JavaScript must be enabled. How to enable JavaScript.
fs-rhel6_x86_64-5.1SP1RP3P1 Obsolete Go to Download Center to download. The latest patch(es) : fs-rhel6_x86_64-5.1SP1RP4P1 sfha-rhel6_x86_64-5.1SP1PR2RP4
Basic information
Release type:	P-patch
Release date:	2012-12-11
OS update support:	None
Technote:	None
Documentation:	None
Popularity:	4163 viewed downloaded
Download size:	6.6 MB
Checksum:	1553730481
Applies to one or more of the following products:
VirtualStore 5.1SP1PR2 On RHEL6 x86-64
Storage Foundation 5.1SP1PR2 On RHEL6 x86-64
Storage Foundation Cluster File System 5.1SP1PR2 On RHEL6 x86-64
Storage Foundation Cluster File System for Oracle RAC 5.1SP1PR2 On RHEL6 x86-64
Storage Foundation for Oracle RAC 5.1SP1PR2 On RHEL6 x86-64
Storage Foundation HA 5.1SP1PR2 On RHEL6 x86-64
Obsolete patches, incompatibilities, superseded patches, or other requirements:
This patch is obsolete. It is superseded by:	Release date
fs-rhel6_x86_64-5.1SP1RP4P1	2014-05-08
sfha-rhel6_x86_64-5.1SP1PR2RP4	2013-08-21
fs-rhel6_x86_64-5.1SP1RP3P1HF1 (obsolete)	2013-06-04
This patch supersedes the following patches:	Release date
fs-rhel6_x86_64-5.1SP1RP2P2 (obsolete)	2012-06-28
This patch requires:	Release date
sfha-rhel6_x86_64-5.1SP1PR2RP3 (obsolete)	2012-10-02
Fixes the following incidents:
2169326, 2243061, 2243063, 2243064, 2247299, 2249658, 2255786, 2257904, 2275543, 2280386, 2280552, 2296277, 2311490, 2320044, 2320049, 2329887, 2329893, 2338010, 2340741, 2340799, 2340817, 2340825, 2340831, 2340834, 2340839, 2341007, 2360817, 2360819, 2360821, 2368738, 2373565, 2386483, 2402643, 2412029, 2412169, 2412177, 2412179, 2412181, 2418819, 2420060, 2425429, 2425439, 2426039, 2427269, 2427281, 2430679, 2478237, 2478325, 2480949, 2482337, 2482344, 2484815, 2486597, 2494464, 2508164, 2521514, 2529356, 2726056, 2810119, 2851320, 2863998, 2900974, 2909496, 2915625, 2925919, 2929057, 3004467
Patch ID:
VRTSvxfs-5.1.133.100-SP1RP3P1_RHEL6
Readme file
                          * * * READ ME * * *
            * * * Veritas File System 5.1 SP1 PR2 RP3 * * *
                         * * * P-patch 1 * * *
                         Patch Date: 2012-11-23


This document provides the following information:

   * PATCH NAME
   * PACKAGES AFFECTED BY THE PATCH
   * BASE PRODUCT VERSIONS FOR THE PATCH
   * OPERATING SYSTEMS SUPPORTED BY THE PATCH
   * INCIDENTS FIXED BY THE PATCH
   * INSTALLATION PRE-REQUISITES
   * INSTALLING THE PATCH
   * REMOVING THE PATCH


PATCH NAME
----------
Veritas File System 5.1 SP1 PR2 RP3 P-patch 1


PACKAGES AFFECTED BY THE PATCH
------------------------------
VRTSvxfs


BASE PRODUCT VERSIONS FOR THE PATCH
-----------------------------------
   * Veritas Storage Foundation for Oracle RAC 5.1 SP1 PR2
   * Veritas Storage Foundation Cluster File System 5.1 SP1 PR2
   * Veritas Storage Foundation 5.1 SP1 PR2
   * Veritas Storage Foundation High Availability 5.1 SP1 PR2
   * Veritas Storage Foundation Cluster File System for Oracle RAC 5.1 SP1 PR2
   * Symantec VirtualStore 5.1 SP1 PR2


OPERATING SYSTEMS SUPPORTED BY THE PATCH
----------------------------------------
RHEL6 x86-64


INCIDENTS FIXED BY THE PATCH
----------------------------
This patch fixes the following Symantec incidents:

Patch ID: 5.1.133.100

* 3004467 (Tracking ID: 3004466)

SYMPTOM:
Installation of 5.1SP1RP3 fails on RHEL 6.3

DESCRIPTION:
Installation of 5.1SP1RP3 fails on RHEL 6.3

RESOLUTION:
Updated the install script to handle the installation failure.

Patch ID: 5.1.133.000

* 2726056 (Tracking ID: 2709869)

SYMPTOM:
The system panics with a redzone violation while releasing inodes File 
Input/Output (FIO) statistics structure and the following stack trace is 
displayed:

kmem_error()
kmem_cache_free_debug()
kmem_cache_free()
vx_fiostats_alloc()
fdd_common1()
fdd_odm_open()
odm_vx_open()
odm_ident_init()
odm_identify()
odmioctl()
fop_ioctl()
ioctl()

DESCRIPTION:
Different types of statistics are maintained when a file is accessed in Quick 
Input/Output (QIO) and non-QIO mode. Some common statistics are copied when the 
file access mode is changed from QIO to non-QIO or vice versa. While switching 
from QIO mode to non-QIO, the QIO statistics structure is freed and FIO 
statistics structure is allocated to maintain FIO file-level statistics. There 
is a race between the thread freeing the QIO statistics which also allocates 
the FIO statistics and the thread updating the QIO statistics when the file is 
opened in QIO mode. Thus, the FIO statistics gets corrupted as another thread 
writes to it assuming that the QIO statistics is allocated.

RESOLUTION:
The code is modified to protect the allocation/releasing of FIO/QIO statistics 
using the read-write lock/spin lock for file statistics structure.

* 2810119 (Tracking ID: 2597347)

SYMPTOM:
Fsck core dumps with the following stack -

_lwp_kill 
 abort   
 iget_i  
 iget    
 ag_ivalidate 
 ag_validate_repino 
 ag_olt_init
 ag_init 
 process_device 
 main    
 _start

DESCRIPTION:
This happens when one of the device record has been corrupted while the replica
device record is valid. Fsck should be able to reconstruct the corrupted record
with the valid record.

RESOLUTION:
Reconstruct the corrupted device record using the valid replica device record.

* 2851320 (Tracking ID: 2839871)

SYMPTOM:
On a system with DELICACHE enabled, several file system operations may hang 
with the following stack trace: 

vx_delicache_inactive 
vx_delicache_inactive_wp
vx_workitem_process 
vx_worklist_process
vx_worklist_thread
vx_kthread_init

DESCRIPTION:
The DELICACHE lock is used to synchronize the access to the DELICACHE list and 
it is held only while updating this list. However, in some cases it is held 
longer and is released only after the issued I/O is completed, causing other 
threads to  hang.

RESOLUTION:
The code is modified to release the spinlock before issuing a blocking I/O 
request.

* 2863998 (Tracking ID: 2884144)

SYMPTOM:
No Support for RHEL 6.3 present

DESCRIPTION:
Added Support for RHEL 6.3 , removed some deprecated function 
calls.

RESOLUTION:
Modified code to add support for RHEL 6.3

* 2900974 (Tracking ID: 2841059)

SYMPTOM:
The file system gets marked for a full fsck operation and the following message 
is displayed in the system log:

V-2-96: vx_setfsflags 
<volume name> file system fullfsck flag set - vx_ierror

vx_setfsflags+0xee/0x120 
vx_ierror+0x64/0x1d0 [vxfs]
vx_iremove+0x14d/0xce0 
vx_attr_iremove+0x11f/0x3e0
vx_fset_pnlct_merge+0x482/0x930
vx_lct_merge_fs+0xd1/0x120
vx_lct_merge_fs+0x0/0x120 
vx_walk_fslist+0x11e/0x1d0
vx_lct_merge+0x24/0x30 
vx_workitem_process+0x18/0x30
vx_worklist_process+0x125/0x290
vx_worklist_thread+0x0/0xc0
vx_worklist_thread+0x6d/0xc0
vx_kthread_init+0x9b/0xb0 

 V-2-17: vx_iremove_2
<volume name>: file system inode 15 marked bad incore

DESCRIPTION:
Due to a race condition, the thread tries to remove an attribute inode that has 
already been removed by another thread. Hence, the file system is marked for a 
full fsck operation and the attribute inode is marked as 'bad ondisk'.

RESOLUTION:
The code is modified to check if the attribute node that a thread is trying to 
remove has already been removed.

* 2909496 (Tracking ID: 2560244)

SYMPTOM:
Stack overflow occurred during stress test, causing system panic.

DESCRIPTION:
While trying to get the page, due to the memory pressure, a process 
overflowed the stack, causing system panic.

             In 'vx_getpages()', we've handoff code, based on current stack 
consumption. We do hand off  work only when  stack used so far is greater than
'vx_alloc_maxstack'. In this case, stack consumed is less than 
'vx_alloc_maxstack'. So, 'vx_alloc_maxstack' need to be appropriately adjusted 
to properly handoff to avoid stack overflow, ultimately avoiding panic.

RESOLUTION:
Adjusted the 'vx_alloc_maxstack' for proper handoff.

* 2915625 (Tracking ID: 1852788)

SYMPTOM:
During Internal fsync testing of a cluster mounted filesystem , a test 
assert is hit which checks consistencies of the on-disk and in-core AU summary 
information.

DESCRIPTION:
The test case issues a fsync operation and force umounts the 
filesystem. the fsync operation issues a cluster wide freeze and the force 
umount triggers re-config which elects a new primary. During the transfer of 
primary ship some of the incore AU summary maps do not get flushed to the disk 
causing the inconsistency leading to the assert.

RESOLUTION:
The code is modified to ensure that the incore structures are 
flushed to the disk correctly before migrating the primaryship of the node.

* 2925919 (Tracking ID: 2925918)

SYMPTOM:
Checkpoint promotion may lead to deadlock and having stack similar to this :

vx_init_overlay()
vx_reread_inode()
vx_assume_iowner()
vx_hlock_getdata()
vx_glm_cbfunc()
vx_glmlist_thread()
thread_start()

DESCRIPTION:
While doing the checkpoint promotion, ilist on secondary nodes may remain stale 
resulting infinite looping in vx_init_overlay() function.

RESOLUTION:
Incore ilist should be refreshed refreshed on secondary nodes while doing 
checkpoint promotion.

* 2929057 (Tracking ID: 2781444)

SYMPTOM:
Quota record mismatch occurred, causing system panic and asserts 
"f:vx_inactive_remove:1d" and "f:xted_validate_pnqtran:5".

DESCRIPTION:
For normal inodes belonging to 999 or 1000+ fset, their corresponding 
'i_fset' and 'i_qfset' point to the same fset; i.e. i_fset = i_qfset = fset. 
But, in case of structural inodes i.e. belonging to structural fileset (fset 1)
its 'i_fset' will point to the fset to which it belongs (1 fset), whereas its
'i_qfset' will point to the fset for which this inode is the quota inode.

         But, in function 'vx_iswapfs_list()', we modify 'i_qfset' for 
structural and reorg inodes. But an reorg inode(IFEMR) can be allocated for 
regular fsets too. So, the 'i_qfset' for reorg inodes, belonging to regular 
fset, may mismatch with 'i_fset', causing mentioned asserts. So we need to make 
sure that we do not update i_qfset of such inodes. So, in case of 'IFEMR' 
inodes, update its 'i_qfset' only if it is part of a structural fileset.

RESOLUTION:
While updating 'i_qfset', added a check that it is updated only in 
case 
of structural inodes.

Patch ID: 5.1.132.000

* 2169326 (Tracking ID: 2169324)

SYMPTOM:
On LM , When clone is mounted for a file system and some quota is 
assigned to clone. And if quota exceeds then clone is removed and if files from 
clones are being accessed then assert may hit in function vx_idelxwri_off() 
through vx_trunc_tran()

DESCRIPTION:
During clone removable, we go through the all inodes of the 
clone(s) being removed and hit the assert because there is difference between 
on-disk and in-core sizes for the file , which is being modified by the 
application.

RESOLUTION:
While truncating files, if VX_IEPTTRUNC op is set, 
set the in-core file size to on_disk file size.

* 2243061 (Tracking ID: 1296491)

SYMPTOM:
Performing a nested mount on a CFS file system triggers a data page fault if a
forced unmount is also taking place on the CFS file system. The panic stack
trace involves the following kernel routines:

vx_glm_range_unlock
vx_mount
domount 
mount 
syscall

DESCRIPTION:
When the underlying cluster mounted file system is in the process of unmounting,
the nested mount dereferences a NULL vfs structure pointer, thereby causing a
system panic.

RESOLUTION:
The code has been modified to prevent the underlying cluster file system from a
forced unmount when a nested mount above the file system, is in progress. The
ENXIO error will be returned to the forced unmount attempt.

* 2243063 (Tracking ID: 1949445)

SYMPTOM:
Hang when file creates were being performed on large directory. stack of hung
thread is similar to below:

vxglm:vxg_grant_sleep+226                                             
vxglm:vxg_cmn_lock+563
vxglm:vxg_api_lock+412                                             
vxfs:vx_glm_lock+29
vxfs:vx_get_ownership+70                                                  
vxfs:vx_exh_coverblk+89  
vxfs:vx_exh_split+142                                                 
vxfs:vx_dexh_setup+1874 
vxfs:vx_dexh_create+385                                              
vxfs:vx_dexh_init+832 
vxfs:vx_do_create+713

DESCRIPTION:
For large directories, Large Directory Hash(LDH) is enabled to improve lookup on
such large directories. Hang was due to taking ownership of LDH inode twice in
same thread context i.e. while building hash for directory.

RESOLUTION:
Avoid taking ownership again if we already have the ownership of the LDH inode.

* 2243064 (Tracking ID: 2111921)

SYMPTOM:
On linux platforms readv()/writev() performance with DIO/CIO can be more than 
2x slower on vxfs than on raw volumes.

DESCRIPTION:
In the current implementation of DIO  If there are multiple iovecs passed 
during IO [eg. readv()/writev()] , we do a DIO for each iovec in a loop. We 
cannot do coalescing of the given iovecs directly, because there is no 
guarantee that the user addresses are contiguous.

RESOLUTION:
We introduced a concept of Parrallel-DIO with this we can submit iovecs in the 
same extent together. By doing this and the use of the io_submit interface 
directly we are able to make use of linux's scatter gather enahancements. This 
change brings the readv() writev() performance in vxfs to the same as raw.

* 2247299 (Tracking ID: 2161379)

SYMPTOM:
In a CFS enviroment various filesytems operations hang with the following stack
trace
T1:
vx_event_wait+0x40
vx_async_waitmsg+0xc
vx_msg_send+0x19c
vx_iread_msg+0x27c
vx_rwlock_getdata+0x2e4
vx_glm_cbfunc+0x14c
vx_glmlist_thread+0x204

T2:
vx_ilock+0xc
vx_assume_iowner+0x100
vx_hlock_getdata+0x3c
vx_glm_cbfunc+0x104
vx_glmlist_thread+0x204

DESCRIPTION:
Due to improper handling of the ENOTOWNER error in the iread receive function.
We continously retry the operation while holding an Inode Lock blocking 
all other threads and causing a deadlock

RESOLUTION:
The code is modified to release the inode lock on ENOTOWNER error and acquire it
again, thus resolving the deadlock
 
There are totally 4 vx_msg_get_owner() caller with ilocked=1:
    vx_rwlock_getdata() : Need Fix
    vx_glock_getdata()  : Need Fix
    vx_cfs_doextop_iau(): Not using the owner for message loop, no need to fix.
    vx_iupdat_msg()     : Already has 'unlock/delay/lock' on ENOTOWNER 
condition!

* 2249658 (Tracking ID: 2220300)

SYMPTOM:
'vx_sched' hogs CPU resources.

DESCRIPTION:
vx_sched process calls vx_iflush_list() to perform the background flushing
processing. vx_iflush_list() calls vx_logwrite_flush() if the file has had
logged-writes performed upon it. vx_logwrite_flush() performs a old trick that
is ineffective when flushing in chunks. The trick is to flush the file
asynchronously, then flush the file again synchronously. This therefore flushes
the entire file twice, this is double the work when chunk flushing.

RESOLUTION:
vx_logwrite_flush() has been changed to flush the file once rather than twice.
So Removed asynchronous flush in vx_logwrite_flush().

* 2255786 (Tracking ID: 2253617)

SYMPTOM:
Fullfsck fails to run cleanly using "fsck -n".

DESCRIPTION:
In case of duplicate file name entries in one directory, fsck 
compares the directory entry with the previous entries. If the filename
already exists further action is taken according to the user input [Yes/No]. As 
we are using strncmp, it will compare first n characters, if it matches it will 
return success and will consider it as a duplicate file name entry and fails to 
run cleanly using "fsck -n"

RESOLUTION:
Checking the filename size and changing the length in strncmp to 
name_len + 1 solves the issue.

* 2257904 (Tracking ID: 2251223)

SYMPTOM:
The 'df -h' command can take 10 seconds to run to completion and yet still 
report an inaccurate free block count, shortly after removing a large number 
of files.

DESCRIPTION:
When removing files, some file data blocks are released and counted in the 
total free block count instantly. However blocks may not always be freed 
immediately as VxFS can sometimes delay the releasing of blocks. Therefore 
the displayed free block count, at any one time, is the summation of the 
free blocks and the 'delayed' free blocks. 

Once a file 'remove transaction' is done, its delayed free blocks will be 
eliminated and the free block count increased accordingly. However, some 
functions which process transactions, for example a metadata update, can also 
alter the free block count, but ignore the current delayed free blocks. As a 
result, if file 'remove transactions' have not finished updating their free 
blocks and their delayed free blocks information, the free space count can 
occasionally show greater than the real disk space. Therefore to obtain an 
up-to-date and valid free block count for a file system a delay and retry 
loop was delaying 1 second before each retry and looping 10 times before
giving up. Thus the 'df -h' command can sometimes take 10 seconds, but
even if the file system waits for 10 seconds there is no guarantee that the
output displayed will be accurate or valid.

RESOLUTION:
The delayed free block count is recalculated accurately when transactions 
are created and when metadata is flushed to disk.

* 2275543 (Tracking ID: 1475345)

SYMPTOM:
write() system call hangs for over 10 seconds

DESCRIPTION:
While performing a transactions in case of logged write we used to
asynchronously flush one buffer at a time belonging to the transaction space.
Such Asynchronous flushing was causing intermediate delays in write operation
because of reduced transaction space.

RESOLUTION:
Flush  all the dirty buffers on the file in one attempt through synchronous
flush, which will free up a large amount of transaction space. This will reduce
the delay during write system call.

* 2280386 (Tracking ID: 2061177)

SYMPTOM:
'fsadm -de' command erroring with 'bad file number' on filesystem(s) on
5.0MP3RP1.

DESCRIPTION:
<1>first, Our kernel fs doesn't have any problem. There is not corrupt layout in
their system. The metasave got from the customer is the proof (we can't
reproduce this problem and there is not corrupted inode in that metasave).
<2>second, As you know, fsadm is a application which has 2 parts: the
application part and the kernel part. The application part read layout from raw
disk to make strategy and the kernel part is to implement. So, For a buffer
write fs, there should be a problem that can't be avoided that is the sync
problem. In our customer's system, when they do fsadm -de, they also so huge of
write operation (they also have many check points and As you know more check
points more copy on write which means checkpoint will multi the write operation.
That why more checkpoints more problem).

RESOLUTION:
our solution is to add sync operation in fsadm before it read layout
from raw disk to avoid kernel and application un-sync.

* 2280552 (Tracking ID: 2246579)

SYMPTOM:
Filesystem corruptions and system panic when attempting to extend a 100%-full disk
layout version 5(DLV5) VxFS filesystem using fsadm(1M).

DESCRIPTION:
The behavior is caused by filesystem metadata that is relocated to the intent log
area inadvertently being destroyed when the intent log is cleared during the
resize operation.

RESOLUTION:
Refresh the incore intent log extent map by reading the bmap of the intent log
inode before clearing it.

* 2296277 (Tracking ID: 2296107)

SYMPTOM:
The fsppadm command (fsppadm query -a mountpoint ) displays ""Operation not
applicable" " while querying the mount point.

DESCRIPTION:
During fsppadm query process, fsppadm will try to open every file's named data
stream "" in the filesystem. but vxfs inernal file FCL: "changelog" doesn't
support this operation. "ENOSYS" is returned in this case. fsppadm will
translate "ENOSYS" into "Operation not applicable", and print the bogus error
message.

RESOLUTION:
Fix fsppadm's get_file_tags() to ignore the "ENOSYS" error.

* 2311490 (Tracking ID: 2074806)

SYMPTOM:
a dmapi program using dm_punch_hole may result in corrupted data

DESCRIPTION:
When the dm_punch_hole call is made on a file with allocated 
extents is used immediatly after a previous write then data can be written 
through stale pages. This causes data to be written to the wrong location

RESOLUTION:
dm_punch_hole will now invalidate all the pages within the hole its 
creating.

* 2320044 (Tracking ID: 2419989)

SYMPTOM:
ncheck(1M) command with '-i' option does not limit the output to the specified 
inodes.

DESCRIPTION:
Currently, ncheck(1M) command  with '-i' option currently shows free space
information and other inodes that are not in the list provides by '-i' option.

RESOLUTION:
ncheck(1M) command is modified to print only those inodes that are specified by
'-i' option.

* 2320049 (Tracking ID: 2419991)

SYMPTOM:
There is no way to specify an inode that is unique to the file 
system since we reuse inode numbers in multiple filesets.  We therefore would 
need to be able to specify a list of filesets similar to the '-i' option for 
inodes, or add a new '-o' option where you can specify fileset+inode pairs.

DESCRIPTION:
When ncheck command is called with '-i' option in conjunction with
-oblock/device/sector option, it displays inodes having same inode number from 
all
filesets. We don't have any command line option that helps us to specify a 
unique
inode and fileset combination.

RESOLUTION:
Code is modified to add '-f' option in ncheck command using which one could 
specify
the fset number on which one wants to filter the results.  Further, if this 
option
is used with '-i' option, we could uniquely specify the inode-fileset pair/s 
that
that we want to display.

* 2329887 (Tracking ID: 2253938)

SYMPTOM:
In a Cluster File System (CFS) environment , the file read
performances gradually degrade up to 10% of the original
read performance and the fsadm(1M) -F vxfs -D -E
<mount point> shows a large number (> 70%) of free blocks in
extents smaller than 64k.
For example,
% Free blocks in extents smaller than 64 blks: 73.04
% Free blocks in extents smaller than  8 blks: 5.33

DESCRIPTION:
In a CFS environment, the disk space is divided into
Allocation Units (AUs).The delegation for these AUs is
cached locally on the nodes.

When an extending write operation is performed on a file,
the file system tries to allocate the requested block from
an AU whose delegation is locally cached, rather than
finding the largest free extent available that matches the
requested size in the other AUs. This leads to a
fragmentation of the free space, thus leading to badly
fragmented files.

RESOLUTION:
The code is modified such that the time for which the
delegation of the AU is cached can be reduced using a
tuneable, thus allowing allocations from other AUs with
larger size free extents. Also, the fsadm(1M) command is
enhanced to de-fragment free space using the -C option.

* 2329893 (Tracking ID: 2316094)

SYMPTOM:
vxfsstat incorrectly reports "vxi_bcache_maxkbyte" greater than "vx_bc_bufhwm"
after reinitialization of buffer cache globals. reinitialization can happen in
case of dynamic reconfig operations. 

vxfsstat's "vxi_bcache_maxkbyte" counter shows maximum memory available for
buffer cache buffers allocation. Maximum memory available for buffer allocation
depends on total memory available for Buffer cache(buffers + buffer headers)
i.e. "vx_bc_bufhwm" global. Therefore vxi_bcache_maxkbyte should never greater
than vx_bc_bufhwm.

DESCRIPTION:
"vxi_bcache_maxkbyte" is per-CPU counter i.e. part of global per-CPU 'vx_info'
counter structure. vxfsstat does sum of all per-cpu counters and reports result
of sum. During re-intitialation of buffer cache, this counter was not set to
zero properly before new value is assigned to it. Therefore total sum of this
per-CPU counter can be more than 'vx_bc_bufhwm'.

RESOLUTION:
During buffer cache re-initialization, "vxi_bcache_maxkbyte" is now correctly
set to zero such that final sum of this per-CPU counter is correct.

* 2338010 (Tracking ID: 2337737)

SYMPTOM:
For Linux version 2.6.27 and onwards, a write() may never complete due to
forever looping of the write kernel thread, thus consuming most of the CPU
leading to system hang like situation.
A kernel stack of such looping write thread looks like -
<>
bad_to_user
vx_uiomove
vx_write_default
vx_write1
vx_rwsleep_unlock
vx_do_putpage
vx_write_common_slow
handle_mm_fault
d_instantiate
do_page_fault
vx_write_common
vx_prefault_uio_readable
vx_write
vfs_write
sys_write
system_call_fastpath
<>

DESCRIPTION:
Some pages created during a write() may be partially initialized and are
therefore destroyed. However due to a bug, the variable representing number of
bytes copied is not updated correctly to reflect this destroying of page. Thus
subsequent page-fault for the destroyed page occur at incorrect offset leading 
to indefinite looping of write kernel thread.

RESOLUTION:
Update correctly the number of bytes copied after destroying partially
initialized pages.

* 2340741 (Tracking ID: 2282201)

SYMPTOM:
On a VxFS filesystem, vxdump(1m) operation running in parallel with
other filesystem Operations like create, delete etc. can fail with signal
SIGSEGV generating a core file.

DESCRIPTION:
vxdump caches the inodes to be dumped in a bit map before starting the dump of a
directory, however this value can change if there are creates and deletes
happening in the background leading to inconsistent bit map eventually
generating a core file.

RESOLUTION:
The code is updated to refresh the inode bit map before actually starting the
dump operation thus avoiding the core file generation.

* 2340799 (Tracking ID: 2059611)

SYMPTOM:
system panics because NULL tranp in vx_unlockmap().

DESCRIPTION:
vx_unlockmap is to unlock a map structure of file system. If the map is being
handled, we incremented the hold count. vx_unlockmap() attempts to check whether
this is an empty mlink doubly linked list while we have an async vx_mapiodone
routine which can change the link at unpredictable timing even though the hold
count is zero.

RESOLUTION:
evaluation order is changed inside vx_unlockmap(), such that
further evaluation can be skipped over when map hold count is zero.

* 2340817 (Tracking ID: 2192895)

SYMPTOM:
System panics when performing fcl commands at

unix:panicsys
unix:vpanic_common
unix:panic
genunix:vmem_xalloc
genunix:vmem_alloc
unix:segkmem_xalloc
unix:segkmem_alloc_vn
genunix:vmem_xalloc
genunix:vmem_alloc
genunix:kmem_alloc
vxfs:vx_getacl
vxfs:vx_getsecattr
genunix:fop_getsecattr
genunix:cacl
genunix:acl
unix:syscall_trap32

DESCRIPTION:
The acl count in inode can be corrupted due to race condition. For
example, setacl can change the acl count when getacl is processing the same
inode, which could cause a invalid use of acl count.

RESOLUTION:
Code is modified to add the protection for the vulnerable acl count to avoid
corruption.

* 2340825 (Tracking ID: 2290800)

SYMPTOM:
When using fsdb to look at the map of ILIST file ("mapall" command), fsdb can
wrongly report a large hole at the end of ILIST file.

DESCRIPTION:
while reading bmap of ILIST file, if hole at the end of indirect extents is
found, fsdb may incorrectly end up marking the hole as the last extent in the
bmap, causing the mapall command to show a large hole till the end of file.

RESOLUTION:
Code has been modified to read ILIST file's bmap correctly when holes at the end
of indirect extents found, instead of marking that hole as the last extent of 
file.

* 2340831 (Tracking ID: 2272072)

SYMPTOM:
GAB panics the box because VCS engine "had" did not respond, the lbolt
wraps around.

DESCRIPTION:
The lbolt wraps around after 498 days machine uptime. In VxFS, we
flush VxFS meta data buffers based on their age. The age calculation happens
taking lbolt in account.

Due to lbolt wrapping the buffers were not flushed. So, a lot of metadata IO's
stopped and hence, the panic.

RESOLUTION:
In the function for handling flushing of dirty buffers, also handle 
the condition if lbolt has wrapped. If it has then assign current lbolt time
to the last update time of dirtylist.

* 2340834 (Tracking ID: 2302426)

SYMPTOM:
System panics when multiple 'vxassist mirror' commands are running 
concurrently with following stack strace:

0)  panic+0x410 
1)  unaligned_hndlr+0x190 
2)  bubbleup+0x880  ( ) 
+-------------  TRAP #1  ---------------------------- 
|  Unaligned Reference Fault in KERNEL mode 
|    IIP=0xe000000000b03ce0:0 
|    IFA=0xe0000005aa53c114    <--- 
|  p struct save_state 0x2c561031.0x9fffffff5ffc7400 
+-------------  TRAP #1  ---------------------------- 
LVL  FUNC  ( IN0, IN1, IN2, IN3, IN4, IN5, IN6, IN7 ) 
3)  vx_copy_getemap_structs+0x70 
4)  vx_send_getemapmsg+0x240 
5)  vx_cfs_getemap+0x240 
6)  vx_get_freeexts_ioctl+0x990
7)  vxportal_ioctl+0x4d0 
8)  spec_ioctl+0x100 
9)  vno_ioctl+0x390 
10) ioctl+0x3c0
11) syscall+0x5a0

DESCRIPTION:
Panic is caused because of de-referencing an unaligned address in CFS message
structure.

RESOLUTION:
Used bcopy to ensure proper alignment of the addresses.

* 2340839 (Tracking ID: 2316793)

SYMPTOM:
Shortly after removing files in a file system commands like 'df', which use
'statfs()', can take 10 seconds to complete.

DESCRIPTION:
To obtain an up-to-date and valid free block count in a file system a delay and
retry loop was delaying 1 second before each retry and looping 10 times before
giving up. This unnecessarily excessive retying could cause a 10 second delay
per file system when executing the df command.

RESOLUTION:
The original 10 retries with a 1 second delay each, have been reduced to 1 retry
after a 20 millisecond delay, when waiting for an updated free block count.

* 2341007 (Tracking ID: 2300682)

SYMPTOM:
When a file is newly created, issuing "fsppadm query -a /mount_point" could show
incorrect IOTemp information.

DESCRIPTION:
fsppadm query outputs incorrect data when the file re-uses the inode number
which belonged to a removed file, but the database still contains this obsolete
record for the removed one. 
 
fsppadm utility takes use of a database to save inodes' historical data. It
compares the nearest and the farthest records for an inode to compute IOTemp in
a time window. And it picks the generation of inode in the farthest record to
check the inode existence. However, if the farthest is missing, zero as the
generation is used mistakenly.

RESOLUTION:
If the nearest record for a given inode exists in database, we extract the
generation entry instead of that from the farthest one.

* 2360817 (Tracking ID: 2332460)

SYMPTOM:
Executing the VxFS 'vxedquota -p user1 user2' command to copy quota information
of one user to other users takes a long time to run to completion.

DESCRIPTION:
VxFS maintains quota information in two files - external quota files and
internal quota files. Whilst copying quota information of one user to another,
the required quota information is read from both the external and internal
files. However the quota information should only need to be read from the
external file in a situation where the read from the internal file has failed.
Reading from both files is therefore causing an unnecessary delay in the command
execution time.

RESOLUTION:
The unnecessary duplication of reading both the external and internal Quota
files to retrieve the same information has been removed.

* 2360819 (Tracking ID: 2337470)

SYMPTOM:
Cluster File System can unexpectedly and prematurely report a 'file system 
out of inodes' error when attempting to create a new file. The error message 
reported will be similar to the following:

vxfs: msgcnt 1 mesg 011: V-2-11: vx_noinode - /dev/vx/dsk/dg/vol file system out
of inodes

DESCRIPTION:
When allocating new inodes in a cluster file system, vxfs will search for an 
available free inode in the 'Inode-Allocation-Units' [IAUs] that are currently
delegated to the local node. If none are available, it will then search the 
IAUs that are not currently delegated to any node, or revoke an IAU delegated 
to another node. It is also possible for gaps or HOLEs to be created in the IAU 
structures as a side effect of the CFS delegation processing. However when 
searching for an available free inode vxfs simply ignores any HOLEs it may find,
if the maximum size of the metadata structures has been reached (2^31) new IAUs
cannot be created, thus one of the HOLEs should then be populated and used for 
new inode allocation. The problem occurred as HOLEs were being ignored, 
consequently vxfs can prematurely report the "file system out of inodes" error 
message even though there is plenty of free space in the vxfs file system to
create new inodes.

RESOLUTION:
New inodes will now be allocated from the gaps, or HOLEs, in the IAU structures 
(created as a side effect of the CFS delegation processing). The HOLEs will be 
populated rather than returning a 'file system out of inodes' error.

* 2360821 (Tracking ID: 1956458)

SYMPTOM:
When attempting to check information of checkpoints by 
fsckptadm -C blockinfo <pathname> <ckpt-name> <mountpoint>,
the command failed with error 6 (ENXIO), the file system is disabled and some
errors come out in message file:
vxfs: msgcnt 4 mesg 012: V-2-12: vx_iget - /dev/vx/dsk/sfsdg/three file system
invalid inode number 4495
vxfs: msgcnt 5 mesg 096: V-2-96: vx_setfsflags - /dev/vx/dsk/sfsdg/three file
system fullfsck flag set - vx_cfs_iread

DESCRIPTION:
VxFS takes use of ilist files in primary fileset and checkpoints to accommodate
inode information. A hole in a ilist file indicates that inodes in the hole
don't exist and are not allocated yet in the corresponding fileset or 
checkpoint. 

fsckptadm will check every inode in the primary fileset and the downstream
checkpoints. If the inode falls into a hole in a prior checkpoint,
i.e. the associated file was not generated at the time of the checkpoint
creation, fsckptadm exits with error.

RESOLUTION:
Skip inodes in the downstream checkpoints, if these inodes are located in a 
hole.

* 2368738 (Tracking ID: 2368737)

SYMPTOM:
If a file which has shared extents has corrupt indirect blocks, then in certain 
cases the reference count tracking system can try to interpret this block and 
panic the system. Since this is a asynchronous background operation, this 
processing will retry repeatedly on every file system mount and hence can result 
in panic every time the file system is mounted.

DESCRIPTION:
Reference count tracking system for shared extents updates reference count in a 
lazy fashion. So in certain cases it asynchronously has to access shared 
indirect blocks belonging to a file to account for reference count updates. But 
due if this indirect block has been corrupted badly "a priori", then this 
tracking mechanism can panic the system repeatedly on every mount.

RESOLUTION:
The reference count tracking system validates the read indirect extent from the 
disk and in case it is not found valid sets VX_FULLFSCK flag in the superblock 
marking it for full fsck and disables the file system on the current node.

* 2373565 (Tracking ID: 2283315)

SYMPTOM:
System may panic when "fsadm -e" is run on a file system containing file level 
snapshots. The panic stack looks like:

crash_kexec()
__die at() 
do_page_fault()
error_exit()
[exception RIP: vx_bmap_lookup+36]
vx_bmap_lookup()
vx_bmap()
vx_reorg_emap()
vx_extmap_reorg()
vx_reorg()
vx_aioctl_full()
vx_aioctl_common()
vx_aioctl()
vx_ioctl()
do_ioctl()
vfs_ioctl()
sys_ioctl()
tracesys

DESCRIPTION:
The panic happened because of a NULL inode pointer passed to vx_bmap_lookup() 
function. During reorganizing extents of a file, block map (bmap) lookup
operation is done on a file to get the information about the extents of the 
file. If this bmap lookup finds a hole at an offset in a file containing shared 
extents, a local variable is not updated that makes the inode pointer NULL
during the next bmap lookup operation.

RESOLUTION:
Initialized the local variable such that inode pointer passed to 
vx_bmap_lookup() will be non NULL.

* 2386483 (Tracking ID: 2374887)

SYMPTOM:
Access to a file system can hang when creating a named attribute 
due to a read/write lock being held exclusively and indefinitely 
causing a thread to loop in vx_tran_nattr_dircreate() 

A typical stacktrace of a looping thread:

vx_itryhold_locked
vx_iget
vx_attr_iget
vx_attr_kgeti
vx_attr_getnonimmed
vx_acl_inherit
vx_aclop_creat
vx_attr_creatop
vx_new_attr
vx_attr_inheritbuf
vx_attr_inherit
vx_tran_nattr_dircreate
vx_nattr_copen
vx_nattr_open
vx_setea
vx_linux_setxattr
vfs_setxattr
link_path_walk
sys_setxattr
system_call

DESCRIPTION:
The initial creation of a named attribute for a regular file 
or directory will result in the automatic creation of a 
'named attribute directory'. Creations are initially attempted 
in a single transaction. Should the single transaction fail 
due to a read/write lock being held then a retry should split
the task into multiple transactions. An incorrect reset of a 
tracking structure meant that all retries were performed using 
a single transaction creating an endless retry loop.

RESOLUTION:
The tracking structure is no longer reset within the retry loop.

* 2402643 (Tracking ID: 2399178)

SYMPTOM:
Full fsck does large directory index validation during pass2c. However, if
the number of large directories are more then this pass takes a lot of time. There
is huge scope to improve the full fsck performance during this pass.

DESCRIPTION:
Pass2c consists of following basic operations:-
[1] Read the entries in the large directory
[2] Cross check hash values of those entries with the hash directory inode
contents residing on the attribute ilist.

This means this is another heavy IO intensive pass.

RESOLUTION:
1.Use directory block read-ahead during Step [1].
2.Wherever possible, access the file contents extent-wise rather than in fs
block size (while reading entries in the directory) or in hash block size (8k,
during dexh_getblk)

Using above mentioned enhancements, the buffer cache can be utilized in better way.

* 2412029 (Tracking ID: 2384831)

SYMPTOM:
System panics with the following stack trace. This happens in some cases when 
names streams are used in VxFS.

machine_kexec()
crash_kexec()
__die
do_page_fault()
error_exit()
[exception RIP: iput+75]
vx_softcnt_flush()
vx_ireuse_clean()
vx_ilist_chunkclean()
vx_inode_free_list()
vx_ifree_scan_list()
vx_workitem_process()
vx_worklist_process()
vx_worklist_thread()
vx_kthread_init()
kernel_thread()

DESCRIPTION:
VxFS internally creates a directory to keep the named streams pertaining to a 
file. In some scenarios, an error code path is missing to release the hold on 
that directory. Due to this unmount of the file system will not clean the inode 
belonging to that directory. Later when VxFS reuses such a inode panic is seen.

RESOLUTION:
Release the hold on the named streams directory in case of an error.

* 2412169 (Tracking ID: 2371903)

SYMPTOM:
There is an extra empty line in "/proc/devices" when activating file system
checkpoints, like # tail -6 /proc/devices
199 VxVM
201 VxDMP
252 vxclonefs-0 
                   <<< There is a space here.  
253 device-mapper
254 mdp

DESCRIPTION:
vxfs improperly adds a new line character i.e. "\n", when composing device name
of clone. As a result, when the function "register_blkdev" is called with the
device name to register block device driver, an additional blank line is showed
in "/proc/devices".

RESOLUTION:
Remove the new line when creating the clone device.

* 2412177 (Tracking ID: 2371710)

SYMPTOM:
User quota file corruption occurs when DELICACHE feature is enabled, the current
usage of inodes of a user becomes negative after frequent file creations and
deletions. Checking quota info using command "vxquota -vu username", the number
of files is "-1" like:

    # vxquota -vu testuser2
   Disk quotas for testuser2 (uid 500):
   Filesystem     usage  quota  limit     timeleft  files  quota  limit     
timeleft
   /vol01       1127809 8239104 8239104                  -1      0      0

DESCRIPTION:
This issue is introduced by the inode DELICACHE feature in 5.1SP1, it is a
performance enhancement to optimize the updates done to inode map during file
creations and deletions. The feature is enabled by default, and can be changed
by vxtunefs. 

When DELICACHE is enabled and quota is set for vxfs, there will be an extra
quota update for the inodes on inactive list during removing process. Since
these inodes' quota has been updated already before put on delicache list, the
current number of user files gets decremented twice eventually.

RESOLUTION:
Add a flag to identify the inodes moved to inactive list from delicache list, so
that the flag can be used to prevent updating the quota again during removing
process.

* 2412179 (Tracking ID: 2387609)

SYMPTOM:
Quota usage gets set to ZERO when umount/mount the file system though files
owned by users exist. This issue may occur after some file creations and
deletions. Checking the quota usage using "vxrepquota" command and the output
would be like following:

# vxrepquota -uv /vx/sofs1/
/dev/vx/dsk/sfsdg/sofs1 (/vx/sofs1):
                     Block limits                      File limits
User          used   soft   hard    timeleft    used       soft    hard    timeleft
testuser1 --      0 3670016 4194304                   0      0      0
testuser2 --      0 3670016 4194304                   0      0      0
testuser3 --      0 3670016 4194304                   0      0      0

Additionally the quota usage may not be updated after inode/block usage reaches
ZERO.

DESCRIPTION:
The issue occurs when VxFS merges external per node quota files with internal
quota file. The block offset within external quota file could be calculated
wrongly in some scenario. When any hole found in per node quota file, the file
offset such that it points the next non-HOLE offset will be modified, but we
miss to change the block offset accordingly which points to the next available
quota record in a block.

VxFS updates per node quota records only when global internal quota file shows
either of some bytes or inode usage, otherwise it doesn't copy the usage from
global quota file to per node quota file. But for the case where quota usage in
external quota files has gone down to zero and both bytes and inode usage in
global file becomes zero, per node quota records would be not updated and left
with incorrect usage. It should also check bytes or inodes usage in per node
quota record. It should skip coping records only when bytes and inodes usage in
both global quota file and per node quota file is zero.

RESOLUTION:
Corrected the way to calculate the block offset when any hole is found in per
node quota file.
 
Added code to also check blocks or inodes usage in per node quota record while
updating user quota usage.

* 2412181 (Tracking ID: 2372093)

SYMPTOM:
New fsadm command options, to defragment a given percentage 
of the available freespace in a file system, have been introduced
as part of an initiative to help improve Cluster File System [CFS]
performance - the new additional command usage is as follows:

     fsadm -C -U <percentage-of-freespace> <mount_point>

We have since found that this new freespace defragmentation 
operation can sometimes hang (whilst it also continues to consume 
some cpu) in specific circumstances when executed on a Cluster 
mounted File System [CFS]

DESCRIPTION:
The hang can occur when file system metadata is being relocated.
In our example case the hang occurs whilst relocating inodes whose 
corresponding files are being actively updated via a different
node (from which the fsadm command is being executed) in the cluster.
During the relocation an error code path is taken due to an 
unexpected mismatch between temporary replica metadata, the code 
path then results in a deadlock, or hang.

RESOLUTION:
As there is no overriding need to relocate structural metadata for 
the purposes of defragmenting the available freespace, we have
chosen to simply leave all structural metadata where it is when 
performing this operation thus avoiding its relocation. The changes
required for this solution are therefore very low risk.

* 2418819 (Tracking ID: 2283893)

SYMPTOM:
In a Cluster File System (CFS) environment , the file read
performances gradually degrade up to 10% of the original
read performance and the fsadm(1M) -F vxfs -D -E
<mount point> shows a large number (> 70%) of free blocks in
extents smaller than 64k.
For example,
% Free blocks in extents smaller than 64 blks: 73.04
% Free blocks in extents smaller than  8 blks: 5.33

DESCRIPTION:
In a CFS environment, the disk space is divided into
Allocation Units (AUs).The delegation for these AUs is
cached locally on the nodes.

When an extending write operation is performed on a file,
the file system tries to allocate the requested block from
an AU whose delegation is locally cached, rather than
finding the largest free extent available that matches the
requested size in the other AUs. This leads to a
fragmentation of the free space, thus leading to badly
fragmented files.

RESOLUTION:
The code is modified such that the time for which the
delegation of the AU is cached can be reduced using a
tuneable, thus allowing allocations from other AUs with
larger size free extents. Also, the fsadm(1M) command is
enhanced to de-fragment free space using the -C option.

* 2420060 (Tracking ID: 2403126)

SYMPTOM:
Hang is seen in the cluster when one of the nodes in the cluster leaves or 
rebooted. One of the nodes in the cluster will contain the following stack trace.

e_sleep_thread()
vx_event_wait()
vx_async_waitmsg()
vx_msg_send()
vx_send_rbdele_resp()
vx_recv_rbdele+00029C ()
vx_recvdele+000100 ()
vx_msg_recvreq+000158 ()
vx_msg_process_thread+0001AC ()
vx_thread_base+00002C ()
threadentry+000014 (??, ??, ??, ??)

DESCRIPTION:
Whenever a node in the cluster leaves, reconfiguration happens and all the 
resources that are held by the leaving nodes are consolidated. This is done on
one node of the cluster called primary node. Each node sends a message to the 
primary node about the resources it is currently holding. During this 
reconfiguration, in a corner case, VxFS is incorrectly calculating the message
length which is larger than what GAB(Veritas Group Membership and Atomic
Broadcast) layer can handle. As a result the message is getting lost. The sender
thinks that the message is sent and waits for acknowledgement. The message is
actually dropped at sender and never sent. The master node which is waiting for
this message will wait forever and the reconfiguration never completes leading
to hang.

RESOLUTION:
The message length calculation is done properly now and GAB can handle the messages.

* 2425429 (Tracking ID: 2422574)

SYMPTOM:
On CFS, after turning the quota on, when any node is rebooted and rejoins the
cluster, it fails to mount the filesystem.

DESCRIPTION:
At the time of mounting the filesystem after rebooting the node, mntlock was
already set, which didn't allow the remount of filesystem, if quota is on.

RESOLUTION:
Code is changed so that the mntlock flag is masked in quota operation as it's
already set on the mount.

* 2425439 (Tracking ID: 2242630)

SYMPTOM:
Earlier distributions of Linux had a maximum size of memory that could be
allocated via vmalloc().  This throttled the maximum size of VxFS's hash tables,
and so limited the size of the inode and buffer caches.  RHEL5/6 and  SLES10/11 do
not have this limit.

DESCRIPTION:
Limitations in the Linux kernel used to limit the inode and buffer cache for VxFS.

RESOLUTION:
Code is changed to accommodate the change of limits in Linux kernel, hence
modified limits on inode and buffer cache for VxFS.

* 2426039 (Tracking ID: 2412604)

SYMPTOM:
Once the time limit expires after exceeding the soft-limit of user quota size on
VxFS filesystem, writes are still permissible over that soft-limit.

DESCRIPTION:
After exceeding the soft-limit, in the initial setup of the soft-limit the timer
didn't use to start.

RESOLUTION:
Start the timer during the initial setting of quota limits if current usage has
already crossed the soft quota limits.

* 2427269 (Tracking ID: 2399228)

SYMPTOM:
Occasionally Oracle Archive logs can be created smaller than they should be, 
in the reported case the resultant Oracle Archive logs were incorrectly sized
as 512 bytes.

DESCRIPTION:
The fcntl  [file control] command F_FREESP [Free storage space] can be 
utilised to change the size of a regular file. If the file size is reduced we 
call it a "truncate", and space allocated in the truncated area will be 
returned to the file system freespace pool. If the file size is increased using 
F_FREESP we call it a "truncate-up", although the file size changes no space is 
allocated in the extended area of the file.

Oracle archive logs utilize the F_FREESP fcntl command to perform a truncate-up
of a new file before a smaller write of 512 bytes [at the the start of the 
file] is then performed. A timing window was found with F_FREESP which meant 
that 'truncate-up' file size was lost, or rather overwritten, by the subsequent 
write of the data, thus causing the file to appear with a size of just 512 
bytes.

RESOLUTION:
A timing window has been closed whereby the flush of the allocating [512byte]
write was triggered after the new F_FREESP file size has been updated in the 
inode.

* 2427281 (Tracking ID: 2413172)

SYMPTOM:
vxfs_fcl_seektime() API seeks to the first record in the File change log(FCL)
file after specified time. This API can incorrectly return EINVAL(FCL record not
found)error while reading first block of FCL file.

DESCRIPTION:
To seek to the first record after the given time, first a binary search is 
performed to get the largest block offset where fcl record time is less than the
given time. Then a linear search from this offset is performed to find the first
record which has time value greater than specified time. 

FCL records are read in buffers. There can be scenarios where FCL records read
in one buffer are less than buffer size, e.g. reading first block of FCL file.
In such scenarios, buffer read can continue even when all data in current buffer
has been read. This is due to wrong check which decides if all records in one
buffer has been read.
Thus reading buffer beyond boundary was causing search to terminate without
finding record for given time and hence EINVAL error was returned.

Actually, VxFS should detect that it is partially filled buffer and the search
should continue reading the next buffer.

RESOLUTION:
Check which decides if all records in buffer have been read is corrected such
that buffer is read within its boundaries.

* 2430679 (Tracking ID: 1892045)

SYMPTOM:
A multitude of slab-1024 memory is consumed, which can be checked from
/proc/slabinfo. For example,
# cat /proc/slabinfo | grep "size-1024 "
size-1024          51676  51700   1024    4    1 : tunables   54   27    8 :
slabdata  12925  12925      0
where 51700 slabs are allocated with 1024 bytes unit size.

DESCRIPTION:
VxFS creates some fake inodes for various background and bookkeeping tasks for
which the kernel wants VxFS to have an inode that doesn't strictly have to be a
real file. The fake inode number is computed from NR_CPUS. If the kernel has a
large NR_CPUS, the slab-1024 will become significant accordingly.

RESOLUTION:
Allocate the fake inodes based on the kernel's routine num_possible_cpus() that
would help to reduce the slab-1024 number to thousands.

* 2478237 (Tracking ID: 2384861)

SYMPTOM:
The following asserts are seen during internal stress and regression runs 
f:vx_do_filesnap:1b
f:vx_inactive:2a
f:xted_check_rwdata:31
f:vx_do_unshare:1

DESCRIPTION:
These asserts validate some assumption in various function also there were some 
miscellaneous issues which were seen during internal testing.

RESOLUTION:
The code has been modified to fix the internal reported issues which other 
miscellaneous changes.

* 2478325 (Tracking ID: 2251015)

SYMPTOM:
Command fsck(1M) will take longer time to complete.

DESCRIPTION:
Command fsck(1M), in extreme case like 2TB file system with a 1KB block size, 130+
checkpoints, and 100-250 million inodes per file set takes 15+ hours
to complete intent log replay because it has to read a few GB of IAU headers and
summaries one synchronous block at a time.

RESOLUTION:
Changing the fsck code, to do read ahead on the IAU file reduces the fsck
log-replay time.

* 2480949 (Tracking ID: 2480935)

SYMPTOM:
System log file may contain following error message on multi-threaded environment
with Dynamic Storage Tiers(DST).
<snip>
UX:vxfs fsppadm: ERROR: V-3-26626: File Change Log IOTEMP and ACCESSTEMP index
creation failure for /vx/fsvm with message Argument list too long
</snip>

DESCRIPTION:
In DST, while enforcing policy, SQL queries are generated and written to file
.__fsppadm_enforcesql present in lost+found. In multi threaded environment, 16
threads works in parallel on ILIST and geenrate SQL queries and write it to
file. This may lead to corruption of file, if multiple threads write to file
simultaneously.

RESOLUTION:
Mutex is used to serialize writing of threads on SQL file.

* 2482337 (Tracking ID: 2431674)

SYMPTOM:
panic in vx_common_msgprint() via vx_inactive()

DESCRIPTION:
The problem is that the call VX_CMN_ERR( ) , uses a "llx" format 
character which vx_common_msgprint() doesn't understand.  It gives up trying to 
process that format, but continues on without consuming the corresponding 
parameter.  
Everything else in the parameter list is effectively shifted by 8 bytes, and 
when we 
get to processing the string argument, it's game over.

RESOLUTION:
Changed the format to "llu", which vx_common_msgprint() understands.

* 2482344 (Tracking ID: 2424240)

SYMPTOM:
In the case of deduplication, when FS block size is bigger and there is a partial
block match, we end up sharing the block anyway, resulting in corruption of files.

DESCRIPTION:
This is because to handle slivers we are rounding up the matched length to block
boundary.

RESOLUTION:
Code is changed so to round-down the length that we intend to dedup to align with
fs block size.

* 2484815 (Tracking ID: 2440584)

SYMPTOM:
System panic when force unmounting the file system. The backtrace looks like

machine_kexec
crash_kexec
__die
do_page_fault
error_exit
vx_sync
vx_sync_fs
sync_filesystems
do_sync
sys_sync
system_call

or

machine_kexec
crash_kexec
__die
do_page_fault
error_exit
sync_filesystems
do_sync
sys_sync
system_call

DESCRIPTION:
When we do force unmount, VxFS will free some memory structures which can still
be referenced by the code path of sync. Thus, there is a window that allows the
race between sync command and force unmount as shown in the first backtrace.

For the second panic, prior to the completion of force unmount, sync_filesystems
could invoke sync_fs (one member of super_operations) callback of VxFS which
however could be already set NULL by force unmount.

RESOLUTION:
vx_sync function will not do real sync if detecting a force unmount is in
process. Add dummy functions instead of NULL pointers to vxfs super_operations
during force unmount.

* 2486597 (Tracking ID: 2486589)

SYMPTOM:
Multiple threads may wait on a mutex owned by a thread that is in function
vx_ireuse_steal() with following stack trace on machine with severe inode pressure.
<snip>
vx_ireuse_steal()
vx_ireuse()
vx_iget()
</snip>

DESCRIPTION:
Several thread are waiting to get inodes from VxFS. The current number of inodes
reached max number of inodes (vxfs_ninode) that can be created in memory. So no
new allocations can be possible, which results in thread wait.

RESOLUTION:
Code is modified so that in such situation, threads return ENOINODE instead of
retrying to get inodes.

* 2494464 (Tracking ID: 2247387)

SYMPTOM:
Internal local mount noise.fullfsck.N4 test hit an assert vx_ino_update:2
With stack trace looking as below

<snip>
panic: f:vx_ino_update:2
Stack Trace:
  IP                  Function Name
  0xe0000000023d5780  ted_call_demon+0xc0
  0xe0000000023d6030  ted_assert+0x130
  0xe000000000d66f80  vx_ino_update+0x230
  0xe000000000d727e0  vx_iupdat_local+0x13b0
  0xe000000000d638b0  vx_iupdat+0x230
  0xe000000000f20880  vx_tflush_inode+0x210
  0xe000000000f1fc80  __vx_fsq_flush___vx_tran.c__4096000_0686__+0xed0
  0xe000000000f15160  vx_tranflush+0xe0
  0xe000000000d2e600  vx_tranflush_threaded+0xc0
  0xe000000000d16000  vx_workitem_process+0x240
  0xe000000000d15ca0  vx_worklist_thread+0x7f0
  0xe000000001471270  kthread_daemon_startup+0x90
End of Stack Trace
</snip>

DESCRIPTION:
INOILPUSH flag is not set when inode is getting updated, which caused above
assert. The problem was creation and deletion of clone resets the INOILPUSH flag
and function  vx_write1_fast() does not set the flag after updating the inode and
file.

RESOLUTION:
Code is modified so that if INOILPUSH flag is not set while function
vx_write1_fast(), then the flag is set in the function.

* 2508164 (Tracking ID: 2481984)

SYMPTOM:
Access to the file system got hang.

DESCRIPTION:
In function 'vx_setqrec', it will call 'vx_dqget'. when 'vx_dqget' return
errors, it will try to unlock DQ structure using 'VX_DQ_CLUSTER_UNLOCK'. But, in
this situation, DQ structure doesn't hold the lock. hence, this hang happens.

RESOLUTION:
'dq_inval' would be set in 'vx_dqget' in case of any error happens in
'vx_dqget'. Skip unlocking DQ structure in the error code path of 'vx_setqrec',
if 'dq_inval' is set.

* 2521514 (Tracking ID: 2177591)

SYMPTOM:
System panic in vx_softcnt_flush() with stack as below:

 #4 [ffff8104981a7d08] generic_drop_inode()
 #5 [ffff8104981a7d28] vx_softcnt_flush()
 #6 [ffff8104981a7d58] vx_ireuse_clean()
 #7 [ffff8104981a7d88] vx_ilist_chunkclean()
 #8 [ffff8104981a7df8] vx_inode_free_list()
 #9 [ffff8104981a7e38] vx_ifree_scan_list()
#10 [ffff8104981a7e48] vx_workitem_process()
#11 [ffff8104981a7e58] vx_worklist_process()
#12 [ffff8104981a7ed8] vx_worklist_thread()
#13 [ffff8104981a7ee8] vx_kthread_init()
#14 [ffff8104981a7f48] kernel_thread()

DESCRIPTION:
panic occured in iput()(vx_softcnt_flush()) which was called to drop the
softcount hold(i_count)
held by VxFS. Panic thread is cleaning an inode on freelist. Inode being cleaned
belongs to FS which is already unmounted. FS superblock structure is freed after
vxfs unmount returns
irrespective of unmount is successfull or failed as linux always expects umount
to succeed.

Unmount of this FS was not clean i.e. detach of this FS was failing with EBUSY
error. Detach of FS
can fail with EBUSY error if there are any busy inodes i.e. having pending
operations.
During an unmount operation, all the inodes belonging to that file system are
cleaned.
But as unmount of FS was not successful, inode processing of FS didn't
completed. When background thread
picked inodes of such unmounted FS, panic happened while accessing superblock
structure which was freed.

RESOLUTION:
Check for busy inodes while unmounting comes before processing all inodes in
inode cache. So this can leave inodes on freelist without their cleanup during
unmount.
Now after some failed attempt to detach fset, calling detach fset with force
option. This can be more aggressive in tearing down fileset and therefore should
help to
clear fest's inodes.

* 2529356 (Tracking ID: 2340953)

SYMPTOM:
During internal stress test, f:vx_iget:1a assert is seen.

DESCRIPTION:
While renaming certain file, we check if the target directory is in the path of
the source file to be renamed. while using function vx_iget() to reach till root
inode, one of parent directory incode number was 0 and hence the assert.

RESOLUTION:
Code is changed so that during renames, parent directory is first assigned correct
inode number before using vx_iget() function to retrieve root inode.


INSTALLING THE PATCH
--------------------
(RHEL6_x86_64)
# rpm -Uvh VRTSvxfs-5.1.133.100-SP1RP3P1_RHEL6.x86_64.rpm


REMOVING THE PATCH
------------------
rpm -e rpm_name


SPECIAL INSTRUCTIONS
--------------------
NONE


OTHERS
------
NONE