* * * READ ME * * * * * * Veritas File System 5.0.1 RP3 * * * * * * P-patch 3 * * * Patch Date: 2011-09-27 This document provides the following information: * PATCH NAME * PACKAGES AFFECTED BY THE PATCH * BASE PRODUCT VERSIONS FOR THE PATCH * OPERATING SYSTEMS SUPPORTED BY THE PATCH * INCIDENTS FIXED BY THE PATCH * INSTALLATION PRE-REQUISITES * INSTALLING THE PATCH * REMOVING THE PATCH PATCH NAME ---------- Veritas File System 5.0.1 RP3 P-patch 3 PACKAGES AFFECTED BY THE PATCH ------------------------------ VRTSvxfs BASE PRODUCT VERSIONS FOR THE PATCH ----------------------------------- * Veritas File System 5.0.1 OPERATING SYSTEMS SUPPORTED BY THE PATCH ---------------------------------------- HP-UX 11i v3 (11.31) INCIDENTS FIXED BY THE PATCH ---------------------------- This patch fixes the following Symantec incidents: Patch ID: PHKL_42311 * 2066168 (Tracking ID: 2060219) SYMPTOM: System panic with teh following stack unix:panicsys+0x48(0x109d900, 0x2a100e9b548, 0x18473d0, 0x1, , , 0x4480001602, , , , , , , , 0x109d900, 0x2a100e9b548) unix:vpanic_common+0x78(0x109d900, 0x2a100e9b548, 0x0, 0x30001e5a000, 0xa00, 0x0) unix:panic+0x1c(0x109d900, 0x181dc40, 0x600578983f8, 0x3000334c100, 0x2a100e9bca0, 0x0) unix:mutex_panic(0x181dc40, 0x600578983f8) - frame recycled unix:mutex_destroy+0x138(0x600578983f8) vxfs:vx_inode_mem_deinit+0x130(0x60057898000, 0x1) vxfs:vx_ilist_chunkdeinit+0x140(0x60057898000) vxfs:vx_inode_free_list+0x164(0x6003638be80, 0x258, 0x0, 0x32) vxfs:vx_ifree_scan_list+0xd0(0x6003d) vxfs:vx_workitem_process+0x10(0x6005ffb5d20, , , 0x2a100e9b998, 0x2a100e9b998) vxfs:vx_worklist_process+0x344(0x1, 0x0, 0x0) vxfs:vx_worklist_thread+0x94(0x7, 0x0) unix:thread_start+0x4() DESCRIPTION: Panic is caused due to a race between inode memory de-initialization and inactive list processing code. It is possible that we put an inode and free list and releases the ilist lock and is about to release dri_lock on the inode but meanwhile a freelist processing thread comes, gets this inode and tries to destroy the dri_lock which is still held. This would result in panic. RESOLUTION: Before de-initializing the inode make sure dri_lock is released to avoid the race. * 2347346 (Tracking ID: 2290800) SYMPTOM: When using fsdb to look at the map of ILIST file ("mapall" command), fsdb can wrongly report a large hole at the end of ILIST file. DESCRIPTION: while reading bmap of ILIST file, if hole at the end of indirect extents is found, fsdb may incorrectly end up marking the hole as the last extent in the bmap, causing the mapall command to show a large hole till the end of file. RESOLUTION: Code has been modified to read ILIST file's bmap correctly when holes at the end of indirect extents found, instead of marking that hole as the last extent of file. * 2351019 (Tracking ID: 2253938) SYMPTOM: In a Cluster File System (CFS) environment , the file read performances gradually degrade up to 10% of the original read performance and the fsadm(1M) -F vxfs -D -E shows a large number (> 70%) of free blocks in extents smaller than 64k. For example, % Free blocks in extents smaller than 64 blks: 73.04 % Free blocks in extents smaller than 8 blks: 5.33 DESCRIPTION: In a CFS environment, the disk space is divided into Allocation Units (AUs).The delegation for these AUs is cached locally on the nodes. When an extending write operation is performed on a file, the file system tries to allocate the requested block from an AU whose delegation is locally cached, rather than finding the largest free extent available that matches the requested size in the other AUs. This leads to a fragmentation of the free space, thus leading to badly fragmented files. RESOLUTION: The code is modified such that the time for which the delegation of the AU is cached can be reduced using a tuneable, thus allowing allocations from other AUs with larger size free extents. Also, the fsadm(1M) command is enhanced to de-fragment free space using the -C option. * 2405558 (Tracking ID: 1269443) SYMPTOM: System panics after fdd:fdd_getdev() unable to find device DESCRIPTION: System panics with the following stack: LVL FUNC ( IN0, IN1, IN2, IN3, IN4, IN5, IN6, IN7 ) 4) 0x_00000000 ( ) 5) newsp+0x130 ( 0x0, 0x0 ) 6) specvp+0x90 ( 0x0, 0x0 ) 7) fdd:fdd_getdev+0x80 ( 0x9fffffff7f547750, 0xe00000031c25e180, 0xe0000002756f7718 ) 8) fdd:fdd_common1+0x290 ( 0x9fffffff7f547750, 0xe00000031c25e180, 0xe00000039d174040, 0x4 ) 9) fdd:fdd_odm_open+0x1b0 ( 0x9fffffff7f5477b0, 0x10000003, 0xe00000039d174040 ) 10) odm:odm_vx_open+0xd0 ( 0x9fffffff7f5477b0, 0x3, 0xe00000039d174040 ) 11) odm:odm_ident_init+0x100 ( 0xe0000002756781c0, 0x0, 0x1ce388, 0x100, 0x3, 0x0, 0xe00000039f04cf00, 0x9fffffff7f547830 ) 12) odm:odm_identify+0x640 ( 0xe0000002c05a2080, 0x0 ) 13) odm:odmioctl+0x150 ( 0x5, 0xe0000002756788e8, 0xe0000002c05a2080, 0xe00000027af6f040, 0xe000000275678f74 ) 14) vno_ioctl+0x190 ( 0xe00000039e9a73c0, 0xe0000003698c5480, 0xe0000002c05a2080 ) 15) ioctl+0x200 ( ) 16) syscall+0x4e0 ( 0x36, 0x9fffffff7f547c00 ) The panic is caused by the race between v_rdev and i_sflag update in fdd_common1(). RESOLUTION: Move v_rdev and i_sflag update into one place to do update together under VX_ILOCK protection. * 2411980 (Tracking ID: 2383225) SYMPTOM: System panics during a user write with the following stack trace and with the panic string "pfd_unlock: bad lock state!" (panic+0x128) (bad_kern_reference+0x64) (vfault+0x1ec) ($0000009B+0xac) ($thndlr_rtn+0x0) (vx_dio_rdwri+0xdc) (vx_write_direct+0x2ec) (vx_write1+0x13a8) (vx_rdwr+0xa88) (vno_rw+0x64) (rwuio+0x11c) (aio_rw_child_thread+0x178) (aio_exec_req_thread+0x258) (kthread_daemon_startup+0x24) (kthread_daemon_startup+0x0) DESCRIPTION: When write to a file is handled as a direct I/O, user pages are pinned using the pas_pin() interface provided by the OS before the I/O is issued. pas_pin() interface can return ENOSPC error. VxFS Write code path has misinterpreted the ENOSPC error and retried the write without resetting a variable in the uio structure. System panics while dereferencing that variable of the uio structure later. RESOLUTION: Do not retry the write when pas_pin() returns ENOSPC error. * 2414975 (Tracking ID: 2283893) SYMPTOM: In a Cluster File System (CFS) environment , the file read performances gradually degrade up to 10% of the original read performance and the fsadm(1M) -F vxfs -D -E shows a large number (> 70%) of free blocks in extents smaller than 64k. For example, % Free blocks in extents smaller than 64 blks: 73.04 % Free blocks in extents smaller than 8 blks: 5.33 DESCRIPTION: In a CFS environment, the disk space is divided into Allocation Units (AUs).The delegation for these AUs is cached locally on the nodes. When an extending write operation is performed on a file, the file system tries to allocate the requested block from an AU whose delegation is locally cached, rather than finding the largest free extent available that matches the requested size in the other AUs. This leads to a fragmentation of the free space, thus leading to badly fragmented files. RESOLUTION: The code is modified such that the time for which the delegation of the AU is cached can be reduced using a tuneable, thus allowing allocations from other AUs with larger size free extents. Also, the fsadm(1M) command is enhanced to de-fragment free space using the -C option. * 2422560 (Tracking ID: 2373239) SYMPTOM: Performace issue pointing to read flush behind algorithm DESCRIPTION: While system is under memory pressure, the vxfs read flush behind algorithm may invalidating pages we read ahead before we have a chance to consume it. The invalidated pages must be re-read which lead to bad application performance. Customer used adb to turn this feature off and did get some very good improvements. RESOLUTION: Keep a gap between the read flush offset and current read offset, the gap length is fs_flush_size. Pages in this gap range will not be flushed which give user application a chance to consume them. * 2424797 (Tracking ID: 2251223) SYMPTOM: The 'df -h' command can take 10 seconds to run to completion and yet still report an inaccurate free block count, shortly after removing a large number of files. DESCRIPTION: When removing files, some file data blocks are released and counted in the total free block count instantly. However blocks may not always be freed immediately as VxFS can sometimes delay the releasing of blocks. Therefore the displayed free block count, at any one time, is the summation of the free blocks and the 'delayed' free blocks. Once a file 'remove transaction' is done, its delayed free blocks will be eliminated and the free block count increased accordingly. However, some functions which process transactions, for example a metadata update, can also alter the free block count, but ignore the current delayed free blocks. As a result, if file 'remove transactions' have not finished updating their free blocks and their delayed free blocks information, the free space count can occasionally show greater than the real disk space. Therefore to obtain an up-to-date and valid free block count for a file system a delay and retry loop was delaying 1 second before each retry and looping 10 times before giving up. Thus the 'df -h' command can sometimes take 10 seconds, but even if the file system waits for 10 seconds there is no guarantee that the output displayed will be accurate or valid. RESOLUTION: The delayed free block count is recalculated accurately when transactions are created and when metadata is flushed to disk. * 2429464 (Tracking ID: 2429566) SYMPTOM: Memory used for VxFS internal buffer cache may significantly grow after 497 days uptime when LBOLT(global which gives current system time) wraps over. DESCRIPTION: We calculate age of buffers based on LBOLT value. Like age = (current LBOLT - LBOLT when buffer added to list). Buffer is reused when age becomes greater than threshold. When LBOLT wraps, current LBOLT becomes very small value and age becomes negative. VxFS thinks that this is not old buffer and never reuses it. Buffer cache memory usage increases as buffers are not reused. RESOLUTION: Now we check if the the LBOLT has wrapped around. If it is, we reassign the buffer time with current LBOLT so that it gets reused after some time. * 2433431 (Tracking ID: 2372093) SYMPTOM: New fsadm command options, to defragment a given percentage of the available freespace in a file system, have been introduced as part of an initiative to help improve Cluster File System [CFS] performance - the new additional command usage is as follows: fsadm -C -U We have since found that this new freespace defragmentation operation can sometimes hang (whilst it also continues to consume some cpu) in specific circumstances when executed on a Cluster mounted File System [CFS] DESCRIPTION: The hang can occur when file system metadata is being relocated. In our example case the hang occurs whilst relocating inodes whose corresponding files are being actively updated via a different node (from which the fsadm command is being executed) in the cluster. During the relocation an error code path is taken due to an unexpected mismatch between temporary replica metadata, the code path then results in a deadlock, or hang. RESOLUTION: As there is no overriding need to relocate structural metadata for the purposes of defragmenting the available freespace, we have chosen to simply leave all structural metadata where it is when performing this operation thus avoiding its relocation. The changes required for this solution are therefore very low risk. * 2521701 (Tracking ID: 2521695) SYMPTOM: HPUX Debug kernel panics with spin_deadlock_failure panic string while enabling auxiliary swap space. Stack trace is as shown below. spinlock+0x50 vx_inactive+0x140 vx_vn_inactive+0x30 vn_rele_inactive+0x1e0 vx_dnlc_getpathname+0x12b0 DESCRIPTION: The debug kernel panics as it detect wrong lock ordering in taking spinlocks. RESOLUTION: Code is changed to correct lock ordering. * 2521790 (Tracking ID: 2510903) SYMPTOM: Writing to clones loops permanently on HP-UX 11.31, there are some threads of the typical stack like following: vx_tranundo vx_logged_cwrite vx_write_clone vx_write1 vx_rdwr vno_rw inline rwuio write syscall DESCRIPTION: A VxFS write with small size can go to logged write which stores the data in intent log. The logged write can boost performance for small writes but requires the write size within logged write limit. However, When we write data to check points and the write length is greater than logged write limit, vxfs cannot proceed with logged write and retry forever. RESOLUTION: Skipped the logged write if the write size exceeds the specific limit. * 2521915 (Tracking ID: 2515380) SYMPTOM: The ff command hangs and later it exits after program exceeds memory limit with following error. # ff -F vxfs /dev/vx/dsk/bernddg/testvol UX:vxfs ff: ERROR: V-3-24347: program limit of 30701385 exceeded for directory data block list UX:vxfs ff: ERROR: V-3-20177: /dev/vx/dsk/bernddg/testvol DESCRIPTION: 'ff' command lists all files on device of vxfs file system. In 'ff' command we do directory lookup. In a function we save the block addresses for a directory. For that we traverse all the directory blocks. Then we have function which keeps track of buffer in which we read directory blocks and the extent up to which we have read directory blocks. This function is called with offset and it return the offset up to which we have read the directory blocks. The offset passed to this function has to be the offset within the extent. But, we were wrongly passing logical offset which can be greater than extent size. As a effect the offset returned gets wrapped to 0. The caller thinks that we have not read anything and hence the loop. RESOLUTION: Remove call to function which maintains buffer offsets for reading data. That call was incorrect and redundant. We actually call that function correctly from one of the functions above. INSTALLING THE PATCH -------------------- To install the VxFS 5.0.1-11.31 patch: a) To install this patch on a CVM cluster, install it one system at a time so that all the nodes are not brought down simultaneously. b) The VxFS 11.31 pkg with revision 5.0.31.5 must be installed before applying the patch. c) To verify the VERITAS file system level, execute: # swlist -l product | egrep -i 'VRTSvxfs' VRTSvxfs 5.0.31.5 VERITAS File System Note: VRTSfsman is a corequisite for VRTSvxfs. So, VRTSfsman also needs to be installed with VRTSvxfs. # swlist -l product | egrep -i 'VRTS' VRTSvxfs 5.0.31.5 Veritas File System VRTSfsman 5.0.31.5 Veritas File System Manuals d)All prerequisite/corequisite patches have to be installed.The Kernel patch requires a system reboot for both installation and removal. e) To install the patch, execute the following command: # swinstall -x autoreboot=true -s PHKL_42311 PHCO_42312 If the patch is not registered, you can register it using the following command: # swreg -l depot The is the absolute path where the patch resides. REMOVING THE PATCH ------------------ To remove the VxFS 5.0.1-11.31 patch: a) Execute the following command: # swremove -x autoreboot=true PHKL_42311 PHCO_42312 SPECIAL INSTRUCTIONS -------------------- NO SPECIAL INSTRUCTIONS OTHERS ------ NONE