* * * READ ME * * * * * * Veritas File System 5.0 MP2 * * * * * * Rolling Patch 7 * * * Patch Date: 2012-08-13 This document provides the following information: * PATCH NAME * PACKAGES AFFECTED BY THE PATCH * BASE PRODUCT VERSIONS FOR THE PATCH * OPERATING SYSTEMS SUPPORTED BY THE PATCH * INCIDENTS FIXED BY THE PATCH * INSTALLATION PRE-REQUISITES * INSTALLING THE PATCH * REMOVING THE PATCH PATCH NAME ---------- Veritas File System 5.0 MP2 Rolling Patch 7 PACKAGES AFFECTED BY THE PATCH ------------------------------ VRTSvxfs VRTSfsman VRTSvxfs BASE PRODUCT VERSIONS FOR THE PATCH ----------------------------------- * Veritas File System 5.0 MP2 * Veritas Storage Foundation for Oracle RAC 5.0 MP2 * Veritas Storage Foundation Cluster File System 5.0 MP2 * Veritas Storage Foundation 5.0 MP2 * Veritas Storage Foundation High Availability 5.0 MP2 * Veritas Storage Foundation for Oracle 5.0 MP2 OPERATING SYSTEMS SUPPORTED BY THE PATCH ---------------------------------------- HP-UX 11i v2 (11.23) INCIDENTS FIXED BY THE PATCH ---------------------------- This patch fixes the following Symantec incidents: Patch ID: PHKL_43107, PHCO_43108, PHCO_43113 * 2114163 (Tracking ID: 2091103) SYMPTOM: CFS hangs with thread appears to be looping while allocating the space for file. DESCRIPTION: Currently in function vx_searchau() if VX_ERETRY error is received, it keeps on retrying indefinitely, resulting in hang. RESOLUTION: Code is changed to limit number of retries to 3. * 2551549 (Tracking ID: 2428964) SYMPTOM: Value of kernel tunable max_thread_proc gets incremented by 1 after every software maintenance related activity (install, remove etc.) of VRTSvxfs package. DESCRIPTION: In the postinstall script for VRTSvxfs package, value of kernel tunable max_thread_proc is wrongly increment by 1. RESOLUTION: From postinstall script increment operation of max_thread_proc tunable is removed. * 2564150 (Tracking ID: 2383225) SYMPTOM: During an internal testing of write operations using direct I/O, the system panics with the panic string "pfd_unlock: bad lock state!" and the following stack is displayed: vx_dio_rdwri+0xdc vx_write_direct+0x2ec vx_write1+0x13a8 vx_rdwr+0xa88 vno_rw+0x64 rwuio+0x11c aio_rw_child_thread+0x178 aio_exec_req_thread+0x258 Problem DESCRIPTION: The routine used to lock the user buffer while performing a Direct I/O does not handle the ENOSPC error correctly and passes an incorrect return value. This leads to the retrial of the I/O and an invalid User I/O structure, resulting in the panic. RESOLUTION: The code is modified so that the routine is updated to handle the ENOSPC errors. * 2587020 (Tracking ID: 2251223) SYMPTOM: The df(1M) command with the -h option takes 10 seconds to execute and reports an inaccurate free block count, shortly after a large number of files are removed. DESCRIPTION: When removing files, some of the file data blocks are released and counted in the total free block count instantly. However, all the blocks may not be freed immediately as Veritas File System (VxFS) can sometimes delay the releasing of blocks. Therefore, the displayed free block count, at any given time, is the total of the free blocks and the 'delayed' free blocks. Once a 'file remove' transaction is done, its 'delayed' free blocks are eliminated and the free block count increases accordingly. However, some functions which process certain transactions, for example a metadata update, can also alter the free block count, but ignore the current 'delayed' free blocks. As a result, if the 'file remove' transactions have not finished updating their free blocks and their 'delayed' free blocks information, the free space count can occasionally show more than the real disk space. Therefore, to obtain an up-to-date and valid free block count for a file system, a delay and retry loop delays 1 second before each retry and loops 10 times before giving up. Thus, the df(1M) command with the -h option sometimes takes 10 seconds to execute. But even if the file system waits for 10 seconds, there is no guarantee that the output displayed will be accurate or valid. RESOLUTION: The code is modified so that the delayed free block count is recalculated accurately when transactions are created and metadata is flushed to the disk. * 2587026 (Tracking ID: 2561739) SYMPTOM: When the file is created and the if the parent has default ACL entry then that entry is not taken into account for calculating the class entry of that file. When a separate dummy entry added we take into account the default entry from parent as well. e.g. $ getacl . # file: . # owner: root # group: sys user::rwx group::rwx class:rwx other:rwx default:user:user_one:r-x $ touch file1 $ getacl file1 # file: try1 # owner: root # group: sys user::rw- user:user_one:r-x group::rw- class:rw- <------ other:rw- The class entry here should be rwx. DESCRIPTION: We were not taking into account the default entry of parent. We share the attribute inode with parent and do not create new attribute inode for newly created file. But when an ACL entry is explicitly made we create the separate attribute inode so the default entry also get copied in new inode and taken into consideration while returning the class entry of file. RESOLUTION: Now before returning the ACL entry buffer we calculate the class entry again and consider all the entries. * 2587032 (Tracking ID: 2492304) SYMPTOM: "find" command displays duplicate directory entries. DESCRIPTION: Whenever the directory entries can fit in the inode's immediate area VxFS doesn't allocate new directory blocks. As new directory entries get added to the directory this immediate area gets filled and all the directory entries are then moved to a newly allocated directory block. The directory blocks have space reserved at the start of the block to hold the block hash information which is used for fast lookup of entries in that block. Offset of the directory entry, which was at say x bytes in the inode's immediate area, when moved to the directory block, will be at (x + y) bytes. "y" is the size of the block hash. During this transition phase from immediate area to directory blocks, a readdir() can report a directory entry more than once. RESOLUTION: Directory entry offsets returned to the "readdir" call are adjusted so that when the entries move to a new block, they will be at the same offsets. * 2604147 (Tracking ID: 2559450) SYMPTOM: Command fsck_vxfs(1m) may core-dump with SEGV_ACCERR error. DESCRIPTION: Command fsck_vxfs(1M) is trying to allocate the memory for a structure, but memory allocation fails for the structure resulting in the segmentation fault. RESOLUTION: A check is added in the code for failed memory allocation to print an error message instead of a core dump. * 2800272 (Tracking ID: 2376382) SYMPTOM: vxrestore man page does not specify that vxrestore will fail if -b option is used while taking dump and block size used is greater than default max i.e. 63 at the same time -b option is not used while restoring from dump. DESCRIPTION: If vxdump is used to take the dump and -b option is used with block size greater than default max i.e. 63 then vxrestore fails to restore this dump if used without -b options. This is because vxrestore attempts to dynamically determine the tape block size , up to default maximum of 63. This is not mentioned in the man page of vxrestore man page. RESOLUTION: Added "vxrestore will attempt to dynamically determine the tape block size , upto the default maximum of 63.So, if -b option is used when creating a dump, but not used when restoring the dump, the restore will fail when the tape block size is specified to be greater than 63." to -b option of vxrestore man page. * 2800276 (Tracking ID: 2566875) SYMPTOM: The write(2) operation exceeding the quota limit fails with an EDQUOT error ("Disc quota exceeded") before the user quota limit is reached. DESCRIPTION: When a write request exceeds a quota limit, the EDQUOT error is handled so that Veritas File System (VxFS) can allocate space up to the hard quota limit to proceed with a partial write. However, VxFS does not handle this error and an erroris returned without performing a partial write. RESOLUTION: The code is modified to handle the EDQUOT error from the extent allocation routine. * 2800290 (Tracking ID: 2696067) SYMPTOM: When a getaccess() command is issued on a file which inherits the default Access Control List (ACL) entries from the parent, it shows incorrrect group object permissions. DESCRIPTION: If a newly created file leverages the ACL entries of its parent directory, the vx_daccess() function does not fabricate any GROUP_OBJ entry unlike the vx_do_getacl() function. RESOLUTION: The code is modified to fabricate a GROUP_OBJ entry. * 2800329 (Tracking ID: 2680946) SYMPTOM: The ls (1M), find(1M) or other lookup operations can trigger a panic with the following stack trace: spinlock+0x40 vx_itryhold+0x40 vx_dnlc_lookup+0x1b0 vx_cbdnlc_lookup+0x130 vx_fast_lookup+0x120 vx_lookup+0x3c0 lookuppnvp+0x2d0 lookuppn+0x90 lookupname+0x60 vn_open+0xa0 copen+0x170 open+0x80 syscall+0x920 DESCRIPTION: The panic is triggered because of a NULL pointer de-reference while inserting an entry in the Data Name Lookup Cache (DNLC) which has a NULL child pointer. RESOLUTION: The code is modified to include a global variable which is incremented when a DNLC with NULL child pointer gets inserted. A preventive code is added to avoid the panic and such occurrences are tracked using the global variable. * 2800330 (Tracking ID: 1092933) SYMPTOM: The system may panic in the vx_fsync_chains() function when it tries to sleep in the interrupt context. The following stack trace is displayed: vx_event_wait vx_delay2 vx_fsync_chains vx_disable vx_dataioerr vx_pageio_done DESCRIPTION: While handling the external interrupt, the vx_pageio_done() function calls the function vx_fsync_chains() function. The vx_fsync_chains() function may sleep during the execution. The function vx_fsync_chains() is required in the case of Input/output (IO) errors. The function vx_fsync_chains() is called at a couple of places, when the I/O strategy fails. But, the error variable is overwritten improperly. RESOLUTION: The code is modified to save Input/Output errors so that the vx_fsync_chains() function can be called and the call to vx_fsync_chains() is removed from the vx_disable(). * 2834283 (Tracking ID: 2740939) SYMPTOM: The unmounting of a file system can cause a Transfer of Control (TOC) on an HP 11.23 service guard environment with many threads. The following stack trace is displayed: spinunlock() vx_worklist_process() vx_worklist_thread() kthread_daemon_startup() DESCRIPTION: The TOC is caused because of a heavy spin-lock contention on a Veritas File System (VxFS) spin-lock. During the unmount of a VxFS file system, worker threads are created to scan the inode cache and remove the inodes related to the mount point. The number of worker threads is calculated based on the number of hash-lists in the inode cache. The number of hash-lists in the inode cache is calculated based on the system memory rather than the tuned vxfs_ninode value. If the system memory is huge and the tuned vxfs_ninode value is less, then the number of hash-lists in the inode cache would be unnecessarily more which can result in the spin-lock contention during the unmount. RESOLUTION: The code is modified to calculate the number of hash-lists based on the tuned vxfs_ninode value. * 2834289 (Tracking ID: 2830513) SYMPTOM: The Cluster File System (CFS) hangs while performing file removal operations and the following stack trace is displayed: vxglm::vxg_grant_sleep+0x110 () vxglm::vxg_cmn_lock+0x5a0 () vxglm::vxg_api_lock+0x310 () vx_glm_lock+0x70 () vx_mdelelock+0x70 () vx_mdele_hold+0xe0 () vx_extfree+0x700 () DESCRIPTION: The CFS hangs due to a missing unlock call for the file removal operations. RESOLUTION: The code is modified to unlock the file removal operations. Patch ID: PHCO_42730, PHKL_42729 * 2244365 (Tracking ID: 1143552) SYMPTOM: During re-tune operations on vx_ninode, memory hangs and the system stops responding. DESCRIPTION: The dynamic reconfiguration added several locks to serialize re-tuning operations on a different platform. This was not suitable on the HP-UX platform. Among the several locks added, one of the dynamic reconfiguration locks was acquired in ways that caused performance degradation on large systems. RESOLUTION: A re-tune lock has now been added to correct the memory hang. * 2405530 (Tracking ID: 2274267) SYMPTOM: Unable to tune vxfs_bc_bufhwm on systems > 227 GB.When they try to tune vxfs_bc_bufhwm, following error is displayed: root@zu8003hp:/#kctune vxfs_bc_bufhwm=65536 ERROR: mesg 113: V-2-113: The specified value for vx_bc_bufhwm is greater than 90% of the total kernel memory 267444224 Kbytes. DESCRIPTION: Error is caused by integer overflow in calculation of upper bound for the tunable value. RESOLUTION: Fixed integer overflow in the calculation of upper bound of the tunable. * 2494597 (Tracking ID: 2429566) SYMPTOM: Memory used for VxFS internal buffer cache may significantly grow after 497 days uptime when LBOLT(global which gives current system time) wraps over. DESCRIPTION: We calculate age of buffers based on LBOLT value. Like age = (current LBOLT - LBOLT when buffer added to list). Buffer is reused when age becomes greater than threshold. When LBOLT wraps, current LBOLT becomes very small value and age becomes negative. VxFS thinks that this is not old buffer and never reuses it. Buffer cache memory usage increases as buffers are not reused. RESOLUTION: Now we check if the the LBOLT has wrapped around. If it is, we reassign the buffer time with current LBOLT so that it gets reused after some time. * 2556095 (Tracking ID: 2515380) SYMPTOM: The ff command hangs and later it exits after program exceeds memory limit with following error. # ff -F vxfs /dev/vx/dsk/bernddg/testvol UX:vxfs ff: ERROR: V-3-24347: program limit of 30701385 exceeded for directory data block list UX:vxfs ff: ERROR: V-3-20177: /dev/vx/dsk/bernddg/testvol DESCRIPTION: 'ff' command lists all files on device of vxfs file system. In 'ff' command we do directory lookup. In a function we save the block addresses for a directory. For that we traverse all the directory blocks. Then we have function which keeps track of buffer in which we read directory blocks and the extent up to which we have read directory blocks. This function is called with offset and it return the offset up to which we have read the directory blocks. The offset passed to this function has to be the offset within the extent. But, we were wrongly passing logical offset which can be greater than extent size. As a effect the offset returned gets wrapped to 0. The caller thinks that we have not read anything and hence the loop. RESOLUTION: Remove call to function which maintains buffer offsets for reading data. That call was incorrect and redundant. We actually call that function correctly from one of the functions above. * 2558844 (Tracking ID: 2532934) SYMPTOM: Performance of ftp transfer which use 'VOP_BREAD()' vnode operation for reads can be slow when max_buf_data_size is set to 64K. DESCRIPTION: Read done using 'VOP_BREAD()' was not doing read ahead even for sequential I/Os when max_buf_data_size is set to 64K. Therefore performance degradation was seen. read ahead was not performed because of wrong read length passed to read ahead detection function. It detects this by storing the last read offset plus the amount of read requested into the inode. It then compares the next read's start offset with this value. If it is sequential they should match. In this case we were not supplying the correct read length field to the read ahead detection function. We should have provided max_buf_data_size as the length but instead we provided VX_MAXBSIZE(8192). Therefore detection function did not detect a valid read ahead when using max_buf_data_size of 65536 and hence no read ahead was performed. This in turn can cause cache miss which can affect performance. RESOLUTION: correct read length is now passed to read ahead detection function so that read ahead on sequential I/O can get triggered correctly. * 2607352 (Tracking ID: 2334061) SYMPTOM: When file system is mounted with tranflush option, operations requiring metadata update take comparatively more time. DESCRIPTION: When VxFS file system is mounted with tranflush option, we flush transaction metadata on the disk and wait for 100 milliseconds before flushing next transaction. This delay is affecting severely operation of various commands on the VxFS file system. RESOLUTION: Since the flushing is synchronous and is performed in loop, 100 milliseconds delay is too much. To solve the problem, the delay is reduced to a more reasonable 2 milliseconds value from 100 milliseconds. * 2647802 (Tracking ID: 1067468) SYMPTOM: fsck(1M) command for full file system check enhancement for Memory Reduction. DESCRIPTION: 1. Currently in fsck code path a flat bitmap is allocated per device. This can be optimized by allocating compressed bit map instead of a flat bitmap. 2. Currently in fsck code path 3 arrays are used to track inode's dotdot linkage. This can be optimized by eliminating 2 of these 3 redundant arrays. 3. Currently in fsck code path memory is allocated for inode tables based on the value of fsh_ninode. In MTS the ilist file can become sparse , increasing the value of fsh_ninode. This memory allocation can be optimized by allocating the memory based on the actual number of the inodes present in the ilist file. RESOLUTION: 1. Changed the code to allocate a compressed bit map instead a flat bitmap. 2. Changed the code to eliminate the 2 of the 3 redundant arrays used to track inodes dotdot linkages. 3. Changed the code to allocate the memory to inode tables based on the actual number of inodes present in the ilist file. * 2654474 (Tracking ID: 2651922) SYMPTOM: "ls -l" command on Local VxFS file system is running slow and high CPU usage is seen on HP platform. DESCRIPTION: This issue occurs when system is under inode pressure, and the most of inodes on inode free list are CFS inodes. On HP-UX platform, currently, CFS inodes are not allowed to be reused as local inodes to avoid GLM deadlock issue when vxFS reconfig is in process. So if system needs a VxFS local inode, it has to take a amount of time to loop through all the inode free lists to find a local inode, if the free lists are almost filled up with CFS inodes. RESOLUTION: 1. added a global "vxi_icache_cfsinodes" to count cfs inodes in inode cache. 2. relaxed the condition for converting cluster inode to local inode when the number of in-core cfs inodes is greater than the threshold"vx_clreuse_threshold" and reconfig is not in progress. * 2669100 (Tracking ID: 848619) SYMPTOM: High CPU utilization occurs when several processes are performing direct I/Os on a VxFS file. DESCRIPTION: The high CPU utilization occurs because of high contention for a global spinlock that protects the linked list containing the pages that are participating in the direct I/Os. RESOLUTION: The hash list function used by VxFS code has been modified to ensure that the lock contention is reduced. * 2684626 (Tracking ID: 1180759) SYMPTOM: System panics in case of the deadlock because of improper lock handling. DESCRIPTION: This issue could be due to vx_lockctl not allowing the release of locks if the filesystem has been disabled.Due to this the process would have exited without releasing the lock and another process that is trying to get a lock would hit a deadlock causing the panic. RESOLUTION: Unlock all the file locks of the vnode in case of disabled file system to satisfy an assertion in the HP-UX. INSTALLING THE PATCH -------------------- o install the VxFS 5.0-MP2RP7 patch: a) To install this patch on a CVM cluster, install it one system at a time so that all the nodes are not brought down simultaneously. b) The VxFS 5.0(GA) must be installed before applying these patches. c) To verify the VERITAS file system level, execute: # swlist -l product | egrep -i 'VRTSvxfs' VRTSvxfs 5.0.01.04 VERITAS File System Note: VRTSfsman is a corequisite for VRTSvxfs. So, VRTSfsman also needs to be installed with VRTSvxfs. # swlist -l product | egrep -i 'VRTS' VRTSvxfs 5.0.01.04 Veritas File System VRTSfsman 5.0.01.02 Veritas File System Manuals d) All prerequisite/corequisite patches must be installed. The Kernel patch requires a system reboot for both installation and removal. e) To install the patch, execute the following command: # swinstall -x autoreboot=true -s PHCO_43108 PHCO_43113 PHKL_43107 If the patch is not registered, you can register it using the following command: # swreg -l depot The is the absolute path where the patch resides. REMOVING THE PATCH ------------------ To remove the VxFS 5.0-MP2RP7 patch: a) Execute the following command: # swremove -x autoreboot=true PHCO_43108 PHCO_43113 PHKL_43107 SPECIAL INSTRUCTIONS -------------------- NONE OTHERS ------ NONE