* * * READ ME * * * * * * Veritas File System 5.0 MP1 * * * * * * P-patch 10 * * * Patch Date: 2012-09-12 This document provides the following information: * PATCH NAME * PACKAGES AFFECTED BY THE PATCH * BASE PRODUCT VERSIONS FOR THE PATCH * OPERATING SYSTEMS SUPPORTED BY THE PATCH * INCIDENTS FIXED BY THE PATCH * INSTALLATION PRE-REQUISITES * INSTALLING THE PATCH * REMOVING THE PATCH PATCH NAME ---------- Veritas File System 5.0 MP1 P-patch 10 PACKAGES AFFECTED BY THE PATCH ------------------------------ VRTSvxfs BASE PRODUCT VERSIONS FOR THE PATCH ----------------------------------- * Veritas File System 5.0 MP1 * Veritas Storage Foundation for Oracle RAC 5.0 MP1 * Veritas Storage Foundation Cluster File System 5.0 MP1 * Veritas Storage Foundation 5.0 MP1 * Veritas Storage Foundation High Availability 5.0 MP1 * Veritas Storage Foundation for Oracle 5.0 MP1 OPERATING SYSTEMS SUPPORTED BY THE PATCH ---------------------------------------- HP-UX 11i v3 (11.31) INCIDENTS FIXED BY THE PATCH ---------------------------- This patch fixes the following Symantec incidents: Patch ID: PHKL_43128 * 2245427 (Tracking ID: 2178147) SYMPTOM: If a socket file is removed, the file system is marked for full fsck (1M) operation. The following error message is displayed in the system log: vmunix: vxfs: WARNING: msgcnt 1 mesg 087: V-2-87: vx_dotdot_manipulate: - / file system 2437 inode 541 dotdot inode error DESCRIPTION: During the socket file creation, the attribute inode for the parent directory is not updated. Hence, the error occurs when the socket file is removed. RESOLUTION: The code is modified to update the socket file linkage during creation, thus avoiding the error message. * 2800277 (Tracking ID: 2566875) SYMPTOM: The write(2) operation exceeding the quota limit fails with an EDQUOT error ("Disc quota exceeded") before the user quota limit is reached. DESCRIPTION: When a write request exceeds a quota limit, the EDQUOT error is handled so that Veritas File System (VxFS) can allocate space up to the hard quota limit to proceed with a partial write. However, VxFS does not handle this error and an erroris returned without performing a partial write. RESOLUTION: The code is modified to handle the EDQUOT error from the extent allocation routine. * 2800335 (Tracking ID: 1092933) SYMPTOM: The system may panic in the vx_fsync_chains() function when it tries to sleep in the interrupt context. The following stack trace is displayed: vx_event_wait vx_delay2 vx_fsync_chains vx_disable vx_dataioerr vx_pageio_done DESCRIPTION: While handling the external interrupt, the vx_pageio_done() function calls the function vx_fsync_chains() function. The vx_fsync_chains() function may sleep during the execution. The function vx_fsync_chains() is required in the case of Input/output (IO) errors. The function vx_fsync_chains() is called at a couple of places, when the I/O strategy fails. But, the error variable is overwritten improperly. RESOLUTION: The code is modified to save Input/Output errors so that the vx_fsync_chains() function can be called and the call to vx_fsync_chains() is removed from the vx_disable(). * 2846939 (Tracking ID: 2768505) SYMPTOM: While reading the Dynamic Name Lookup Cache (DNLC) entries using the pstat_getmpathname(2) function, the system might panic and display the following stack trace: vx_dnlc_appened_dnlcnodes vx_dnlc_getentries pstat_mpathname $cold_pstat syscall DESCRIPTION: While reading all the DNLC entries, the VxFS pointer is derived from the VNODE pointer. The VNODE pointer can be reused and de-referencing the VxFS pointer might panic the system. RESOLUTION: The code is modified to pass the VxFS pointer directly to the function instead of deriving it from the VNODE pointer. * 2857613 (Tracking ID: 2830513) SYMPTOM: The Cluster File System (CFS) hangs while performing file removal operations and the following stack trace is displayed: vxglm::vxg_grant_sleep+0x110 () vxglm::vxg_cmn_lock+0x5a0 () vxglm::vxg_api_lock+0x310 () vx_glm_lock+0x70 () vx_mdelelock+0x70 () vx_mdele_hold+0xe0 () vx_extfree+0x700 () DESCRIPTION: The CFS hangs due to a missing unlock call for the file removal operations. RESOLUTION: The code is modified to unlock the file removal operations. * 2857619 (Tracking ID: 2845175) SYMPTOM: When the Access Control List (ACL) feature is enabled, the system may panic with "Data Key Miss Fault in KERNEL mode" error message in the vx_do_getacl() function and the following stack trace is displayed: vx_do_getacl+0x840 () vx_getacl+0x70 () acl+0x480 () DESCRIPTION: In the vx_do_getacl() function, a local variable is accessed without being initializing as a result leading to a panic. RESOLUTION: The code is modified to initialize the local variable to NULL before using it. * 2876845 (Tracking ID: 2874172) SYMPTOM: Network File System (NFS) file creation thread might loop continuously with the following stack trace: vx_getblk_cmn(inlined) vx_getblk+0x3a0 vx_exh_allocblk+0x3c0 vx_exh_hashinit+0xa50 vx_dexh_create+0x240 vx_dexh_init+0x8b0 vx_do_create+0x1e0 vx_create1+0x1d0 vx_create0+0x270 vx_create+0x40 rfs3_create+0x420 common_dispatch+0xb40 rfs_dispatch+0x40 svc_getreq+0x250 svc_run+0x310 svc_do_run+0xd0 nfssys+0x6a0 hpnfs_nfssys+0x60 coerce_scall_args+0x130 syscall+0x590 DESCRIPTION: The Veritas File System (VxFS) file creation vnode operation (VOP) routine expects the parent vnode to be a directory vnode pointer. But, the NFS layer passes a stale file vnode pointer by default. This might cause unexpected results such as hang during VOP handling. RESOLUTION: The code is modified to check for the vnode type of the parent vnode pointer at the beginning of the create VOP call and return an error if it is not a directory vnode pointer. Patch ID: PHCO_42918, PHKL_42919 * 2245447 (Tracking ID: 1537731) SYMPTOM: A panic can occur when a write hits an ENOSPC whilst a re-size is in operation. DESCRIPTION: The cause of problem here is we get an ENOSPC during a write and re-size it at the same time. We drop the active level do some other task and then again raise it. The time gap between dropping and raising active level gives a window for the re-size to come in and do its work. Here, we do not update the related pointer and this results in an panic. RESOLUTION: Updated fs pointer after re-aquiring active level. * 2370062 (Tracking ID: 2370046) SYMPTOM: Readahead with read_nstream value <> 1 misses early blocks of data. DESCRIPTION: Readahead is not reading portion of the file which is expected to be read in advance. These blocks are read on demand rather than actual readahead. This is happening because the parameter determining next readahead offset is wrongly zeroed out. RESOLUTION: Changed the code which causes zeroing out of the parameter determining next readahead offset. * 2604150 (Tracking ID: 2559450) SYMPTOM: Command fsck_vxfs(1m) may core-dump with SEGV_ACCERR error. DESCRIPTION: Command fsck_vxfs(1M) is trying to allocate the memory for a structure, but memory allocation fails for the structure resulting in the segmentation fault. RESOLUTION: A check is added in the code for failed memory allocation to print an error message instead of a core dump. * 2696070 (Tracking ID: 2696067) SYMPTOM: A file created which inherits the default ACL from parent shows wrong permissions when getaccess command is issued. DESCRIPTION: vx_daccess() does not fabricate any GROUP_OBJ entry (as vx_do_getacl does). If a newly created file leverages its parent directory's ACL entries RESOLUTION: To address this, a fix is to fabricate a GROUP_OBJ entry. * 2722872 (Tracking ID: 1244756) SYMPTOM: A race condition corrupts data structures, causing a NULL pointer dereference. DESCRIPTION: There is a race condition between some DNLC functions. One of those functions is already holding a lock and wants another lock which results in a race condition. This causes the freelist corruption and a NULL pointer dereference. RESOLUTION: Updated DNLC_LOOKUP to make sure DNLC entry is not free and on freelist when moving it to tail, avoiding a race with DNLC_GET or DNLC_ENTER. * 2723003 (Tracking ID: 2670022) SYMPTOM: Duplicate file names can be seen in a directory. DESCRIPTION: VxFS maintains internal directory name lookup cache (DNLC) to improve the performance of directory lookups. A race condition is arising in DNLC lists manipulation code during lookup/creation of file names having >32 characters ( which will further affect other file creations). This is causing DNLC to have a stale entry for an existing file in the directory. A lookup of such a file through DNLC will say file as non-existent which will allow another duplicate file name in the directory. RESOLUTION: Fixed the race condition by protecting the DNLC lists through proper locks. * 2723005 (Tracking ID: 2651922) SYMPTOM: "ls -l" command on Local VxFS file system is running slow and high CPU usage is seen on HP platform. DESCRIPTION: This issue occurs when system is under inode pressure, and the most of inodes on inode free list are CFS inodes. On HP-UX platform, currently, CFS inodes are not allowed to be reused as local inodes to avoid GLM deadlock issue when vxFS reconfig is in process. So if system needs a VxFS local inode, it has to take a amount of time to loop through all the inode free lists to find a local inode, if the free lists are almost filled up with CFS inodes. RESOLUTION: 1. added a global "vxi_icache_cfsinodes" to count cfs inodes in inode cache. 2. relaxed the condition for converting cluster inode to local inode when the number of in-core cfs inodes is greater than the threshold"vx_clreuse_threshold" and reconfig is not in progress. * 2723127 (Tracking ID: 2680946) SYMPTOM: panic in vx_itryhold+0x40/spinlock() - due to NULL child pointer of dnlc entry. Data Access Rights Fault in KERNEL mode spinlock+0x40 vx_itryhold+0x40 vx_dnlc_lookup+0x1b0 vx_cbdnlc_lookup+0x130 vx_fast_lookup+0x120 vx_lookup+0x3c0 lookuppnvp+0x2d0 lookuppn+0x90 lookupname+0x60 vn_open+0xa0 copen+0x170 open+0x80 syscall+0x920 DESCRIPTION: The panic is because of NULL pointer de-reference. The reason for NULL pointer is not known for sure. One of the possibility for NULL child pointer is while inserting an entry in DNLC. RESOLUTION: We have global variable which we increment if we see the DNLC with NULL child pointer is getting inserted. So when next time issue occurs we can look value of the global and rule out this possibility. Patch ID: PHCO_42617, PHKL_42618 * 2289635 (Tracking ID: 1633670) SYMPTOM: Panic in vx_inull_list() / vx_inactive() / vx_inode_deinit() after forced unmount of writable clone and unmount of primary fileset. DESCRIPTION: This happens due to accessing of NULL pointer dereferencing. RESOLUTION: Do not iflush clone inodes which have already been force unmounted when running iflush on force unmount of a primary fileset. Beware of null vfsp pointers in partially processed force-unmounted inodes. Reference the vx_vfs struct through the fset instead of the vnode. * 2351018 (Tracking ID: 2253938) SYMPTOM: In a Cluster File System (CFS) environment , the file read performances gradually degrade up to 10% of the original read performance and the fsadm(1M) -F vxfs -D -E shows a large number (> 70%) of free blocks in extents smaller than 64k. For example, % Free blocks in extents smaller than 64 blks: 73.04 % Free blocks in extents smaller than 8 blks: 5.33 DESCRIPTION: In a CFS environment, the disk space is divided into Allocation Units (AUs).The delegation for these AUs is cached locally on the nodes. When an extending write operation is performed on a file, the file system tries to allocate the requested block from an AU whose delegation is locally cached, rather than finding the largest free extent available that matches the requested size in the other AUs. This leads to a fragmentation of the free space, thus leading to badly fragmented files. RESOLUTION: The code is modified such that the time for which the delegation of the AU is cached can be reduced using a tuneable, thus allowing allocations from other AUs with larger size free extents. Also, the fsadm(1M) command is enhanced to de-fragment free space using the -C option. * 2406458 (Tracking ID: 2283893) SYMPTOM: In a Cluster File System (CFS) environment , the file read performances gradually degrade up to 10% of the original read performance and the fsadm(1M) -F vxfs -D -E shows a large number (> 70%) of free blocks in extents smaller than 64k. For example, % Free blocks in extents smaller than 64 blks: 73.04 % Free blocks in extents smaller than 8 blks: 5.33 DESCRIPTION: In a CFS environment, the disk space is divided into Allocation Units (AUs).The delegation for these AUs is cached locally on the nodes. When an extending write operation is performed on a file, the file system tries to allocate the requested block from an AU whose delegation is locally cached, rather than finding the largest free extent available that matches the requested size in the other AUs. This leads to a fragmentation of the free space, thus leading to badly fragmented files. RESOLUTION: The code is modified such that the time for which the delegation of the AU is cached can be reduced using a tuneable, thus allowing allocations from other AUs with larger size free extents. Also, the fsadm(1M) command is enhanced to de-fragment free space using the -C option. * 2410789 (Tracking ID: 1466351) SYMPTOM: Mount hangs in vx_bc_binval_cookie like the following stack, delay vx_bc_binval_cookie vx_blkinval_cookie vx_freeze_flush_cookie vx_freeze_all vx_freeze vx_set_tunefs1 vx_set_tunefs vx_aioctl_full vx_aioctl_common vx_aioctl vx_ioctl genunix:ioctl unix:syscall_trap32 DESCRIPTION: The hanging process is waiting for a buffer to be unlocked. But that buffer can only be released if its associated cloned map writes get flushed. But a necessary flush is missed. RESOLUTION: Add code to synchronize cloned map writes so that all the cloned maps will be cleared and the buffers associated with them will be released. * 2551555 (Tracking ID: 2428964) SYMPTOM: Value of kernel tunable max_thread_proc gets incremented by 1 after every software maintenance related activity (install, remove etc.) of VRTSvxfs package. DESCRIPTION: In the postinstall script for VRTSvxfs package, value of kernel tunable max_thread_proc is wrongly increment by 1. RESOLUTION: From postinstall script increment operation of max_thread_proc tunable is removed. * 2551569 (Tracking ID: 2510903) SYMPTOM: Writing to clones loops permanently on HP-UX 11.31, there are some threads of the typical stack like following: vx_tranundo vx_logged_cwrite vx_write_clone vx_write1 vx_rdwr vno_rw inline rwuio write syscall DESCRIPTION: A VxFS write with small size can go to logged write which stores the data in intent log. The logged write can boost performance for small writes but requires the write size within logged write limit. However, When we write data to check points and the write length is greater than logged write limit, vxfs cannot proceed with logged write and retry forever. RESOLUTION: Skipped the logged write if the write size exceeds the specific limit. * 2556096 (Tracking ID: 2515380) SYMPTOM: The ff command hangs and later it exits after program exceeds memory limit with following error. # ff -F vxfs /dev/vx/dsk/bernddg/testvol UX:vxfs ff: ERROR: V-3-24347: program limit of 30701385 exceeded for directory data block list UX:vxfs ff: ERROR: V-3-20177: /dev/vx/dsk/bernddg/testvol DESCRIPTION: 'ff' command lists all files on device of vxfs file system. In 'ff' command we do directory lookup. In a function we save the block addresses for a directory. For that we traverse all the directory blocks. Then we have function which keeps track of buffer in which we read directory blocks and the extent up to which we have read directory blocks. This function is called with offset and it return the offset up to which we have read the directory blocks. The offset passed to this function has to be the offset within the extent. But, we were wrongly passing logical offset which can be greater than extent size. As a effect the offset returned gets wrapped to 0. The caller thinks that we have not read anything and hence the loop. RESOLUTION: Remove call to function which maintains buffer offsets for reading data. That call was incorrect and redundant. We actually call that function correctly from one of the functions above. * 2561752 (Tracking ID: 2561739) SYMPTOM: When the file is created and the if the parent has default ACL entry then that entry is not taken into account for calculating the class entry of that file. When a separate dummy entry added we take into account the default entry from parent as well. e.g. $ getacl . # file: . # owner: root # group: sys user::rwx group::rwx class:rwx other:rwx default:user:user_one:r-x $ touch file1 $ getacl file1 # file: try1 # owner: root # group: sys user::rw- user:user_one:r-x group::rw- class:rw- <------ other:rw- The class entry here should be rwx. DESCRIPTION: We were not taking into account the default entry of parent. We share the attribute inode with parent and do not create new attribute inode for newly created file. But when an ACL entry is explicitly made we create the separate attribute inode so the default entry also get copied in new inode and taken into consideration while returning the class entry of file. RESOLUTION: Now before returning the ACL entry buffer we calculate the class entry again and consider all the entries. * 2561757 (Tracking ID: 2492304) SYMPTOM: "find" command displays duplicate directory entries. DESCRIPTION: Whenever the directory entries can fit in the inode's immediate area VxFS doesn't allocate new directory blocks. As new directory entries get added to the directory this immediate area gets filled and all the directory entries are then moved to a newly allocated directory block. The directory blocks have space reserved at the start of the block to hold the block hash information which is used for fast lookup of entries in that block. Offset of the directory entry, which was at say x bytes in the inode's immediate area, when moved to the directory block, will be at (x + y) bytes. "y" is the size of the block hash. During this transition phase from immediate area to directory blocks, a readdir() can report a directory entry more than once. RESOLUTION: Directory entry offsets returned to the "readdir" call are adjusted so that when the entries move to a new block, they will be at the same offsets. * 2616625 (Tracking ID: 2616622) SYMPTOM: slow mmap() performace when the filesystem block size is 8k and pagesize is 4K. DESCRIPTION: When we have an 8k block size file system and 4k pages and mmap say a 8K file, the file as represented in memory ends up as two pages (0 and 1). When the memory at offset 0 into the mapping is modified, we get a page fault for page 0 in the file. However, we haven't had a page fault yet for page 1 and can't guarantee that we will in the future. When we allocate that disk block and mark it as valid, we trust that the page mentioned in the fault request will get flushed out to disk and thereby leave it uninitialized on disk by default. We clear just the page in memory and leave it dirty so we know the data in memory is more recent than the data on disk. However, the other half of the block (which could eventually be mapped to page 1) gets cleared with a synchronous write because we don't know if we will ever see a fault. This synchronous clearing of the other half of 8K block was causing performance degradation. RESOLUTION: We now expand the range of the fault to cover whole 8K block. In this case we would just ignore that the OS asked for only one page and give it two pages anyway to cover the whole file system block to save the separate synchronous clearing of the other half of 8K block. * 2619959 (Tracking ID: 1027438) SYMPTOM: Internal noise for VxFS hit "f:vx_statvfs_pri:2" assert on cluster file system DESCRIPTION: While updating summaries for Allocation units on node, summaries for all node of cluster need to be synchronized using broadcast message. The broadcast message is not sent as recovery for file system is in progress, resulting in wrong free space calculation on node causing the assert. RESOLUTION: Code is changed to send the broadcast message after file system recovery is complete. INSTALLING THE PATCH -------------------- Installing VxFS 5.0 MP1P10 patch: a)If you install this patch on a CVM cluster, install it one system at a time so that all the nodes are not brought down simultaneously. b)VxFS 5.0(GA) must be installed before applying these patches. c)To verify the VERITAS file system level, enter: # swlist -l product | egrep -i 'VRTSvxfs' VRTSvxfs 5.0.31.0 VERITAS File System Note : VRTSfsman is a corequisite for VRTSvxfs.Hence VRTSfsman also needs to be installed alongwith VRTSvxfs. # swlist -l product | egrep -i 'VRTS' VRTSvxfs 5.0.31.0 Veritas File System VRTSfsman 5.0.31.0 Veritas File System Manuals d)All prerequisite/corequisite patches have to be installed.The Kernel patch requires a system reboot for both installation and removal. e)To install the patch, enter the following command: # swinstall -x autoreboot=true -s PHCO_42918 PHKL_43128 Incase the patch is not registered, the patch can be registered using the following command: # swreg -l depot , where is the absolute path where the patch resides. REMOVING THE PATCH ------------------ Removing VxFS 5.0 MP1P10 patches: a)To remove the patch, enter the following command: # swremove -x autoreboot=true PHCO_42918 PHKL_43128 SPECIAL INSTRUCTIONS -------------------- NONE OTHERS ------ NONE