* * * READ ME * * * * * * Veritas File System 5.1 SP1 RP3 * * * * * * P-patch 12 * * * Patch Date: 2018-08-23 This document provides the following information: * PATCH NAME * OPERATING SYSTEMS SUPPORTED BY THE PATCH * PACKAGES AFFECTED BY THE PATCH * BASE PRODUCT VERSIONS FOR THE PATCH * SUMMARY OF INCIDENTS FIXED BY THE PATCH * DETAILS OF INCIDENTS FIXED BY THE PATCH * INSTALLATION PRE-REQUISITES * INSTALLING THE PATCH * REMOVING THE PATCH PATCH NAME ---------- Veritas File System 5.1 SP1 RP3 P-patch 12 OPERATING SYSTEMS SUPPORTED BY THE PATCH ---------------------------------------- HP-UX 11i v3 (11.31) PACKAGES AFFECTED BY THE PATCH ------------------------------ VRTSvxfs BASE PRODUCT VERSIONS FOR THE PATCH ----------------------------------- * Veritas Storage Foundation 5.1 SP1 * Veritas Storage Foundation Cluster File System 5.1 SP1 * Veritas Storage Foundation for Oracle RAC 5.1 SP1 * Veritas Storage Foundation HA 5.1 SP1 SUMMARY OF INCIDENTS FIXED BY THE PATCH --------------------------------------- Patch ID: PHKL_44729 * 3935893 (3935892) When doing frequent non-extending writes to a file on a local VxFS file system, mtime of the file is not advanced even though file content has been updated. Patch ID: PHKL_44710 * 3926568 (3926565) Changing definition of VX_MINNINODE on HP-UX. * 3934419 (3934418) System hang may be observed during massive file removal on a large filesystem with smaller filesystem block size. Patch ID: PHKL_44651 * 3831186 (3784990) CPU contention is observed during file system freeze. * 3920069 (3915962) vx_do_setacl results in panic as the ACL buffer is invalid. Patch ID: PHKL_44613 * 3910815 (3910526) fsadm fails with error number 28 during resize operation * 3916659 (3910248) ODM IO may get hung while performing read/write operations. Patch ID: PHKL_44567 * 3860554 (3873078) Read performance degraded due to smaller initial read ahead size. * 3877524 (3869091) Mmap write operation receives a SIGBUS signal due to ENOSPC error when the file system has less free space than the page size. * 3896397 (3896396) Performance degradation may be observed in case of fcache invalidation of read ahead pages. * 3900972 (3900971) System call sendfile may hang in retrieving pages in VxFS Patch ID: PHKL_44512 * 3868662 (3868661) The vxfsstat(1M) command displays incorrect file system statistics. For example: $vxfsstat -v --- vxi_alloc_emap -1085366410684661760 vxi_alloc_smap -1 vxi_alloc_expand_retry -6317191817700346753 vxi_alloc_find_retry -4612083491882057801 --- * 3873853 (3811849) On cluster file system (CFS), while executing lookup() function in a directory with Large Directory Hash (LDH), the system panics and displays an error. * 3875839 (3875837) Memory mapped read performance may get impacted due to code fix for sendfile performance improvement. * 3876990 (3873624) In case of memory mapped I/O, synchronous buffer allocated from FC_BUF_DEFAULT_arena may not be freed, if error flag is set on them. * 3878643 (3878641) In case of Cluster File System (CFS), if a file system is disabled during file/directory creation, the thread that creates the inode may hang. * 3879382 (3879381) The dynamic minimum value of vxfs_bc_bufhwm on VxFS 5.1SP1 is set to a much larger value than previous releases when there are more than 16 CPUs on the system. The larger value is not optimal for HPVM use case which typically wants to minimize the memory usage on the VSP. * 3879793 (3879792) When a large number of files or directories are deleted, File System Queue (FSQ) spinlock contention may be observed. * 3879796 (3879795) When a large number of files or directories are deleted, File System Queue (FSQ) spinlock contention may be observed. * 3879805 (3879804) On Cluster File System (CFS) with a single CPU or a core machine as a node in the cluster, directory or file creation threads may hang. * 3879887 (3805124) Using kctune(1M) to change vxfs_bc_bufhwm(5) gives error even though the new value is larger than the current value. * 3880073 (3877000) When the number of Inode Allocation Units (IAU) on the root VxFS file system for HPUX system reaches a value greater than 256, the system boot hangs and displays an error. * 3889125 (3867995) Veritas File System (VxFS) worker threads hangs while flushing transaction log buffer to the disk. Patch ID: PHKL_44439 * 3829948 (1482790) System panics when VxFS DMAPI is used. * 3832381 (3767366) VxFS read-ahead feature may not work, if the read request is greater than the read-ahead request. * 3832587 (3832584) Files and directories on the VxFS file system are not able to inherit the default ACL entries of other objects resulting in incorrect inherited- permission for files and directories created under the parent directory. * 3859665 (2767579) System may hang during reverse-name DNLC (Directory Name Lookup Cache) lookup operation on the file system. * 3864335 (3864333) Cluster node may panic in the vx_dnlc_recent_cookie() function, when another Cluster File System (CFS) node has higher number of CPUs. Patch ID: PHKL_44293 * 3796751 (3784126) The mmap pages are invalidated during the file system freeze. * 3800361 (3602322) System panics while flushing the dirty pages of the inode. * 3803799 (3751205) The system may panic during the stop operation of the high availability cluster. * 3803825 (3331093) The MountAgent process gets stuck when repeated switchover is performed due to the current VxFS- Asynchronous Monitoring Framework (AMF) notification/unregistration design. * 3803849 (3807129) The file system resize operation may result in panic on an almost full file system. Patch ID: PHKL_44268 * 3751305 (2439108) System crashes when the read_preferred_io tunable is set to a non-page aligned size. * 3754049 (3718924) On cluster file system (CFS), the file system I/O hangs for a few seconds and all the file pages are invalidated during the hang. * 3769384 (3673599) The effective user permission is incorrectly displayed in the getacl(1M) command output. Patch ID: PHKL_44215 * 3537201 (3469644) The system panics in the vx_logbuf_clean() function. * 3597563 (3597482) The pwrite(2) function fails with the EOPNOTSUPP error. * 3615527 (3604750) The kernel loops during the extent re-org. * 3615530 (3466020) File system is corrupted with an error message "vx_direrr: vx_dexh_keycheck_1". * 3615532 (3484336) The fidtovp() system call panics in the vx_itryhold_locked () function. * 3660347 (3660342) VxFS 5.1SP1 package does not set the MANPATH environment variable correctly. * 3669985 (3669983) When the file system with disk-layout version (DLV) 4 or 5 is in use, and the quota option is enabled, the system panics. * 3669994 (3669989) On Cluster File System (CFS) and on high end machines, the spinlock contention may be observed when new files are created in parallel on nearly full file system. * 3682335 (3637636) Cluster File System (CFS) node initialization and protocol upgrade may hang during the rolling upgrade. * 3706705 (3703176) The tunable to enable or disable VxFS inactive-thread throttling, and VxFS inactive-thread process throttling was not available. * 3755915 (3755927) Space allocation to a file during write may hang on the file system if the remaining intent log space is low. Patch ID: PHKL_44140 * 3526848 (3526845) The Data Translation Lookaside Buffer (DTLB) panic may occur when the directory entries are read. * 3527418 (3520349) When there is a huge number of dirty pages in the memory, and a sparse write is performed at a large offset of 4TB or above, on an existing file that is not null, the file system hangs. * 3537431 (3537414) The "mount -v" command does not display the "nomtime" mount option when the file system is mounted with the "nomtime" mount option. * 3560610 (3560187) The kernel may panic when the buffer is freed in the vx_dexh_preadd_space() function with the message "Data Key Miss Fault in kernel mode". * 3561998 (2387439) An internal debug assert is hit when the conformance test is run for the partitioned-directory feature. Patch ID: PHKL_43916 * 2937310 (2696657) Internal noise test on the cluster file system hits a debug assert. * 3261849 (3253210) The file system hangs when it reaches the space limitation. * 3396530 (3252983) On a high-end system greater than or equal to 48 CPUs, some file system operations may hang. * 3410567 (3410532) The file system hangs due to self-deadlock. * 3435207 (3433777) A single CPU machine panics due to the safety-timer check when the inodes are re-tuned. * 3471150 (3150368) vx_writesuper() function causes the system to panic in evfsevol_strategy(). * 3471152 (3153919) The fsadm (1M) command may hang when the structural file set re-organization is in progress. * 3471165 (3332902) While shutting down, the system running the fsclustadm(1M) command panics. * 3484316 (2555201) The internal conformance and stress testing on local and cluster mounted file system hits a debug assert. Patch ID: PHKL_43539 * 2755784 (2730759) The sequential read performance is poor because of the read-ahead issues. * 2801689 (2695390) Accessing a vnode from cbdnlc cache hits an assert during internal testing. * 2857465 (2735912) The performance of tier relocation using the fsppadm(1M) enforce command degrades while migrating a large number of files. * 2930507 (2215398) Internal stress test in the cluster environment hits the xted_set_msg_pri1:1 assert. * 2932216 (2594774) The "vx_msgprint" assert is observed several times in the internal Cluster File System (CFS) testing. * 3011828 (2963763) When the thin_friendly_alloc() and deliache_enable() functionality is enabled, VxFS may hit a deadlock. * 3024028 (2899907) On CFS, some file-system operations like vxcompress utility and de-duplication fail to respond. * 3024042 (2923105) Removal of the VxFS module from the kernel takes a longer time. * 3024049 (2926684) In rare cases the system may panic while performing a logged write. * 3024052 (2906018) The vx_iread errors are displayed after successful log replay and mount of the file system. * 3024088 (3008451) In a Cluster File System (CFS) environment, shutting down the cluster may panic one of the nodes with a null pointer dereference. * 3131795 (2912089) The system becomes unresponsive while growing a file through vx_growfile in a fragmented file system. * 3131824 (2966277) Systems with high file-system activity like read/write/open/lookup may panic the system. * 3131885 (3010444) On a NFS filesystem cksum(1m) fails with the "cksum: read error on : Bad address" error. * 3131920 (3049408) When the system is under the file-cache pressure, the find(1) command takes time to operate. * 3138653 (2972299) The initial and subsequent reads on the directory with many symbolic links is very slow. * 3138663 (2732427) A Cluster mounted file-system may hang and become unresponsive. * 3138668 (3121933) The pwrite(2) fails with the EOPNOTSUPP error. * 3138675 (2756779) The read and write performances are slow on Cluster File System (CFS) when it runs applications that rely on the POSIX file-record using the fcntl lock. * 3138695 (3092114) The information output displayed by the "df -i" command may be inaccurate for cluster mounted file systems. * 3141278 (3066116) The system panics due to NULL pointer dereference at vx_worklist_process() function. * 3141428 (2972183) The fsppadm(1M) enforce command takes a long time on the secondary nodes compared to the primary nodes. * 3141433 (2895743) Accessing named attributes for some files stored in CFS seems to be slow. * 3141440 (2908391) It takes a longer time to remove checkpoints from the Veritas File System (VxFS) file system with a large number of files. * 3141445 (3003679) When running the fsppadm(1M) command and removing a file with the named stream attributes (nattr) at the same time, the file system does not respond. * 3142476 (3072036) Read operations from secondary node in CFS can sometimes fail with the ENXIO error code. * 3159607 (2779427) The full fsck flag is set in after a failed inode read operation. * 3160205 (3157624) The fcntl() system call when used for file share reservations(F_SHARE command) can cause a memory leak in Cluster File System (CFS). * 3207096 (3192985) Checkpoints quota usage on CFS can be negative. * 3226404 (3214816) When you create and delete the inodes of a user frequently with the DELICACHE feature enabled, the user quota file becomes corrupt. * 3235517 (3240635) In a CFS environment, when a checkpoint is mount using the mount(1M) command the system may panic. * 3243204 (3226462) On a cluster mounted file-system with unequal CPUs, a node may panic while doing a lookup operation. * 3248982 (3272896) Internal stress test on the local mount hits a deadlock. * 3249151 (3270357) The fsck (1m) command fails to clean the corrupt file system during the internal 'noise' test. * 3261334 (3228646) NFSv4 server panics in unlock path. * 3261782 (3240403) The fidtovp()system call may cause panic in the vx_itryhold_locked () function. * 3262025 (3259634) A CFS that has more than 4 GB blocks is corrupted because the blocks containing some file system metadata gets eliminated. Patch ID: PHKL_43432 * 3042340 (2616622) The performance of the mmap() function is slow when the file system block size is 8KB and the page size is 4KB. * 3042341 (2555198) sendfile() does not create DMAPI events for HSM on VxFS. * 3042352 (2806466) "fsadm -R" resulting in panic at LVM layer due to vx_ts.ts_length set to 2GB. * 3042357 (2750860) Performance issue due to CFS fragmentation in CFS cluster * 3042373 (2874172) Infinite looping in vx_exh_hashinit() * 3042407 (3031869) "vxfsstat -b" does not print correct information on maximum buffer size * 3042427 (2850738) The system may hang in the low memory condition. * 3042479 (3042460) Add support for DOX lock to tackle vnode lock contention in VN_HOLD/RELE * 3042501 (3042497) Atomically increment/decrement active level in HP * 3047980 (2439261) [VxFS]VxFS 16 bytes sequential writes buffer IO performance is 30% slower than jfs2. fiostats seems to be taking time. * 3073371 (3073372) Changing default max pdir level to 2 and default threshold size to 32768. DETAILS OF INCIDENTS FIXED BY THE PATCH --------------------------------------- This patch fixes the following incidents: Patch ID: PHKL_44729 * 3935893 (Tracking ID: 3935892) SYMPTOM: In case of NFS over VxFS, When doing frequent non-extending writes to a file on a local VxFS file system, the modification timestamp (i_mtime) of the file is not advanced even though file content has been updated and nfs reader might read stale data. DESCRIPTION: The NFS-client performs reading in the following manner. It first sends a getattr() call to the NFS server. In serving client request, the NFS server made a VOP_GETATTR call to VxFS to retrieve file's attributes and send back to the client. The nfs client compares the mtime of the file and it has changed it follows with a read() call to get the data from the NFS server. If mtime remains unchanged, the client will return the data kept in its local cache to next read request. RESOLUTION: Code changes has been done to increment mtime for local mount filesystem in case mtime is equal to system time but inode has been modified. Patch ID: PHKL_44710 * 3926568 (Tracking ID: 3926565) SYMPTOM: On HP-UX, the VX_MINNINODE has a dependency on the outdated kernel tunable 'ninode'. DESCRIPTION: On HP-UX, the VX_MINNINODE has a dependency on the outdated kernel tunable 'ninode'. RESOLUTION: Change the definition of VX_MINNINODE. * 3934419 (Tracking ID: 3934418) SYMPTOM: System hang may be observed during massive file removal on a large filesystem with smaller filesystem block size. DESCRIPTION: The massive file removal generates huge amount of data to be flushed that leads to heavy spinlock contention on the FS:vxfs:fsq spinlock while flushing the transactions. These transactions, when written to the disk, receive a confirmation in the form an interrupt (biodone()) when not written in the synchronous mode. These interrupts are to be served and require the same lock which is needed by the threads, trying to flush those transactions, causing contention. Since it is a spinlock which essentially means that the threads which are running over the CPU would block the interrupts hence slow down the biodone()s. RESOLUTION: Code changes have been done to introduce a new kctune hidden tunable to yield the CPU in case total execution time of the thread crosses the limit set by the tunable. Also added a counter in vxfsstat command. Patch ID: PHKL_44651 * 3831186 (Tracking ID: 3784990) SYMPTOM: On high-end servers (typically with large memory size and more than 64 processors), CPU starvation or mini-hang could be experienced with high CPU utilization by vxfsd threads triggered by a file system freeze. DESCRIPTION: During the file system freeze, the VxFS background thread (vx_sched_thread ()) spawns vxfsd threads to flush the file system data and metadata cached in memory. The number of vxfsd threads created depends on the amount of cached data to be flushed and is limited based on the kernel tunable max_thread_proc. The default maximum size of the VxFS inode cache is auto-tuned based on the memory size. On a high-end server, the inode cache size can exceed 100000, allowing a large amount of cached inodes to be flushed during a file system freeze. At the same time, if max_thread_proc is set above 1000, VxFS can end up creating and waking up close to 1000 vxfsd threads within a short time. It results in a thundering herd of vxfsd threads running in parallel on different processors and trying to acquire some file system level locks for flushing. RESOLUTION: To resolve this issue, code changes have been made which restricts the number of vxfsd threads to be woken up. A hidden tunable, independent of max_thread_proc, is also provided to limit the number of vxfsd threads to be created. * 3920069 (Tracking ID: 3915962) SYMPTOM: When the NFS client attempts to set default ACL on the NFS mounted VxFS file system, the system would panic. The stack trace may look like: panic: Fault when executing in kernel mode Stack Trace: Function Name bad_kern_reference+0xa0 $cold_vfault+0x500 vm_hndlr+0x620 bubbledown+0x0 vx_do_setacl+0xe40 vx_setacl+0x410 acl3_setacl+0x480 common_dispatch+0xc10 acl_dispatch+0x40 svc_getreq+0x250 svc_run+0x350 svc_do_run+0xd0 nfssys+0x7f0 hpnfs_nfssys+0x60 coerce_scall_args+0x130 syscall+0x580 DESCRIPTION: The panic occurs when the NFS client passes an invalid ACL specification to VOP_SETACL which has 4 default ACL entries without the base non-default entries (i.e. OBJ_USER, OBJ_GROUP, OBJ_OTHER and OBJ_CLASS). When the NFS server passes the same invalid ACL specification to VxFS, the validation check fails to detect the missing base ACL entries and results in a system panic in subsequent processing. RESOLUTION: To resolve this issue, code changes for ACL validation have been added to check for base/minimum number of non-default entries in each ACL specification. Patch ID: PHKL_44613 * 3910815 (Tracking ID: 3910526) SYMPTOM: In expanding a full file system through vxresize(1M), the operation fails with ENOSPC (errno 28) and the following error is printed : UX:vxfs fsadm: ERROR: V-3-20340: attempt to resize failed with errno 28 Despite the failure, the file system remains expanded even after vxresize has shrunk back the volume after getting the ENOSPC error. As a result, the file system is marked for full-fsck but a full fsck would fail with errors like the followings : UX:vxfs fsck: ERROR: V-3-26248: could not read from block offset devid/blknum .... Device containing meta data may be missing in vset or device too big to be read on a 32 bit system. UX:vxfs fsck: ERROR: V-3-20694: cannot initialize aggregate file system check failure, aborting ... DESCRIPTION: If there is no space available in the filesystem and the resize operation gets initiated, the intent log extents are used for metadata setup to continue the resize operation. If resize is successful, the superblock is updated with the expanded size. A new intent log will be allocated because the old one has been used for the resize. There is a chance that the new intent log will fail allocation with ENOSPC because the expanded size is not big enough to return the space allocated to the original intent log. The superblock update is already made at this stage and will not be rolled back even a failure is returned. RESOLUTION: The code has been modified to fail the resize operation if resize size is less than the size of the intent log. * 3916659 (Tracking ID: 3910248) SYMPTOM: ODM I/O operations hang due to a deadlock between two threads performing I/O to the same file. The threads involved in the IO operations may have stacks as below: Thread 1 : inline swtch_to_thread+0x220 () _swtch+0x52 () _mp_b_sema_sleep+0x320 () b_psema_c+0x460 () mutex_lock+0x70 () pfd_lock+0x50 () fault_in_pages+0x2e0 () bring_in_pages+0x330 () inline vaslockpages+0x220 () pas_pin_core+0x80 () pas_pin+0x340 () vx_memlock+0x70 () vx_dio_chain_start+0xf0 () vx_dio_iovec+0x380 () vx_dio_rdwri+0x280 () vx_dio_read+0x200 () vx_read1+0x580 () vx_rdwr+0x1060 () odm_vx_io_retry+0xa0 () odm_vx_iocleanup+0x220 () odm_io_sync+0xc40 () odm_io+0x830 () odm_io_stat+0x170 () odmioctl+0x190 () vno_ioctl+0x350 () ioctl+0x410 () syscall+0x5a0 () Thread 2 : slpq_swtch_core+0x520 () real_sleep+0x400 () sleep_spinunlock2+0x4f () sleep_spinunlock+0x61 () vxg_ilock_wait+0x100 () vxg_range_cmn_lock+0x410 () vxg_api_range_lock+0xe0 () vx_glm_range_lock+0x70 () vx_glmrange_rangelock+0x150 () inline vx_irwlock2+0x60 () vx_irwlock+0xa0 () vx_rdwr+0x12a0 () odm_vx_io_retry+0x1b0 () odm_vx_iocleanup+0x220 () odm_io_sync+0xc40 () odm_io+0x830 () odm_io_stat+0x170 () odmioctl+0x190 () vno_ioctl+0x350 () ioctl+0x410 () syscall+0x5a0 () DESCRIPTION: Deadlock is observed between ODM reader and ODM writer thread while performing IO on HOLE extent. ODM Reader thread holds IRWLOCK in SHARED mode and waits for locking the pages while ODM writer thread locks the pages and waits for IRWLOCK held by ODM Reader thread. RESOLUTION: The code is changed to have ODM providing a hint to VxFS to specify that it is a HOLE extent being read by ODM so VxFS does not require the pages to be locked. The fix consists of two parts, one from the VxFS patch and the other from the ODM patch. Patch ID: PHKL_44567 * 3860554 (Tracking ID: 3873078) SYMPTOM: Performance degradation has been observed due to small initial read ahead size. DESCRIPTION: Newer versions of VxFS has smaller initial read ahead size than VxFS 5.0.1 version where the initial read ahead size is dependent on read_perf_io size. If read request of the application falls in this large initial read ahead range and VxFS is upgraded to newer version then the performance regression may be observed for that particular application. RESOLUTION: A new kctune private tunable has been introduced to enable large initial read ahead based on the read request length. * 3877524 (Tracking ID: 3869091) SYMPTOM: Consider a file system having a block size smaller than the systems page size on HP-UX. The mmap write operation may receive SIGBUS due to ENOSPC errors when the amount of free space is less than the page size. DESCRIPTION: While writing to a memory-mapped file, the mmap write operation allocates file pages with a write fault. If the end of the file does not align with the page boundary, then the allocation will be rounded up to the page size on HP-UX. This results in VxFS treating the file as having a HOLE between the actual end of the file and the next page boundary. In subsequent writes to this page, even though the write happens within the actual file size, VxFS will attempt to allocate file system storage for the HOLE and may fail with ENOSPC if the file system happens to have free space less than a page. RESOLUTION: The code is modified accordingly. Allocation of the file pages is only done up to the file size, without rounding up to the page size. The SIGBUS issue is avoided because overwriting of writes will not require extra file system space allocation. But, this change also requires VxFS to invalidate the last file page in the file cache before subsequent allocation of writes or file truncation to ensure that the mmap has to read the last page from the disk again. * 3896397 (Tracking ID: 3896396) SYMPTOM: Performance degradation may be observed in case of fcache invalidation of read ahead pages. DESCRIPTION: if fcache is too small to hold up the cached portion of the file and cache invalidation throws out the data fetched in by read ahead, then read ahead pattern detection may break and performance degradation may be observed. RESOLUTION: Code has been modified to restart the read ahead detection in case the fcache invalidation happens and read ahead pattern breaks. * 3900972 (Tracking ID: 3900971) SYMPTOM: System call sendfile may hang while retrieving pages from VxFS file system with following stack trace: wait_for_lock() spinlock() slpq_wakeup() rwlock_unlock_select_new_owner() rwlock_unlock() fcache_page_alloc() vx_page_alloc() vx_do_getpage() vx_getpage1() vx_getpage() preg_vn_fault() fcache_as_fault() sofl_vec_read() sendfile() syscall() DESCRIPTION: Hang during Sendfile operation is caused due to range alignment to block size for read request in VxFS page retrieval. This range alignment may create conflicting filecache allocation request which results in the liveloop condition. RESOLUTION: Code is modified to detect live loop condition and retry the page allocation without modifying range for read request. Patch ID: PHKL_44512 * 3868662 (Tracking ID: 3868661) SYMPTOM: The vxfsstat(1M) command displays negative values which are incorrect. DESCRIPTION: All statistics in vxfsstat(1M) command have unsigned values. But due to incorrect calculation some statistics displayed using vxfsstat(1M) may have wrong negative values. RESOLUTION: The code is modified to calculate the file system statistics correctly. * 3873853 (Tracking ID: 3811849) SYMPTOM: On cluster file system (CFS), due to a size mismatch in the cluster-wide buffers containing hash bucket for large directory hashing (LDH), the system panics with the following stack trace: vx_populate_bpdata() vx_getblk_clust() vx_getblk() vx_exh_getblk() vx_exh_get_bucket() vx_exh_lookup() vx_dexh_lookup() vx_dirscan() vx_dirlook() vx_pd_lookup() vx_lookup_pd() vx_lookup() On some platforms, instead of panic, LDH corruption is reported. Full fsck reports some meta-data inconsistencies as displayed in the following sample messages: fileset 999 primary-ilist inode 263 has invalid alternate directory index (fileset 999 attribute-ilist inode 8193), clear index? (ynq)y DESCRIPTION: On a highly fragmented file system with a file system block size of 1K, 2K or 4K, the bucket(s) of an LDH inode, which has a fixed size of 8K, can spread across multiple small extents. Currently in-core allocation for bucket of LDH inode happens in parallel to on-disk allocation, which results in small in-core buffer allocations. Combination of these small in-core allocations will be merged for final in memory representation of LDH inodes bucket. On two Cluster File System (CFS) nodes, this may result in same LDH metadata/bucket represented as in-core buffers of different sizes. This may result in system panic as LDH inodes bucket are passed around the cluster, or this may result in on-disk corruption of LDH inode's buckets, if these buffers are flushed to disk. RESOLUTION: The code is modified to separate the on-disk allocation and in-core buffer initialization in LDH code paths, so that in-core LDH bucket will always be represented by a single 8K buffer. * 3875839 (Tracking ID: 3875837) SYMPTOM: Memory mapped read performance may get impacted due to code fix for sendfile performance improvement. DESCRIPTION: When VxFS read ahead is initiated through getpage VOP (Vnode Operation) for sendfile() operation, there is a possibility wherein larger pages are read and returned by the operating system. As a result, auditing in read ahead code path does not match up with the next expected fault offset and that breaks read ahead pattern. RESOLUTION: Earlier fix has been modified to use the length of actual pages read rather than the requested length in calculation of expected read fault offset for next read ahead. This occurs in scenarios where read pages are larger than requested pages in getpage VOP. * 3876990 (Tracking ID: 3873624) SYMPTOM: Memory leaks occur in FC_BUF_DEFAULT_ arena, showing buffers corresponding to mmap I/Os that have returned errors. DESCRIPTION: In case of memory mapped I/O, currently synchronous buffer having error flag set will not be freed due to glitch in the code, leading to memory leak in FC_BUF_DEFAULT_arena. RESOLUTION: The code is modified to free all synchronous buffers, irrespective of error. * 3878643 (Tracking ID: 3878641) SYMPTOM: In case of cluster file system (CFS), if a file system is disabled during file/directory creation, the thread that creates file/directory may hang with a typical stack trace: vx_int_create() vx_do_create() vx_create1() vx_create0() vx_create() or vx_do_mkdir() vx_mkdir1() vx_mkdir() DESCRIPTION: In case of cluster file system (CFS), if a file system is disabled (with system log message as vx_disable: file system disabled) during file/directory creation, the creation thread gets stuck in a live-loop. This is because the disabled condition of file system is not verified before the file/directory creation. RESOLUTION: The code is modified to verify if a file system is disabled during the inode creation on CFS and therefore it displays an appropriate error. * 3879382 (Tracking ID: 3879381) SYMPTOM: The dynamic minimum value of vxfs_bc_bufhwm on VxFS 5.1SP1 is set to a much larger value than previous releases when there are more than 16 CPUs on the system. The larger value is not optimal for HPVM use case which typically wants to minimize the memory usage on the VSP. DESCRIPTION: There needs to be a way to allow HPVM to specify an optimal vxfs_bc_bufhwm value as in previous VxFS releases. RESOLUTION: The code is modified to allow vxfs_bc_bufhwm to be explicitly tuned to a value that is greater than the dynamic minimum based on the auto-tuning in previous releases, even though it may be smaller than the dynamic minimum based on the 5.1SP1 calculation. * 3879793 (Tracking ID: 3879792) SYMPTOM: When a large number of files or directories are deleted, FSQ spinlock contention may be observed. DESCRIPTION: Because of the massive file removal, large number of removed inodes are stored up in the delicache list as delicache is enabled by default. If these inodes are not re-used, they may be moved to the inactive list altogether after a certain time for inactive processing. The inactive processing of all these removed inodes belonging to the same file system will trigger high contention on the file system queue spinlock. RESOLUTION: The code is modified such that FSQ lock will be used only to protect required part of movement operation from delicache list to an inactive list which will in turn reduces FSQ lock contention. * 3879796 (Tracking ID: 3879795) SYMPTOM: On a system with a large VxFS inode cache, deleting a large number of files or directories from a single file system may lead to heavy contention on a per-file-system spinlock. High disk service time and/or high vxfsd CPU utilization may be observed, typically around 12 minutes after file deletion, as a result. DESCRIPTION: Because of the massive file removal, large number of removed inodes are stored up in the delicache list as delicache is enabled by default. If these inodes are not re-used, they may be moved to the inactive list altogether after a certain time for inactive processing. The inactive processing of all these removed inodes belonging to the same file system will trigger high contention on the file system queue spinlock. RESOLUTION: Two new VxFS tunables have been introduced for fine tuning on delicache sizing and timing as a remedy, which essentially limit the amount of inodes stored in the delicache list and shorten the duration of an inodes stay in the delicache list. * 3879805 (Tracking ID: 3879804) SYMPTOM: A thread creating a directory on a full CFS file system will retry the search for a free inode in the preferred allocation unit (AU) because other thread(s) working on the same AU have kept the AUs delegation map busy. On a system with only a single CPU running, the delay can persist indefinitely because the other thread keeping the AUs delegation map busy never has a chance to run as the CPU keeps on retrying the search considering only the preferred AU, creating a live-lock scenario with stack traces as shown below. In such a situation, the cluster daemon process cmcld can experience long scheduling delays which may consequently lead to a Serviceguard TOC. The live-lock scenario or the SG TOC, however, is not expected to happen on a CFS cluster node which fulfils the minimum requirement of having at least 2 CPUs. On Cluster File System with a single CPU or a core machine as a node in the cluster, directory or file creation threads may hang with the following stack traces: vx_dircreate_tran() vx_pd_create() vx_create1_pd() vx_do_create() vx_create1() vx_create0() vx_create() and vx_mdelelock_try() vx_mdele_tryhold() vx_cfs_inofindau() vx_findino() vx_ialloc() vx_dirmakeinode() vx_dircreate() vx_dircreate_tran() vx_pd_create() vx_create1_pd() vx_do_create() vx_create1() vx_create0() vx_create() DESCRIPTION: A thread is currently holding lock on Inode Allocation Unit (IAU) which has free inodes. When the other thread tries to allocate an inode on IAU, it fails because it cannot get access on the same IAU. In case of a system running with a single core, the two threads can interlock indefinitely on the CFS secondary instead of requesting new IAU from the CFS primary. RESOLUTION: The code is modified to yield CPU on single CPU or core system, only if, there are free inodes available in Inode Allocation Unit. * 3879887 (Tracking ID: 3805124) SYMPTOM: Using kctune(1M) to change vxfs_bc_bufhwm(5) gives the following error even though the new value is larger than the current value: ERROR: mesg 110: V-2-110: The specified value for vx_bc_bufhwm is less than the recommended minimum value of DESCRIPTION: At VxFS initialization, the VxFS buffer cache high water mark is by default auto-tuned to a dynamic minimum value based on the system configuration. Subsequent attempts to set vxfs_bc_bufhwm(5) via kctune(1M) will not be allowed if the value is smaller than the dynamic minimum. It was possible, however, to bypass such validation by hardcoding the vxfs_bc_bufhwm value in /stand/system. The error is returned when the new value specified in the kctune command is smaller than the dynamic minimum even though it may be larger than the current value as hardcoded in /stand/system. RESOLUTION: The code is modified to re-instate the dynamic minimum value as the effective value of vxfs_bc_bufhwm if the value specified in /stand/system is smaller. Following warning message will be displayed to indicate that the effective value has been changed and to notify users that the tunable value displayed by kctune may not correspond to the effective value: WARNING: mesg 139: V-2-139: Setting vxfs_bc_bufhwm to recommended minimum value of , since the specified value was less than * 3880073 (Tracking ID: 3877000) SYMPTOM: In an unusual scenario, where the number of Inode Allocation Units (IAU) on the root VxFS file system for HPUX system reaches a value greater than 256 (number of inodes exceeding 32 millions), the system boot hangs with the following stack trace. vx_event_wait() vx_olt_iauinit() vx_olt_iasinit() vx_loadv23_fsetinit() vx_loadv23_fset() vx_fset_reget() vx_fs_mntdup() vx_fs_reinit() vx_doremount() vx_fset_localremnt() vx_remount() vx_mountroot() vx_evfsop_root_remount() vfs_extended_vfs_op() Im_preinitrc() DoCallist() DESCRIPTION: If the number of IAUs are greater than 256, then, as an optimization the Veritas File System (VxFS) creates multiple work items to load IAUs in parallel. These work items are serviced by the worker threads. VxFS mount waits for these work items to be completed. At the time of mounting root VxFS file system during boot process, there are no worker threads available. Hence, the VxFS mount process hangs. RESOLUTION: The code is modified such that the mount thread itself will help process the work items created by it. * 3889125 (Tracking ID: 3867995) SYMPTOM: VxFS file system appears hang, while vxfsd threads are waiting to flush log buffers with the following stack traces: vx_sleep_lock() vx_logbuf_clean() vx_map_delayflush() vx_tflush_map() vx_fsq_flush() vx_tranflush_threaded() vx_workitem_process() vx_worklist_process() Or vx_sleep_lock() vx_logbuf_clean() vx_logflush() vx_async_iupdat() vx_iupdat_local() vx_iupdat() vx_iflush_list() vx_iflush() vx_workitem_process() vx_worklist_process() DESCRIPTION: If an extent free operation is executed instantaneously before the completion of the previous extent free transaction, then the extent map is updated instantly and the freed extent can be picked up for some other inode. But, since the previous extent free transaction is incomplete and has not hit the disk due to a particular sequence of events, VxFS transaction log encounters an inconsistent state. Thus, transaction log buffer flush may hang. RESOLUTION: The code is modified to treat every extent free operation as delayed extent free implicitly. This means extent free operation will no longer be executed instantaneously and the particular hang should be avoided. Patch ID: PHKL_44439 * 3829948 (Tracking ID: 1482790) SYMPTOM: System may panic when the mknod() operation is performed on the file system, when VxFS data management API (DMAPI) is used. The following stack trace is observed: vx_hsm_createandmkdir() vx_create() vns_create() vn_create() mknod() mknod() syscall() DESCRIPTION: DMAPI feature is always enabled in VxFS for hierarchical storage management. As part of VxFS DMAPI code, snode for the device are handled as VxFS inodes for the mknod () operation. This results in the inappropriate memory access and panic. RESOLUTION: The code is modified such that the inode type is checked and the further processing is done only if it is a VxFS inode. * 3832381 (Tracking ID: 3767366) SYMPTOM: Using the sendfile(2) function to transfer a big file, takes long, as compared with the same transfer via the send(2) function. The sequential file access via the sendfile(2) function, generates a lot of synchronous I/Os, suggesting that VxFS read-ahead is not fully utilized. DESCRIPTION: When the page size is greater than 4 KB, VOP_GETPAGE may return more data than requested, breaking the read-ahead detection. RESOLUTION: The code is modified to update the read-ahead parameters, considering the possibility that more data can be returned via VOP_GETPAGE, so as to continue doing the read-ahead for the sequential read. * 3832587 (Tracking ID: 3832584) SYMPTOM: Files and directories on the VxFS file system are not able to inherit properly the default USER_OBJ, CLASS_OBJ and OTHER_OBJ ACL entries. DESCRIPTION: The condition set to calculate the inheritance-permission mask of the parent directory is incorrect. This subsequently results in incorrect inherited-permission for files and directories created under the parent directory. RESOLUTION: The code is modified to correct the condition that calculates the inheritance- permission mask of the parent directory. * 3859665 (Tracking ID: 2767579) SYMPTOM: The system may hang during DNLC-lookup operation on VxFS file system with the following stack trace: as_ubcopy() vx_dnlc_pathname_realloc() vx_dnlc_getpathname() pstat_pathname_fillin() pstat_pathname() pstat() syscall() DESCRIPTION: The system hangs because of an infinite loop, that gets triggered when an inode with negative DNLC entry is encountered during a reverse name DNLC lookup. RESOLUTION: The code is modified to detect the negative DNLC entry, and fail the lookup instead of looping it infinitely. * 3864335 (Tracking ID: 3864333) SYMPTOM: The earlier fix to incident 3226462, whereby a CFS node may panic in the vx_dnlc_recent_cookie() function, when another CFS node has a higher number of CPUs, still relies on VX_MAX_CPU being static at runtime. DESCRIPTION: The earlier fix sets the size of the counters[] array to VX_MAX_CPU, which performs a runtime check on the number of CPUs. A more robust fix is to size the array based on MAX_PROCS instead, which is guaranteed to be static at runtime. RESOLUTION: The code is enhanced to use MAX_PROCS, in allocating an internal array to better safe-guard against out-of-bound array access. Patch ID: PHKL_44293 * 3796751 (Tracking ID: 3784126) SYMPTOM: The application experienced delays showing high vfault counters, as process text pages in memory are invalidated and need to be reloaded. For a Serviceguard cluster, the heartbeat communication is slowed down considerably because cmclds TEXT pages need to be paged in again. This results in a SG INIT failure. DESCRIPTION: When a freeze operation is handled, VxFS flushes and invalidates dirty pages of a file system. Due to a bug in the code, even read-only mmap pages, which are typically process TEXT pages, also get invalidated unnecessarily. Consider a freeze on /usr or some other file systems that host the program executables, such page invalidation can cause delays to applications as TEXT pages need to be faulted in again. RESOLUTION: The code is modified to skip the read only mmap pages invalidation during the file system freeze. * 3800361 (Tracking ID: 3602322) SYMPTOM: System may panic while flushing the dirty pages of the inode. The following stack traces are observed: vx_iflush_list() vx_workitem_process() vx_worklist_process() vx_worklist_thread() and vx_vn_cache_deinit() vx_inode_deinit vx_ilist_chunkclean() vx_inode_free_list() vx_ifree_scan_list() vx_workitem_process() vx_worklist_process() vx_worklist_thread() DESCRIPTION: Panic may occur due to the synchronization problem between one thread that flushes the inode, and the other thread that frees the chunks that contain the inodes on the freelist. The thread that frees the chunks of inodes on the freelist grabs an inode, and clears/dereference the inode pointer while deinitializing the inode. This may result in the pointer dereference, if the flusher thread is working on the same inode. RESOLUTION: The code is modified to resolve the race condition by taking proper locks on the inode and freelist, whenever a pointer in the inode is dereferenced. If the inode pointer is already de-initialized to NULL, then the flushing is attempted on the next inode. * 3803799 (Tracking ID: 3751205) SYMPTOM: During force unmount of VxFS file system, the system may panic if parallel directory entry retrieval is in progress. The following stack trace is observed: vx_readdir3() getdents() syscall() DESCRIPTION: During the force unmount operation, all metadata structures for fileset are destroyed. Thereby, any access to the file set structure can cause panic due to deference in the NULL pointer. RESOLUTION: The code is modified to return EIO when accessing the file set structure which is forcefully unmounted. * 3803825 (Tracking ID: 3331093) SYMPTOM: The MountAgent process gets stuck, when repeated switchover is performed, due to the current VxFS-AMF notification/unregistration design. The following stack trace is observed: vx_delay2() vx_unreg_callback_funcs_impl() disable_vxfs_api () text () amf_event_release() amf_fs_event_lookup_notify_multi() amf_vxfs_mount_opt_change_callback() vx_aioctl_unsetmntlock() vx_aioctl_common() vx_aioctl() vx_admin_ioctl() vxportal_ioctl() DESCRIPTION: This issue is related to the VxFS-AMF interface. VxFS provides notifications to AMF for certain events, for example, when VxFS is disabled or the mount options change. When VxFS calls AMF, AMF event handling mechanism can trigger an unregistration of VxFS, in the same context since VxFS's notification triggered the last event notification registered with AMF. Before VxFS calls into AMF, a vx_fsamf_busy variable is set to 1, and it is reset when the callback returns. The unregistration loops if it finds that the vx_fsamf_busy variable is set to 1. Since, unregistration is called from the same context of the notification call back, the vx_fsamf_busy variable is never set to 0, and the loop goes on endlessly causing the command that triggered the notification to hang. RESOLUTION: The code is modified so that a delayed unregistration mechanism is employed. The fix addresses the issue of getting the unregistration from AMF in the context of callback from VxFS to AMF. In such scenario, the unregistration is marked for a later time. When all the notifications return, and if a delayed unregistration is marked, the unregistration routine is explicitly called. * 3803849 (Tracking ID: 3807129) SYMPTOM: When a file system that is above 2 TB in size is re-sized, a panic may occur with the following stack trace if the file system is almost full: vx_multi_bufinval() vx_alloc.c() vx_dunemap() vx_demap() vx_trancommit2() vx_trancommit () vx_trunc_tran2() vx_trunc_tran() vx_trunc() vx_inactive_remove() vx_inactive_tran() vx_local_inactive_list() vx_inactive_list () vx_worklist_process() vx_worklist_thread() kthread_daemon_startup() DESCRIPTION: When resizing a file system that is almost full, VxFS may have to temporarily steal the last 32 blocks from the intent log to grow the extent allocation maps. The intent log files organization can be switched from IORG_EXT4 to IORG_TYPED during the process. But the truncation code still assumes IORG_EXT4, causing the truncated blocks to still appear in the intent logs extent map. Subsequently, the truncated blocks will be allocated for growing the extent map, resulting in corruption as the same blocks will appear allocated to both the intent log and another file structure. The panic occurs when such corruption is detected. RESOLUTION: The code is modified to switch the intent log file to IORG_TYPED at the start of the resize operation, to ensure that the last 32 blocks should be truncated properly. Patch ID: PHKL_44268 * 3751305 (Tracking ID: 2439108) SYMPTOM: Due to the page alignment issues in the VxFS code, the system panics when the read_preferred_io tunable is set to a non-page aligned size. The following stack trace is observed: fcache_buf_create() vx_fcache_buf_create() vx_io_setup() vx_io_ext() vx_alloc_getpage() vx_do_getpage() vx_getpage1() vx_getpage() preg_vn_fault() fcache_as_fault() vx_fcache_as_fault() vx_do_read_ahead() vx_read_ahead() vx_fcache_read() vx_read1() vx_rdwr() DESCRIPTION: VxFS ends up consuming an extra page, when the preferred read I/O size is not a multiple of the page size, and runs out of the allocated pages before the getpage() function call could finish. This results in the panic. RESOLUTION: The code is modified to use the read_preferred_io tunable size only after rounding it by the page size. * 3754049 (Tracking ID: 3718924) SYMPTOM: On CFS, the file system I/O hangs for a few seconds and all the file pages are invalidated during the hang. This could occur as a routine It may cause delay in running the application processes, if the text pages of these processes are invalidated. DESCRIPTION: When VxFS intent log-ID is overflowed, a log-ID reset is required. The reset triggers a file system freeze which subsequently invalidates all the pages unnecessarily. RESOLUTION: The code is modified to avoid the unnecessary page invalidation. * 3769384 (Tracking ID: 3673599) SYMPTOM: For VxFS files inheriting some ACL entry from the parent directory having a default ACL entry, the initial class permission is not set correctly to align with the file mode creation mask and the umask setting at file creation time. DESCRIPTION: When a file is created, the ACL inheritance needs to take place before applying the file mode creation mask and the umask setting, so that the latter is honored . RESOLUTION: The code is modified to honour the file mode creation mask and the umask setting when creating files with ACL inheritance. Patch ID: PHKL_44215 * 3537201 (Tracking ID: 3469644) SYMPTOM: The system panics in the vx_logbuf_clean() function, when it traverses the chain of transactions off the intent-log buffer. The stack trace is as follows: vx_logbuf_clean () vx_logadd () vx_log() vx_trancommit() vx_exh_hashinit () vx_dexh_create () vx_dexh_init () vx_pd_rename () vx_rename1_pd() vx_do_rename () vx_rename1 () vx_rename () vx_rename_skey () DESCRIPTION: The system panics in the vx_logbug_clean() function, when it tries to access an already freed transaction from the transaction chain to flush-it-to log. RESOLUTION: The code is modified to ensure that the transaction gets flushed to the log before it is freed. * 3597563 (Tracking ID: 3597482) SYMPTOM: The pwrite(2) function fails with EOPNOTSUPP error when the write range is in two indirect extents. DESCRIPTION: The ZFOD extent that belongs to DB2 pre-allocated files and the other DATA extent belongs to the adjacent INDIR, fail with the EOPNOTSUPP error. Because the range of the pwrite() function falls between two indirect extents. VxFS tries to coalesce the extents which belong to different indirect-address extents as a part of the transaction. This kind of meta-data change consumes a lot of transaction resources. However, the VxFS transaction engine is unable to support the current implementation and fails with an error message. RESOLUTION: The code is modified to retry the write transaction without combining the extents. * 3615527 (Tracking ID: 3604750) SYMPTOM: The kernel loops during the extent re-org with the following stack trace: vx_bmap_enter() vx_reorg_enter_zfod() vx_reorg_emap() vx_extmap_reorg() vx_reorg() vx_aioctl_full() $cold_vx_aioctl_common() vx_aioctl() vx_ioctl() vno_ioctl() ioctl() syscall() DESCRIPTION: The extent re-org minimizes the file system fragmentation. When the re-org request is issued for an inode with a lot of ZFOD extents, it reallocates the extents of the original inode to the re-org inode. During this, the ZFOD extent are preserved and enter the re-org inode in a transaction. If the extent allocated is big, the transaction that enters the ZFOD extents becomes big and returns an error. Even when the transaction is retried the same issue occurs. As a result, the kernel loops during the extent re-org. RESOLUTION: The code is modified to enter the Bmap (block map) of the allocated extent and then perform the ZFOD processing. If you get a committable error during the ZFOD enter, then commit the transaction and continue with the ZFOD enter. * 3615530 (Tracking ID: 3466020) SYMPTOM: File system is corrupted with the following error message in the log: WARNING: msgcnt 28 mesg 008: V-2-8: vx_direrr: vx_dexh_keycheck_1 - /TraceFile file system dir inode 3277090 dev/block 0/0 diren WARNING: msgcnt 27 mesg 008: V-2-8: vx_direrr: vx_dexh_keycheck_1 - /TraceFile file system dir inode 3277090 dev/block 0/0 diren WARNING: msgcnt 26 mesg 008: V-2-8: vx_direrr: vx_dexh_keycheck_1 - /TraceFile file system dir inode 3277090 dev/block 0/0 diren WARNING: msgcnt 25 mesg 096: V-2-96: vx_setfsflags - /dev/vx/dsk/a2fdc_cfs01/trace_lv01 file system fullfsck flag set - vx_direr WARNING: msgcnt 24 mesg 008: V-2-8: vx_direrr: vx_dexh_keycheck_1 - /TraceFile file system dir inode 3277090 dev/block 0/0 diren DESCRIPTION: In case an error is returned from the vx_dirbread() function via the vx_dexh_keycheck1() function, the FULLFSCK flag is set on the file system unconditionally. A corrupted Large Directory Hash (LDH) can lead to the incorrect block being read, this results in the FULLFSCK flag being set. The system does not verify whether it reads the incorrect value due to a corrupted LDH. Subsequently, the FULLFSCK flag is set unnecessarily, because a corrupted LDH is fixed online by recreating the hash. RESOLUTION: The code is modified such that when a LDH corruption is detected, the system removes the LDH, instead of setting FULLFSCK. The LDH is recreated the next time the directory is modified. * 3615532 (Tracking ID: 3484336) SYMPTOM: The fidtovp() system call panics in the vx_itryhold_locked() function with the following stack trace: vx_itryhold_locked() vx_iget() vx_common_vget() vx_do_vget() vx_vget_skey() vfs_vget() fidtovp() kernel_add_gate_cstack() nfs3_fhtovp() rfs3_getattr() rfs_dispatch() svc_getreq() threadentry() [kdb_read_mem] () DESCRIPTION: Some VxFS operations like the vx_vget() function try to get a hold on an in- core inode using the vx_itryhold_locked() function, but it does not take the lock on the corresponding directory inode. This may lead to a race condition, when this inode is present on the delicache list and is inactivated. Thereby, this results in a panic when the vx_itryhold_locked() function tries to remove it from the free list. RESOLUTION: The code is modified to take the inode list lock inside the vx_inactive_tran(), vx_tranimdone() and vx_tranuninode() functions. This subsequently prevents the race condition. * 3660347 (Tracking ID: 3660342) SYMPTOM: VxFS 5.1SP1 package does not set the MANPATH environment variable correctly. DESCRIPTION: The VxFS 5.1SP1 install package does not set the man-page-search path for VxFS 5.1SP1 correctly in /etc/MANPATH. It inserts the search path for VxFS5.1SP1 after /usr/share/man/. However, there are old versions of VxFS man pages in /usr/share/man/man1m.Z. As a result, the older version of the man page is displayed first and VxFS 5.1SP1 pages are not displayed. RESOLUTION: The post-install script is modified to place the search path for VxFS 5.1SP1 man page before /usr/share/man. * 3669985 (Tracking ID: 3669983) SYMPTOM: When the file system with DLV 4 or 5 is in use, and the quota option is enabled, the system panics. The following stack trace is observed: memset() vx_qrec2dq64() vx_qflush() vx_qsync() vx_local_doquota() vx_vfsquota() quotactl() syscall() DESCRIPTION: The 64-bit quotas feature was introduced in the release 5.1 SP1RP2. This feature increased the maximum soft/hard quota limits for users/groups. But the file system with DLV<=5 cannot use this feature, because the DLV 5 quota structure itself contains 32-bit elements. When the file system with DLV 5 is mounted with the quota option, accessing the on-disk 32-bit structures, using the 64-bit-quota structures in the kernel, results in the panic. RESOLUTION: The code is modified to disable the file system quota for DLV less than 6. * 3669994 (Tracking ID: 3669989) SYMPTOM: On Cluster File System (CFS) and on high end machines, the spinlock contention may be observed when new files are created in parallel on nearly full file system. DESCRIPTION: When free Inode Allocation Unit (IAU) is searched for, to allocate the inode in the list of IAUs, the spinlock is taken to serialize various threads. Because the file system is nearly full, large number of iterations are required to find the free IAU. RESOLUTION: The code is modified to optimize the free IAU search using a hint index. The hintindex is modified when a free IAU is found. The subsequent search could use hint index, to directly jump to the free IAU pointed by the hint index. * 3682335 (Tracking ID: 3637636) SYMPTOM: Cluster File System (CFS) node initialization and protocol upgrade may hang during rolling upgrade with the following stack trace: vx_svar_sleep_unlock() vx_event_wait() vx_async_waitmsg() vx_msg_broadcast() vx_msg_send_join_version() vx_msg_send_join() vx_msg_gab_register() vx_cfs_init() vx_cfs_reg_fsckd() vx_cfsaioctl() vxportalunlockedkioctl() vxportalunlockedioctl() And vx_delay() vx_recv_protocol_upgrade_intent_msg() vx_recv_protocol_upgrade() vx_ctl_process_thread() vx_kthread_init() DESCRIPTION: CFS node initialization waits for the protocol upgrade to complete. Protocol upgrade waits for the flag related to the CFS initialization to be cleared. As the result, the deadlock occurs. RESOLUTION: The code is modified so that the protocol upgrade process does not wait to clear the CFS initialization flag. * 3706705 (Tracking ID: 3703176) SYMPTOM: The tunable to enable or disable VxFS inactive-thread throttling, and VxFS inactive-thread process throttling was not available . DESCRIPTION: The tunable to enable or disable VxFS inactive-thread throttling, and VxFS inactive-thread process throttling was not available through the kctune(1M) interface. RESOLUTION: The code is modified so that the tunable to enable or disable VxFS inactive- thread throttling, and VxFS inactive-thread process throttling is available through the kctune(1M) interface with the relevant man page info. * 3755915 (Tracking ID: 3755927) SYMPTOM: Space allocation to a file during write may hang on the file system if the remaining intent log space is low. The following stack trace is observed: vx_te_bmap_split() vx_pre_bmapsplit() vx_dopreamble() vx_write_alloc3() vx_tran_write_alloc() vx_write_alloc2() vx_external_alloc() vx_write_alloc() vx_write1() vx_rdwr() DESCRIPTION: During allocation of write to the new file, the transaction for allocation fails to commit due to low intent log space on the file system. This results in execution of the preamble routine for the block map split which tries to make 21 typed extent entries in the inode immediate area. As the inode immediate area is not able to hold the 21 typed extent entries, the preamble routine may end up into a live loop. RESOLUTION: The code is modified to add the exit condition for the live loop in the preamble routine for the block map split. Patch ID: PHKL_44140 * 3526848 (Tracking ID: 3526845) SYMPTOM: The Data Translation Lookaside Buffer (DTLB) panic may occur when the directory entries are read. The following stack trace is observed: bcmp() vx_real_readdir3() vx_readdir3() getdents() syscall() DESCRIPTION: When the directory entry is read, the directory name is checked using the bcmp () function against the VX_PDMAGIC identifier string. This is used to determine if the directory is the partitioned directory or not. The thread panics in the vx_real_readdir3() function, because the length of the directory name is less than the length of the VX_PDMAGIC identifier string. As a result, the bcmp() function accesses the unallocated area. RESOLUTION: The code is modified to check if the length of the directory-entry name is greater thanthe VX_PDMAGIC string, before the bcmp() function is called. * 3527418 (Tracking ID: 3520349) SYMPTOM: When there is a huge number of dirty pages in the memory, and a sparse write is performed at a large offset of 4 TB or above, on an existing file that is not null, the file system hangs in the thread. The following stack trace is observed: fcache_buf_iowait() vx_fcache_buf_iowait() vx_io_wait() vx_alloc_getpage() vx_do_getpage() vx_getpage1() vx_getpage() preg_vn_fault() fcache_as_uiomove_rd() fcache_as_uiomove() vx_fcache_as_uiomove() vx_fcache_read() vx_read1() vx_rdwr() vn_rdwr() DESCRIPTION: When a sparse write is performed at an offset of 4TB or above, on a file that has ext4 extent orgtype with some blocks that are already allocated, this can result in a file system hang. This is caused due to a type casting bug in the offset calculation in the vxfs extent allocation code path. A sparse write should create a 'HOLE' between the last allocated offset and the current offset on which the write is requested. Due to the type-casting bug, VxFS may allocate the space between the last offset and the new offset, instead of creating a 'HOLE' in certain scenarios. This generates a huge number of dirty pages, and fills up the file system space incorrectly. The memory pressure due to the huge number of dirty pages causes the hang. The sparse write offset on which the problem occurs depends on the file system block size. For a file system with block size 1 KB, the problem can occur at the sparse write offset of 4TB. RESOLUTION: The code is modified so that the VxFS extent allocation code calculates the offset correctly, and does not allocate space for a sparse write. This resolves the type casting bug. * 3537431 (Tracking ID: 3537414) SYMPTOM: The "mount -v" command does not display the "nomtime" mount option when the file system is mounted with the "nomtime" mount option. DESCRIPTION: When the current option string of a file system is created, the "nomtime" mount option is not appended when the file system is mounted with the "nomtime" mount option. As a result, the "nomtime" mount option is not displayed. RESOLUTION: The code is modified to append the "nomtime" mount option to the current option string, when the file system is mounted with the "nomtime" mount option. * 3560610 (Tracking ID: 3560187) SYMPTOM: The kernel may panic when the buffer is freed in the vx_dexh_preadd_space() function with the message "Data Key Miss Fault in kernel mode". The following stack trace is observed: kmem_arena_free() vx_free() vx_dexh_preadd_space() vx_dopreamble() vx_dircreate_tran() vx_do_create() vx_create1() vx_create0() vx_create() vn_open() DESCRIPTION: The buffers in the extended-hash structure are allocated, zeroed outside, and freed outside the transaction retry loop. For some error scenarios, the transaction is re-executed from the beginning. Since the buffers are zeroed outside of the transaction retry loop, during the transaction retry the extended-hash structure may have some stale buffers from the last try. As a result, some stale parts of the structure are freed incorrectly.This results in panic. RESOLUTION: The code is modified to zero-out the extended-hash structure within the retry loop, so that the stale values are not used during retry. * 3561998 (Tracking ID: 2387439) SYMPTOM: An internal debug assert is hit when the conformance test is run for the partitioned-directory feature. DESCRIPTION: The debug assert is hit because the "directory reading" operation is called without holdingthe read-write lock on the partition hash directory. RESOLUTION: The code is modified to take the read-write lock on the partition hash directory.The read operation is performed on the directory. Patch ID: PHKL_43916 * 2937310 (Tracking ID: 2696657) SYMPTOM: Internal noise test on the cluster file system hits a debug assert related to file size in inode. DESCRIPTION: In VxFS, the inode wsize(size after write) field is not maintained in synchronous with The inode nsize(new size) field in case of an error. This triggers the debug assert. RESOLUTION: The code is modified to bring back the inode wsize in synchronous with nsize. * 3261849 (Tracking ID: 3253210) SYMPTOM: When the file system reaches the space limitation, it hangs with the following stack trace: vx_svar_sleep_unlock() default_wake_function() wake_up() vx_event_wait() vx_extentalloc_handoff() vx_te_bmap_alloc() vx_bmap_alloc_typed() vx_bmap_alloc() vx_bmap() vx_exh_allocblk() vx_exh_splitbucket() vx_exh_split() vx_dopreamble() vx_rename_tran() vx_pd_rename() DESCRIPTION: When large directory hash is enabled through the vx_dexh_sz (5M) tunable, Veritas File System (VxFS) uses the large directory hash for directories. When you rename a file, a new directory entry is inserted to the hash table, which results in hash split. The hash split fails the current transaction. The transaction is retried after completing some housekeeping jobs. These jobs include allocating more space for the hash table. However, VxFS does not check the return value of the preamble job. As a result, when VxFS runs out of space, the rename transaction is re-entered permanently without knowing if more space is allocated by preamble jobs. RESOLUTION: The code is modified to enable VxFS to exit looping when the ENOSPC error is returned from the preamble job. * 3396530 (Tracking ID: 3252983) SYMPTOM: On a high-end system greater than or equal to 48 CPUs, some file-system operations may hang with the following stack trace: vx_ilock() vx_tflush_inode() vx_fsq_flush() vx_tranflush() vx_traninit() vx_tran_iupdat() vx_idelxwri_done() vx_idelxwri_flush() vx_delxwri_flush() vx_workitem_process() vx_worklist_process() vx_worklist_thread() DESCRIPTION: The function to get an inode returns an incorrect error value if there are no free inodes available in incore, this error value allocates an inode on-disk instead of allocating it to the incore. As a result, the same function is called again resulting in a continuous loop. RESOLUTION: The code is modified to return the correct error code. * 3410567 (Tracking ID: 3410532) SYMPTOM: The VxFS file system may hang, if it is mounted with the "tranflush" mount option. The following stack-trace is observed: swtch_to_thread() slpq_swtch_core() real_sleep() sleep_one() vx_rwsleep_lock() vx_ilock() vx_tflush_inode() vx_fsq_flush() vx_tranflush() $cold_vx_tranidflush() vx_exh_hashinit() vx_dexh_create() vx_dexh_init() vx_pd_mkdir() vx_mkdir1_pd() vx_do_mkdir() vx_mkdir1() vx_mkdir() vns_create() vn_create() mkdir() syscall() DESCRIPTION: If the VxFS file system is mounted with the "tranflush" mount option, it may cause the thread to be holding the ILOCK and waiting for the same. This can lead to a self-deadlock situation which causes the file system to hang. RESOLUTION: The code is modified to avoid the self-deadlock situation. * 3435207 (Tracking ID: 3433777) SYMPTOM: A single CPU machine panics due to the safety-timer check when the inodes are re-tuned. The following stack trace is observed: spinunlock() vx_ilist_chunkclean() vx_inode_free_list() vx_retune_ninode() vx_do_inode_kmcache_callback() vx_worklist_thread () kthread_daemon_startup ( ) DESCRIPTION: When the inode cache list is traversed, the vxfsd daemon schedules a "vx_do_inode_kmcache_callback" which does not free the CPU between the iterations. Thereby, the other threads cannot get access to the CPU. This results in panic. RESOLUTION: The code is modified to use the sched_yield() function for every iteration in "vx_inode_free_list" to free the CPU, so that the other threads get a chance to be scheduled. * 3471150 (Tracking ID: 3150368) SYMPTOM: A periodic sync operation on an Encrypted Volume and File System (EVFS) configuration may cause the system to panic with the following stack trace: evfsevol_strategy() io_invoke_devsw() vx_writesuper() vx_fsetupdate() vx_sync1() vx_sync0() $cold_vx_do_fsext() vx_workitem_process() vx_worklist_process() vx_walk_fslist_threaded() vx_walk_fslist() vx_sync_thread() vx_worklist_thread() kthread_daemon_startup() DESCRIPTION: In the EVFS environment, EVFS may get the STALE or garbage value of b_filevp, which is not initialized by Veritas File System (VxFS) causing the system to panic. RESOLUTION: The code is modified to initialize the b_filevp. * 3471152 (Tracking ID: 3153919) SYMPTOM: The fsadm(1M) command may hang when the structural file set re-organization is in progress. The following stack trace is observed: vx_event_wait vx_icache_process vx_switch_ilocks_list vx_cfs_icache_process vx_switch_ilocks vx_fs_reinit vx_reorg_dostruct vx_extmap_reorg vx_struct_reorg vx_aioctl_full vx_aioctl_common vx_aioctl vx_ioctl vx_compat_ioctl compat_sys_ioctl DESCRIPTION: During the structural file set re-organization, due to some race condition, the VX_CFS_IOWN_TRANSIT flag is set on the inode. At the final stage of the structural file set re-organization, all the inodes are re-initialized. Since, the VX_CFS_IOWN_TRANSIT flag is set improperly, the re-initialization fails to proceed. This causes the hang. RESOLUTION: The code is modified such that VX_CFS_IOWN_TRANSIT flag is cleared. * 3471165 (Tracking ID: 3332902) SYMPTOM: The system running the fsclustadm(1M) command panics while shutting down. The following stack trace is logged along with the panic: machine_kexec crash_kexec oops_end page_fault [exception RIP: vx_glm_unlock] vx_cfs_frlpause_leave [vxfs] vx_cfsaioctl [vxfs] vxportalkioctl [vxportal] vfs_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath DESCRIPTION: There exists a race-condition between "fsclustadm(1M) cfsdeinit" and "fsclustadm(1M) frlpause_disable". The "fsclustadm(1M) cfsdeinit" fails after cleaning the Group Lock Manager (GLM), without downgrading the CFS state. Under the false CFS state, the "fsclustadm(1M) frlpause_disable" command enters and accesses the GLM lock, which "fsclustadm(1M) cfsdeinit" frees resulting in a panic. Another race condition exists between the code in vx_cfs_deinit() and the code in fsck, and it leads to the situation that although fsck has a reservation held, but this couldn't prevent vx_cfs_deinit() from freeing vx_cvmres_list because there is no such a check for vx_cfs_keepcount. RESOLUTION: The code is modified to add appropriate checks in the "fsclustadm(1M) cfsdeinit" and "fsclustadm(1M) frlpause_disable" to avoid the race-condition. * 3484316 (Tracking ID: 2555201) SYMPTOM: The internal conformance and stress testing on local and cluster mounted file- system hits a debug assert. DESCRIPTION: If more than ten conditions are used in a macro and the macro is called from an if condition, the HP Itanium based compiler does some code optimization which affects the if condition logic. This results in the hitting of various asserts during internal testing. RESOLUTION: The code is modified such that the macro using more than ten condition is replaced by a function. Patch ID: PHKL_43539 * 2755784 (Tracking ID: 2730759) SYMPTOM: The sequential read performance is poor because of the read-ahead issues. DESCRIPTION: The read-ahead on sequential reads performed incorrectly because of wrong read- advisory and the read-ahead pattern offsets are used to detect and perform the read-ahead. Also, more sync reads are performed which can affect the performance. RESOLUTION: The code is modified and the read-ahead pattern offsets are updated correctly to detect and perform the read-ahead at the required offsets. The read-ahead detection is also modified to reduce the sync reads. * 2801689 (Tracking ID: 2695390) SYMPTOM: TEDed built hits "f:vx_cbdnlc_lookup:3" assert during internal test runs. DESCRIPTION: If a vnode having a NULL file system pointer is added to cbdnlc cache and later, during some lookup, if vnode is returned, an assert gets hit during validation of the vnode. RESOLUTION: The code is modified to identify the place where an invalid vnode (whose file system pointer is not set) is added to the cache and to prevent it from being added to the cbdnlc cache. * 2857465 (Tracking ID: 2735912) SYMPTOM: The performance of tier relocation for moving a large number of files is poor when the `fsppadm enforce' command is used. When looking at the fsppadm(1M) command in the kernel, the following stack trace is observed: vx_cfs_inofindau vx_findino vx_ialloc vx_reorg_ialloc vx_reorg_isetup vx_extmap_reorg vx_reorg vx_allocpolicy_enforce vx_aioctl_allocpolicy vx_aioctl_common vx_ioctl vx_compat_ioctl DESCRIPTION: When the relocation is for each file located in Tier 1 to be relocated to Tier 2, Veritas File System (VxFS) allocates a new reorg inode and all its extents in Tier 2. VxFS then swaps the content of these two files and deletes the original file. This new inode allocation which involves a lot of processing can result in poor performance when a large number of files are moved. RESOLUTION: The code is modified to develop a reorg inode pool or cache instead of allocating it each time. * 2930507 (Tracking ID: 2215398) SYMPTOM: Internal stress test in the cluster environment hits an assert. DESCRIPTION: The transactions for the partitioned directories need to be initiated without the flush operation. Currently, the transaction flag is not set properly to initiate the transactions without the flush operation for the partitioned directories, this results in the assert to be hit. RESOLUTION: The code is modified to set the transaction flag properly, to initiate the transactions without the flush operation for the partitioned directories. * 2932216 (Tracking ID: 2594774) SYMPTOM: The f:vx_msgprint:ndebug assert is observed several times in the internal Cluster File System (CFS) testing. DESCRIPTION: In case of CFS, the "no space left on device" (ENOSPC) error is observed when the File Change Log (FCL) is enabled during the reorganization operation. The secondary node requests the primary to delegate the allocation units (AUs). The primary node delegates an AU which has an exclusion zone set. This returns the ENOSPC error. You must retry to get another AU. Currently the retry count for getting an AU and allocation failures are set at 3. This retry count can be increased. RESOLUTION: Code is modified to increase the number of retries when allocation fails because the exclusion zones are set on the delegated AU and when the CFS is frozen. * 3011828 (Tracking ID: 2963763) SYMPTOM: When thin_friendly_alloc and deliache_enable parameters are enabled, Veritas File System (VxFS) may hit the deadlock. The thread involved in the deadlock can have the following stack trace: vx_rwsleep_lock() vx_tflush_inode() vx_fsq_flush() vx_tranflush() vx_traninit() vx_remove_tran() vx_pd_remove() vx_remove1_pd() vx_do_remove() vx_remove1() vx_remove_vp() vx_remove() vfs_unlink() do_unlinkat The threads waiting in vx_traninit() for transaction space, displays following stack trace: vx_delay2() vx_traninit() vx_idelxwri_done() vx_idelxwri_flush() vx_common_inactive_tran() vx_inactive_tran() vx_local_inactive_list() vx_inactive_list+0x530() vx_worklist_process() vx_worklist_thread() DESCRIPTION: In the extent allocation code paths, VxFS is setting the IEXTALLOC flag on the inode, without taking the ILOCK, with overlapping transactions picking up this same inode off the delicache list makes the transaction done code paths to miss the IUNLOCK call. RESOLUTION: The code is modified to change the corresponding code paths to set the IEXTALLOC flag under proper protection. * 3024028 (Tracking ID: 2899907) SYMPTOM: Some file-system operations on a Cluster File System (CFS) may hang with the following stack trace. vxg_svar_sleep_unlock vxg_grant_sleep vxg_cmn_lock vxg_api_lock vx_glm_lock vx_mdele_hold vx_extfree1 vx_exttrunc vx_trunc_ext4 vx_trunc_tran2 vx_trunc_tran vx_cfs_trunc vx_trunc vx_inactive_remove vx_inactive_tran vx_cinactive_list vx_workitem_process vx_worklist_process vx_worklist_thread vx_kthread_init kernel_thread DESCRIPTION: In CFS, a node can lock a mdelelock for an extent map while holding a mdelelock for a different extent map locked. This can result in a deadlock between different nodes in the cluster. RESOLUTION: The code is modified to prevent the deadlock between different nodes in the cluster. * 3024042 (Tracking ID: 2923105) SYMPTOM: Removing the Veritas File System (VxFS) module using rmmod(8) on a system having heavy buffer cache usage may hang. DESCRIPTION: When a large number of buffers are allocated from the buffer cache, at the time of removing VxFS module, the process of freeing the buffers takes a long time. RESOLUTION: The code is modified to use an improved algorithm which prevents it from traversing the free lists even if it has found the free chunk. Instead, it will break out from the search and free that buffer. * 3024049 (Tracking ID: 2926684) SYMPTOM: On systems with heavy transactions workload like creation, deletion of files and so on, the system may panic with the following stack trace: a|.. vxfs:vx_traninit+0x10 vxfs:vx_dircreate_tran+0x420 vxfs:vx_pd_create+0x980 vxfs:vx_create1_pd+0x1d0 vxfs:vx_do_create+0x80 vxfs:vx_create1+0xd4 vxfs:vx_create+0x158 a|.. DESCRIPTION: In case of a delayed log, a transaction commit can complete before completing the log write. The memory for transaction is freed before logging the transaction and corrupts the transaction freelist causing the system to panic. RESOLUTION: The code is modified such that the transaction is not freed untill the log is written. * 3024052 (Tracking ID: 2906018) SYMPTOM: In the event of a system crash, the fsck-intent-log is not replayed and file system is marked clean. Subsequently, mounting the file-system-extended operations is not completed. DESCRIPTION: Only when a file system that contains PNOLTs is mounted locally (mounted without using 'mount -o cluster') are potentially exposed to this issue. The reason why fsck silently skips the intent-log replay is that each PNOLT has a flag to identify whether the intent-log is dirty or not - in the event of a system crash this flag signifies whether intent-log replay is required or not. In the event of a system crash whilst the file system was mounted locally and the PNOLTs are not utilized. The fsck intent-log replay will still check for the flags in the PNOLTs, however, these are the wrong flags to check if the file system was locally mounted. The fsck intent-log replay therefore assumes that the intent-logs are clean (because the PNOLTs are not marked dirty) and it therefore skips the replay of intent-log altogether. RESOLUTION: The code is modified such that when PNOLTs exist in the file system, VxFS will set the dirty flag in the CFS primary PNOLT while mounting locally. With this change, in the event of system crash whilst a file system is locally mounted, the subsequent fsck intent-log replay will correctly utilize the PNOLT structures and successfully replay the intent log. * 3024088 (Tracking ID: 3008451) SYMPTOM: On a cluster mounted filesystem hastop -all command may panic some of the nodes with the following stack trace. vxfs:vx_is_fs_disabled_implamf:is_fs_disabled amf:amf_ev_fsoff_verify amf:amf_event_reg amf:amfioctl amf:amf_ioctl specfs:spec_ioctl genunix:fop_ioctl genunix:ioctl DESCRIPTION: vx_is_fs_disabled_impl function which is called during the umount operation (triggered by hastop-all) traverses the vx_fsext_list one by one and returns true if the file system is disabled. While traversing this list, It also accesses filesystems which have fse_zombie flag which denoted that the filesystem is in unstable state some pointers may be NULL which when accessed would panic the machine with the above mentioned stack trace. RESOLUTION: The code is modified to skip fsext with fse_zombie flag set since fse_zombie flag set implies fsext is in unstable state. * 3131795 (Tracking ID: 2912089) SYMPTOM: On a Cluster mounted File System which is highly fragmented, a grow file operation may hang with the following stack traces. T1: vx_event_wait+0001A8 vx_async_waitmsg+000174 vx_msg_send+0006B0005BC vx_cfs_pagealloc+00023C vx_alloc_getpage+0002DC vx_do_getpage+001618 vx_mm_getpage+0000B4 vx_internal_alloc+00029C vx_write_alloc+00051C vx_write1+0014D4 vx_write_common_slow+000EB0 vx_write_common+000C34 vx_rdwr_attr+0002C4 T2: vx_glm_lock+000120 vx_genglm_lock+0000B0 vx_iglock3+0004B4 vx_iglock2+0005E4 vx_iglock+00004C vx_write1+000E70 vx_write_common_slow+000EB0 vx_write_common+000C34 vx_rdwr_attr+0002C4 DESCRIPTION: While growing a file a transaction is performed to allocate extents. CFS can only allow up to a maximum number of sub transactions within a transaction. When the maximum limit for sub transactions is reached, CFS retries the operation. If the file system is badly fragmented then CFS goes into an infinite loop due to crossing maximum sub transaction limit in every retrial. RESOLUTION: Code is modified to specify a maximum retry limit and abort the operation with ENOSPC error after the retry limit is reached. * 3131824 (Tracking ID: 2966277) SYMPTOM: Systems with high file-system activity like read/write/open/lookup may panic with the following stack trace due to a rare race condition: spinlock+0x21 ( ) -> vx_rwsleep_unlock() vx_ipunlock+0x40() vx_inactive_remove+0x530() vx_inactive_tran+0x450() vx_local_inactive_list+0x30() vx_inactive_list+0x420() -> vx_workitem_process() -> vx_worklist_process() vx_worklist_thread+0x2f0() kthread_daemon_startup+0x90() DESCRIPTION: ILOCK is released before doing a IPUNLOCK that causes a race condition. This results in a panic when an inode that has been set free is accessed. RESOLUTION: The code is modified so that the ILOCK is used to protect the inodes' memory from being set free, while the memory is being accessed. * 3131885 (Tracking ID: 3010444) SYMPTOM: On a Network File System (NFS) mounted file system, the operations which read the file via the cksum (1m) command may fail with the following error message: cksum: read error on : Bad address The following error messages would also be seen in the syslog vmunix: WARNING: Synchronous Page I/O error DESCRIPTION: When the read-vnode operation (VOP_RDWR) is performed, certain requests are converted to direct the I/O for optimisation. However, the NFS buffers passed during the read requests are not the user buffers. As a result, there is an error. RESOLUTION: The code is modified to convert the I/O requests to the direct I/O, only if the buffer passed during the I/O is the user buffer. * 3131920 (Tracking ID: 3049408) SYMPTOM: When the system is under the file-cache pressure, the find(1) command takes time to operate. DESCRIPTION: The Veritas File System (VxFS) does not grow the metadata-buffer cache under system or file-cache memory pressure. When the vx_bcrecycle_timelag factor drops to zero, the metadata buffers are reused immediately after they are accessed. As a result, a large-directory scan takes many physical I/Os to scan the directory. The end result is that VxFS ends up performing excessive re- reads for the same data, into the metadata-buffer cache. However, the file- cache memory pressure is normal. There is no need to shrink the metadata-buffer cache, just because there is a file-cache memory pressure. RESOLUTION: The code is modified to unlink the metadata-buffer cache behaviour from the file-cache memory pressure. * 3138653 (Tracking ID: 2972299) SYMPTOM: open(O_CREAT) (1m) operation can take upto 0.5 seconds to complete. A high value of vxi_bc_reuse counter is also seen in vxfsstat data. DESCRIPTION: After the directory blocks are cached, they are expected to remain in cache till they are evicted from the cache. The buffer-cache reuse code uses the "lbolt" value to determine the age of the buffer. All the buffers which are older than a particular threshold are reused. Errors are introduced to buffer- reuse calculations because the simple signed-unsigned arithmetic causes the buffers to be reused every time. Hence subsequent reads take a longer than expected time. RESOLUTION: The code is modified so that the variables which store time are correctly declared as signed int. * 3138663 (Tracking ID: 2732427) SYMPTOM: The system hangs with the following stacks: T1: _spin_lock_irqsave vx_bc_do_brelse vx_bc_biodone vx_inode_iodone vx_end_io_bp vx_end_io blk_update_request blk_update_bidi_request __blk_end_request_all vxvm_end_request volkiodone volsiodone vol_subdisksio_done volkcontext_process voldiskiodone getnstimeofday voldiskiodone_intr gendmpiodone blk_update_request blk_update_bidi_request blk_end_bidi_request scsi_end_request scsi_io_completion blk_done_softirq __do_softirq call_softirq do_softirq irq_exit do_IRQ --- --- ret_from_intr [exception RIP: vxg_api_deinitlock+147] vx_glm_deinitlock vx_cbuf_free_list vx_workitem_process vx_worklist_process vx_worklist_thread vx_kthread_init kernel_thread T2: _spin_lock vx_cbuf_rele vx_bc_getblk vx_getblk_bp vx_getblk_clust vx_getblk_cmn find_busiest_group vx_getblk vx_iupdat_local vx_cfs_iupdat vx_iflush_list vx_iflush vx_workitem_process vx_worklist_process vx_worklist_thread vx_kthread_init T3: _spin_lock_irqsave vxvm_end_request volkiodone volsiodone vol_subdisksio_done volkcontext_process voldiskiodone getnstimeofday voldiskiodone_intr gendmpiodone blk_update_request blk_update_bidi_request blk_end_bidi_request scsi_end_request scsi_io_completion blk_done_softirq __do_softirq call_softirq do_softirq irq_exit do_IRQ --- --- ret_from_intr [exception RIP: _spin_lock+9] vx_cbuf_lookup vx_getblk_clust vx_getblk_cmn find_busiest_group vx_cfs_iupdat vx_iflush_list vx_iflush vx_workitem_process vx_worklist_process vx_worklist_thread vx_kthread_init kernel_thread DESCRIPTION: There are three locks which constitutes the dead lock. They include a volume manager lock (L1), a buffer list lock (L2), and cluster buffer list lock (L3). T1, which tries to release a buffer for I/O completion, holds a volume manager spin lock (L1) and waits for a buffer free list lock (L2). T2 is the owner of L2. The T2 is chasing a cluster buffer lock (L3) to release its affiliated cluster buffer. When T3 tries to obtain the L3 an unexpected disk interrupt happens, which processes an iodone job. As a result, T3 in volume manager layer is stuck by the volume manager lock L1, which causes a deadlock. RESOLUTION: The code is modified so that in vx_bc_getblk, buffer list lock is dropped before acquiring cluster buffer list lock. This means that the "df -i" count will rarely be accurate. However once an Inode Allocation Unit (IAU) has all its inodes allocated, its delegation will now timeout. As files are created, the IAU delegations will timeout one by one after 3 minutes of inactivity, thus allowing the CFS primary to obtain more accurate df -i count information. As the number of files in the file system grows, any remaining "df -i" count inaccuracy, due to the current CFS secondary IAU delegations will become increasingly irrelevant. The * 3138668 (Tracking ID: 3121933) SYMPTOM: The pwrite() function fails with EOPNOTSUPP when the write range is in two indirect extents. DESCRIPTION: When the range of pwrite() falls in two indirect extents (one ZFOD extent belonging to DB2 pre-allocated files created with setext( , VX_GROWFILE, ) ioctl and another DATA extent belonging to adjacent INDIR) write fails with EOPNOTSUPP. The reason is that VxFS is trying to coalesce extents which belong to different indirect address extents as part of this transaction - such a meta- data change consumes more transaction resources which VxFS transaction engine is unable to support in the current implementation. RESOLUTION: Code is modified to retry the transaction without coalescing the extents, as latter is an optimisation and should not fail write. * 3138675 (Tracking ID: 2756779) SYMPTOM: Write and read performance concerns on Cluster File System (CFS) when running applications that rely on POSIX file-record locking (fcntl). DESCRIPTION: The usage of fcntl on CFS leads to high messaging traffic across nodes thereby reducing the performance of readers and writers. RESOLUTION: The code is modified to cache the ranges that are being file-record locked on the node. This is tried whenever possible to avoid broadcasting of messages across the nodes in the cluster. * 3138695 (Tracking ID: 3092114) SYMPTOM: The information output by the "df -i" command can often be inaccurate for cluster mounted file systems. DESCRIPTION: In Cluster File System 5.0 release a concept of delegating metadata to nodes in the cluster is introduced. This delegation of metadata allows CFS secondary nodes to update metadata without having to ask the CFS primary to do it. This provides greater node scalability. However, the "df -i" information is still collected by the CFS primary regardless of which node (primary or secondary) the "df -i" command is executed on. For inodes the granularity of each delegation is an Inode Allocation Unit [IAU], thus IAUs can be delegated to nodes in the cluster. When using a VxFS 1Kb file system block size each IAU will represent 8192 inodes. When using a VxFS 2Kb file system block size each IAU will represent 16384 inodes. When using a VxFS 4Kb file system block size each IAU will represent 32768 inodes. When using a VxFS 8Kb file system block size each IAU will represent 65536 inodes. Each IAU contains a bitmap that determines whether each inode it represents is either allocated or free, the IAU also contains a summary count of the number of inodes that are currently free in the IAU. The ""df -i" information can be considered as a simple sum of all the IAU summary counts. Using a 1Kb block size IAU-0 will represent inodes numbers 0 - 8191 Using a 1Kb block size IAU-1 will represent inodes numbers 8192 - 16383 Using a 1Kb block size IAU-2 will represent inodes numbers 16384 - 32768 etc. The inaccurate "df -i" count occurs because the CFS primary has no visibility of the current IAU summary information for IAU that are delegated to Secondary nodes. Therefore the number of allocated inodes within an IAU that is currently delegated to a CFS Secondary node is not known to the CFS Primary. As a result, the "df -i" count information for the currently delegated IAUs is collected from the Primary's copy of the IAU summaries. Since the Primary's copy of the IAU is stale, therefore the "df -i" count is only accurate when no IAUs are currently delegated to CFS secondary nodes. In other words - the IAUs currently delegated to CFS secondary nodes will cause the "df -i" count to be inaccurate. Once an IAU is delegated to a node it can "timeout" after a 3 minutes of inactivity. However, not all IAU delegations will timeout. One IAU will always remain delegated to each node for performance reasons. Also an IAU whose inodes are all allocated (so no free inodes remain in the IAU) it would not timeout either. The issue can be best summarized as: The more IAUs that remain delegated to CFS secondary nodes, the greater the inaccuracy of the "df -i" count. RESOLUTION: Allow the delegations for IAU's whose inodes are all allocated (so no free inodes in the IAU) to "timeout" after 3 minutes of inactivity. * 3141278 (Tracking ID: 3066116) SYMPTOM: The system panics due to NULL pointer dereference with the following stack trace: a| bubbleup vx_worklist_process vx_worklist_thread a| DESCRIPTION: To prevent too many running inactive threads, two adb tunables "vx_inactive_throttling" and "vx_inactive_process_throttling" are introduced to fix the issue of vxfsd taking lot of CPU time after deleting some large directories. A bug in the code increments a local counter from 0 to 1. This in turn affects inactive work item dispatch. As a result, the empty work items are added to the local batch of work items. The system panics while processing this empty work item. RESOLUTION: The code is modified not to incremented the counter. * 3141428 (Tracking ID: 2972183) SYMPTOM: "fsppadm enforce" takes longer than usual time force update the secondary nodes than it takes to force update the primary nodes. DESCRIPTION: The ilist is force updated on secondary node. As a result the performance on the secondary becomes low. RESOLUTION: Force update the ilist file on Secondary nodes only on error condition. * 3141433 (Tracking ID: 2895743) SYMPTOM: It takes a longer than usual time for many Windows7 clients to log off in parallel if the user profile is stored in Cluster File system (CFS). DESCRIPTION: Veritas File System (VxFS) keeps file creation time/full ACL things for samba clients in the extended attribute which is implemented via named streams. VxFS reads the named stream for each of the ACL objects. Reading of named stream is a costly operation, as it results in an open, an opendir, a lookup, and another open to get the fd. The VxFS function vx_nattr_open() holds the exclusive rwlock to read an ACL object that stored as extended attribute. It may cause heavy lock contention when many threads want the same lock. They might get blocked until one of the nattr_open releases it. This takes time since nattr_open is very slow. RESOLUTION: The code is modified so that it takes the rwlock in shared mode instead of exclusive for linux getxattr code path. * 3141440 (Tracking ID: 2908391) SYMPTOM: Checkpoint removal takes too long if Veritas File System (VxFS) has a large number of files. The cfsumount(1M) command could hang if removal of multiple checkpoints is in progress for such a file system. DESCRIPTION: When removing a checkpoint, VxFS traverses every inode to determine if pull/push is needed for upstream/downstream checkpoint in its chain. This is time consuming if the file system has large number of files. This results in the slow checkpoint removal. The command "cfsumount -c fsname" forces the umounts operation on a VxFS file system if there is any asynchronous checkpoint removal job in progress by checking if the value of vxfs stat "vxi_clonerm_jobs" is larger than zero. However, the stat does not count in the jobs in the checkpoint removal working queue and the jobs are entered into the working queue. The "force umount" operation does not happen even if there are pending checkpoint removal jobs because of the incorrect value of "vxi_clonerm_jobs" (zero). RESOLUTION: For slow checkpoint removal issue: Code is modified to create multiple threads to work on different Inode Allocation Units (IAUs) in parallel and to reduce the inode push work by sorting the checkpoint removal jobs by the creation time in ascending order and enlarged the checkpoint push size. For the cfsumount(1M) command hang issue: Code is modified to add the counts of jobs in the working queue in the "vxi_clonerm_jobs" stat. * 3141445 (Tracking ID: 3003679) SYMPTOM: The file system hangs when doing fsppadm and removing a file with named stream attributes (nattr) at the same time. The following two typical threads are involved: T1: COMMAND: "fsppadm" schedule at vxg_svar_sleep_unlock vxg_grant_sleep vxg_cmn_lock vxg_api_lock vx_glm_lock vx_ihlock vx_cfs_iread vx_iget vx_traverse_tree vx_dir_lookup vx_rev_namelookup vx_aioctl_common vx_ioctl vx_compat_ioctl compat_sys_ioctl T2: COMMAND: "vx_worklist_thr" schedule vxg_svar_sleep_unlock vxg_grant_sleep vxg_cmn_lock vxg_api_lock vx_glm_lock vx_genglm_lock vx_dirlock vx_do_remove vx_purge_nattr vx_nattr_dirremove vx_inactive_tran vx_cfs_inactive_list vx_inactive_list vx_workitem_process vx_worklist_process vx_worklist_thread vx_kthread_init kernel_thread DESCRIPTION: The file system hangs due to the deadlock between the threads. T1 initiated by fsppadm calls vx_traverse_tree to obtain the path name for a given inode number. T2 removes the inode as well as its affiliated nattr inodes. The reverse name lookup (T1) holds the global dirlock in vx_dir_lookup during the lookup process. It traverses the entire path from bottom to top to resolve the inode number inversely in vx_traverse_tree. During the lookup, VxFS needs to hold the hlock of each inode to read them, and drop it after reading. The file removal (T2) is processed via vx_inactive_tran which will take the "hlock" of the inode being removed. After that, it will remove all its named attribute inodes invx_do_remove, where sometimes the global dirlock is needed. Eventually, each thread waits for the lock, which is held by the other thread and this result in the deadlock. RESOLUTION: The code is modified so that the dirlock is not acquired during reserve name lookup. * 3142476 (Tracking ID: 3072036) SYMPTOM: Reads from secondary node in CFS can sometimes fail with ENXIO (No such device or address). DESCRIPTION: The incore attribute ilist on secondary node is out of sync with that of the primary. RESOLUTION: The code is modified so that the incore attribute ilist on secondary node is not out of sync with that of the primary. * 3159607 (Tracking ID: 2779427) SYMPTOM: On a Cluster Mounted File-system read/write/lookup operation may mark the file- system for a full fsck. The following messages are seen in the system log: vxfs: msgcnt mesg 096: V-2-96: vx_setfsflags - file system fullfsck flag set - vx_cfs_iread DESCRIPTION: The in-core ilist (inode list) on secondary node is not synchronized with the primary node. RESOLUTION: The code is modified to retry the operation a fixed number of times before giving out the error. * 3160205 (Tracking ID: 3157624) SYMPTOM: The fcntl() system call when used for file share reservations (F_SHARE command) can cause a memory leak in Cluster File System (CFS). The memory leak is observed in the "ALLOCB_MBLK_LM" arena. Stack trace for the leak (as seen in HP-UX vmtrace) is as follows: $cold_kmem_arena_varalloc+0xd0 allocb+0x880 llt:llt_msgalloc+0xa0 gab:gab_mem_allocmsg+0x70 gab:gab_allocmsg+0x20 vx_msgalloc+0x70 vx_recv_shrlock+0x60 vx_recv_rpc+0x100 DESCRIPTION: In CFS, file share reservation requests are broadcasted to all the nodes in the cluster to check for conflicts. Due to a bug in the code, the system cannot free the response messages received. This results in a memory leak for every broadcast of the "file share reservation" message. RESOLUTION: The code is modified to free the response message received. * 3207096 (Tracking ID: 3192985) SYMPTOM: Checkpoints quota usage on CFS can be negative. An example is as follows: Filesystem hardlimit softlimit usage action_flag /sofs1 51200 51200 18446744073709490176 << negative DESCRIPTION: In CFS, to manage the intent logs, and the other extra objects required for CFS, a holding object referred to as a per-node-object-location table (PNOLT) is created. In CFS, the quota usage is calculated by reading the per node cut (current usage table) files (member of PNOLT) and summing up the quota usage for each clone clain. However, when the quotaoff and quotaon operations are fired on a CFS checkpoint, the usage shows "0" after these two operations are executed. This happens because the quota usage calculation is skipped. Subsequently, if a delete operation is performed, the usage becomes negative since the blocks allocated for the deleted file are subtracted from zero. RESOLUTION: The code is modified such that when the quotaon operation is performed, the quota usage calculation is not skipped. * 3226404 (Tracking ID: 3214816) SYMPTOM: When you create and delete the inodes of a user frequently with the DELICACHE feature enabled, the user quota file becomes corrupt. DESCRIPTION: The inode DELICACHE feature causes this issue. This feature optimizes the updates on the inode map during the file creation and deletion operations. It is enabled by default. You can disable this feature with the vxtunefs(1M) command. When DELICACHE is enabled and the quota is set for Veritas File System (VxFS), VxFS updates the quota for the inodes before the inodes are on the DELICACHE list and after they are on the inactive list during the removal process. As a result, VxFS decrements the current number of user files twice. This causes the quota file corruption. RESOLUTION: The code is modified to identify the inodes moved to the inactive list from the DELICACHE list. This flag prevents the quota being decremented again during the removal process. * 3235517 (Tracking ID: 3240635) SYMPTOM: In a CFS environment, when a checkpoint is mount using the mount(1M) command the system may panic. The following stack trace is observed: vx_domount vx_fill_super get_sb_nodev vx_get_sb_nodev vx_get_clone_impl vx_get_clone_sb do_kern_mount do_mount sys_mount DESCRIPTION: When a checkpoint is mounted cluster-wide the protocol version is verified. However, if the primary fileset (999) is not mounted cluster-wide, some of the file system data structures remain uninitialized. This results in a panic. RESOLUTION: The code is modified to disable the cluster-wide mount of the checkpoint if the primary fileset is mounted locally. * 3243204 (Tracking ID: 3226462) SYMPTOM: On a cluster mounted file-system with unequal CPUs, while doing a lookup operation, a node may panic with the stack trace: vx_dnlc_recent_cookie vx_dnlc_getpathname audit_get_pathname_from_dnlc audit_clean_path $cold_audit_build_full_dir_name inline change_p_cdir DESCRIPTION: The cause of the panic is of out-of-bounds access in the counters[] array whose size is defined by the vx_max_cpu variable. The value of vx_max_cpu can differ between the CFS nodes, if the nodes have different number of processors. However, the code assumes this value is the same across the cluster. When propagating inode cookies across the cluster, the counter[] array is allocated based on the vx_max_cpu of the current CFS node. If the cookie is populated via vx_cbdnlc_populate_cookie(), having a CPU ID from another CFS node exceeding the local vx_max_cpu, the function vx_dnlc_recent_cookie() would access locations beyond the counter[] array allocated. RESOLUTION: The code is modified to detect the out-of-bound access at vx_dnlc_recent_cookie () and return the ENOENT error. * 3248982 (Tracking ID: 3272896) SYMPTOM: Internal stress test on the local mount hits a deadlock. DESCRIPTION: When the rename operation is performed, the directory lock of the file system is necessary to check whether the source directory is being renamed to the sub- directory of itself (directory loop). However, when the directory is partitioned this check is not required. This unnecessary directory lock caused a deadlock when the directory is partitioned. RESOLUTION: The code is modified to take the directory lock in case of the rename operation when the directory is partitioned. * 3249151 (Tracking ID: 3270357) SYMPTOM: The fsck (1m) command fails to clean the corrupt file system during the internal 'noise' test. The following error message is displayed: pass0 - checking structural files pass1 - checking inode sanity and blocks fileset 999 primary-ilist inode mismatched reorg extent map fileset 999 primary-ilist inode bad blocks found clear? (ynq)n fileset 999 primary-ilist inode does not have corresponding matchino clear? (ynq)n DESCRIPTION: The inode which is reorged is from the atribute ilist. When a reorg inode is allocated, the 2262161th inode from the delicache is referred to. However, this inode is from the primary ilist. There is no check in the 'vx_ialloc' that forces an attribute list inode's corresponding reorg inode to be allocated from the same ilist. But the fsck (1M) code expects the source and the reorg inode to be from the same ilist. So that when the reorg inode is examined from the primary ilist, it checks the corresponding source inode. Also, in the primary list the VX_IEREORG is not set. Thereby, an error message is displayed. RESOLUTION: The code is modified to the add a check for 'vx_ialloc' to ensure that the reorg inode is allocated from the same ilist. * 3261334 (Tracking ID: 3228646) SYMPTOM: NFSv4 server may panic with the following stack trace, when fcntl() requests with F_SETLK are made on CFS: vx_vn_inactive+0xf0 vn_rele_inactive+0x140 vfs_free_iflist+0x100 rfs4_op_release_lockowner+0x4c0 rfs4_compound+0x430 common_dispatch+0xc10 rfs_dispatch+0x40 svc_getreq+0x250 svc_run+0x300 svc_do_run+0xd0 nfssys+0x7c0 hpnfs_nfssys+0x60 DESCRIPTION: In a CFS configuration, if fcntl(1m) fails, some NFS specific structures(I_pid) are not updated correctly and may point to stale information. This causes the NFSv4 server to panic. RESOLUTION: The code is modified to preserve the l_pid value during a failed fcntl(F_SETLK) request. * 3261782 (Tracking ID: 3240403) SYMPTOM: The fidtovp() system call can panic in the vx_itryhold_locked() function with the following stack trace: vx_itryhold_locked vx_iget vx_common_vget vx_do_vget vx_vget_skey vfs_vget fidtovp kernel_add_gate_cstack nfs3_fhtovp rfs3_write rfs_dispatch svc_getreq threadentry [kdb_read_mem] DESCRIPTION: Some VxFS operations like the vx_vget() function try to get a hold on an in- core inode using the vx_itryhold_locked() function, without taking the lock on the corresponding directory inode. This might lead to a race condition when this inode is present on the delicache list and is inactivated Thereby this results in a panic when the vx_itryhold_locked() function tries to remove it from a free list. RESOLUTION: The code is modified to take the delicahe lock before unlocking the ilist lock inside the vx_inactive() function when the IDELICACHE flag is set. This prevents the race condition. * 3262025 (Tracking ID: 3259634) SYMPTOM: A CFS that has more than 4 GB blocks is corrupted due to some file system metadata being zeroed out incorrectly. The blocks which get zeroed out may contain any metadata or file data and can be located anywhere on the disk. The problem occurs only with the following file system size and the FS block size combinations: 1kb block size and FS size > 4TB 2kb block size and FS size > 8TB 4kb block size and FS size > 16TB 8kb block size and FS size > 32TB DESCRIPTION: When a CFS is mounted for the first time on the secondary node, a per-node- intent log is created. When the intent log is created, the blocks newly allocated to it are zeroed out. The start offset and the length to be cleared is passed to the block that clears the routine. Due to a miscalculation a wrong start offset is passed. This results in the disk content at that offset getting zeroed out incorrectly. This content can be file system metadata or file data. If it is the metadata, this corruption is detected when the metadata is accessed and the file system is marked for full fsck(1M). RESOLUTION: The code is modified so that the correct start offset is passed to the block that clears the routine. Patch ID: PHKL_43432 * 3042340 (Tracking ID: 2616622) SYMPTOM: The performance of the mmap() function is slow when the file system block size is 8 KB and the page size is 4 KB. DESCRIPTION: When the file system block size is 8 KB, the page size is 4 KB, and the mmap() function is performed on an 8 KB file, the file gets represented in memory as two pages (0 and 1). When the memory at offset 0 in the mapping is modified, a page fault occurs for page 0 in the file. When that disk block is allocated and marked as valid, the page mentioned in the fault request is expected to get flushed out to the disk and therefore, it is left uninitialized on the disk by default. Only that particular page is cleaned in memory and left modified so that it is known that the data in memory ismore recent than the data on disk. However, the other half of the block (which could eventually be mapped to page 1) gets cleared with a synchronous write because such a fault may not occur. This synchronous clearing of the other half of 8 KB block causes performance degradation. RESOLUTION: The code is modified to expand the range of the fault to cover the entire 8 KB block. The message from the OS asking for only one page is ignored and two pages are given to cover the entire file system block to save the separate synchronous clearing of the other half of 8 KB block. * 3042341 (Tracking ID: 2555198) SYMPTOM: On HPUX 11.31 binary mode, File Transfer Protocol (FTP) transfer uses the sendfile() interface, which does not create the DMAPI events for Hierarchical Storage Management (HSM). DESCRIPTION: The sendfile() interface does not call the Veritas File System (VxFS) read() function that creates the DMAPI events. It uses the HP Unified File Cache(UFC) interface. The UFC interface is not aware of the HSM application. As a result, the DMAPI events are not generated. RESOLUTION: The code is modified to set a flag in the vfs structure during the mount time, to indicate if the file system is configured under HSM. This flag information is used by the UFC interface to generate the DMAPI events. * 3042352 (Tracking ID: 2806466) SYMPTOM: A reclaim operation on a filesystem mounted on a Logical Volume Manager (LVM) volume using the fsadm(1M) command with the 'R' option may panic the system and the following stack trace is displayed: vx_dev_strategy+0xc0() vx_dummy_fsvm_strategy+0x30() vx_ts_reclaim+0x2c0() vx_aioctl_common+0xfd0() vx_aioctl+0x2d0() vx_ioctl+0x180() DESCRIPTION: Thin reclamation is supported only on the file systems mounted on a Veritas Volume Manager (VxVM) volume. RESOLUTION: The code is modified to error out gracefully if the underlying volume is LVM. * 3042357 (Tracking ID: 2750860) SYMPTOM: On a large file system(4TB or greater), the performance of the write(1) operation with many small request sizes may degrade, and many threads may be found sleeping with the following stack trace: real_sleep sleep_one vx_sleep_lock vx_lockmap vx_getemap vx_extfind vx_searchau_downlevel vx_searchau_downlevel vx_searchau_downlevel vx_searchau_downlevel vx_searchau_uplevel vx_searchau vx_extentalloc_device vx_extentalloc vx_te_bmap_alloc vx_bmap_alloc_typed vx_bmap_alloc vx_write_alloc3 vx_recv_prealloc vx_recv_rpc vx_msg_recvreq vx_msg_process_thread kthread_daemon_startup DESCRIPTION: For a cluster-mounted file system, the free-extend-search algorithm is not optimized for a large file system (4TB or greater), and for instances where the number of free Allocation Units (AUs) available can be very large. RESOLUTION: The code is modified to optimize the free-extend-search algorithm by skipping certain AUs. This reduces the overall search time. * 3042373 (Tracking ID: 2874172) SYMPTOM: Network File System (NFS) file creation thread might loop continuously with the following stack trace: vx_getblk_cmn(inlined) vx_getblk+0x3a0 vx_exh_allocblk+0x3c0 vx_exh_hashinit+0xa50 vx_dexh_create+0x240 vx_dexh_init+0x8b0 vx_do_create+0x1e0 vx_create1+0x1d0 vx_create0+0x270 vx_create+0x40 rfs3_create+0x420 common_dispatch+0xb40 rfs_dispatch+0x40 svc_getreq+0x250 svc_run+0x310 svc_do_run+0xd0 nfssys+0x6a0 hpnfs_nfssys+0x60 coerce_scall_args+0x130 syscall+0x590 DESCRIPTION: The Veritas File System (VxFS) file creation vnode operation (VOP) routine expects the parent vnode to be a directory vnode pointer. But, the NFS layer passes a stale file vnode pointer by default. This might cause unexpected results such as hang during VOP handling. RESOLUTION: The code is modified to check for the vnode type of the parent vnode pointer at the beginning of the create VOP call and return an error if it is not a directory vnode pointer. * 3042407 (Tracking ID: 3031869) SYMPTOM: In a multi-CPU environment, the "vxfsstat -b" command does not print the correct information on the maximum-size buffer. DESCRIPTION: The "vx_bc_bufhwm" parameter represents the maximum amount of memory that can be used to cache the VxFS metadata. When the kctune(1M) command is used to tune the "vxfs_bc_bufhwm" parameter to a different value, the tunable is not set correctly due to the incorrect arithmetic. As a consequence, the "vxfsstat -b" command reports the maximum-size buffer to be increased, even though the "vxfs_bc_bufhwm" parameter is tuned to a lower value. RESOLUTION: The code is modified to correct the arithmetic for tuning the "vx_bc_bufhwm" parameter. * 3042427 (Tracking ID: 2850738) SYMPTOM: The Veritas File System (VxFS) module allocates memory with MEMWAIT in the callback() routine during the low memory condition. This causes the system to hang with the following stack trace: swtch_to_thread(inlined) slpq_swtch_core+0x520 real_sleep(inlined) sleep+0x400 mrg_reserve_swapmem(inlined) $cold_steal_swap+0x460 $cold_kalloc_nolgpg+0x4b0 kalloc_internal(inlined) $cold_kmem_arena_refill+0x650 kmem_arena_varalloc+0x280 vx_alloc(inlined)vx_worklist_enqueue+0x40 vx_buffer_kmcache_callback+0x160 kmem_gc_arena(inlined) foreach_arena_ingroup+0x840 kmem_garbage_collect_group(inlined) kmem_garbage_collect+0x390 kmem_arena_gc+0x240 kthread_daemon_startup+0x90 DESCRIPTION: The VxFS kernel memory callback() routine allocates memory with MEMWAIT. As a result, the system hangs in low memory condition. RESOLUTION: The code is modified to allocate memory without waiting in the VxFS kernel memory callback() routine. * 3042479 (Tracking ID: 3042460) SYMPTOM: On high end configurations (single npar or stand alone systems) having more than 128 number of processing units, where the work load frequently involves translating a pathname to vnode such as open, stat, find etc, may observe reduced performance due to vnode spin lock contention. DESCRIPTION: Efficient locking technique is not used in the pathname traversal component of VxFS. RESOLUTION: The code has been modified to use a locking mechanism called "shared write spinlocks" which provides an efficient means to lock a vnode. To make use of this new locking, the following set of products, including this patch, has to be installed on the system: PHKL_43180 PHKL_43178 SyncShwrspl VfsShwrsplEnh * 3042501 (Tracking ID: 3042497) SYMPTOM: On high end configurations (single npar or stand alone systems) having more than 128 number of processing units, contention is seen on the locks to serialize the increment and decrement of active levels used to serialize the file-system activity. DESCRIPTION: In the current implementation, the active level is incremented or decremented by taking a spin lock leading to the contention on the locks. RESOLUTION: The code has been enhanced to enable incrementing or decrementing the active level atomically. * 3047980 (Tracking ID: 2439261) SYMPTOM: When vx_fiostats_tunable is changed from zero to non-zero, the system panics with the following stack trace: panic_save_regs_switchstack+0x110 () panic+0x410 () bad_kern_reference+0xa0 () $cold_pfault+0x5c0 () vm_hndlr+0x370 () bubbleup+0x880 () vx_fiostats_do_update+0x140 () vx_fiostats_update+0x170 () vx_read1+0x10e0 () vx_rdwr+0x790 () vno_rw+0xd0 () rwuio+0x32f () pread+0x121 () syscall+0x590 () in ?? () DESCRIPTION: When vx_fiostats_tunable is changed from zero to non-zero, all the incore-inode fiostats attributes are set to NULL. When these attributes are accessed, the system panics due to the NULL pointer dereference. RESOLUTION: The code has been modified such that when vx_fiostats_tunable is changed from zero to non-zero, it is verified if the fiostats attributes of inode are NULL or not. This will prevent the panic. * 3073371 (Tracking ID: 3073372) SYMPTOM: Contention as observed in the lookup-code path when the maximum number of the partition-directory level is set to 3 and the the default partition-directory threshold is 32000. DESCRIPTION: An enhancement is required to change the default maximum number of the partition-directory level to 2 and the default partition-directory threshold (directory size beyond which partition directories come into effect) to 32768. RESOLUTION: An enhancement is made to change the default maximum number of the partition- directory level to 2 and the default partition-directory threshold to 32768. The man pages are updated to reflect these changes accordingly. INSTALLING THE PATCH -------------------- 1.Installing VxFS 5.1 SP1RP3P12 patch: a)If you install this patch on a CVM cluster, install it one system at a time so that all the nodes are not brought down simultaneously. b)VxFS 5.1(GA) must be installed before applying these patches. c)To verify the VERITAS file system level, enter: # swlist -l product | egrep -i 'VRTSvxfs' VRTSvxfs 5.1.100.000 VERITAS File System d)All prerequisite/corequisite patches have to be installed.The Kernel patch requires a system reboot for both installation and removal. e)To install the patch, enter the following command: # swinstall -x autoreboot=true -s &amp;amp;lt;patch_directory&amp;amp;gt; PHCO_44712 PHKL_44729 Incase the patch is not registered, the patch can be registered using the following command: # swreg -l depot &amp;amp;lt;patch_directory&amp;amp;gt; , where &amp;amp;lt;patch_directory&amp;amp;gt; is the absolute path where the patch resides. REMOVING THE PATCH ------------------ Removing VxFS 5.1 SP1RP3P12 patches: a)To remove the patch, enter the following command: # swremove -x autoreboot=true PHCO_44712 PHKL_44729 SPECIAL INSTRUCTIONS -------------------- NONE OTHERS ------ NONE