README VERSION : 1.1 README CREATION DATE : 2013-01-31 PATCH-ID : 148481-02 PATCH NAME : VRTSvxfs 6.0.300.000 BASE PACKAGE NAME : VRTSvxfs BASE PACKAGE VERSION : 6.0.100.000 SUPERSEDED PATCHES : 148481-01 REQUIRED PATCHES : NONE INCOMPATIBLE PATCHES : NONE SUPPORTED PADV : sol10_sparc (P-PLATFORM , A-ARCHITECTURE , D-DISTRIBUTION , V-VERSION) PATCH CATEGORY : CORE , CORRUPTION , HANG , MEMORYLEAK , PANIC , PERFORMANCE PATCH CRITICALITY : CRITICAL HAS KERNEL COMPONENT : YES ID : NONE REBOOT REQUIRED : YES REQUIRE APPLICATION DOWNTIME : YES PATCH INSTALLATION INSTRUCTIONS: -------------------------------- Please refer to Release Notes for install instructions PATCH UNINSTALLATION INSTRUCTIONS: ---------------------------------- Please refer to Release Notes for uninstall instructions SPECIAL INSTRUCTIONS: --------------------- NONE SUMMARY OF FIXED ISSUES: ----------------------------------------- PATCH ID:148481-02 2928921 (2843635) Internal testing is having some failures. 2933290 (2756779) Write and read performance concerns on CFS when running applications that rely on posix file-record locking (fcntl). 2933291 (2806466) "fsadm -R" resulting in panic at LVM layer due to vx_ts.ts_length set to 2GB. 2933292 (2895743) Accessing named attributes for some files seems to be slow. 2933294 (2750860) Performance issue due to CFS fragmentation in CFS cluster 2933296 (2923105) The upgrade VRTSvxfs5.0MP4HFaf hang at vxfs preinstall scripts 2933309 (2858683) Reserve extent attributes changed after vxrestore, only for files greater than 8192bytes 2933313 (2841059) full fsck fails to clear the corruption in attribute inode 15 2933325 (2905820) process hangs during file system access 2933326 (2827751) When ODM is used with non-VxVM devices high kernel memory allocation is seen. 2933751 (2916691) Customer experiencing hangs when doing dedups 2933822 (2624262) Filestore:Dedup:fsdedup.bin hit oops at vx_bc_do_brelse 2937367 (2923867) Internal test hits an assert "f:xted_set_msg_pri1:1". 2976664 (2906018) vx_iread errors after successful log replay and mount of the fs 2978227 (2857751) The internal testing hits the assert "f:vx_cbdnlc_enter:1a". 2984589 (2977697) vx_idetach generated kernel core dump while filestore replication running. 2987373 (2881211) File ACLs not preserved in checkpoints properly if file has hardlink. 3007184 (3018869) On Solaris 11 update 1 fsadm command shows that the mountpoint is not a vxfs file system 3021281 (3013950) Internal assert "f:vx_info_init:2" is hit during Solaris 11 update 1 validation. SUMMARY OF KNOWN ISSUES: ----------------------------------------- 3057670(3060829) Internal tests hit assert "f:vx_putpage1:1a" KNOWN ISSUES : -------------- * INCIDENT NO::3057670 TRACKING ID ::3060829 SYMPTOM:: The assert failure will not cause any outage but may lead to performance issues on systems with large memory and swap file configurations. This is due to a known defect in Solaris 11 update 1 , please refer it the bug id below for more details. Bug ID: 15813035 SUNBT7194962 pageout no longer pushes pages asynchronously WORKAROUND:: None. NONE FIXED INCIDENTS: ---------------- PATCH ID:148481-02 * INCIDENT NO:2928921 TRACKING ID:2843635 SYMPTOM: The VxFS internal testing, there are some failures during the reorg operation of structural files. DESCRIPTION: While the reorg is in progress, from certain ioctl, the error value that is to be returned is overwritten and thus results in an incorrect error value and test failures. RESOLUTION: Made changes accordingly so as the error value is not overwritten. * INCIDENT NO:2933290 TRACKING ID:2756779 SYMPTOM: Write and read performance concerns on CFS when running applications that rely on posix file-record locking (fcntl). DESCRIPTION: Usage of fcntl on CFS lead to high messaging traffic across nodes thereby reducing perfomance of readers and writers. RESOLUTION: Ranges that are being file-record locked will be cached on the node whenever possible to avoid broadcasting messages across the nodes in the cluster. * INCIDENT NO:2933291 TRACKING ID:2806466 SYMPTOM: A reclaim operation on a filesystem mounted on a Logical Volume Manager (LVM) volume using the fsadm(1M) command with the 'R' option may panic the system and the following stack trace is displayed: vx_dev_strategy+0xc0() vx_dummy_fsvm_strategy+0x30() vx_ts_reclaim+0x2c0() vx_aioctl_common+0xfd0() vx_aioctl+0x2d0() vx_ioctl+0x180() DESCRIPTION: Thin reclamation is supported only on the file systems mounted on a Veritas Volume Manager (VxVM) volume. RESOLUTION: The code is modified to error out gracefully if the underlying volume is LVM. * INCIDENT NO:2933292 TRACKING ID:2895743 SYMPTOM: It is taking too long for 50 Windows7 clients to log off in parallel if the user profile is stored in CFS. DESCRIPTION: VxFS keeps file creation time/full ACL things for samba clients in the extended attribute which is implemented via named streams. VxFS will read the named stream every time for each ACL object. Reading of named stream is a costly operation, as it results in an open, an opendir, a lookup and another open to get the fd. VxFS function vx_nattr_open() holds the exclusive rwlock to read an ACL object that stored as extended attribute. It may cause heavy lock contention when there are many threads wants the same lock, they will get blocked until one of nattr_open releases it, but which will take time since nattr_open is very slow.. RESOLUTION: Take the rwlock in shared mode instead of exclusive for linux getxattr code path. * INCIDENT NO:2933294 TRACKING ID:2750860 SYMPTOM: On a large file system(4TB or greater), the performance of the write(1) operation with many small request sizes may degrade, and many threads may be found sleeping with the following stack trace: real_sleep sleep_one vx_sleep_lock vx_lockmap vx_getemap vx_extfind vx_searchau_downlevel vx_searchau_downlevel vx_searchau_downlevel vx_searchau_downlevel vx_searchau_uplevel vx_searchau vx_extentalloc_device vx_extentalloc vx_te_bmap_alloc vx_bmap_alloc_typed vx_bmap_alloc vx_write_alloc3 vx_recv_prealloc vx_recv_rpc vx_msg_recvreq vx_msg_process_thread kthread_daemon_startup DESCRIPTION: For a cluster-mounted file system, the free-extend-search algorithm is not optimized for a large file system (4TB or greater), and for instances where the number of free Allocation Units (AUs) available can be very large. RESOLUTION: The code is modified to optimize the free-extend-search algorithm by skipping certain AUs. This reduces the overall search time. * INCIDENT NO:2933296 TRACKING ID:2923105 SYMPTOM: Removing the vxfs module from kernel taking lot of time. DESCRIPTION: When there are huge number of buffers allocated from buffer cache, at the time of removing module the process of freeing the buffers takes a lot of time. Algorithm of this process can be improved. RESOLUTION: Modified the algorithm such that, it will not keep on traversing the freelists even if it has found the freechunk. We will break out from the search and free that buffer. * INCIDENT NO:2933309 TRACKING ID:2858683 SYMPTOM: The reserve-extent attributes are changed after the vxrestore(1M ) operation, for files that are greater than 8192 bytes. DESCRIPTION: A local variable is used to contain the number of the reserve bytes that are reused during the vxrestore(1M) operation, for further VX_SETEXT ioctl call for files that are greater than 8k. As a result, the attribute information is changed. RESOLUTION: The code is modified to preserve the original variable value till the end of the function. * INCIDENT NO:2933313 TRACKING ID:2841059 SYMPTOM: The file system gets marked for a full fsck operation and the following message is displayed in the system log: V-2-96: vx_setfsflags file system fullfsck flag set - vx_ierror vx_setfsflags+0xee/0x120 vx_ierror+0x64/0x1d0 [vxfs] vx_iremove+0x14d/0xce0 vx_attr_iremove+0x11f/0x3e0 vx_fset_pnlct_merge+0x482/0x930 vx_lct_merge_fs+0xd1/0x120 vx_lct_merge_fs+0x0/0x120 vx_walk_fslist+0x11e/0x1d0 vx_lct_merge+0x24/0x30 vx_workitem_process+0x18/0x30 vx_worklist_process+0x125/0x290 vx_worklist_thread+0x0/0xc0 vx_worklist_thread+0x6d/0xc0 vx_kthread_init+0x9b/0xb0 V-2-17: vx_iremove_2 : file system inode 15 marked bad incore DESCRIPTION: Due to a race condition, the thread tries to remove an attribute inode that has already been removed by another thread. Hence, the file system is marked for a full fsck operation and the attribute inode is marked as 'bad ondisk'. RESOLUTION: The code is modified to check if the attribute node that a thread is trying to remove has already been removed. * INCIDENT NO:2933325 TRACKING ID:2905820 SYMPTOM: If the file is being read via NFSv4 client, then removing the same file on NFSv4 server may hang if the file system is VxFS. Stack of hanging thread on Server may look similar to this : _resume_from_idle+0xf8 resume_return() swtch() cv_timedwait_hires() cv_timedwait() rfs4_dbe_twait() deleg_vnevent() vhead_vnevent() fop_vnevent() vnevent_remove() vx_pd_remove() vx_remove1_pd() vx_do_remove() vx_remove1() vx_remove_vp() vx_remove() fop_remove() vn_removeat() vn_remove() unlink() _syscall32_save() DESCRIPTION: Deleting thread holds the irwlock in EXCL mode and waits for the delegation from client, while Client holds the delegation and keeps waiting for the irwlock in SH mode causing deadlock. RESOLUTION: We will inform the NFSv4 FEM monitor about the deletion of the file before holding the irwlock in EX mode to avoid the deadlock. * INCIDENT NO:2933326 TRACKING ID:2827751 SYMPTOM: When ODM is used with non-VxVM devices high kernel memory allocation is seen with following stack - kmem_slab_alloc+0xac kmem_cache_alloc+0x2dc bp_mapin_common+0xdc vdc_strategy+0x3c vx_dio_physio+0x654 vx_dio_rdwri+0x4a0 fdd_write_end+0x504 fdd_rw+0x6ac fdd_odm_rw+0x278 odm_vx_aio+0xb8 odm_vx_io+0x1c odm_io_issue+0xf8 odm_io_start+0x1e4 odm_io_req+0xb3c odm_request_io+0xdc DESCRIPTION: With non-VxVM device ODM should deallocate the kernel memory for self-owned buffers by calling bp_mapout() appropriately. RESOLUTION: Fixed as above. * INCIDENT NO:2933751 TRACKING ID:2916691 SYMPTOM: fsdedup infinite loop with the following stack: #5 [ffff88011a24b650] vx_dioread_compare at ffffffffa05416c4 #6 [ffff88011a24b720] vx_read_compare at ffffffffa05437a2 #7 [ffff88011a24b760] vx_dedup_extents at ffffffffa03e9e9b #11 [ffff88011a24bb90] vx_do_dedup at ffffffffa03f5a41 #12 [ffff88011a24bc40] vx_aioctl_dedup at ffffffffa03b5163 DESCRIPTION: vx_dedup_extents() do the following to dedup two files: 1. Compare the data extent of the two files that need to be deduped. 2. Split both files' bmap to make them share the first file's common data extent. 3. Free the duplicate data extent of the second file. In step 2, During bmap split, vx_bmap_split() might need to allocate space for the inode's bmap to add new bmap entries, which will add emap to this transaction. (This condition is more likely to hit if the dedup is being run on two large files that have interleaved duplicate/difference data extents, the files bmap will needed to be splited more in this case) In step 3, vx_extfree1() doesn't support Multi AU extent free if there is already an emap in the same transaction, In this case, it will return VX_ETRUNCMAX. (Please see incident e569695 for history of this limitation) VX_ETRUNCMAX is a retirable error, so vx_dedup_extents() will undo everything in the transaction and retry from the beginning, then hit the same error again. Thus infinite loop. RESOLUTION: We make vx_te_bmap_split() always register an transaction preamble for the bmap split operation in dedup, and let vx_dedup_extents() perform the preamble at a separate transaction before it retry the dedup operation. * INCIDENT NO:2933822 TRACKING ID:2624262 SYMPTOM: Panic hit in vx_bc_do_brelse() function while executing dedup functionality with following backtrace. vx_bc_do_brelse() vx_mixread_compare() vx_dedup_extents() enqueue_entity() __alloc_pages_slowpath() __get_free_pages() vx_getpages() vx_do_dedup() vx_aioctl_dedup() vx_aioctl_common() vx_rwunlock() vx_aioctl() vx_ioctl() vfs_ioctl() do_vfs_ioctl() sys_ioctl() DESCRIPTION: While executing function vx_mixread_compare() in dedup codepath, we hit error due to which an allocated data structure remained uninitialised. The panic occurs due to writing to this uninitialised allocated data structure in the function vx_mixread_compare(). RESOLUTION: Code is changed to free the memory allocated to the data structure when we are going out due to error. * INCIDENT NO:2937367 TRACKING ID:2923867 SYMPTOM: Got assert hit due to VX_RCQ_PROCESS_MSG having lower priority(Numerically) than VX_IUPDATE_MSG; DESCRIPTION: When primary is going to send VX_IUPDATE_MSG message to the owner of the inode about updation of the inode's non-transactional field change then it checks for the current messaging priority(for VX_RCQ_PROCESS_MSG) with the priority of the message being sent(VX_IUPDATE_MSG) to avoid possible deadlock. In our case we were getting VX_RCQ_PROCESS_MSG priority numerically lower than VX_IUPDATE_MSG thus getting assert hit. RESOLUTION: We have changed the VX_RCQ_PROCESS_MSG priority numerically higher than VX_IUPDATE_MSG thus avoiding possible assert hit. * INCIDENT NO:2976664 TRACKING ID:2906018 SYMPTOM: Corrupt metadata content can be identified by VxFS, and the fullfsck flag can be set in the superblock. Leading up to this the following events would have occurred - a system crash event would have occurred and the corresponding fsck intent-log replay would fail to replay the intent-log, but fsck will fail to report any error, instead fsck simply marks the superblock as clean. Therefore in the event of a system crash the intent-log is not replayed and file system is marked clean, thus subsequently mounting the file system extended operations will not be completed either. These two failures can then easily result in VxFS reporting corrupt metadata content whilst the file system is mounted after a system crash event, any corrupt metadata content errors will be reported when VxFS is attempting to access the inconsistent metadata. DESCRIPTION: At mkfs time one intent-log is created in the file system, however every cluster file system [CFS] mount (be it a primary or secondary mount) has its own intent-log within the file system. An additional intent-log is therefore created when cluster mounting a CFS secondary for the first time, the CFS primary mount uses the intent-log created at mkfs time. To manage the intent-logs, and other extra objects required for CFS, we also create a holding object call a PNOLT - per node object location table. Once created, the PNOLTs (and corresponding intent-logs) are never deleted. Therefore, once a file system has been cluster mounted it will for evermore contain PNOLTs. Only when a file system that contains PNOLTs is mounted locally (mounted without using 'mount -o cluster') are we potentially exposed to this issue. In the event of a system crash a locally mounted file system will logically require an fsck intent-log replay, however if the file system contains PNOLTs the intent-log replay will be incorrectly skipped and report that the replay was successful and the superblock marked as clean - however, no transactions have actually been replayed (hence the intent-log replay is silently skipped). The reason why fsck silently skips the intent-log replay is that each PNOLT has a flag to identify whether the intent-log is dirty or not - in the event of a system crash this flag signifies whether intent-log replay is required, or not. In the event of a system crash whilst the file system was mounted locally the PNOLTs are not utilised, but the fsck intent-log replay will still check for the flags in the PNOLTs, however these are the wrong flags to check if the file system was locally mounted. The fsck intent-log replay therefore believes the intent-logs are clean (because the PNOLTs are not marked dirty) and it therefore skips the replay of intent-log altogether. RESOLUTION: If PNOLTs exist in the file system VxFS will now set the dirty flag in the CFS primary PNOLT when mounting locally. With this change, and in the event of system crash whilst a file system was locally mounted, the subsequent fsck intent-log replay will correctly utilize the PNOLT structures and successfully replay the intent log. * INCIDENT NO:2978227 TRACKING ID:2857751 SYMPTOM: The internal testing hits the assert "f:vx_cbdnlc_enter:1a" when the upgrade was in progress. DESCRIPTION: The clone/fileset should be mounted if there is an attempt to add an entry in the dnlc. If the clone/fileset is not mounted and still there is an attempt to add it to dnlc, then it is not valid. RESOLUTION: Fix is added to check if filset is mounted or not before adding an entry to dnlc. * INCIDENT NO:2984589 TRACKING ID:2977697 SYMPTOM: Core dump is generated while removing the clone. DESCRIPTION: Suppose we created a file system where there is character/device special file in it and subsequently create clone of that. Then, while removing the clone, we might get core dumped. While removing the clone, it converts all inodes to 'IFPTI'. Now, the inode of character special file contains 'i_cdev' set and its type is 'IFCHR'. While deleting the inode with 'vx_idetach()' and converting it to 'IFPTI', we face NULL pointer dereference in 'clear_inode'. This caused core dump. To avoid this, we call 'clear_inode' while unmounting. This won't cause NULL pointer dereference. RESOLUTION: Clear inode in the processing of unmount to prevent NULL pointer dereference. * INCIDENT NO:2987373 TRACKING ID:2881211 SYMPTOM: File ACLs not preserved in checkpoints properly if file has hardlink. Works fine with file ACLs which don't have hardlinks. DESCRIPTION: This issue is with attribute inode. When we add an acl entry, if its in the immediate area its propagated to the clone . But in the case if attribute inode is created, its not being propagated to the checkpoint. We are missing push in the context of attribute inode and so getting this issue. RESOLUTION: Modified the code to propagate the ACLs entries (attribute inode case) to the clone. * INCIDENT NO:3007184 TRACKING ID:3018869 SYMPTOM: fsadm command shows that the mountpoint is not a vxfs file system DESCRIPTION: The Solaris11 update1 has some changes in function fstatvfs() [VFS layer] which breaks VxFS's previous assumptions. The statvfs.f_basetype gets populated with some garbage value instead of "vxfs". So, during the fsadm, when we check for the file system type, the check fails and so we get the error. RESOLUTION: Made changes to fetch correct value for fstype using OS provided API's so that the statvfs.f_basetype field gets valid i.e. "vxfs" value. * INCIDENT NO:3021281 TRACKING ID:3013950 SYMPTOM: In VxFS internal testing, we hit the assert "f:vx_info_init:2" after install the stack and reboot the machine. DESCRIPTION: The _ncpu value in Solaris kernel was increased beyond 640. Solaris 10 Update 11 has 1024 as limit and Soalris 11 Update 1 has 3072 as _ncpu value. In our current VxFS code, the maco VX_CMD_MAX_CPU has value 640. We have a check for [VX_CMD_MAX_CPU >= VX_MAX_CPU], which fails after Sol11u1 and so the panic occurred. RESOLUTION: Changed the VX_CMD_MAX_CPU limit to 4096. INCIDENTS FROM OLD PATCHES: --------------------------- NONE