* * * READ ME * * * * * * Veritas File System 6.0.3 * * * * * * Public Hot Fix 3 * * * Patch Date: 2013-12-09 This document provides the following information: * PATCH NAME * OPERATING SYSTEMS SUPPORTED BY THE PATCH * PACKAGES AFFECTED BY THE PATCH * BASE PRODUCT VERSIONS FOR THE PATCH * SUMMARY OF INCIDENTS FIXED BY THE PATCH * DETAILS OF INCIDENTS FIXED BY THE PATCH * INSTALLATION PRE-REQUISITES * INSTALLING THE PATCH * REMOVING THE PATCH PATCH NAME ---------- Veritas File System 6.0.3 Public Hot Fix 3 OPERATING SYSTEMS SUPPORTED BY THE PATCH ---------------------------------------- RHEL6 x86-64 PACKAGES AFFECTED BY THE PATCH ------------------------------ VRTSodm VRTSvxfs BASE PRODUCT VERSIONS FOR THE PATCH ----------------------------------- * Veritas Storage Foundation for Oracle RAC 6.0.1 * Veritas Storage Foundation Cluster File System 6.0.1 * Veritas Storage Foundation 6.0.1 * Veritas Storage Foundation High Availability 6.0.1 * Symantec VirtualStore 6.0.1 SUMMARY OF INCIDENTS FIXED BY THE PATCH --------------------------------------- Patch ID: 6.0.300.300 * 3384781 (3384775) Installing patch 6.0.3.200 on RHEL 6.4 or earlier RHEL 6.* versions fails with ERROR: No appropriate modules found. Patch ID: 6.0.300.200 * 3349650 (3349649) ODM modules fail to load on RHEL6.5 * 3349652 (3349651) VxFS modules fail to load on RHEL6.5 * 3356841 (2059611) Panic in vx_unlockmap() due to NULL ml_tranp. * 3356845 (3331419) System panic because of kernel stack overflow. * 3356892 (3259634) A Cluster File System having more than 4G blocks gets corrupted. * 3356895 (3253210) File System hung, when it has reached the space limit. * 3356909 (3335272) mkfs dumps core when logsize given is not aligned. * 3357264 (3350804) System panic on RHEL6 setup due to kernel stack overflow corruption * 3357278 (3340286) Persistent tunable setting of "dalloc_enable" gets reset to default after file system resize. DETAILS OF INCIDENTS FIXED BY THE PATCH --------------------------------------- This patch fixes the following Symantec incidents: Patch ID: 6.0.300.300 * 3384781 (Tracking ID: 3384775) SYMPTOM: Installing patch 6.0.3.200 on RHEL 6.4 or earlier RHEL 6.* versions fails with ERROR: No appropriate modules found. # /etc/init.d/vxfs start ERROR: No appropriate modules found. Error in loading module "vxfs". See documentation. Failed to create /dev/vxportal ERROR: Module fdd does not exist in /proc/modules ERROR: Module vxportal does not exist in /proc/modules ERROR: Module vxfs does not exist in /proc/modules DESCRIPTION: VRTSvxfs and VRTSodm rpms ship four different set module for RHEL 6.1 and RHEL6.2 , RHEL 6.3 , RHEL6.4 and RHEL 6.5 each. However the current patch only contains the RHEL 6.5 module. Hence installation on earlier RHEL 6.* version fail. RESOLUTION: A superseding patch 6.0.3.300 will be released to include all the modules for RHEL 6.* versions which will be available on SORT for download. Patch ID: 6.0.300.200 * 3349650 (Tracking ID: 3349649) SYMPTOM: ODM modules fail to load on RHEL6.5 and following error messages are reported in system log. kernel: vxodm: disagrees about version of symbol putname kernel: vxodm: disagrees about version of symbol getname DESCRIPTION: In RHEL6.5 the kernel interfaces for getname and putname used by VxFS have changed. RESOLUTION: Code modified to use latest definitions of getname and putname kernel interfaces. * 3349652 (Tracking ID: 3349651) SYMPTOM: VxFS modules fail to load on RHEL6.5 and following error messages are reported in system log. kernel: vxfs: disagrees about version of symbol putname kernel: vxfs: disagrees about version of symbol getname DESCRIPTION: In RHEL6.5 the kernel interfaces for getname and putname used by VxFS have changed. RESOLUTION: Code modified to use latest definitions of getname and putname kernel interfaces. * 3356841 (Tracking ID: 2059611) SYMPTOM: system panics because NULL tranp in vx_unlockmap(). DESCRIPTION: vx_unlockmap is to unlock a map structure of file system. If the map is being handled, we incremented the hold count. vx_unlockmap() attempts to check whether this is an empty mlink doubly linked list while we have an async vx_mapiodone routine which can change the link at unpredictable timing even though the hold count is zero. RESOLUTION: evaluation order is changed inside vx_unlockmap(), such that further evaluation can be skipped over when map hold count is zero. * 3356845 (Tracking ID: 3331419) SYMPTOM: Machine panics with the following stack trace. #0 [ffff883ff8fdc110] machine_kexec at ffffffff81035c0b #1 [ffff883ff8fdc170] crash_kexec at ffffffff810c0dd2 #2 [ffff883ff8fdc240] oops_end at ffffffff81511680 #3 [ffff883ff8fdc270] no_context at ffffffff81046bfb #4 [ffff883ff8fdc2c0] __bad_area_nosemaphore at ffffffff81046e85 #5 [ffff883ff8fdc310] bad_area at ffffffff81046fae #6 [ffff883ff8fdc340] __do_page_fault at ffffffff81047760 #7 [ffff883ff8fdc460] do_page_fault at ffffffff815135ce #8 [ffff883ff8fdc490] page_fault at ffffffff81510985 [exception RIP: print_context_stack+173] RIP: ffffffff8100f4dd RSP: ffff883ff8fdc548 RFLAGS: 00010006 RAX: 00000010ffffffff RBX: ffff883ff8fdc6d0 RCX: 0000000000002755 RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000046 RBP: ffff883ff8fdc5a8 R8: 000000000002072c R9: 00000000fffffffb R10: 0000000000000001 R11: 000000000000000c R12: ffff883ff8fdc648 R13: ffff883ff8fdc000 R14: ffffffff81600460 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffff883ff8fdc540] print_context_stack at ffffffff8100f4d1 #10 [ffff883ff8fdc5b0] dump_trace at ffffffff8100e4a0 #11 [ffff883ff8fdc650] show_trace_log_lvl at ffffffff8100f245 #12 [ffff883ff8fdc680] show_trace at ffffffff8100f275 #13 [ffff883ff8fdc690] dump_stack at ffffffff8150d3ca #14 [ffff883ff8fdc6d0] warn_slowpath_common at ffffffff8106e2e7 #15 [ffff883ff8fdc710] warn_slowpath_null at ffffffff8106e33a #16 [ffff883ff8fdc720] hrtick_start_fair at ffffffff810575eb #17 [ffff883ff8fdc750] pick_next_task_fair at ffffffff81064a00 #18 [ffff883ff8fdc7a0] schedule at ffffffff8150d908 #19 [ffff883ff8fdc860] __cond_resched at ffffffff81064d6a #20 [ffff883ff8fdc880] _cond_resched at ffffffff8150e550 #21 [ffff883ff8fdc890] vx_nalloc_getpage_lnx at ffffffffa041afd5 [vxfs] #22 [ffff883ff8fdca80] vx_nalloc_getpage at ffffffffa03467a3 [vxfs] #23 [ffff883ff8fdcbf0] vx_do_getpage at ffffffffa034816b [vxfs] #24 [ffff883ff8fdcdd0] vx_do_read_ahead at ffffffffa03f705e [vxfs] #25 [ffff883ff8fdceb0] vx_read_ahead at ffffffffa038ed8a [vxfs] #26 [ffff883ff8fdcfc0] vx_do_getpage at ffffffffa0347732 [vxfs] #27 [ffff883ff8fdd1a0] vx_getpage1 at ffffffffa034865d [vxfs] #28 [ffff883ff8fdd2f0] vx_fault at ffffffffa03d4788 [vxfs] #29 [ffff883ff8fdd400] __do_fault at ffffffff81143194 #30 [ffff883ff8fdd490] handle_pte_fault at ffffffff81143767 #31 [ffff883ff8fdd570] handle_mm_fault at ffffffff811443fa #32 [ffff883ff8fdd5e0] __get_user_pages at ffffffff811445fa #33 [ffff883ff8fdd670] get_user_pages at ffffffff81144999 #34 [ffff883ff8fdd690] vx_dio_physio at ffffffffa041d812 [vxfs] #35 [ffff883ff8fdd800] vx_dio_rdwri at ffffffffa02ed08e [vxfs] #36 [ffff883ff8fdda20] vx_write_direct at ffffffffa044f490 [vxfs] #37 [ffff883ff8fddaf0] vx_write1 at ffffffffa04524bf [vxfs] #38 [ffff883ff8fddc30] vx_write_common_slow at ffffffffa0453e4b [vxfs] #39 [ffff883ff8fddd30] vx_write_common at ffffffffa0454ea8 [vxfs] #40 [ffff883ff8fdde00] vx_write at ffffffffa03dc3ac [vxfs] #41 [ffff883ff8fddef0] vfs_write at ffffffff81181078 #42 [ffff883ff8fddf30] sys_pwrite64 at ffffffff81181a32 #43 [ffff883ff8fddf80] system_call_fastpath at ffffffff8100b072 DESCRIPTION: The panic is due to kernel referring to corrupted thread_info structure from the scheduler, thread_info got corrupted by stack overflow. While doing direct I/O write, user-space pages need to be pre-faulted using __get_user_pages() code path. This code path is very deep can end up consuming lot of stack space. RESOLUTION: Reduced the kernel stack consumption by ~400-500 bytes in this code path by making various changes in the way pre-faulting is done. * 3356892 (Tracking ID: 3259634) SYMPTOM: A CFS that has more than 4 GB blocks is corrupted due to some file system metadata being zeroed out incorrectly. The blocks which get zeroed out may contain any metadata or file data and can be located anywhere on the disk. The problem occurs only with the following file system size and the FS block size combinations: 1kb block size and FS size > 4TB 2kb block size and FS size > 8TB 4kb block size and FS size > 16TB 8kb block size and FS size > 32TB DESCRIPTION: When a CFS is mounted for the first time on the secondary node, a per-node- intent log is created. When the intent log is created, the blocks newly allocated to it are zeroed out. The start offset and the length to be cleared is passed to the block that clears the routine. Due to a miscalculation a wrong start offset is passed. This results in the disk content at that offset getting zeroed out incorrectly. This content can be file system metadata or file data. If it is the metadata, this corruption is detected when the metadata is accessed and the file system is marked for full fsck(1M). RESOLUTION: The code is modified so that the correct start offset is passed to the block that clears the routine. * 3356895 (Tracking ID: 3253210) SYMPTOM: File System hung with the stacktrace: vx_svar_sleep_unlock default_wake_function __wake_up vx_event_wait vx_extentalloc_handoff vx_te_bmap_alloc vx_bmap_alloc_typed vx_bmap_alloc vx_bmap vx_exh_allocblk vx_exh_splitbucket vx_exh_split vx_dopreamble vx_rename_tran vx_pd_rename DESCRIPTION: VxFS will use large directory hash for large directory when large directory hash is enabled thru tuning vx_dexh_sz. When renaming a file, a new directory entry will inserted to that hash table that can result in hash split. The hash split will fail the current transaction and retry after some housekeeping job is done. The job includes allocating more space for the hash table. However, VxFS doesn't check if the return value of the preamble job, thus, when there runs out of space, the rename transaction will be re-entered permanently without knowing if more space is allocated by preamble job. RESOLUTION: Code changed such that, it will exit looping if ENOSPC is hit from the preamble job. * 3356909 (Tracking ID: 3335272) SYMPTOM: mkfs dumps core when logsize given is not aligned. Stack may look like as below : (gdb) bt #0 find_space () #1 place_extents () #2 fill_fset () #3 main () (gdb) DESCRIPTION: While creating the VxFS file system using mkfs, if logsize given is not properly aligned, we may end up in doing miscalculations for placing the RCQ extents and finding no place. It may lead to illegal memory access of AU bitmap and resulting in core dump. RESOLUTION: Place the RCQ extents in the same AU where log extents are allocated, if there is enough space left. * 3357264 (Tracking ID: 3350804) SYMPTOM: On RHEL6 setups, VXFS can sometimes report system panic with error as Thread overran stack, or stack corrupted. DESCRIPTION: On RHEL6, the low-stack memory allocations consume significant memory, especially when system is under memory pressure and takes page allocator route. This breaks our earlier assumptions of stack depth calculations. RESOLUTION: Added a check in case of RHEL6 before doing low-stack allocations for available stack size, the various stack depth calculations have been re-tuned for each distribution separately to minimize performance penalties. * 3357278 (Tracking ID: 3340286) SYMPTOM: Persistent tunable setting of "dalloc_enable" gets reset to default after file system resize. DESCRIPTION: File System resize operation triggers the file system reinitialization. While reinitializing instead of retaining the old tunable value, default value was getting reset. RESOLUTION: Fixed the code such that it will retain the old "dalloc_enable" tunable value. INSTALLING THE PATCH -------------------- #rpm -Uvh VRTSvxfs-6.0.300.300-RHEL6.x86_64.rpm For VRTSodm: Before installing the patch, the SF/SFHA/SFCFS/SFRAC stack have to be at the 6.0.3 level. Please refer to this appnote at http://www.symantec.com/docs/TECH210620 for detailed instructions/dependencies before installing this patch. RHEL6 # rpm -Uvh VRTSodm-6.0.300.300-RHEL6.x86_64.rpm REMOVING THE PATCH ------------------ #rpm -e rpm_name SPECIAL INSTRUCTIONS -------------------- NONE OTHERS ------ NONE