* * * READ ME * * * * * * Veritas File System 7.3.1 * * * * * * Patch 100 * * * Patch Date: 2018-03-01 This document provides the following information: * PATCH NAME * OPERATING SYSTEMS SUPPORTED BY THE PATCH * PACKAGES AFFECTED BY THE PATCH * BASE PRODUCT VERSIONS FOR THE PATCH * SUMMARY OF INCIDENTS FIXED BY THE PATCH * DETAILS OF INCIDENTS FIXED BY THE PATCH * INSTALLATION PRE-REQUISITES * INSTALLING THE PATCH * REMOVING THE PATCH * KNOWN ISSUES PATCH NAME ---------- Veritas File System 7.3.1 Patch 100 OPERATING SYSTEMS SUPPORTED BY THE PATCH ---------------------------------------- RHEL6 x86-64 PACKAGES AFFECTED BY THE PATCH ------------------------------ VRTSvxfs BASE PRODUCT VERSIONS FOR THE PATCH ----------------------------------- * Veritas InfoScale Foundation 7.3.1 * Veritas InfoScale Storage 7.3.1 * Veritas InfoScale Enterprise 7.3.1 SUMMARY OF INCIDENTS FIXED BY THE PATCH --------------------------------------- Patch ID: 7.3.1.100 * 3933810 (3830300) Degraded CPU performance during backup of Oracle archive logs on CFS vs local filesystem * 3933819 (3879310) The file system may get corrupted after a failed vxupgrade. * 3933820 (3894712) ACL permissions are not inherited correctly on cluster file system. * 3933824 (3908785) System panic observed because of null page address in writeback structure in case of kswapd process. * 3933828 (3921152) Performance drop caused by vx_dalloc_flush(). * 3933834 (3931761) Cluster wide hang may be observed in case of high workload. * 3933843 (3926972) A recovery event can result in a cluster wide hang. * 3933844 (3922259) Force umount hang in vx_idrop * 3933912 (3922986) Dead lock issue with buffer cache iodone routine in CFS. * 3934841 (3930267) Deadlock between fsq flush threads and writer threads. * 3936286 (3936285) fscdsconv command may fsil the conversion for disk layout version(DLV) 12 and above. * 3937536 (3940516) File resize thread loops infinitely for file resize operation crossing 32 bit boundary. * 3938258 (3938256) When checking file size through seek_hole, it will return incorrect offset/size when delayed allocation is enabled on the file. * 3939406 (3941034) VxFS worker thread may continuously spin on a CPU * 3940266 (3940235) A hang might be observed in case filesystem gets disbaled while enospace handling is being taken care by inactive processing * 3940368 (3940268) File system might get disabled in case the size of the directory surpasses the vx_dexh_sz value. * 3940652 (3940651) The vxupgrade command might fail while upgrading Disk Layout Version (DLV) 10 to any upper DLV version. * 3940830 (3937042) Data corruption seen when issuing writev with mixture of named page and anonymous page buffers. DETAILS OF INCIDENTS FIXED BY THE PATCH --------------------------------------- This patch fixes the following incidents: Patch ID: 7.3.1.100 * 3933810 (Tracking ID: 3830300) SYMPTOM: Heavy cpu usage while oracle archive process are running on a clustered fs. DESCRIPTION: The cause of the poor read performance in this case was due to fragmentation, fragmentation mainly happens when there are multiple archivers running on the same node. The allocation pattern of the oracle archiver processes is 1. write header with O_SYNC 2. ftruncate-up the file to its final size ( a few GBs typically) 3. do lio_listio with 1MB iocbs The problem occurs because all the allocations in this manner go through internal allocations i.e. allocations below file size instead of allocations past the file size. Internal allocations are done at max 8 Pages at once. So if there are multiple processes doing this, they all get these 8 Pages alternately and the fs becomes very fragmented. RESOLUTION: Added a tunable, which will allocate zfod extents when ftruncate tries to increase the size of the file, instead of creating a hole. This will eliminate the allocations internal to file size thus the fragmentation. Fixed the earlier implementation of the same fix, which ran into locking issues. Also fixed the performance issue while writing from secondary node. * 3933819 (Tracking ID: 3879310) SYMPTOM: The file system may get corrupted after the file system freeze during vxupgrade. The full fsck gives the following errors: UX:vxfs fsck: ERROR: V-3-20451: No valid device inodes found UX:vxfs fsck: ERROR: V-3-20694: cannot initialize aggregate DESCRIPTION: The vxupgrade requires the file system to be frozen during its functional operation. It may happen that the corruption can be detected while the freeze is in progress and the full fsck flag can be set on the file system. However, this doesn't stop the vxupgrade from proceeding. At later stage of vxupgrade, after structures related to the new disk layout are updated on the disk, vxfs frees up and zeroes out some of the old metadata inodes. If any error occurs after this point (because of full fsck being set), the file system needs to go back completely to the previous version at the tile of full fsck. Since the metadata corresponding to the previous version is already cleared, the full fsck cannot proceed and gives the error. RESOLUTION: The code is modified to check for the full fsck flag after freezing the file system during vxupgrade. Also, disable the file system if an error occurs after writing new metadata on the disk. This will force the newly written metadata to be loaded in memory on the next mount. * 3933820 (Tracking ID: 3894712) SYMPTOM: ACL permissions are not inherited correctly on cluster file system. DESCRIPTION: The ACL counts stored on a directory inode gets reset every time directory inodes ownership is switched between the nodes. When ownership on directory inode comes back to the node, which previously abdicated it, ACL permissions were not getting inherited correctly for the newly created files. RESOLUTION: Modified the source such that the ACLs are inherited correctly. * 3933824 (Tracking ID: 3908785) SYMPTOM: System panic observed because of null page address in writeback structure in case of kswapd process. DESCRIPTION: Secfs2/Encryptfs layers had used write VOP as a hook when Kswapd is triggered to free page. Ideally kswapd should call writepage() routine where writeback structure are correctly filled. When write VOP is called because of hook in secfs2/encrypts, writeback structures are cleared, resulting in null page address. RESOLUTION: Code changes has been done to call VxFS kswapd routine only if valid page address is present. * 3933828 (Tracking ID: 3921152) SYMPTOM: Performance drop. Core dump shows threads doing vx_dalloc_flush(). DESCRIPTION: An implicit typecast error in vx_dalloc_flush() can cause this performance issue. RESOLUTION: The code is modified to do an explicit typecast. * 3933834 (Tracking ID: 3931761) SYMPTOM: cluster wide hang may be observed in a race scenario in case freeze gets initiated and there are multiple pending workitems in the worklist related to lazy isize update workitems. DESCRIPTION: If lazy_isize_enable tunable is ON and "ls -l" is getting executed from the non-writing node of the cluster frequently, it accumulates a huge number of workitems to get processed by worker threads. In case there is any workitem with active level 1 held which is enqueued after these workitems and clusterwide freeze gets initiated, it leads to deadlock situation. The worker threads would get exhausted in processing the lazy isize update work items and the thread which is enqueued in the worklist would never get a chance to be processed. RESOLUTION: code changes have been done to handle this race condition. * 3933843 (Tracking ID: 3926972) SYMPTOM: Once a node reboots or goes out of the cluster, the whole cluster can hang. DESCRIPTION: This is a three way deadlock, in which a glock grant could block the recovery while trying to cache the grant against an inode. But when it tries for ilock, if the lock is held by hlock revoke and waiting to get a glm lock, in our case cbuf lock, then it won't be able to get that because a recovery is in progress. The recovery can't proceed because glock grant thread blocked it. Hence the whole cluster hangs. RESOLUTION: The fix is to avoid taking ilock in GLM context, if it's not available. * 3933844 (Tracking ID: 3922259) SYMPTOM: A force umount hang with stack like this: - vx_delay - vx_idrop - vx_quotaoff_umount2 - vx_detach_fset - vx_force_umount - vx_aioctl_common - vx_aioctl - vx_admin_ioctl - vxportalunlockedkioctl - vxportalunlockedioctl - do_vfs_ioctl - SyS_ioctl - system_call_fastpath DESCRIPTION: An opened external quota file was preventing the force umount from continuing. RESOLUTION: Code has been changed so that an opened external quota file will be processed properly during the force umount. * 3933912 (Tracking ID: 3922986) SYMPTOM: System panic since Linux NMI Watchdog detected LOCKUP in CFS. DESCRIPTION: The vxfs buffer cache iodone routine interrupted the inode flush thread which was trying to acquire the cfs buffer hash lock with releasing the cfs buffer. And the iodone routine was blocked by other threads on acquiring the free list lock. In the cycle, the other threads were contending the cfs buffer hash lock with the inode flush thread. On Linux, the spinlock is FIFO tickets lock, so if the inode flush thread set ticket on the spinlock earlier, other threads cant acquire the lock. This caused a dead lock issue. RESOLUTION: Code changes are made to ensure acquiring the cfs buffer hash lock with irq disabled. * 3934841 (Tracking ID: 3930267) SYMPTOM: Deadlock between fsq flush threads and writer threads. DESCRIPTION: In linux, under certain circumstances i.e. to account dirty pages, writer threads takes lock on inode and start flushing dirty pages which will need page lock. In this case, if fsq flush threads start flushing transaction on the same inode then it will need the inode lock which was held by writer thread. The page lock was taken by another writer thread which is waiting for transaction space which can be only freed by fsq flush thread. This leads to deadlock between these 3 threads. RESOLUTION: Code is modified to add a new flag which will skip dirty page accounting. * 3936286 (Tracking ID: 3936285) SYMPTOM: fscdsconv command may fail the conversion for disk layout version 12 and above.After exporting file system for use on the specified target, it fails to mount on that specified target with below error: # /opt/VRTS/bin/mount UX:vxfs mount: ERROR: V-3-20012: not a valid vxfs file system UX:vxfs mount: ERROR: V-3-24996: Unable to get disk layout version when importing file system on target for use on the same system, it asks for 'fullfsck' during mount. After 'fullfsck', file system mounted successfully. But fsck gives below meesages: # /opt/VRTS/bin/fsck -y -o full /dev/vx/rdsk/mydg/myvol log replay in progress intent log does not contain valid log entries pass0 - checking structural files fileset 1 primary-ilist inode 34 (SuperBlock) failed validation clear? (ynq)y pass1 - checking inode sanity and blocks rebuild structural files? (ynq)y pass0 - checking structural files pass1 - checking inode sanity and blocks pass2 - checking directory linkage pass3 - checking reference counts pass4 - checking resource maps corrupted CUT entries, clear? (ynq)y au 0 emap incorrect - fix? (ynq)y OK to clear log? (ynq)y flush fileset headers? (ynq)y set state to CLEAN? (ynq)y DESCRIPTION: While checking for filesystem version in fscdsconv, the check for DLV 12 and above was missing and that triggered this issue. RESOLUTION: Code changes have been done to handle filesystem version 12 and above for fscdsconv command. * 3937536 (Tracking ID: 3940516) SYMPTOM: The file resize thread loops infinitely if tried to resize file to a size greater than 4TB DESCRIPTION: Because of vx_u32_t typecast in vx_odm_resize function, resize threads gets stuck inside an infinite loop. RESOLUTION: Removed vx_u32_t typcast in vx_odm_resize() to handle such scenarios. * 3938258 (Tracking ID: 3938256) SYMPTOM: When checking file size through seek_hole, it will return incorrect offset/size when delayed allocation is enabled on the file. DESCRIPTION: In recent version of RHEL7 onwards, grep command uses seek_hole feature to check current file size and then it reads data depends on this file size. In VxFS, when dalloc is enabled, we allocate the extent to file later but we increment the file size as soon as write completes. When checking the file size in seek_hole, VxFS didn't completely consider case of dalloc and it was returning stale size, depending on the extent allocated to file, instead of actual file size which was resulting in reading less amount of data than expected. RESOLUTION: Code is modified in such way that VxFS will now return correct size in case dalloc is enabled on file and seek_hole is called on that file. * 3939406 (Tracking ID: 3941034) SYMPTOM: During forced umount the vxfs worker thread may continuously spin on a CPU DESCRIPTION: During forced unmount a vxfs worker thread need a semaphore to drop super block reference but that semaphore is held by vxumount thread and this vxumount thread waiting for a event to happened. This situation causing a softlockup panic on the system because vxfs worker thread continuously spinning on a CPU to grab semaphore. RESOLUTION: Code changes are done to fix this issue. * 3940266 (Tracking ID: 3940235) SYMPTOM: A hang might be observed in case filesystem gets disbaled while enospace handling is being taken care by inactive processing. The stacktrace might look like: cv_wait+0x3c() ] delay_common+0x70() vx_extfree1+0xc08() vx_extfree+0x228() vx_te_trunc_data+0x125c() vx_te_trunc+0x878() vx_trunc_typed+0x230() vx_trunc_tran2+0x104c() vx_trunc_tran+0x22c() vx_trunc+0xcf0() vx_inactive_remove+0x4ec() vx_inactive_tran+0x13a4() vx_local_inactive_list+0x14() vx_inactive_list+0x6e4() vx_workitem_process+0x24() vx_worklist_process+0x1ec() vx_worklist_thread+0x144() thread_start+4() DESCRIPTION: In function smapchange funtion, it is possible in case of races that SMAP can record the oldstate as VX_EAU_FREE or VX_EAU_ALLOCATED. But, the corresponding EMAP won't be updated. This will happen if the concerned flag can get reset to 0 by some other thread in between. This leads to fm_dirtycnt leak which causes a hang sometime afterwards. RESOLUTION: Code changes has been done to fix the issue by using the local variable instead of global dflag variable directly which can get reset to 0. * 3940368 (Tracking ID: 3940268) SYMPTOM: File system having disk layout version 13 might get disabled in case the size of the directory surpasses the vx_dexh_sz value. DESCRIPTION: When LDH (large directory Hash) hash directory is filled up and the buckets are filled up, we extend the size of the hash directory. For this we create a reorg inode and copy extent map of LDH attr inode into reorg inode. This is done using extent map reorg function. In that function, we check whether extent reorg structure was passed for the same inode or not. If its not, then we dont proceed with extent copying. we setup the extent reorg structure accordingly but while setting up the fileset index, we use inodes i_fsetindex. But in disk layout version 13 onwards, we have overlaid the attribute inode and because of these changes, we no longer sets i_fsetindex in attribute inode and it will remain 0. Hence the checks in extent map reorg function is failing and resulting in disabling FS. RESOLUTION: Code has been modified to pass correct fileset. * 3940652 (Tracking ID: 3940651) SYMPTOM: During Disk Layout Version (DLV) vxupgrade command might observe a hang. DESCRIPTION: vxupgrade does a lookup on histino to identify mkfs version. In case of CFS lookup requires RWLOCK or GLOCK on inode. RESOLUTION: Code changes have been done to take RWLOCK and GLOCK on inode. * 3940830 (Tracking ID: 3937042) SYMPTOM: Data corruption seen when issuing writev with mixture of named page and anonymous page buffers. DESCRIPTION: During writes, VxFS prefaults all of the user buffers into kernel and decides the write length depending on this prefault length. In case of mixed page buffers, VxFs issues prefault separately for each page i.e. for named page and anon page. This reduces length to be wrote and will cause page create optimization. Since VxFs erroneously enables page create optimization, data corruption was seen on disk. RESOLUTION: Code is modified such that VxFS will not enable page create optimization when short prefault is seen. INSTALLING THE PATCH -------------------- Run the Installer script to automatically install the patch: ----------------------------------------------------------- Please be noted that the installation of this P-Patch will cause downtime. To install the patch perform the following steps on at least one node in the cluster: 1. Copy the patch fs-rhel6_x86_64-Patch-7.3.1.100.tar.gz to /tmp 2. Untar fs-rhel6_x86_64-Patch-7.3.1.100.tar.gz to /tmp/hf # mkdir /tmp/hf # cd /tmp/hf # gunzip /tmp/fs-rhel6_x86_64-Patch-7.3.1.100.tar.gz # tar xf /tmp/fs-rhel6_x86_64-Patch-7.3.1.100.tar 3. Install the hotfix(Please be noted that the installation of this P-Patch will cause downtime.) # pwd /tmp/hf # ./installVRTSvxfs731P100 [ ...] You can also install this patch together with 7.3.1 maintenance release using Install Bundles 1. Download this patch and extract it to a directory 2. Change to the Veritas InfoScale 7.3.1 directory and invoke the installer script with -patch_path option where -patch_path should point to the patch directory # ./installer -patch_path [] [ ...] Install the patch manually: -------------------------- # rpm -Uvh VRTSvxfs-7.3.1.100-RHEL6.x86_64.rpm REMOVING THE PATCH ------------------ # rpm -e rpm_name KNOWN ISSUES ------------ * Tracking ID: 3933818 SYMPTOM: Oracle database start failure, with trace log like this: ORA-63999: data file suffered media failure ORA-01114: IO error writing block to file 304 (block # 722821) ORA-01110: data file 304: ORA-17500: ODM err:ODM ERROR V-41-4-2-231-28 No space left on device WORKAROUND: No SPECIAL INSTRUCTIONS -------------------- NONE OTHERS ------ NONE