* * * READ ME * * * * * * Veritas File System 7.2 * * * * * * Patch 200 * * * Patch Date: 2017-06-07 This document provides the following information: * PATCH NAME * OPERATING SYSTEMS SUPPORTED BY THE PATCH * PACKAGES AFFECTED BY THE PATCH * BASE PRODUCT VERSIONS FOR THE PATCH * SUMMARY OF INCIDENTS FIXED BY THE PATCH * DETAILS OF INCIDENTS FIXED BY THE PATCH * INSTALLATION PRE-REQUISITES * INSTALLING THE PATCH * REMOVING THE PATCH * KNOWN ISSUES PATCH NAME ---------- Veritas File System 7.2 Patch 200 OPERATING SYSTEMS SUPPORTED BY THE PATCH ---------------------------------------- Solaris 11 SPARC PACKAGES AFFECTED BY THE PATCH ------------------------------ VRTSvxfs BASE PRODUCT VERSIONS FOR THE PATCH ----------------------------------- * Veritas InfoScale Foundation 7.2 * Veritas InfoScale Storage 7.2 * Veritas InfoScale Enterprise 7.2 SUMMARY OF INCIDENTS FIXED BY THE PATCH --------------------------------------- Patch ID: 7.2.0.200 * 3909937 (3908954) Some writes could be missed causing data loss. * 3909940 (3894712) ACL permissions are not inherited correctly on cluster file system. * 3909947 (3898565) Solaris no longer supports F_SOFTLOCK. * 3910080 (3870832) Panic due to a race between force umount and nfs lock manager vnode get operation. * 3910083 (3707662) Race between reorg processing and fsadm timer thread (alarm expiry) leads to panic in vx_reorg_emap. * 3910085 (3779916) vxfsconvert fails to upgrade layout verison for a vxfs file system with large number of inodes. * 3910086 (3830300) Degraded CPU performance during backup of Oracle archive logs on CFS vs local filesystem * 3910088 (3855726) Panic in vx_prot_unregister_all(). * 3910090 (3790721) High cpu usage caused by vx_send_bcastgetemapmsg_remaus * 3910092 (3751049) The umountall operation fails on Solaris. * 3910093 (1428611) 'vxcompress' can spew many GLM block lock messages over the LLT network. * 3910094 (3879310) The file system may get corrupted after a failed vxupgrade. * 3910096 (3757609) CPU usage going high because of contention over ODM_IO_LOCK * 3910097 (3817734) Direct command to run fsck with -y|Y option was mentioned in the message displayed to user when file system mount fails. * 3910098 (3861271) Missing an inode clear operation when a Linux inode is being de-initialized on SLES11. * 3910101 (3846521) "cp -p" fails if modification time in nano seconds have 10 digits. * 3910103 (3817734) Direct command to run fsck with -y|Y option was mentioned in the message displayed to user when file system mount fails. * 3910105 (3907902) System panic observed due to race between dalloc off thread and getattr thread. * 3911290 (3910526) fsadm fails with error number 28 during resize operation * 3911407 (3917013) fsck command throws error message. * 3911718 (3905576) CFS hang during a cluster wide freeze * 3911719 (3909583) Disable partitioning of directory if directory size is greater than upper threshold value. * 3911732 (3896670) Intermittent CFS hang like situation with many CFS pglock grant messages pending on LLT layer * 3912604 (3914488) On a Local Mount Filesystem, while mounting the filesystem, it can be marked for FULLFSCK. * 3912988 (3912315) EMAP corruption while freeing up the extent * 3912989 (3912322) vxfs tunable max_seqio_extent_size cannot be tuned to any value less than 32768. * 3912990 (3912407) CFS hang on VxFS 7.2 while thread on a CFS node waits for EAU delegation. * 3913004 (3911048) LDH corrupt and filesystem hang. * 3914871 (3915125) File system kernel threads deadlock while allocating/freeing blocks. * 3916912 (3916914) On Disk Layout Version 11, FileSystem may run into ENOSPC condition. * 3917961 (3919130) cfs conformance -> attr have some failures. * 3918551 (3919135) vxtunefs man page changes for max_seqio_extent_size tunable DETAILS OF INCIDENTS FIXED BY THE PATCH --------------------------------------- This patch fixes the following incidents: Patch ID: 7.2.0.200 * 3909937 (Tracking ID: 3908954) SYMPTOM: Whilst performing vectored writes, using writev(), where two iovec-writes write to different offsets within the same 4K page-aligned range of a file, it is possible to find null data at the beginning of the 4Kb range when reading the data back. DESCRIPTION: Whilst multiple processes are performing vectored writes to a file, using writev(), The following situation can occur We have 2 iovecs, the first is 448 bytes and the second is 30000 bytes. The first iovec of 448 bytes completes, however the second iovec finds that the source page is no longer in memory. As it cannot fault-in during uiomove, it has to undo both iovecs. It then faults the page back in and retries the second iovec only. However, as the undo operation also undid the first iovec, the first 448 bytes of the page are populated with nulls. When reading the file back, it seems that no data was written for the first iovec. Hence, we find nulls in the file. RESOLUTION: Code has been changed to handle the unwind of multiple iovecs accordingly in the scenarios where certain amount of data is written out of a particular iovec and some from other. * 3909940 (Tracking ID: 3894712) SYMPTOM: ACL permissions are not inherited correctly on cluster file system. DESCRIPTION: The ACL counts stored on a directory inode gets reset every time directory inodes ownership is switched between the nodes. When ownership on directory inode comes back to the node, which previously abdicated it, ACL permissions were not getting inherited correctly for the newly created files. RESOLUTION: Modified the source such that the ACLs are inherited correctly. * 3909947 (Tracking ID: 3898565) SYMPTOM: System panicked with this stack: - panicsys - panic_common - panic - segshared_fault - as_fault - vx_memlock - vx_dio_zero - vx_dio_rdwri - vx_dio_read - vx_read_common_inline - vx_read_common_noinline - vx_read1 - vx_read - fop_read DESCRIPTION: Solaris no longer supports F_SOFTLOCK. The vx_memlock() uses F_SOFTLOCK to fault in the page. RESOLUTION: Change vxfs code to avoid using F_SOFTLOCK. * 3910080 (Tracking ID: 3870832) SYMPTOM: System panic due to a race between force umount and the nfs lock manager thread trying to get a vnode with the stack as below: vx_active_common_flush vx_do_vget vx_vget fsop_vget lm_nfs3_fhtovp lm_get_vnode lm_unlock lm_nlm4_dispatch svc_getreq svc_run svc_do_run nfssys DESCRIPTION: When the nfs mounted filesystem is unshared and force unmounted, if there is a file that was locked from the nfs client before that, there could be a panic. In nfs3 the unshare does not clear the existing locks or clear/kill the lock manager threads, so when the force umount wins the race, it would go and free the vx_fsext and vx_vfs structures. Later when the lockmanager threads try to get the vnode of this force unmounted filesystem it panics on the vx_fsext structure that is freed. RESOLUTION: The code is modified to mark the solaris vfs flag with VFS_UNMOUNTED flag during a force umount. This flag is later checked in the vx_vget function when the lock manager thread comes to get vnode, if the flag is set, then it returns an error. * 3910083 (Tracking ID: 3707662) SYMPTOM: Race between reorg processing and fsadm timer thread (alarm expiry) leads to panic in vx_reorg_emap with the following stack:: vx_iunlock vx_reorg_iunlock_rct_reorg vx_reorg_emap vx_extmap_reorg vx_reorg vx_aioctl_full vx_aioctl_common vx_aioctl vx_ioctl fop_ioctl ioctl DESCRIPTION: When the timer expires (fsadm with -t option), vx_do_close() calls vx_reorg_clear() on local mount which performs cleanup on reorg rct inode. Another thread currently active in vx_reorg_emap() will panic due to null pointer dereference. RESOLUTION: When fop_close is called in alarm handler context, we defer the cleaning up untill the kernel thread performing reorg completes its operation. * 3910085 (Tracking ID: 3779916) SYMPTOM: vxfsconvert fails to upgrade layout verison for a vxfs file system with large number of inodes. Error message will show some inode discrepancy. DESCRIPTION: vxfsconvert walks through the ilist and converts inode. It stores chunks of inodes in a buffer and process them as a batch. The inode number parameter for this inode buffer is of type unsigned integer. The offset of a particular inode in the ilist is calculated by multiplying the inode number with size of inode structure. For large inode numbers this product of inode_number * inode_size can overflow the unsigned integer limit, thus giving wrong offset within the ilist file. vxfsconvert therefore reads wrong inode and eventually fails. RESOLUTION: The inode number parameter is defined as unsigned long to avoid overflow. * 3910086 (Tracking ID: 3830300) SYMPTOM: Heavy cpu usage while oracle archive process are running on a clustered fs. DESCRIPTION: The cause of the poor read performance in this case was due to fragmentation, fragmentation mainly happens when there are multiple archivers running on the same node. The allocation pattern of the oracle archiver processes is 1. write header with O_SYNC 2. ftruncate-up the file to its final size ( a few GBs typically) 3. do lio_listio with 1MB iocbs The problem occurs because all the allocations in this manner go through internal allocations i.e. allocations below file size instead of allocations past the file size. Internal allocations are done at max 8 Pages at once. So if there are multiple processes doing this, they all get these 8 Pages alternately and the fs becomes very fragmented. RESOLUTION: Added a tunable, which will allocate zfod extents when ftruncate tries to increase the size of the file, instead of creating a hole. This will eliminate the allocations internal to file size thus the fragmentation. Fixed the earlier implementation of the same fix, which ran into locking issues. Also fixed the performance issue while writing from secondary node. * 3910088 (Tracking ID: 3855726) SYMPTOM: Panic happens in vx_prot_unregister_all(). The stack looks like this: - vx_prot_unregister_all - vxportalclose - __fput - fput - filp_close - sys_close - system_call_fastpath DESCRIPTION: The panic is caused by a NULL fileset pointer, which is due to referencing the fileset before it's loaded, plus, there's a race on fileset identity array. RESOLUTION: Skip the fileset if it's not loaded yet. Add the identity array lock to prevent the possible race. * 3910090 (Tracking ID: 3790721) SYMPTOM: High CPU usage on the vxfs thread process. The backtrace of such kind of threads usually look like this: schedule schedule_timeout __down down vx_send_bcastgetemapmsg_remaus vx_send_bcastgetemapmsg vx_recv_getemapmsg vx_recvdele vx_msg_recvreq vx_msg_process_thread vx_kthread_init kernel_thread DESCRIPTION: The locking mechanism in vx_send_bcastgetemapmsg_process() is inefficient. So that every time vx_send_bcastgetemapmsg_process() is called, it will perform a series of down-up operation on a certain semaphore. This can result in a huge CPU cost when multiple threads have contention on this semaphore. RESOLUTION: Optimize the locking mechanism in vx_send_bcastgetemapmsg_process(), so that it only do down-up operation on the semaphore once. * 3910092 (Tracking ID: 3751049) SYMPTOM: The umountall operation fails on Solaris with error "V-3-20358: cannot open mnttab" DESCRIPTION: On Solaris, normally, fopen() returns an EMFILE error for 32-bit applications if it attempts to associate a stream with a file accessed by a file descriptor with a value greater than 255. When using umountall to umount more than 256 file systems, the command will fork child process and open more than 256 file descriptors at the same time.This will cross the 256 file descriptor maximum limit and cause the operation to fail. RESOLUTION: Use "F" mode in fopen call to avoid the 256 file descriptor limitation. * 3910093 (Tracking ID: 1428611) SYMPTOM: 'vxcompress' command can cause many GLM block lock messages to be sent over the network. This can be observed with 'glmstat -m' output under the section "proxy recv", as shown in the example below - bash-3.2# glmstat -m message all rw g pg h buf oth loop master send: GRANT 194 0 0 0 2 0 192 98 REVOKE 192 0 0 0 0 0 192 96 subtotal 386 0 0 0 2 0 384 194 master recv: LOCK 193 0 0 0 2 0 191 98 RELEASE 192 0 0 0 0 0 192 96 subtotal 385 0 0 0 2 0 383 194 master total 771 0 0 0 4 0 767 388 proxy send: LOCK 98 0 0 0 2 0 96 98 RELEASE 96 0 0 0 0 0 96 96 BLOCK_LOCK 2560 0 0 0 0 2560 0 0 BLOCK_RELEASE 2560 0 0 0 0 2560 0 0 subtotal 5314 0 0 0 2 5120 192 194 DESCRIPTION: 'vxcompress' creates placeholder inodes (called IFEMR inodes) to hold the compressed data of files. After the compression is finished, IFEMR inode exchange their bmap with the original file and later given to inactive processing. Inactive processing truncates the IFEMR extents (original extents of the regular file, which is now compressed) by sending cluster-wide buffer invalidation requests. These invalidations need GLM block lock. Regular file data need not be invalidated across the cluster, thus making these GLM block lock requests unnecessary. RESOLUTION: Pertinent code has been modified to skip the invalidation for the IFEMR inodes created during compression. * 3910094 (Tracking ID: 3879310) SYMPTOM: The file system may get corrupted after the file system freeze during vxupgrade. The full fsck gives the following errors: UX:vxfs fsck: ERROR: V-3-20451: No valid device inodes found UX:vxfs fsck: ERROR: V-3-20694: cannot initialize aggregate DESCRIPTION: The vxupgrade requires the file system to be frozen during its functional operation. It may happen that the corruption can be detected while the freeze is in progress and the full fsck flag can be set on the file system. However, this doesn't stop the vxupgrade from proceeding. At later stage of vxupgrade, after structures related to the new disk layout are updated on the disk, vxfs frees up and zeroes out some of the old metadata inodes. If any error occurs after this point (because of full fsck being set), the file system needs to go back completely to the previous version at the tile of full fsck. Since the metadata corresponding to the previous version is already cleared, the full fsck cannot proceed and gives the error. RESOLUTION: The code is modified to check for the full fsck flag after freezing the file system during vxupgrade. Also, disable the file system if an error occurs after writing new metadata on the disk. This will force the newly written metadata to be loaded in memory on the next mount. * 3910096 (Tracking ID: 3757609) SYMPTOM: High CPU usage because of contention over ODM_IO_LOCK DESCRIPTION: While performing ODM IO, to update some of the ODM counters we take ODM_IO_LOCK which leads to contention from multiple of iodones trying to update these counters at the same time. This is results in high CPU usage. RESOLUTION: Code modified to remove the lock contention. * 3910097 (Tracking ID: 3817734) SYMPTOM: If file system with full fsck flag set is mounted, direct command message is printed to the user to clean the file system with full fsck. DESCRIPTION: When mounting file system with full fsck flag set, mount will fail and a message will be printed to clean the file system with full fsck. This message contains direct command to run, which if run without collecting file system metasave will result in evidences being lost. Also since fsck will remove the file system inconsistencies it may lead to undesired data being lost. RESOLUTION: More generic message is given in error message instead of direct command. * 3910098 (Tracking ID: 3861271) SYMPTOM: Due to the missing inode clear action, a page can also be in a strange state. Also, inode is not fully quiescent which leads to races in the inode code. Sometime this can cause panic from iput_final(). DESCRIPTION: We're missing an inode clear operation when a Linux inode is being de-initialized on SLES11. RESOLUTION: Add the inode clear operation on SLES11. * 3910101 (Tracking ID: 3846521) SYMPTOM: cp -p is failing with EINVAL for files with 10 digit modification time. EINVAL error is returned if the value in tv_nsec field is greater than/outside the range of 0 to 999, 999, 999. VxFS supports the update in usec but when copying in the user space, we convert the usec to nsec. So here in this case, usec has crossed the upper boundary limit i.e 999, 999. DESCRIPTION: In a cluster, its possible that time across nodes might differ.so when updating mtime, vxfs check if it's cluster inode and if nodes mtime is newer time than current node time, then accordingly increment the tv_usec instead of changing mtime to older time value. There might be chance that it, tv_usec counter got overflowed here, which resulted in 10 digit mtime.tv_nsec. RESOLUTION: Code is modified to reset usec counter for mtime/atime/ctime when upper boundary limit i.e. 999999 is reached. * 3910103 (Tracking ID: 3817734) SYMPTOM: If file system with full fsck flag set is mounted, direct command message is printed to the user to clean the file system with full fsck. DESCRIPTION: When mounting file system with full fsck flag set, mount will fail and a message will be printed to clean the file system with full fsck. This message contains direct command to run, which if run without collecting file system metasave will result in evidences being lost. Also since fsck will remove the file system inconsistencies it may lead to undesired data being lost. RESOLUTION: More generic message is given in error message instead of direct command. * 3910105 (Tracking ID: 3907902) SYMPTOM: System panic observed due to race between dalloc off thread and getattr thread. DESCRIPTION: With 7.2 release of VxFS, dalloc states are now stored in a new structure. when getting attributes of file, dalloc blocks are calculated and stored into this new structure. If a dalloc off thread races with getattr thread, there is a possibility of dereferencing of NULL dalloc structure by getattr thread. RESOLUTION: Code changes has been done to take appropriate dalloc lock while calculating dalloc blocks in getattr function to avoid the race. * 3911290 (Tracking ID: 3910526) SYMPTOM: In expanding a full file system through vxresize(1M), the operation fails with ENOSPC (errno 28) and the following error is printed : UX:vxfs fsadm: ERROR: V-3-20340: attempt to resize failed with errno 28 Despite the failure, the file system remains expanded even after vxresize has shrunk back the volume after getting the ENOSPC error. As a result, the file system is marked for full-fsck but a full fsck would fail with errors like the followings : UX:vxfs fsck: ERROR: V-3-26248: could not read from block offset devid/blknum .... Device containing meta data may be missing in vset or device too big to be read on a 32 bit system. UX:vxfs fsck: ERROR: V-3-20694: cannot initialize aggregate file system check failure, aborting ... DESCRIPTION: If there is no space available in the filesystem and the resize operation gets initiated, the intent log extents are used for metadata setup to continue the resize operation. If resize is successful, the superblock is updated with the expanded size. A new intent log will be allocated because the old one has been used for the resize. There is a chance that the new intent log will fail allocation with ENOSPC because the expanded size is not big enough to return the space allocated to the original intent log. The superblock update is already made at this stage and will not be rolled back even a failure is returned. RESOLUTION: The code has been modified to fail the resize operation if resize size is less than the size of the intent log. * 3911407 (Tracking ID: 3917013) SYMPTOM: fsck may fail with following error message. # fsck -t vxfs -o full -n /dev/vx/dsk/testdg/vol1 UX:vxfs fsck.vxfs: ERROR: V-3-28484: File system block size is not aligned with supported sector size. DESCRIPTION: If superblock of filesystem is corrupted along with the block size field and now the block size is not aligned with supported sector size then while recovering the filesystem, the fsck may fail. RESOLUTION: Code changes has been done to add new flag to skip the sector and block size alignment check. * 3911718 (Tracking ID: 3905576) SYMPTOM: Cluster file system hangs. On one node, all worker threads are blocked due to file system freeze. And there's another thread blocked with stack like this: - __schedule - schedule - vx_svar_sleep_unlock - vx_event_wait - vx_olt_iauinit - vx_olt_iasinit - vx_loadv23_fsetinit - vx_loadv23_fset - vx_fset_reget - vx_fs_reinit - vx_recv_cwfa - vx_msg_process_thread - vx_kthread_init DESCRIPTION: The frozen CFS won't thaw because the mentioned thread is waiting for a work item to be processed in vx_olt_iauinit(). Since all the worker threads are blocked, there is no free thread to process this work item. RESOLUTION: Change the code in vx_olt_iauinit(), so that the work item will be processed even with all worker threads blocked. * 3911719 (Tracking ID: 3909583) SYMPTOM: Disable partitioning of directory if directory size is greater than upper threshold value. DESCRIPTION: If PD is enabled during mount, mount may take long time to complete. Because mount tries to partition all the directories hence looks like hung. To avoid such hangs, a new upper threshold value for PD is added which will disable partitioning of directory if directory size is above that value. RESOLUTION: Code is modified to disable partitioning of directory if directory size is greater than upper threshold value. * 3911732 (Tracking ID: 3896670) SYMPTOM: Intermittent CFS hang like situation with many CFS pglock grant messages pending on LLT layer. DESCRIPTION: To optimize the CFS locking, VxFS may send greedy pglock grant messages to speed up the upcoming write operations. In certain scenarios created due to particular read and write pattern across nodes, one node can send these greedy msgs far more faster than the speed of response. This may cause build up lot of msgs on the CFS layer and may delay the response of other msgs and cause slowness in CFS operations. RESOLUTION: Fix is to send the next greedy msg only after receiving the response of the previous one. This way at a given time there will be only one pglock greedy msg will be in flight. * 3912604 (Tracking ID: 3914488) SYMPTOM: Local mount may fail with the following error message and as a result FULLFSCK get set on the filesystem: vx_lctbad - file system link count table bad DESCRIPTION: During Extop processing while local mounting a filesystem (primary fileset), LCT may get marked bad and FULLFSCK may get set on the Filesystem. This is a corner case where earlier unmount opertion on the filesystem is not clean. RESOLUTION: Code Changes have been done to merge the LCT while mounting the primary fileset. * 3912988 (Tracking ID: 3912315) SYMPTOM: EMAP corruption while freeing up the extent. Feb 4 15:10:45 localhost kernel: vxfs: msgcnt 2 mesg 056: V-2-56: vx_mapbad - vx_smap_stateupd - file system extent allocation unit state bitmap number 0 marked bad Feb 4 15:10:45 localhost kernel: Call Trace: vx_setfsflags+0x103/0x140 [vxfs] vx_mapbad+0x74/0x2d0 [vxfs] vx_smap_stateupd+0x113/0x130 [vxfs] vx_extmapupd+0x552/0x580 [vxfs] vx_alloc+0x3d6/0xd10 [vxfs] vx_extsub+0x0/0x5f0 [vxfs] vx_semapclone+0xe1/0x190 [vxfs] vx_clonemap+0x14d/0x230 [vxfs] vx_unlockmap+0x299/0x330 [vxfs] vx_smap_dirtymap+0xea/0x120 [vxfs] vx_do_extfree+0x2b8/0x2e0 [vxfs] vx_extfree1+0x22e/0x7c0 [vxfs] vx_extfree+0x9f/0xd0 [vxfs] vx_exttrunc+0x10d/0x2a0 [vxfs] vx_trunc_ext4+0x65f/0x7a0 [vxfs] vx_validate_ext4+0xcc/0x1a0 [vxfs] vx_trunc_tran2+0xb7f/0x1450 [vxfs] vx_trunc_tran+0x18f/0x1e0 [vxfs] vx_trunc+0x66a/0x890 [vxfs] vx_iflush_list+0xaee/0xba0 [vxfs] vx_iflush+0x67/0x80 [vxfs] vx_workitem_process+0x24/0x50 [vxfs] DESCRIPTION: The issue oocurs due to wrongly validating the smap and changing the AU state from free to allocated. RESOLUTION: skip validation and change the EAU state only if it is a SMAP update. * 3912989 (Tracking ID: 3912322) SYMPTOM: The vxfs tunable max_seqio_extent_size cannot be tuned to any value less than 32768. DESCRIPTION: In vxfs 7.1 the default for max_seqio_extent_size was changed from 2048 to 32768. Due to a bug it doesn't allow setting any value less than 32768 for this tunable. RESOLUTION: Fix is to allow the tunable to be set to any value >=2048. * 3912990 (Tracking ID: 3912407) SYMPTOM: CFS hang on VxFS 7.2 while thread on a CFS node waits for EAU delegation: __schedule at ffffffff8163b46d schedule at ffffffff8163bb09 vx_svar_sleep_unlock at ffffffffa0a99eb5 [vxfs] vx_event_wait at ffffffffa0a9a68f [vxfs] vx_async_waitmsg at ffffffffa09aa680 [vxfs] vx_msg_send at ffffffffa09aa829 [vxfs] vx_get_dele at ffffffffa09d5be1 [vxfs] vx_extentalloc_device at ffffffffa0923bb0 [vxfs] vx_extentalloc at ffffffffa0925272 [vxfs] vx_bmap_ext4 at ffffffffa0953755 [vxfs] vx_bmap_alloc_ext4 at ffffffffa0953e14 [vxfs] vx_bmap_alloc at ffffffffa0950a2a [vxfs] vx_write_alloc3 at ffffffffa09c281e [vxfs] vx_tran_write_alloc at ffffffffa09c3321 [vxfs] vx_cfs_prealloc at ffffffffa09b4220 [vxfs] vx_write_alloc2 at ffffffffa09c25ad [vxfs] vx_write_alloc at ffffffffa0b708fb [vxfs] vx_write1 at ffffffffa0b712ff [vxfs] vx_write_common_slow at ffffffffa0b7268c [vxfs] vx_write_common at ffffffffa0b73857 [vxfs] vx_write at ffffffffa0af04a6 [vxfs] vfs_write at ffffffff811ded3d sys_write at ffffffff811df7df system_call_fastpath at ffffffff81646b09 And a corresponding delegation receiver thread should be seen looping on CFS primary: PID: 18958 TASK: ffff88006c776780 CPU: 0 COMMAND: "vx_msg_thread" __schedule at ffffffff8163a26d mutex_lock at ffffffff81638b42 vx_emap_lookup at ffffffffa0ecf0eb [vxfs] vx_extchkmaps at ffffffffa0e9c7b4 [vxfs] vx_searchau_downlevel at ffffffffa0ea0923 [vxfs] vx_searchau at ffffffffa0ea0e22 [vxfs] vx_dele_get_freespace at ffffffffa0f53b6d [vxfs] vx_getedele_size at ffffffffa0f54c4b [vxfs] vx_pri_getdele at ffffffffa0f54edc [vxfs] vx_recv_getdele at ffffffffa0f5690d [vxfs] vx_recvdele at ffffffffa0f59800 [vxfs] vx_msg_process_thread at ffffffffa0f2aead [vxfs] vx_kthread_init at ffffffffa105cba4 [vxfs] kthread at ffffffff810a5aef ret_from_fork at ffffffff81645858 DESCRIPTION: In VxFS 7.2 release a performance optimization was done in the way we update the allocation state map. Prior to 7.2 the updates were done synchronously and in 7.2 the updates were made transactional. The hang happens when an EAU needs to be converted back to FREE state after all the allocation from it is freed. In such case if the corresponding EAU delegation may time out before we can complete the process, it can result in inconsistent state. Because of this inconsistency, in future when a node will try to get delegation of this EAU, the primary may loop forever resulting the secondary to wait infinitely for the AU delegation. RESOLUTION: Code fix is done to not allow the delegation to timeout till the free processing is complete. * 3913004 (Tracking ID: 3911048) SYMPTOM: The LDH bucket validation failure message is logged and system hang. DESCRIPTION: When modifying a large directory, vxfs needs to find a new bucket in the LDH for this directory, and once the bucket is full, it will be be split to get more bucket to use. When the bucket is split to maximum amount, overflow bucket will be allocated. Under some condition, the available bucket lookup on overflow bucket will may got incorrect result and overwrite the existing bucket entry thus corrupt the LDH file. Another problem is that when the bucket invalidation failed, the bucket buffer is released without checking whether the buffer is already in a previous transaction, this may cause the transaction flush thread to hang and finally stuck the whole filesystem. RESOLUTION: Correct the LDH bucket entry change code to avoid the corrupt. And release the bucket buffer without throw it out of memory to avoid blocking the transaction flush. * 3914871 (Tracking ID: 3915125) SYMPTOM: File system freeze was stuck because of the deadlock among kernel threads while writing to a file. DESCRIPTION: While writing to any file, file system allocates space on disk. It needs to search for free blocks and hence needs to scan through various metadata information, like state map (SMAP), extent bitmap (EMAP), etc which are per- allocation unit (AU) information. Now, the problem was: we were not holding the global lock (GLM) while making changes to the particular allocation unit (AU). Because of this, the allocator thread was continuously looping. RESOLUTION: Return error from the function which does the allocation in case the we modified the state map metadata and couldn't get appropriate GLM lock. * 3916912 (Tracking ID: 3916914) SYMPTOM: On Disk Layout Version 11, FileSystem may run into ENOSPC condition even if Filesystem still has space available. DESCRIPTION: Filesystem can run into enospc condition with DLV 11 and logversion 12, although it has space but bitmaps are marked allocated instead of being marked free. RESOLUTION: Code changes have been done to reflect the correct log record conversion. * 3917961 (Tracking ID: 3919130) SYMPTOM: The conformance cfs attr test shows multiple failures for attr_hint, attr_hint_fsck. DESCRIPTION: For solaris, the user land is 32 bit and kernel land is 64 bit. The transition from user space to kernel space reads information regarding attributes from vx_nxattr_info_32. It needs some flags info for further processing, but vx_nxattr_info_32 does not have the field for flags. RESOLUTION: changing the code to add nxi_flags field to vx_nattr_info_32 struct. * 3918551 (Tracking ID: 3919135) SYMPTOM: The vxtunefs man page shows incorrect default value and minimum value for 'max_seqio_extent_size' tunable. DESCRIPTION: The default value of the 'max_seqio_extent_size' tunable is 32k but the man page shows the incorrect value i.e 2097152 blocks (1GB). Also, if we try to set tunable value to less than 2048 blocks then it resets this to minimum value (2048 blocks) but according to man page the reset value is deafault value and not minimum value. RESOLUTION: Changing vxtunefs man page for correcting the max_seqio_extent_size tunable's deafault value. INSTALLING THE PATCH -------------------- Run the Installer script to automatically install the patch: ----------------------------------------------------------- Please be noted that the installation of this P-Patch will cause downtime. To install the patch perform the following steps on at least one node in the cluster: 1. Copy the patch fs-sol11_sparc-Patch-7.2.0.200.tar.gz to /tmp 2. Untar fs-sol11_sparc-Patch-7.2.0.200.tar.gz to /tmp/hf # mkdir /tmp/hf # cd /tmp/hf # gunzip /tmp/fs-sol11_sparc-Patch-7.2.0.200.tar.gz # tar xf /tmp/fs-sol11_sparc-Patch-7.2.0.200.tar 3. Install the hotfix(Please be noted that the installation of this P-Patch will cause downtime.) # pwd /tmp/hf # ./installVRTSvxfs720P200 [ ...] You can also install this patch together with 7.2 base release using Install Bundles 1. Download this patch and extract it to a directory 2. Change to the Veritas InfoScale 7.2 directory and invoke the installer script with -patch_path option where -patch_path should point to the patch directory # ./installer -patch_path [] [ ...] Install the patch manually: -------------------------- 1. pkg uninstall VRTSvxfs 2. pkg unset-publisher Symantec 3. pkg unset-publisher Veritas 4. pkg set-publisher -g Veritas 5. pkg install --accept -g VRTSvxfs REMOVING THE PATCH ------------------ 1. pkg uninstall VRTSvxfs KNOWN ISSUES ------------ * Tracking ID: 3910079 SYMPTOM: Sequential reads slowed after scaling the conditional variables on which worker threads in VxFS sleep WORKAROUND: No * Tracking ID: 3910100 SYMPTOM: Oracle database start failure, with trace log like this: ORA-63999: data file suffered media failure ORA-01114: IO error writing block to file 304 (block # 722821) ORA-01110: data file 304: ORA-17500: ODM err:ODM ERROR V-41-4-2-231-28 No space left on device WORKAROUND: No * Tracking ID: 3916786 SYMPTOM: Kernel modules are not getting unloaded and upgraded to the newer version in Solaris when patches are installed upon the base packages. WORKAROUND: Reboot the system to refresh the modules. * Tracking ID: 3918076 SYMPTOM: There may be some unused space in case of Local filesystem due to incorrect accounting of space reservation. This issue is seen on Local file system where delayed allocation feature is enabled. WORKAROUND: 1. Unmount the filesystem. 2. Mount the filesystem. 3. Disable the Dalloc feature for the filesystem. (Using the vxtunefs command) SPECIAL INSTRUCTIONS -------------------- NONE OTHERS ------ NONE