* * * READ ME * * * * * * Symantec File System 6.1.1 * * * * * * Patch 6.1.1.200 * * * Patch Date: 2014-12-18 This document provides the following information: * PATCH NAME * OPERATING SYSTEMS SUPPORTED BY THE PATCH * PACKAGES AFFECTED BY THE PATCH * BASE PRODUCT VERSIONS FOR THE PATCH * SUMMARY OF INCIDENTS FIXED BY THE PATCH * DETAILS OF INCIDENTS FIXED BY THE PATCH * INSTALLATION PRE-REQUISITES * INSTALLING THE PATCH * REMOVING THE PATCH * KNOWN ISSUES PATCH NAME ---------- Symantec File System 6.1.1 Patch 6.1.1.200 OPERATING SYSTEMS SUPPORTED BY THE PATCH ---------------------------------------- RHEL6 x86-64 PACKAGES AFFECTED BY THE PATCH ------------------------------ VRTSvxfs BASE PRODUCT VERSIONS FOR THE PATCH ----------------------------------- * Symantec File System 6.1 * Symantec Storage Foundation 6.1 * Symantec Storage Foundation Cluster File System HA 6.1 * Symantec Storage Foundation for Oracle RAC 6.1 * Symantec Storage Foundation HA 6.1 SUMMARY OF INCIDENTS FIXED BY THE PATCH --------------------------------------- Patch ID: 6.1.1.200 * 3660421 (3660422) On RHEL 6.6, umount(8) system call hangs if an application is watching for inode events using inotify(7) APIs. Patch ID: 6.1.1.100 * 3520113 (3451284) Internal testing hits an assert "vx_sum_upd_efree1" * 3521945 (3530435) Panic in Internal test with SSD cache enabled. * 3529243 (3616722) System panics because of race between the writeback cache offline thread and the writeback data flush thread. * 3536233 (3457803) File System gets disabled intermittently with metadata IO error. * 3583963 (3583930) When the external quota file is restored or over-written, the old quota records are preserved. * 3617774 (3475194) Veritas File System (VxFS) fscdsconv(1M) command fails with metadata overflow. * 3617776 (3473390) The multiple stack overflows with Veritas File System (VxFS) on RHEL6 lead to panics or system crashes. * 3617781 (3557009) After the fallocate() function reserves allocation space, it results in the wrong file size. * 3617788 (3604071) High CPU usage consumed by the vxfs thread process. * 3617790 (3574404) Stack overflow during rename operation. * 3617793 (3564076) The MongoDB noSQL db creation fails with an ENOTSUP error. * 3617877 (3615850) Write system call hangs with invalid buffer length * 3620279 (3558087) The ls -l command hangs when the system takes backup. * 3620284 (3596378) The copy of a large number of small files is slower on vxfs compared to ext4 * 3620288 (3469644) The system panics in the vx_logbuf_clean() function. * 3621420 (3621423) The VxVM caching shouldnt be disabled while mounting a file system in a situation where the VxFS cache area is not present. * 3628867 (3595896) While creating OracleRAC 12.1.0.2 database, the node panics. * 3636210 (3633067) While converting from ext3 file system to VxFS using vxfsconvert, it is observed that many inodes are missing.. * 3644006 (3451686) During internal stress testing on cluster file system(CFS), debug assert is hit due to invalid cache generation count on incore inode. * 3645825 (3622326) Filesystem is marked with fullfsck flag as an inode is marked bad during checkpoint promote Patch ID: 6.1.1.000 * 3370758 (3370754) Internal test with SmartIO write-back SSD cache hit debug asserts. * 3383149 (3383147) The ACA operator precedence error may occur while turning off delayed allocation. * 3422580 (1949445) System is unresponsive when files were created on large directory. * 3422584 (2059611) The system panics due to a NULL pointer dereference while flushing bitmaps to the disk. * 3422586 (2439261) When the vx_fiostats_tunable value is changed from zero to non-zero, the system panics. * 3422604 (3092114) The information output displayed by the "df -i" command may be inaccurate for cluster mounted file systems. * 3422614 (3297840) A metadata corruption is found during the file removal process. * 3422619 (3294074) System call fsetxattr() is slower on Veritas File System (VxFS) than ext3 file system. * 3422624 (3352883) During the rename operation, lots of nfsd threads hang. * 3422626 (3332902) While shutting down, the system running the fsclustadm(1M) command panics. * 3422629 (3335272) The mkfs (make file system) command dumps core when the log size provided is not aligned. * 3422634 (3337806) The find(1) command may panic the systems with Linux kernels with versions greater than 3.0. * 3422636 (3340286) After a file system is resized, the tunable setting of dalloc_enable gets reset to a default value. * 3422638 (3352059) High memory usage occurs when VxFS uses Veritas File Replicator (VFR) on the target even when no jobs are running. * 3422649 (3394803) A panic is observed in VxFS routine vx_upgrade7() function while running the vxupgrade command(1M). * 3422657 (3412667) The RHEL 6 system panics with Stack Overflow. * 3430467 (3430461) The nested unmounts fail if the parent file system is disabled. * 3436431 (3434811) The vxfsconvert(1M) in VxFS 6.1 hangs. * 3436433 (3349651) Veritas File System (VxFS) modules fail to load on RHEL 6.5 and display an error message. * 3494534 (3402618) The mmap read performance on VxFS is slow * 3502847 (3471245) The mongodb fails to insert any record. * 3504362 (3472551) The attribute validation (pass 1d) of full fsck takes too much time to complete. * 3506487 (3506485) The system does not allow write-back caching with Symantec Volume Replicator (VVR). * 3512292 (3348520) In a Cluster File System (CFS) cluster having multi volume file system of a smaller size, execution of the fsadm command causes system hang if the free space in the file system is low. * 3518943 (3534779) Internal stress testing on Cluster File System (CFS) hits a debug assert. * 3519809 (3463464) Internal kernel functionality conformance test hits a kernel panic due to null pointer dereference. * 3522003 (3523316) The writeback cache feature does not work for write size of 2MB. * 3528770 (3449152) Failed to set 'thin_friendly_alloc' tunable in case of cluster file system (CFS). * 3529852 (3463717) Information regarding Cluster File System (CFS) that does not support the 'thin_friendly_alloc' tunable is not updated in the vxtunefs(1M) command man page. * 3530038 (3417321) The vxtunefs(1M) tunable man page gives an incorrect * 3541125 (3541083) The vxupgrade(1M) command for layout version 10 creates 64-bit quota files with inappropriate permission configurations. Patch ID: 6.1.0.200 * 3424575 (3349651) Veritas File System (VxFS) modules fail to load on RHEL 6.5 and display an error message. Patch ID: 6.1.0.100 * 3418489 (3370720) Performance degradation is seen with Smart IO feature enabled. DETAILS OF INCIDENTS FIXED BY THE PATCH --------------------------------------- This patch fixes the following Symantec incidents: Patch ID: 6.1.1.200 * 3660421 (Tracking ID: 3660422) SYMPTOM: On RHEL 6.6, umount(8) system call hangs if an application is watching for inode events using inotify(7) APIs. DESCRIPTION: On RHEL 6.6, additional counters were added in the super block to track inotify watches, these new counters were not implemented in VxFS. Hence while doing umount, the operation hangs until the counter in the superblock drops to zero, which would never happen since they are not handled in VXFS. RESOLUTION: Code is modified to handle additional counters added in RHEL6.6. Patch ID: 6.1.1.100 * 3520113 (Tracking ID: 3451284) SYMPTOM: While allocating extent during write operation, if summary and bitmap data for filesystem allocation unit get mismatched then the assert hits. DESCRIPTION: if extent was allocated using SMAP on the deleted inode, and part of the AU space is moved from deleted inode to the new inode. At this point SMAP state is set to VX_EAU_ALLOCATED and EMAP is not initialized. When more space is needed for new inode, it tries to allocate from the same AU using EMAP and can hit "f:vx_sum_upd_efree1:2a" assert, as EMAP is not initialized. RESOLUTION: Code has been modified to expand AU while moving partial AU space from one inode to other inode. * 3521945 (Tracking ID: 3530435) SYMPTOM: Panic in Internal test with SSD cache enabled. DESCRIPTION: The record end of the write back log record was wrongly getting modified while adding a skip list node in the punch hole case where expunge flag is set where then insertion of new node is skipped RESOLUTION: Code to modified to skip modification of the writeback log record when the expunge flag is set and left end of the record is smaller or equal to the end offset of the next punch hole request. * 3529243 (Tracking ID: 3616722) SYMPTOM: Race between the writeback cache offline thread and the writeback data flush thread causes null pointer dereference, resulting in system panic. DESCRIPTION: While disabling writeback, the writeback cache information is deinitialized from each inode which results in the removal of writeback bmap lock pointer. But during this time frame, if the writeback flushing is still going on through some other thread which has writeback bmap lock, then while removing the writeback bmap lock, null pointer dereference hits since it was already removed through previous thread. RESOLUTION: The code is modified to handle such race conditions. * 3536233 (Tracking ID: 3457803) SYMPTOM: File System gets disabled with the following message in the system log: WARNING: V-2-37: vx_metaioerr - vx_iasync_wait - /dev/vx/dsk/testdg/test file system meta data write error in dev/block DESCRIPTION: The inode's incore information gets inconsistent as one of its field is getting modified without the locking protection. RESOLUTION: Protect the inode's field properly by taking the lock operation. * 3583963 (Tracking ID: 3583930) SYMPTOM: When external quota file is over-written or restored from backup, new settings which were added after the backup still remain. DESCRIPTION: The internal quota file is not always updated with correct limits, so the quotaon operation is to copy the quota limits from external to internal quota file. To complete the copy operation, the extent of external file is compared to the extent of internal file at the corresponding offset. If the external quota file is overwritten (or restored to its original copy) and the size of internal file is more than that of external, the quotaon operation does not clear the additional (stale) quota records in the internal file. Later, the sync operation (part of quotaon) copies these stale records from internal to external file. Hence, both internal and external files contain stale records. RESOLUTION: The code has been modified to remove the stale records in the internal file at the time of quotaon. * 3617774 (Tracking ID: 3475194) SYMPTOM: Veritas File System (VxFS) fscdsconv(1M) command fails with the following error message: ... UX:vxfs fscdsconv: INFO: V-3-26130: There are no files violating the CDS limits for this target. UX:vxfs fscdsconv: INFO: V-3-26047: Byteswapping in progress ... UX:vxfs fscdsconv: ERROR: V-3-25656: Overflow detected UX:vxfs fscdsconv: ERROR: V-3-24418: fscdsconv: error processing primary inode list for fset 999 UX:vxfs fscdsconv: ERROR: V-3-24430: fscdsconv: failed to copy metadata UX:vxfs fscdsconv: ERROR: V-3-24426: fscdsconv: Failed to migrate. DESCRIPTION: The fscdsconv(1M) command takes a filename argument which is used as a recovery failure, to be used to restore the original file system in case of failure when the file system conversion is in progress. This file has two parts: control part and data part. The control part is used to store information about all the metadata like inodes and extents etc. In this instance, the length of the control part is being underestimated for some file systems where there are few inodes, but the average number of extents per file is very large (this can be seen in the fsadm E report). RESOLUTION: Make recovery file sparse, start the data part after 1TB offset, and then the control part can do allocating writes to the hole from the beginning of the file. * 3617776 (Tracking ID: 3473390) SYMPTOM: In memory pressure scenarios, you see panics or system crashes due to stack overflows. DESCRIPTION: Specifically on RHEL6, the memory allocation routines consume much more memory than other distributions like SLES, or even RHEL5. Due to this, multiple overflows are reported for the RHEL6 platform. Most of these overflows occur when VxFS tries to allocate memory under memory pressure. RESOLUTION: The code is modified to fix multiple overflows by adding handoff code paths, adjusting handoff limits, removing on-stack structures and reducing the number of function frames on stack wherever possible. * 3617781 (Tracking ID: 3557009) SYMPTOM: Run the fallocate command with -l option to specify the length of the reserve allocation. The file size is not expected, but multiple of file system block size. For example: If block size = 8K: # fallocate -l 8860 testfile1 # ls -l total 16 drwxr-xr-x. 2 root root 96 Jul 1 11:40 lost+found/ -rw-r--r--. 1 root root 16384 Jul 1 11:41 testfile1 The file size should be 8860, but it's 16384(which is 2*8192). DESCRIPTION: The vx_fallocate() function on Veritas File System (VxFS) creates larger file than specified because it allocates the extent in blocks. So the reserved file size is multiples of block size, instead of what the fallocate command specifies. RESOLUTION: The code is modified so that the vx_fallocate() function on VxFS sets the reserved file size to what it specifies, instead of multiples of block size. * 3617788 (Tracking ID: 3604071) SYMPTOM: With the thin reclaim feature turned on, you can observe high CPU usage on the vxfs thread process. The backtrace of such kind of threads usually look like this: - vx_dalist_getau - vx_recv_bcastgetemapmsg - vx_recvdele - vx_msg_recvreq - vx_msg_process_thread - vx_kthread_init DESCRIPTION: In the routine to get the broadcast information of a node which contains maps of Allocation Units (AUs) for which node holds the delegations, the locking mechanism is inefficient. Thus every time when this routine is called, it will perform a series of down-up operation on a certain semaphore. This can result in a huge CPU cost when many threads calling the routine in parallel. RESOLUTION: The code is modified to optimize the locking mechanism in the routine to get the broadcast information of a node which contains maps of Allocation Units (AUs) for which node holds the delegations, so that it only does down-up operation on the semaphore once. * 3617790 (Tracking ID: 3574404) SYMPTOM: System panics because of a stack overflow during rename operation. The following stack trace can be seen during the panic: machine_kexec crash_kexec oops_end no_context __bad_area_nosemaphore bad_area_nosemaphore __do_page_fault do_page_fault page_fault task_tick_fair scheduler_tick update_process_times tick_sched_timer __run_hrtimer hrtimer_interrupt local_apic_timer_interrupt smp_apic_timer_interrupt apic_timer_interrupt --- <IRQ stack> --- apic_timer_interrupt mempool_free_slab mempool_free vx_pgbh_free vx_pgbh_detach vx_releasepage try_to_release_page shrink_page_list.clone.3 shrink_inactive_list shrink_mem_cgroup_zone shrink_zone zone_reclaim get_page_from_freelist __alloc_pages_nodemask alloc_pages_current __get_free_pages vx_getpages vx_alloc vx_bc_getfreebufs vx_bc_getblk vx_getblk_bp vx_getblk_cmn vx_getblk vx_getmap vx_getemap vx_extfind vx_searchau_downlevel vx_searchau_downlevel vx_searchau_downlevel vx_searchau_downlevel vx_searchau_uplevel vx_searchau vx_extentalloc_device vx_extentalloc vx_bmap_ext4 vx_bmap_alloc_ext4 vx_bmap_alloc vx_write_alloc3 vx_tran_write_alloc vx_idalloc_off1 vx_idalloc_off vx_int_rename vx_do_rename vx_rename1 vx_rename vfs_rename sys_renameat sys_rename system_call_fastpath DESCRIPTION: The stack is overflown by 88 bytes in the rename code path. The thread_info structure is disrupted with VxFS page buffer head addresses.. RESOLUTION: We now use dynamic allocation of local structures in vx_write_alloc3 and vx_int_rename. Thissaves 256 bytes and gives enough room. * 3617793 (Tracking ID: 3564076) SYMPTOM: The MongoDB noSQL db creation fails with an ENOTSUP error. MongoDB uses posix_fallocate to create a file first. When it writes at offset which is not aligned with File System block boundary, an ENOTSUP error comes up. DESCRIPTION: On a file system with 8k bsize and 4k page size, the application creates a file using posix_fallocate, and then writes at some offset which is not aligned with fs block boundary. In this case, the pre-allocated extent is split at the unaligned offset into two parts for the write. However the alignment requirement of the split fails the operation. RESOLUTION: Split the extent down to block boundary. * 3617877 (Tracking ID: 3615850) SYMPTOM: The write system call writes up to count bytes from the pointed buffer to the file referred to by the file descriptor field: ssize_t write(int fd, const void *buf, size_t count); When the count parameter is invalid, sometimes it can cause the write() to hang on VxFS file system. E.g. with a 10000 bytes buffer, but the count is set to 30000 by mistake, then you may encounter such problem. DESCRIPTION: On recent linux kernels, you cannot take a page-fault while holding a page locked so as to avoid a deadlock. This means uiomove can copy less than requested, and any partially populated pages created in routine which establish a virtual mapping for the page are destroyed. This can cause an infinite loop in the write code path when the given user-buffer is not aligned with a page boundary and the length given to write() causes an EFAULT; uiomove() does a partial copy, segmap_release destroys the partially populated pages and unwinds the uio. The operation is then repeated. RESOLUTION: The code is modified to move the pre-faulting to the buffered IO write-loops; The system either shortens the length of the copy if all of the requested pages cannot be faulted, or fails with EFAULT if no pages are pre-faulted. This prevents the infinite loop. * 3620279 (Tracking ID: 3558087) SYMPTOM: Run simultaneous dd threads on a mount point and start the ls l command on the same mount point. Then the system hangs. DESCRIPTION: When the delayed allocation (dalloc) feature is turned on, the flushing process takes much time. The process keeps the glock held, and needs writers to keep the irwlock held. Thels l command starts stat internally and keeps waiting for irwlock to real ACLs. RESOLUTION: Redesign dalloc to keep the glock unlocked while flushing. * 3620284 (Tracking ID: 3596378) SYMPTOM: The copy of a large number of small files is slower on Veritas File System (VxFS) compared to EXT4. DESCRIPTION: VxFS implements the fsetxattr() system call in a synchronized way. Hence, before returning to the system call, the VxFS will take some time to flush the data to the disk. In this way, the VxFS guarantees the file system consistency in case of file system crash. However, this implementation has a side-effect that it serializes the whole processing, which takes more time. RESOLUTION: The code is modified to change the transaction to flush the data in a delayed way. * 3620288 (Tracking ID: 3469644) SYMPTOM: The system panics in the vx_logbuf_clean() function when it traverses chain of transactions off the intent log buffer. The stack trace is as follows: vx_logbuf_clean () vx_logadd () vx_log() vx_trancommit() vx_exh_hashinit () vx_dexh_create () vx_dexh_init () vx_pd_rename () vx_rename1_pd() vx_do_rename () vx_rename1 () vx_rename () vx_rename_skey () DESCRIPTION: The system panics as the vx_logbug_clean() function tries to access an already freed transaction from transaction chain to flush it to log. RESOLUTION: The code has been modified to make sure that the transaction gets flushed to the log before it is freed. * 3621420 (Tracking ID: 3621423) SYMPTOM: The Veritas Volume manager (VxVM) caching is disabled or stopped after mounting a file system in a situation where the Veritas File System (VxFS) cache area is not present. DESCRIPTION: When the VxFS cache area is not present and the VxVM cache area is present and in ENABLED state, if you mount a file system on any of the volumes, the VxVM caching gets stopped for that volume, which is not an expected behavior. RESOLUTION: The code is modified not to disable VxVM caching for any mounted file system if the VxFS cache area is not present. * 3628867 (Tracking ID: 3595896) SYMPTOM: While creating OracleRAC 12.1.0.2 database, the node panics with the following stack: aio_complete() vx_naio_do_work() vx_naio_worker() vx_kthread_init() DESCRIPTION: For a zero size request (with a correctly aligned buffer), Veritas File System (VxFS) wrongly queues the work internally and returns -EIOCBQUEUED. The kernel calls function aio_complete() for this zero size request. However, while VxFS is performing the queued work internally, the aio_complete() function gets called again. The double call of the aio_complete() function results in the panic. RESOLUTION: The code is modified so that the zero size requests will not queue elements inside VxFS work queue. * 3636210 (Tracking ID: 3633067) SYMPTOM: While converting from ext3 file system to VxFS using vxfsconvert, it is observed that many inodes are missing. DESCRIPTION: When vxfsconvert(1M) is run on an ext3 file system, it misses an entire block group of inodes. This happens because of an incorrect calculation of block group number of a given inode in border case. The inode which is the last inode for a given block group is calculated to have the correct inode offset, but is calculated to be in the next block group. This causes the entire next block group to be skipped when the code attempts to find the next consecutive inode. RESOLUTION: The code is modified to correct the calculation of block group number. * 3644006 (Tracking ID: 3451686) SYMPTOM: During internal stress testing on cluster file system(CFS), debug assert is hit due to invalid cache generation count on incore inode. DESCRIPTION: Reset of the cache generation count in incore inode used in Disk Layout Version(DLV) 10 was missed during inode reuse, causing the debug assert. RESOLUTION: The code is modified to reset the cache generation count in incore inode during inode reuse. * 3645825 (Tracking ID: 3622326) SYMPTOM: Filesystem is marked with fullfsck flag as an inode is marked bad during checkpoint promote DESCRIPTION: VxFS incorrectly skipped pushing of data to clone inode due to which the inode is marked bad during checkpoint promote which intern resulted in filesystem being marked with fullfsck flag. RESOLUTION: Code is modified to push the proper data to clone inode. Patch ID: 6.1.1.000 * 3370758 (Tracking ID: 3370754) SYMPTOM: Internal test with SmartIO write-back SSD cache hit debug asserts. DESCRIPTION: The debug asserts are hit due to race condition in various code segments for write-back SSD cache feature. RESOLUTION: The code is modified to fix the race conditions. * 3383149 (Tracking ID: 3383147) SYMPTOM: The ACA operator precedence error may occur while turning AoffA delayed allocation. DESCRIPTION: Due to the C operator precedence issue, VxFS evaluates a condition wrongly. RESOLUTION: The code is modified to evaluate the condition correctly. * 3422580 (Tracking ID: 1949445) SYMPTOM: System is unresponsive when files are created on large directory. The following stack is logged: vxg_grant_sleep() vxg_cmn_lock() vxg_api_lock() vx_glm_lock() vx_get_ownership() vx_exh_coverblk() vx_exh_split() vx_dexh_setup() vx_dexh_create() vx_dexh_init() vx_do_create() DESCRIPTION: For large directories, large directory hash (LDH) is enabled to improve the lookup feature. When a system takes ownership of LDH inode twice in same thread context (while building hash for directory), it becomes unresponsive RESOLUTION: The code is modified to avoid taking ownership again if we already have the ownership of the LDH inode. * 3422584 (Tracking ID: 2059611) SYMPTOM: The system panics due to a NULL pointer dereference while flushing the bitmaps to the disk and the following stack trace is displayed: a| a| vx_unlockmap+0x10c vx_tflush_map+0x51c vx_fsq_flush+0x504 vx_fsflush_fsq+0x190 vx_workitem_process+0x1c vx_worklist_process+0x2b0 vx_worklist_thread+0x78 DESCRIPTION: The vx_unlockmap() function unlocks a map structure of the file system. If the map is being used, the hold count is incremented. The vx_unlockmap() function attempts to check whether this is an empty mlink doubly linked list. The asynchronous vx_mapiodone routine can change the link at random even though the hold count is zero. RESOLUTION: The code is modified to change the evaluation rule inside the vx_unlockmap() function, so that further evaluation can be skipped over when map hold count is zero. * 3422586 (Tracking ID: 2439261) SYMPTOM: When the vx_fiostats_tunable is changed from zero to non-zero, the system panics with the following stack trace: vx_fiostats_do_update vx_fiostats_update vx_read1 vx_rdwr vno_rw rwuio pread DESCRIPTION: When vx_fiostats_tunable is changed from zero to non-zero, all the incore-inode fiostats attributes are set to NULL. When these attributes are accessed, the system panics due to the NULL pointer dereference. RESOLUTION: The code has been modified to check the file I/O stat attributes are present before dereferencing the pointers. * 3422604 (Tracking ID: 3092114) SYMPTOM: The information output by the "df -i" command can often be inaccurate for cluster mounted file systems. DESCRIPTION: In Cluster File System 5.0 release a concept of delegating metadata to nodes in the cluster is introduced. This delegation of metadata allows CFS secondary nodes to update metadata without having to ask the CFS primary to do it. This provides greater node scalability. However, the "df -i" information is still collected by the CFS primary regardless of which node (primary or secondary) the "df -i" command is executed on. For inodes the granularity of each delegation is an Inode Allocation Unit [IAU], thus IAUs can be delegated to nodes in the cluster. When using a VxFS 1Kb file system block size each IAU will represent 8192 inodes. When using a VxFS 2Kb file system block size each IAU will represent 16384 inodes. When using a VxFS 4Kb file system block size each IAU will represent 32768 inodes. When using a VxFS 8Kb file system block size each IAU will represent 65536 inodes. Each IAU contains a bitmap that determines whether each inode it represents is either allocated or free, the IAU also contains a summary count of the number of inodes that are currently free in the IAU. The ""df -i" information can be considered as a simple sum of all the IAU summary counts. Using a 1Kb block size IAU-0 will represent inodes numbers 0 - 8191 Using a 1Kb block size IAU-1 will represent inodes numbers 8192 - 16383 Using a 1Kb block size IAU-2 will represent inodes numbers 16384 - 32768 etc. The inaccurate "df -i" count occurs because the CFS primary has no visibility of the current IAU summary information for IAU that are delegated to Secondary nodes. Therefore the number of allocated inodes within an IAU that is currently delegated to a CFS Secondary node is not known to the CFS Primary. As a result, the "df -i" count information for the currently delegated IAUs is collected from the Primary's copy of the IAU summaries. Since the Primary's copy of the IAU is stale, therefore the "df -i" count is only accurate when no IAUs are currently delegated to CFS secondary nodes. In other words - the IAUs currently delegated to CFS secondary nodes will cause the "df -i" count to be inaccurate. Once an IAU is delegated to a node it can "timeout" after a 3 minutes of inactivity. However, not all IAU delegations will timeout. One IAU will always remain delegated to each node for performance reasons. Also an IAU whose inodes are all allocated (so no free inodes remain in the IAU) it would not timeout either. The issue can be best summarized as: The more IAUs that remain delegated to CFS secondary nodes, the greater the inaccuracy of the "df -i" count. RESOLUTION: Allow the delegations for IAU's whose inodes are all allocated (so no free inodes in the IAU) to "timeout" after 3 minutes of inactivity. * 3422614 (Tracking ID: 3297840) SYMPTOM: A metadata corruption is found during the file removal process with the inode block count getting negative. DESCRIPTION: When the user removes or truncates a file having the shared indirect blocks, there can be an instance where the block count can be updated to reflect the removal of the shared indirect blocks when the blocks are not removed from the file. The next iteration of the loop updates the block count again while removing these blocks. This will eventually lead to the block count being a negative value after all the blocks are removed from the file. The removal code expects the block count to be zero before updating the rest of the metadata. RESOLUTION: The code is modified to update the block count and other tracking metadata in the same transaction as the blocks are removed from the file. * 3422619 (Tracking ID: 3294074) SYMPTOM: System call fsetxattr() is slower on Veritas File System (VxFS) than ext3 file system. DESCRIPTION: VxFS implements the fsetxattr() system call in a synchronized sync way. Hence, it will take some time to flush the data to the disk before returning to the system call to guarantee file system consistency in case of file system crash. RESOLUTION: The code is modified to allow the transaction to flush the data in a delayed way. * 3422624 (Tracking ID: 3352883) SYMPTOM: During the rename operation, lots of nfsd threads waiting for mutex operation hang with the following stack trace : vxg_svar_sleep_unlock vxg_get_block vxg_api_initlock vx_glm_init_blocklock vx_cbuf_lookup vx_getblk_clust vx_getblk_cmn vx_getblk vx_fshdchange vx_unlinkclones vx_clone_dispose vx_workitem_process vx_worklist_process vx_worklist_thread vx_kthread_init vxg_svar_sleep_unlock vxg_grant_sleep vxg_cmn_lock vxg_api_trylock vx_glm_trylock vx_glmref_trylock vx_mayfrzlock_try vx_walk_fslist vx_log_sync vx_workitem_process vx_worklist_process vx_worklist_thread vx_kthread_init DESCRIPTION: A race condition is observed between the NFS rename and additional dentry alias created by the current vx_splice_alias()function. This race condition causes two different directory dentries pointing to the same inode, which results in mutex deadlock in lock_rename()function. RESOLUTION: The code is modified to change the vx_splice_alias()function to prevent the creation of additional dentry alias. * 3422626 (Tracking ID: 3332902) SYMPTOM: The system running the fsclustadm(1M) command panics while shutting down. The following stack trace is logged along with the panic: machine_kexec crash_kexec oops_end page_fault [exception RIP: vx_glm_unlock] vx_cfs_frlpause_leave [vxfs] vx_cfsaioctl [vxfs] vxportalkioctl [vxportal] vfs_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath DESCRIPTION: There exists a race-condition between "fsclustadm(1M) cfsdeinit" and "fsclustadm(1M) frlpause_disable". The "fsclustadm(1M) cfsdeinit" fails after cleaning the Group Lock Manager (GLM), without downgrading the CFS state. Under the false CFS state, the "fsclustadm(1M) frlpause_disable" command enters and accesses the GLM lock, which "fsclustadm(1M) cfsdeinit" frees resulting in a panic. Another race condition exists between the code in vx_cfs_deinit() and the code in fsck, and it leads to the situation that although fsck has a reservation held, but this couldn't prevent vx_cfs_deinit() from freeing vx_cvmres_list because there is no such a check for vx_cfs_keepcount. RESOLUTION: The code is modified to add appropriate checks in the "fsclustadm(1M) cfsdeinit" and "fsclustadm(1M) frlpause_disable" to avoid the race-condition. * 3422629 (Tracking ID: 3335272) SYMPTOM: The mkfs (make file system) command dumps core when the log size provided is not aligned. The following stack trace is displayed: (gdb) bt #0 find_space () #1 place_extents () #2 fill_fset () #3 main () (gdb) DESCRIPTION: While creating the VxFS file system using the mkfs command, if the log size provided is not aligned properly, you may end up in doing miscalculations for placing the RCQ extents and finding no place. This leads to illegal memory access of AU bitmap and results in core dump. RESOLUTION: The code is modified to place the RCQ extents in the same AU where log extents are allocated. * 3422634 (Tracking ID: 3337806) SYMPTOM: On linux kernels greater than 3.0 find(1) command, the kernel may panic in the link_path_walk() function with the following stack trace: do_page_fault page_fault link_path_walk path_lookupat do_path_lookup user_path_at_empty vfs_fstatat sys_newfstatat system_call_fastpath DESCRIPTION: VxFS overloads a bit of the dentry flag at 0x1000 for internal usage. Linux didn't use this bit until kernel version 3.0 onwards. Therefore it is possible that both Linux and VxFS strive for this bit, which panics the kernel. RESOLUTION: The code is modified not to use 0x1000 bit in the dentry flag . * 3422636 (Tracking ID: 3340286) SYMPTOM: The tunable setting of dalloc_enable gets reset to a default value after a file system is resized. DESCRIPTION: The file system resize operation triggers the file system re-initialization process. During this process, the tunable value of dalloc_enable gets reset to the default value instead of retaining the old tunable value. RESOLUTION: The code is fixed such that the old tunable value of dalloc_enable is retained. * 3422638 (Tracking ID: 3352059) SYMPTOM: Due to memory leak, high memory usage occurs with vxfsrepld on target when no jobs are running. DESCRIPTION: On the target side, high memory usage may occur even when there are no jobs running because the memory allocated for some structures is not freed for every job iteration. RESOLUTION: The code is modified to resolve the memory leaks. * 3422649 (Tracking ID: 3394803) SYMPTOM: The vxupgrade(1M) command causes VxFS to panic with the following stack trace: panic_save_regs_switchstack() panic bad_kern_reference() $cold_pfault() vm_hndlr() bubbleup() vx_fs_upgrade() vx_upgrade() $cold_vx_aioctl_common() vx_aioctl() vx_ioctl() vno_ioctl() ioctl() syscall() DESCRIPTION: The panic is caused due to de_referencing the operator in the NULL device (one of the devices in the DEVLIST is showing as a NULL device). RESOLUTION: The code is modified to skip the NULL devices when the device in EVLIST is processed. * 3422657 (Tracking ID: 3412667) SYMPTOM: On RHEL 6, the inode update operation may create deep stack and cause system panic due to stack overflow. Below is the stack trace: dequeue_entity() dequeue_task_fair() dequeue_task() deactivate_task() thread_return() io_schedule() get_request_wait() blk_queue_bio() generic_make_request() submit_bio() vx_dev_strategy() vx_bc_bwrite() vx_bc_do_bawrite() vx_bc_bawrite() vx_bwrite() vx_async_iupdat() vx_iupdat_local() vx_iupdat_clustblks() vx_iupdat_local() vx_iupdat() vx_iupdat_tran() vx_tflush_inode() vx_fsq_flush() vx_tranflush() vx_traninit() vx_get_alloc() vx_tran_get_alloc() vx_alloc_getpage() vx_do_getpage() vx_internal_alloc() vx_write_alloc() vx_write1() vx_write_common_slow() vx_write_common() vx_vop_write() vx_writev() vx_naio_write_v2() do_sync_readv_writev() do_readv_writev() vfs_writev() nfsd_vfs_write() nfsd_write() nfsd3_proc_write() nfsd_dispatch() svc_process_common() svc_process() nfsd() kthread() kernel_thread() DESCRIPTION: Some VxFS operation may need inode update. This may create very deep stack and cause system panic due to stack overflow. RESOLUTION: The code is modified to add a handoff point in the inode update function. If the stack usage reaches a threshold, it will start a separate thread to do the work to limit stack usage. * 3430467 (Tracking ID: 3430461) SYMPTOM: The nested unmounts as well as the force unmounts fail if, the parent file system is disabled which further inhibits the unmounting of the child file system. DESCRIPTION: If a file system is mounted inside another vxfs mount, and if the parent file system gets disabled, then it is not possible to sanely unmount the child even with the force unmounts. This issue is observed because a disabled file system does not allow directory look up on it. On Linux, a file system can be unmounted only by providing the path of the mount point. RESOLUTION: The code is modified to allow the exceptional path look for unmounts. These are read only operations and hence are safer. This makes it possible for the unmount of child file system to proceed. * 3436431 (Tracking ID: 3434811) SYMPTOM: In VxFS 6.1, the vxfsconvert(1M) command hangs within the vxfsl3_getext() Function with following stack trace: search_type() bmap_typ() vxfsl3_typext() vxfsl3_getext() ext_convert() fset_convert() convert() DESCRIPTION: There is a type casting problem for extent size. It may cause a non-zero value to overflow and turn into zero by mistake. This further leads to infinite looping inside the function. RESOLUTION: The code is modified to remove the intermediate variable and avoid type casting. * 3436433 (Tracking ID: 3349651) SYMPTOM: Veritas File System (VxFS) modules fail to load on RHEL6.5 and display the following error message: kernel: vxfs: disagrees about version of symbol putname kernel: vxfs: disagrees about version of symbol getname DESCRIPTION: In RHEL6.5, the kernel interfaces for getname() and putname() functions used by VxFS have changed. RESOLUTION: The code is modified to use the latest kernel interfaces definitions for getname() and putname()functions. * 3494534 (Tracking ID: 3402618) SYMPTOM: The mmap read performance on VxFS is slow. DESCRIPTION: The mmap read performance on VxFS is not good, because the read ahead operation is not triggered while the mmap reads is executed. RESOLUTION: An enhancement has been made to the read ahead operation. It helps improve the mmap read performance. * 3502847 (Tracking ID: 3471245) SYMPTOM: The Mongodb fails to insert any record because lseek fails to seek to the EOF. DESCRIPTION: Fallocate doesn't update the inode's i_size on linux, which causes lseek unable to seek to the EOF. RESOLUTION: Before returning from the vx_fallocate() function, call the vx_getattr()function to update the Linux inode with the VxFS inode. * 3504362 (Tracking ID: 3472551) SYMPTOM: The attribute validation (pass 1d) of full fsck takes too much time to complete. DESCRIPTION: The current implementation of full fsck Pass 1d (attribute inode validation) is single threaded. This causes slow full fsck performance on large file system, especially the ones having large number of attribute inodes. RESOLUTION: The Pass 1d is modified to work in parallel using multiple threads, which enables full fsck to process the attribute inode validation faster. * 3506487 (Tracking ID: 3506485) SYMPTOM: The system does not allow write-back caching with VVR. DESCRIPTION: If the volume or vset is a part of a RVG (Replicated Volume Group) on which the file system is mounted with the write-back feature, then the mount operation should succeed without enabling the write-back feature to maintain write order fidelity. Similarly, if the write-back feature is enabled on the file system, then an attempt to add that volume or vset to RVG should fail. RESOLUTION: The code is modified to add the required limitation. * 3512292 (Tracking ID: 3348520) SYMPTOM: In a Cluster File System (CFS) cluster having multi volume file system of a smaller size, execution of the fsadm command causes system hang if the free space in the file system is low. The following stack trace is displayed: vx_svar_sleep_unlock() vx_extentalloc_device() vx_extentalloc() vx_reorg_emap() vx_extmap_reorg() vx_reorg() vx_aioctl_full() vx_aioctl_common() vx_aioctl() vx_unlocked_ioctl() vfs_ioctl() do_vfs_ioctl() sys_ioctl() tracesys() And vxg_svar_sleep_unlock() vxg_grant_sleep() vxg_api_lock() vx_glm_lock() vx_cbuf_lock() vx_getblk_clust() vx_getblk_cmn() vx_getblk() vx_getmap() vx_getemap() vx_extfind() vx_searchau_downlevel() vx_searchau_uplevel() vx_searchau() vx_extentalloc_device() vx_extentalloc() vx_reorg_emap() vx_extmap_reorg() vx_reorg() vx_aioctl_full() vx_aioctl_common() vx_aioctl() vx_unlocked_ioctl() vfs_ioctl() do_vfs_ioctl() sys_ioctl() tracesys() DESCRIPTION: While performing the fsadm operation, the secondary node in the CFS cluster is unable to allocate space from EAU (Extent Allocation Unit) delegation given by the primary node. It requests the primary node for another delegation. While giving such delegations, the primary node does not verify whether the EAU has exclusion zones set on it. It only verifies if it has enough free space. On secondary node, the extent allocation cannot be done from EAU which has exclusion zone set, resulting in loop. RESOLUTION: The code is modified such that the primary node will not delegate EAU to the secondary node which have exclusion zone set on it. * 3518943 (Tracking ID: 3534779) SYMPTOM: Internal stress testing on Cluster File System (CFS) hits a debug assert. DESCRIPTION: The assert was hit while refreshing the incore reference count queue (rcq) values from the disk in response to a loadfs message. Due to which, a race occurs with a rcq processing thread that has already advanced the incore rcq indexes on a primary node in CFS. RESOLUTION: The code is modified to avoid selective updates in incore rcq. * 3519809 (Tracking ID: 3463464) SYMPTOM: Internal kernel functionality conformance test hits a kernel panic due to null pointer dereference. DESCRIPTION: In the vx_fsadm_query()function, error handling code path incorrectly sets the nodeid to AnullA in the file system structure. As a result of clearing nodeid, any subsequent access to this field results in the kernel panic. RESOLUTION: The code is modified to improve the error handling code path. * 3522003 (Tracking ID: 3523316) SYMPTOM: The writeback cache feature does not work for write size of 2MB. DESCRIPTION: In vx_wb_possible()function, the condition for checking of write size compatibility with write back caching skips the write request of 2MB from caching. RESOLUTION: The code is modified such that the conditions for checking compatibility of write size in vx_wb_possible() function allows write request of 2 MB for caching. * 3528770 (Tracking ID: 3449152) SYMPTOM: The vxtunefs(1M) command fails to set the thin_friendly_alloc tunable in CFS. DESCRIPTION: The thin_friendly_alloc tunable is not supported on CFS. But when the vxtunefs(1M) command is used to set it in CFS, a false successful message is displayed. RESOLUTION: The code is modified to report error for the attempt to set the thin_friendly_alloc tunable in CFS. * 3529852 (Tracking ID: 3463717) SYMPTOM: CFS does not support the 'thin_friendly_alloc' tunable. And, the vxtunefs(1M) command man page is not updated with this information. DESCRIPTION: Since the man page does not explicitly mention that the 'thin_friendly_alloc' tunable is not supported, it is assumed that CFS supports this feature. RESOLUTION: The man page pertinent to the vxtunefs(1M) command is updated to denote that CFS does not support the 'thin_friendly_alloc' tunable. * 3530038 (Tracking ID: 3417321) SYMPTOM: The vxtunefs(1M) man page gives an incorrect DESCRIPTION: According to the current design, the tunable Adelicache_enableA is enabled by default both in case of local mount and cluster mount. But, the man page is not updated accordingly. It still specifies that this tunable is enabled by default only in case of a local mount. The man page needs to be updated to correct the RESOLUTION: The code is modified to update the man page of the vxtunefs(1m) tunable to display the correct contents for the Adelicache_enableA tunable. Additional information is provided with respect to the performance benefits, in case of CFS being limited as compared to the local mount. Also, in case of CFS, unlike the other CFS tunable parameters, there is a need to explicitly turn this tunable on or off on each node. * 3541125 (Tracking ID: 3541083) SYMPTOM: The vxupgrade(1M) command for layout version 10 creates 64-bit quota files with inappropriate permission configurations. DESCRIPTION: Layout version 10 supports 64-bit quota feature. Thus, while upgrading to version 10, 32-bit external quota files are converted to 64-bit. During this conversion process, 64-bit files are created without specifying any permission. Hence, random permissions are assigned to the 64-bit file, which creates an impression that the conversion process was not successful as expected. RESOLUTION: The code is modified such that appropriate permissions are provided while creating 64-bit quota files. Patch ID: 6.1.0.200 * 3424575 (Tracking ID: 3349651) SYMPTOM: Veritas File System (VxFS) modules fail to load on RHEL6.5 and display the following error message: kernel: vxfs: disagrees about version of symbol putname kernel: vxfs: disagrees about version of symbol getname DESCRIPTION: In RHEL6.5, the kernel interfaces for getname() and putname() functions used by VxFS have changed. RESOLUTION: The code is modified to use the latest kernel interfaces definitions for getname() and putname()functions. Patch ID: 6.1.0.100 * 3418489 (Tracking ID: 3370720) SYMPTOM: I/OAs pause periodically and this result in performance degradation. No explicit error is seen. DESCRIPTION: To avoid the kernel stack overflow, the work which consumes a large amount of stack is not done in the context of the original thread. Instead, such work items are added to a high priority work queue to be processed by a set of worker threads. If all the worker threads are busy, then there is an issue wherein the processing of the newly added work items in the work queue is subjected to an additional delay which in turn results in periodic stalls. RESOLUTION: The code is modified such that the high priority work items are processed by a set of dedicated worker threads. These dedicated threads do not have an issue when all the threads are busy and hence do not trigger periodic stalls. INSTALLING THE PATCH -------------------- Run the Installer script to automatically install the patch: ----------------------------------------------------------- To install the patch perform the following steps on at least one node in the cluster: 1. Copy the hot-fix fs-rhel6_x86_64-6.1.1.200-rpms.tar.gz to /tmp 2. Untar fs-rhel6_x86_64-6.1.1.200-rpms.tar.gz to /tmp/hf # mkdir /tmp/hf # cd /tmp/hf # gunzip /tmp/fs-rhel6_x86_64-6.1.1.200-rpms.tar.gz # tar xf /tmp/fs-rhel6_x86_64-6.1.1.200-rpms.tar 3. Install the hotfix # pwd /tmp/hf # ./installVRTSvxfs611P200 [ ...] You can also use the 6.1.1MR installmr to install this patch together with 6.1.1: # cd <6.1.1MR path> # ./installmr -hotfix_path /tmp/hf [ ...] Or, you can use the 6.1.1MR installmr to install this patch together with 6.1.1 and 6.1GA bits: # cd <6.1.1MR path> # ./installmr -hotfix_path /tmp/hf -base_path [<61GA path>] [ ...] where the -base_path should point to the 6.1GA image directory. Install the patch manually: -------------------------- #rpm -Uvh VRTSvxfs-6.1.1.200-RHEL6.x86_64.rpm REMOVING THE PATCH ------------------ #rpm -e rpm_name KNOWN ISSUES ------------ * Tracking ID: 3690067 SYMPTOM: The 'delayed allocation' (ie 'dalloc') feature on VxFS 6.1.1.100 p-patch can cause data loss or stale data. Dalloc feature is enabled by default for local mounted file system and is not supported for cluster mounted file systems. Dalloc with sequential extending buffer writes can possibly cause data loss or stale data. This issue is seen only with 6.1.1.100 p patch. WORKAROUND: disable the 'delayed allocation' ('dalloc') feature on the VxFS filesystems. Following commands are used to disable dalloc. 1)For a filesystem which is already mounted # vxtunefs -s -o dalloc_enable=0 $MOUNT_POINT 2) To make the value persistent across system reboot, add an entry to /etc/vx/tunefstab /dev/vx/dsk/$DISKGROUP/$VOLUME dalloc_enable=0 SPECIAL INSTRUCTIONS -------------------- NONE OTHERS ------ NONE