* * * READ ME * * * * * * InfoScale 7.4 * * * * * * Patch 1200 * * * Patch Date: 2018-12-17 This document provides the following information: * PATCH NAME * OPERATING SYSTEMS SUPPORTED BY THE PATCH * PACKAGES AFFECTED BY THE PATCH * BASE PRODUCT VERSIONS FOR THE PATCH * SUMMARY OF INCIDENTS FIXED BY THE PATCH * DETAILS OF INCIDENTS FIXED BY THE PATCH * INSTALLATION PRE-REQUISITES * INSTALLING THE PATCH * REMOVING THE PATCH PATCH NAME ---------- InfoScale 7.4 Patch 1200 OPERATING SYSTEMS SUPPORTED BY THE PATCH ---------------------------------------- RHEL7 x86-64 PACKAGES AFFECTED BY THE PATCH ------------------------------ VRTSamf VRTSaslapm VRTSdbac VRTSgab VRTSllt VRTSodm VRTSveki VRTSvxfen VRTSvxfs VRTSvxvm BASE PRODUCT VERSIONS FOR THE PATCH ----------------------------------- * InfoScale Availability 7.4 * InfoScale Enterprise 7.4 * InfoScale Foundation 7.4 * InfoScale Storage 7.4 SUMMARY OF INCIDENTS FIXED BY THE PATCH --------------------------------------- Patch ID: VRTSvxvm-7.4.0.1400 * 3949320 (3947022) VVR:vxconfigd hang during during /scripts/configuratio/assoc_datavol.tc#6 * 3958838 (3953681) Data corruption issue is seen when more than one plex of volume is detached. * 3958884 (3954787) Data corruption may occur in GCO along with FSS environment on RHEL 7.5 Operating system. * 3958887 (3953711) Panic observed while switching the logowner to slave while IO's are in progress * 3958976 (3955101) Panic observed in GCO environment (cluster to cluster replication) during replication. * 3959204 (3949954) Dumpstack messages are printed when vxio module is loaded for the first time when called blk_register_queue. * 3959433 (3956134) System panic might occur when IO is in progress in VVR (veritas volume replicator) environment. * 3967098 (3966239) IO hang observed while copying data in cloud environments. * 3967099 (3965715) vxconfigd may core dump when VIOM(Veritas InfoScale Operation Manager) is enabled. * 3949322 (3944259) The vradmin verifydata and vradmin ibc commands fail on private diskgroups with Lost connection error. * 3950578 (3953241) Messages in syslog are seen with message string "0000" for VxVM module. * 3950760 (3946217) In a scenario where encryption over wire is enabled and secondary logging is disabled, vxconfigd hangs and replication does not progress. * 3950799 (3950384) In a scenario where volume encryption at rest is enabled, data corruption may occur if the file system size exceeds 1TB. * 3951488 (3950759) The application I/Os hang if the volume-level I/O shipping is enabled and the volume layout is mirror-concat or mirror-stripe. * 3967123 (3967122) Retpoline support for ASLAPM rpm on RHEL 7.6 retpoline kernel Patch ID: VRTSveki-7.4.0.1100 * 3967356 (3967265) Support for RHEL 7.6 and RHEL 7.x RETPOLINE kernels. Patch ID: VRTSdbac-7.4.0.1300 * 3967347 (3967265) Support for RHEL 7.6 and RHEL 7.x RETPOLINE kernels. Patch ID: VRTSamf-7.4.0.1300 * 3967346 (3967265) Support for RHEL 7.6 and RHEL 7.x RETPOLINE kernels. Patch ID: VRTSvxfen-7.4.0.1300 * 3967345 (3967265) Support for RHEL 7.6 and RHEL 7.x RETPOLINE kernels. Patch ID: VRTSgab-7.4.0.1300 * 3967344 (3967265) Support for RHEL 7.6 and RHEL 7.x RETPOLINE kernels. Patch ID: VRTSllt-7.4.0.1400 * 3967343 (3967265) Support for RHEL 7.6 and RHEL 7.x RETPOLINE kernels. * 3948761 (3948507) If RDMA dependencies are not fulfilled by the setup, the LLT init/systemctl script should load the non-RDMA module. Patch ID: VRTSodm-7.4.0.1400 * 3958867 (3958865) ODM module failed to load on RHEL7.6. * 3953235 (3953233) After installing the 7.4.0.1100 (on AIX and Solaris) and 7.4.0.1200 (on Linux) patch, the Oracle Disk Management (ODM) module fails to load. Patch ID: VRTSvxfs-7.4.0.1400 * 3958854 (3958853) VxFS module failed to load on RHEL7.6. * 3959065 (3957285) job promote operation from replication target node fails. * 3959302 (3959299) Improve file creation time on systems with Selinux enabled. * 3959306 (3959305) Fix a bug in security attribute initialisation of files with named attributes. * 3959996 (3938256) When checking file size through seek_hole, it will return incorrect offset/size when delayed allocation is enabled on the file. * 3964083 (3947421) DLV upgrade operation fails while upgrading Filesystem from DLV 9 to DLV 10. * 3966524 (3966287) During the multiple mounts of Shared CFS mount points, more than 100MB is consumed per mount * 3966896 (3957092) System panic with spin_lock_irqsave thru splunkd. * 3966920 (3925281) Hexdump the incore inode data and piggyback data when inode revalidation fails. * 3966973 (3952837) VXFS mount fails during startup as its dependency, autofs.service is not up. * 3967002 (3955766) CFS hung when doing extent allocating. * 3967004 (3958688) System panic when VxFS got force unmounted. * 3967006 (3934175) 4-node FSS CFS experienced IO hung on all nodes. * 3967030 (3947433) While adding a volume (part of vset) in already mounted filesystem, fsvoladm displays error. * 3967032 (3947648) Mistuning of vxfs_ninode and vx_bc_bufhwm to very small value. * 3967089 (3908785) System panic observed because of null page address in writeback structure in case of kswapd process. * 3949500 (3949308) In a scenario where FEL caching is enabled, application I/O on a file does not proceed when file system utilization is 100%. * 3949506 (3949501) When SmartIO FEL-based writeback caching is enabled, the "sfcache offline" command hangs during Inode deinit. * 3949507 (3949502) When SmartIO FEL-based writeback caching is enabled, memory leak of few bytes may happen during node reconfiguration. * 3949508 (3949503) When FEL is enabled in CFS environment, data corruption may occur after node recovery. * 3949509 (3949504) When SmartIO FEL-based writeback caching is enabled, memory leak of few bytes may happen during node reconfiguration. * 3949510 (3949505) When SmartIO FEL-based writeback caching is enabled, I/O operations on a file in filesystem can result in panic after cluster reconfiguration. * 3950740 (3953165) Messages in syslog are seen with message string "0000" for VxFS module. * 3952340 (3953148) Due to space reservation mechanism, extent delegation deficit may be seen on a node, although it may not be using any feature involving space reservation mechanism like CFS Delayed allocation/FEL. DETAILS OF INCIDENTS FIXED BY THE PATCH --------------------------------------- This patch fixes the following incidents: Patch ID: VRTSvxvm-7.4.0.1400 * 3949320 (Tracking ID: 3947022) SYMPTOM: vxconfigd hang. DESCRIPTION: There was a window between NIOs being added to rp_port->pt_waitq and rlink being disconnected where NIOs were left in pt_waitq and hence their parent (ack sio) were not done. The ack sio had IO count which led to the vxconfigd hang. RESOLUTION: Don't add NIO to rp_port->pt_waitq if rp_port->pt_closing is set. Instead call done on the NIO with error ENC_CLOSING.Before deleting the port, call done on the NIOs in pt_waitq with error ENC_CLOSING. * 3958838 (Tracking ID: 3953681) SYMPTOM: Data corruption issue is seen when more than one plex of volume is detached. DESCRIPTION: When a plex of volume gets detached, DETACH map gets enabled in the DCO (Data Change Object). The incoming IO's are tracked in DRL (Dirty Region Log) and then asynchronously copied to DETACH map for tracking. If one more plex gets detached then it might happen that some of the new incoming regions are missed in the DETACH map of the previously detached plex. This leads to corruption when the disk comes back and plex resync happens using corrupted DETACH map. RESOLUTION: Code changes are done to correctly track the IO's in the DETACH map of previously detached plex and avoid corruption. * 3958884 (Tracking ID: 3954787) SYMPTOM: On a RHEL 7.5 FSS environment with GCO configured having NVMe devices and Infiniband network, data corruption might occur when sending the IO from Master to slave node. DESCRIPTION: In the recent RHEL 7.5 release, linux stopped allowing IO on the underlying NVMe device which has gaps in between BIO vectors. In case of VVR, the SRL header of 3 blocks is added to the BIO . When the BIO is sent through LLT to the other node because of LLT limitation of 32 fragments can lead to unalignment of BIO vectors. When this unaligned BIO is sent to the underlying NVMe device, the last 3 blocks of the BIO are skipped and not written to the disk on the slave node. This leads to incomplete data written on the slave node which leads to data corruption. RESOLUTION: Code changes have been done to handle this case and send the BIO aligned to the underlying NVMe device. * 3958887 (Tracking ID: 3953711) SYMPTOM: System might encounter panic while switching the logowner to slave while IO's are in progress with the following stack: vol_rv_service_message_free() vol_rv_replica_reconfigure() sched_clock_cpu() vol_rv_error_handle() vol_rv_errorhandler_callback() vol_klog_start() voliod_iohandle() voliod_loop() voliod_kiohandle() kthread() insert_kthread_work() ret_from_fork_nospec_begin() insert_kthread_work() vol_rv_service_message_free() DESCRIPTION: While processing a transaction, we leave the IO count on the RV object to let the transaction to proceed. In such case we set the RV object in SIO to NULL. But while freeing the message, the object is de-referenced without taking into consideration it can be NULL. This can lead to a panic because of NULL pointer de-reference in code. RESOLUTION: Code changes have been made to handle NULL value of the RV object. * 3958976 (Tracking ID: 3955101) SYMPTOM: Server might panic in a GCO environment with the following stack: nmcom_server_main_tcp() ttwu_do_wakeup() ttwu_do_activate.constprop.90() try_to_wake_up() update_curr() update_curr() account_entity_dequeue() __schedule() nmcom_server_proc_tcp() kthread() kthread_create_on_node() ret_from_fork() kthread_create_on_node() DESCRIPTION: There are recent changes done in the code to handle Dynamic port changes i.e deletion and addition of ports can now happen dynamically. It might happen that while accessing the port, it was deleted in the background by other thread. This would lead to a panic in the code since the port to be accessed has been already deleted. RESOLUTION: Code changes have been done to take care of this situation and check if the port is available before accessing it. * 3959204 (Tracking ID: 3949954) SYMPTOM: Dumpstack messages are printed when vxio module is loaded for the first time when called blk_register_queue. DESCRIPTION: In RHEL 7.5 a new check was added in kernel code in blk_register_queue where if QUEUE_FLAG_REGISTERED was already set on the queue a dumpstack warning message was printed. In vxvm the flag was already set as the flag got copied from the device queue which was earlier registered by the OS. RESOLUTION: Changes are done in VxVM code to avoid copying of QUEUE_FLAG_REGISTERED fix the dumpstack warnings. * 3959433 (Tracking ID: 3956134) SYMPTOM: System panic might occur when IO is in progress in VVR (veritas volume replicator) environment with below stack: page_fault() voliomem_grab_special() volrv_seclog_wsio_start() voliod_iohandle() voliod_loop() kthread() ret_from_fork() DESCRIPTION: In a memory crunch scenario in some cases the memory reservation for SIO (staged IO) in VVR configuration might fail. If this situation occurs then SIO is tried at a later time when the memory becomes available again but while doing some of the fields of SIO are passed NULL values which leads to panic in the VVR code. RESOLUTION: Code changes have been done to pass proper values to IO when it is retired in VVR environment. * 3967098 (Tracking ID: 3966239) SYMPTOM: IO hang observed while copying data to VxVM (Veritas Volume Manager) volumes in cloud environments. DESCRIPTION: In an cloud environment having a mirrored volume with DCO (Data Change Object) attached to the volume, IO's issued on the volume have to process DRL (Dirty region log) which is used for faster recovery on reboot. When IO processing is happening on DRL, there is an condition in code which can lead to IO's not being driven through DRL resulting in a hang situation. Further IO's keep on queuing waiting for the IO's on the DRL to complete leading to a hang situation. RESOLUTION: Code changes have been done to resolve the condition which leads to IO's not being driven through DRL. * 3967099 (Tracking ID: 3965715) SYMPTOM: vxconfigd may core dump when VIOM(Veritas InfoScale Operation Manager) is enabled. Following is stack trace: #0 0x00007ffff7309d22 in ____strtoll_l_internal () from /lib64/libc.so.6 #1 0x000000000059c61f in ddl_set_vom_discovered_attr () #2 0x00000000005a4230 in ddl_find_devices_in_system () #3 0x0000000000535231 in find_devices_in_system () #4 0x0000000000535530 in mode_set () #5 0x0000000000477a73 in setup_mode () #6 0x0000000000479485 in main () DESCRIPTION: vxconfigd daemon reads Json data generated by VIOM for dynamically updating some of the VxVM disk(LUN) attributes. While accessing this data it wrongly parsed LUN size attribute as string which returns NULL instead of returning LUN size. Accessing this NULL value leads to vxconfigd daemon to core dump and generates segmentation fault. RESOLUTION: Appropriate changes are done to handle the LUN size attribute correctly. * 3949322 (Tracking ID: 3944259) SYMPTOM: The vradmin verifydata and vradmin ibc commands fail on private diskgroups with Lost connection error. DESCRIPTION: This issue occurs because of a deadlock between the IBC mechanism and the ongoing I/Os on the secondary RVG. IBC mechanism expects I/O transfer to secondary in a sequential order, however to improve performance I/Os are now written in parallel. The mismatch in IBC behavior causes a deadlock and the vradmin verifydata and vradmin ibc fail due to time out error. RESOLUTION: As a part of this fix, IBC behavior is now improved such that it now considers parallel and possible out-of-sequence I/O writes to the secondary. * 3950578 (Tracking ID: 3953241) SYMPTOM: Customer may get generic message or warning in syslog with string as "vxvm:0000: " instead of uniquely numbered message id for VxVM module. DESCRIPTION: Few syslog messages introduced in InfoScale 7.4 release were not given unique message number to identify correct places in the product where they are originated. Instead they are marked with common message identification number "0000". RESOLUTION: This patch fixes syslog messages generated by VxVM module, containing "0000" as the message string and provides them with a unique numbering. * 3950760 (Tracking ID: 3946217) SYMPTOM: In a scenario where encryption over wire is enabled and secondary logging is disabled, vxconfigd hangs and replication does not progress. DESCRIPTION: In a scenario where encryption over wire is enabled and secondary logging is disabled, the application I/Os are encrypted in a sequence, but are not written to the secondary in the same order. The out-of-sequence and in-sequence I/Os are stuck in a loop, waiting for each other to complete. Due to this, I/Os are left incomplete and eventually hang. As a result, the vxconfigd hangs and the replication does not progress. RESOLUTION: As a part of this fix, the I/O encryption and write sequence is improved such that all I/Os are first encrypted and then sequentially written to the secondary. * 3950799 (Tracking ID: 3950384) SYMPTOM: In a scenario where volume data encryption at rest is enabled, data corruption may occur if the file system size exceeds 1TB and the data is located in a file extent which has an extent size bigger than 256KB. DESCRIPTION: In a scenario where data encryption at rest is enabled, data corruption may occur when both the following cases are satisfied: - File system size is over 1TB - The data is located in a file extent which has an extent size bigger than 256KB This issue occurs due to a bug which causes an integer overflow for the offset. RESOLUTION: As a part of this fix, appropriate code changes have been made to improve data encryption behavior such that the data corruption does not occur. * 3951488 (Tracking ID: 3950759) SYMPTOM: The application I/Os hang if the volume-level I/O shipping is enabled and the volume layout is mirror-concat or mirror-stripe. DESCRIPTION: In a scenario where an application I/O is issued over a volume that has volume-level I/O shipping enabled, the I/O is shipped to all target nodes. Typically, on the target nodes, the I/O must be sent only to the local disk. However, in case of mirror-concat or mirror-stripe volumes, I/Os are sent to remote disks as well. This at times leads in to an I/O hang. RESOLUTION: As a part of this fix, I/O once shipped to the target node is restricted to only locally connected disks and remote disks are skipped. * 3967123 (Tracking ID: 3967122) SYMPTOM: Retpoline support for ASLAPM on RHEL 7.6 kernels DESCRIPTION: The RHEL7.6 is new release and it has Retpoline kernel. The APM module should be recompiled with retpoline aware GCC to support retpoline kernel. RESOLUTION: Compiled APM with retpoline GCC. Patch ID: VRTSveki-7.4.0.1100 * 3967356 (Tracking ID: 3967265) SYMPTOM: RHEL 7.x RETPOLINE kernels and RHEL 7.6 are not supported DESCRIPTION: Red Hat has released RHEL 7.6 which has RETPOLINE kernel, and also released RETPOLINE kernels for older RHEL 7.x Updates. Veritas Cluster Server kernel modules need to be recompiled with RETPOLINE aware GCC to support RETPOLINE kernel. RESOLUTION: Support for RHEL 7.6 and RETPOLINE kernels on RHEL 7.x kernels is now introduced. Patch ID: VRTSdbac-7.4.0.1300 * 3967347 (Tracking ID: 3967265) SYMPTOM: RHEL 7.x RETPOLINE kernels and RHEL 7.6 are not supported DESCRIPTION: Red Hat has released RHEL 7.6 which has RETPOLINE kernel, and also released RETPOLINE kernels for older RHEL 7.x Updates. Veritas Cluster Server kernel modules need to be recompiled with RETPOLINE aware GCC to support RETPOLINE kernel. RESOLUTION: Support for RHEL 7.6 and RETPOLINE kernels on RHEL 7.x kernels is now introduced. Patch ID: VRTSamf-7.4.0.1300 * 3967346 (Tracking ID: 3967265) SYMPTOM: RHEL 7.x RETPOLINE kernels and RHEL 7.6 are not supported DESCRIPTION: Red Hat has released RHEL 7.6 which has RETPOLINE kernel, and also released RETPOLINE kernels for older RHEL 7.x Updates. Veritas Cluster Server kernel modules need to be recompiled with RETPOLINE aware GCC to support RETPOLINE kernel. RESOLUTION: Support for RHEL 7.6 and RETPOLINE kernels on RHEL 7.x kernels is now introduced. Patch ID: VRTSvxfen-7.4.0.1300 * 3967345 (Tracking ID: 3967265) SYMPTOM: RHEL 7.x RETPOLINE kernels and RHEL 7.6 are not supported DESCRIPTION: Red Hat has released RHEL 7.6 which has RETPOLINE kernel, and also released RETPOLINE kernels for older RHEL 7.x Updates. Veritas Cluster Server kernel modules need to be recompiled with RETPOLINE aware GCC to support RETPOLINE kernel. RESOLUTION: Support for RHEL 7.6 and RETPOLINE kernels on RHEL 7.x kernels is now introduced. Patch ID: VRTSgab-7.4.0.1300 * 3967344 (Tracking ID: 3967265) SYMPTOM: RHEL 7.x RETPOLINE kernels and RHEL 7.6 are not supported DESCRIPTION: Red Hat has released RHEL 7.6 which has RETPOLINE kernel, and also released RETPOLINE kernels for older RHEL 7.x Updates. Veritas Cluster Server kernel modules need to be recompiled with RETPOLINE aware GCC to support RETPOLINE kernel. RESOLUTION: Support for RHEL 7.6 and RETPOLINE kernels on RHEL 7.x kernels is now introduced. Patch ID: VRTSllt-7.4.0.1400 * 3967343 (Tracking ID: 3967265) SYMPTOM: RHEL 7.x RETPOLINE kernels and RHEL 7.6 are not supported DESCRIPTION: Red Hat has released RHEL 7.6 which has RETPOLINE kernel, and also released RETPOLINE kernels for older RHEL 7.x Updates. Veritas Cluster Server kernel modules need to be recompiled with RETPOLINE aware GCC to support RETPOLINE kernel. RESOLUTION: Support for RHEL 7.6 and RETPOLINE kernels on RHEL 7.x kernels is now introduced. * 3948761 (Tracking ID: 3948507) SYMPTOM: LLT loads the RDMA module during its configuration, even if RDMA dependencies are not fulfilled by the setup. DESCRIPTION: LLT loads the RDMA module during its configuration, even if RDMA dependencies are not fulfilled by the setup. Moreover, the user is unable to manually unload the IB modules. This issue occurs because the LLT_RDMA module holds a use count on the ib_core module even though LLT is not configured to work over RDMA. RESOLUTION: LLT now loads the non-RDMA module if RDMA dependency fails during configuration. Patch ID: VRTSodm-7.4.0.1400 * 3958867 (Tracking ID: 3958865) SYMPTOM: ODM module failed to load on RHEL7.6. DESCRIPTION: Since RHEL7.6 is new release therefore ODM module failed to load on it. RESOLUTION: Added ODM support for RHEL7.6. * 3953235 (Tracking ID: 3953233) SYMPTOM: After installing the 7.4.0.1100 (on AIX and Solaris) and 7.4.0.1200 (on Linux) patch, the Oracle Disk Management (ODM) module fails to load. DESCRIPTION: As part of the 7.4.0.1100 (on AIX and Solaris) and 7.4.0.1200 (on Linux) patch, the VxFS version has been updated to 7.4.0.1200. Because of the VxFS version update, the ODM module needs to be repackaged due to an internal dependency on VxFS version. RESOLUTION: As part of this fix, the ODM module has been repackaged to support the updated VxFS version. Patch ID: VRTSvxfs-7.4.0.1400 * 3958854 (Tracking ID: 3958853) SYMPTOM: VxFS module failed to load on RHEL7.6. DESCRIPTION: Since RHEL7.6 is new release therefore VxFS module failed to load on it. RESOLUTION: Added VxFS support for RHEL7.6. * 3959065 (Tracking ID: 3957285) SYMPTOM: job promote operation executed on replication target node fails with error message like: # /opt/VRTS/bin/vfradmin job promote myjob1 /mnt2 UX:vxfs vfradmin: INFO: V-3-28111: Current replication direction: :/mnt1 -> :/mnt2 UX:vxfs vfradmin: INFO: V-3-28112: If you continue this command, replication direction will change to: :/mnt2 -> :/mnt1 UX:vxfs vfradmin: QUESTION: V-3-28101: Do you want to continue? [ynq]y UX:vxfs vfradmin: INFO: V-3-28090: Performing final sync for job myjob1 before promoting... UX:vxfs vfradmin: INFO: V-3-28099: Job promotion failed. If you continue, replication will be stopped and the filesystem will be made available on this host for use. To resume replication when returns, use the vfradmin job recover command. UX:vxfs vfradmin: INFO: V-3-28100: Continuing may result in data loss. UX:vxfs vfradmin: QUESTION: V-3-28101: Do you want to continue? [ynq]y UX:vxfs vfradmin: INFO: V-3-28227: Unable to unprotect filesystem. DESCRIPTION: job promote from target node sends promote operation related message to source node. After this message is processed on source side, 'seqno' file updating/write is done. 'seqno' file is created on target side and not present on source side, hence 'seqno' file update returns error and promote fails. RESOLUTION: 'seqno' file write is not required as part of promote message. Passing SKIP_SEQNO_UPDATE flag in promote message so that seqno file write is skipped on source side during promote processing. Note: job should be stopped on source node before doing promote from target node. * 3959302 (Tracking ID: 3959299) SYMPTOM: When large number of files are created at once, on a system with Selinux enabled, the file creation may take longer time as compared to on a system with Selinux disabled. DESCRIPTION: On an Selinux enabled system, during file creation Selinux security labels are needed to be stored as extended attributes. This requires allocation of attribute inode and it's data extent. The content of the extent are read synchronously into the buffer. If this is a newly allocated extent, it's content are anyway garbage. And it will get overwritten with the attribute data containing Selinux security labels. Thus it was found that, for newly allocated attribute extents, the read operation is redundant. RESOLUTION: As a fix, for newly allocated attribute extent the reading of the data from that extent is skipped. However, If the allocated extent gets merged with previously allocated extent, then extent returned by allocator could be a combined extent. In such cases, read of entire extent is allowed to ensure that previously written data is correctly loaded in-core. * 3959306 (Tracking ID: 3959305) SYMPTOM: When large number of files with named attributes are being created/written to/deleted in a loop, along with other operations on an Selinux enabled system, some files may end up without security attributes. This may lead to access being denied to such files later. DESCRIPTION: On an Selinux enabled system, during file creation, security initialisation happens and security attributes are stored. However when there are parallel create/write/delete operations on multiple files or on same files multiple times which have named attributes, due to a race condition, it is possible that a security attribute initialization may get skipped for some files. Since these file dont have security attributes set, at later time Selinux security module will prevent access to such files for other operations. These operations will fail with access denied error. RESOLUTION: If this is a file creation context, then while writing named attributes also attempt to do security initialisation of file by explicitly calling security initiailzation routine. This is an additional provision (in addition to security initialisation during default file create code) to ensure that security-initialization always happens (notwithstanding race conditions) in named attribute write codepath. * 3959996 (Tracking ID: 3938256) SYMPTOM: When checking file size through seek_hole, it will return incorrect offset/size when delayed allocation is enabled on the file. DESCRIPTION: In recent version of RHEL7 onwards, grep command uses seek_hole feature to check current file size and then it reads data depends on this file size. In VxFS, when dalloc is enabled, we allocate the extent to file later but we increment the file size as soon as write completes. When checking the file size in seek_hole, VxFS didn't completely consider case of dalloc and it was returning stale size, depending on the extent allocated to file, instead of actual file size which was resulting in reading less amount of data than expected. RESOLUTION: Code is modified in such way that VxFS will now return correct size in case dalloc is enabled on file and seek_hole is called on that file. * 3964083 (Tracking ID: 3947421) SYMPTOM: DLV upgrade operation fails while upgrading Filesystem from DLV 9 to DLV 10 with following error message: ERROR: V-3-22567: cannot upgrade /dev/vx/rdsk/metadg/metavol - Invalid argument DESCRIPTION: If the filesystem has been created with DLV 5 or lesser and later the successful upgrade opration has been done from 5 to 6, 6 to 7, 7 to 8, 8 to 9. The newly written code tries to find the "mkfs" logging in the history log. There was no concept of logging the mkfs operation in the history log for DLV 5 or lesser so Upgrade operation fails while upgrading from DLV 9 to 10 RESOLUTION: Code changes have been done to complete upgrade operation even in case mkfs logging is not found. * 3966524 (Tracking ID: 3966287) SYMPTOM: During the multiple mounts of Shared CFS mount points, more than 100MB is consumed per mount DESCRIPTION: During initialization of GLM there is scaling of memory usage for the message queues. Original design provide memory per max inode number which don't scale well for multiple cfs mount points, memory rise linearly with mount points while inode max number is constant, more over current memory usage per inode is about 1/4N where N is max number inodes which translate to 130MB for standard Linux mount which looks like aggressive strategy. Here we provide additional parameter in order to scale this usage per local scope (shared mount point). Thanks to that we are able easily to reduce memory footprint. RESOLUTION: Introduce kernel parameter which will scale memory usage, to use it kernel parameter needs to be included, example of how to change 100MB per FS to 100KB by providing scope factor = 1000 (Kind of recomended value for multiple CFS mount > 100) options vxfs vxfs_max_scopes_possible=1000 * 3966896 (Tracking ID: 3957092) SYMPTOM: System panic with spin_lock_irqsave thru splunkd in rddirahead path. DESCRIPTION: As per current state of, it seems to be spinlock getting re-initialized somehow in rddirahead path which is causing this deadlock. RESOLUTION: Code changes done accordingly to avoid this situation. * 3966920 (Tracking ID: 3925281) SYMPTOM: Hexdump the incore inode data and piggyback data when inode revalidation fails. DESCRIPTION: While assuming the inode ownership, if inode revalidation fails with piggyback data, then we doesn't hexdump the piggyback and incore inode data. This will loose the current state of inode. Added inode revalidation failure message and hexdump the incore inode data and piggyback data. RESOLUTION: Code is modified to print hexdump of incore inode and piggyback data when revalidation of inode fails. * 3966973 (Tracking ID: 3952837) SYMPTOM: If a VXFS is to be mounted during boot-up in systemd environment, if its dependency service autofs.service is not up, service trying to mount vxfs enters fail state. DESCRIPTION: vxfs.service has dependency on autofs.service, If autofs starts after vxfs.service & another service tries to mount a vxfs , mount fails and service enters fail state. RESOLUTION: Dependency of vxfs.service on autofs.service & systemd-remountfs.service has been removed to solve the issue. * 3967002 (Tracking ID: 3955766) SYMPTOM: CFS hung when doing extent allocating, there is a thread like following to loop forever doing extent allocation: #0 [ffff883fe490fb30] schedule at ffffffff81552d9a #1 [ffff883fe490fc18] schedule_timeout at ffffffff81553db2 #2 [ffff883fe490fcc8] vx_delay at ffffffffa054e4ee [vxfs] #3 [ffff883fe490fcd8] vx_searchau at ffffffffa036efc6 [vxfs] #4 [ffff883fe490fdf8] vx_extentalloc_device at ffffffffa036f945 [vxfs] #5 [ffff883fe490fea8] vx_extentalloc_device_proxy at ffffffffa054c68f [vxfs] #6 [ffff883fe490fec8] vx_worklist_process_high_pri_locked at ffffffffa054b0ef [vxfs] #7 [ffff883fe490fee8] vx_worklist_dedithread at ffffffffa0551b9e [vxfs] #8 [ffff883fe490ff28] vx_kthread_init at ffffffffa055105d [vxfs] #9 [ffff883fe490ff48] kernel_thread at ffffffff8155f7d0 DESCRIPTION: In the current code of emtran_process_commit(), it is possible that the EAU summary got updated without delegation of the corresponding EAU, because we clear the VX_AU_SMAPFREE flag before updating EAU summary, which could lead to possible hang. Also, some improper error handling in case of bad map can also cause some hang situations. RESOLUTION: To avoid potential hang, modify the code to clear the VX_AU_SMAPFREE flag after updating the EAU summary, and improve some error handling in emtran_commit/undo. * 3967004 (Tracking ID: 3958688) SYMPTOM: System panic when VxFS got force unmounted, the panic stack trace could be like following: #8 [ffff88622a497c10] do_page_fault at ffffffff81691fc5 #9 [ffff88622a497c40] page_fault at ffffffff8168e288 [exception RIP: vx_nfsd_encode_fh_v2+89] RIP: ffffffffa0c505a9 RSP: ffff88622a497cf8 RFLAGS: 00010202 RAX: 0000000000000002 RBX: ffff883e5c731558 RCX: 0000000000000000 RDX: 0000000000000010 RSI: 0000000000000000 RDI: ffff883e5c731558 RBP: ffff88622a497d48 R8: 0000000000000010 R9: 000000000000fffe R10: 0000000000000000 R11: 000000000000000f R12: ffff88622a497d6c R13: 00000000000203d6 R14: ffff88622a497d78 R15: ffff885ffd60ec00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ffff88622a497d50] exportfs_encode_inode_fh at ffffffff81285cb0 #11 [ffff88622a497d60] show_mark_fhandle at ffffffff81243ed4 #12 [ffff88622a497de0] inotify_fdinfo at ffffffff8124411d #13 [ffff88622a497e18] inotify_show_fdinfo at ffffffff812441b0 #14 [ffff88622a497e50] seq_show at ffffffff81273ec7 #15 [ffff88622a497e90] seq_read at ffffffff8122253a #16 [ffff88622a497f00] vfs_read at ffffffff811fe0ee #17 [ffff88622a497f38] sys_read at ffffffff811fecbf #18 [ffff88622a497f80] system_call_fastpath at ffffffff816967c9 DESCRIPTION: There is no error handling for the situation that file system gets disabled/unmounted in nfsd_encode code path, which could lead to panic. RESOLUTION: Added error handling in vx_nfsd_encode_fh_v2()to avoid panic in case the file system get unmounted/dsiabled. * 3967006 (Tracking ID: 3934175) SYMPTOM: 4-node FSS CFS experienced IO hung on all nodes. DESCRIPTION: IO requests are processed in LIFO manner when IO requests are handed off to worker thread in case of low stack space, which is not expected, they should be process in FIFO. RESOLUTION: Modified the code to pick up the older work items from tail of the queue. * 3967030 (Tracking ID: 3947433) SYMPTOM: While adding a volume (part of vset) in already mounted filesystem, fsvoladm displays following error: UX:vxfs fsvoladm: ERROR: V-3-28487: Could not find the volume in vset DESCRIPTION: The code to find the volume in the vset requires the file descriptor of character special device but in the concerned code path, the file descriptor that is being passed is of block device. RESOLUTION: Code changes have been done to pass the file descriptor of character special device. * 3967032 (Tracking ID: 3947648) SYMPTOM: Due to the wrong auto tuning of vxfs_ninode/inode cache, there could be hang observed due to lot of memory pressure. DESCRIPTION: If kernel heap memory is very large(particularly observed from SOLARIS T7 servers), there can be overflow due to smaller size data type. RESOLUTION: Changed the code to handle overflow. * 3967089 (Tracking ID: 3908785) SYMPTOM: System panic observed because of null page address in writeback structure in case of kswapd process. DESCRIPTION: Secfs2/Encryptfs layers had used write VOP as a hook when Kswapd is triggered to free page. Ideally kswapd should call writepage() routine where writeback structure are correctly filled. When write VOP is called because of hook in secfs2/encrypts, writeback structures are cleared, resulting in null page address. RESOLUTION: Code changes has been done to call VxFS kswapd routine only if valid page address is present. * 3949500 (Tracking ID: 3949308) SYMPTOM: In a scenario where FEL caching is enabled, application I/O on a file does not proceed when file system utilization is 100%. DESCRIPTION: When file system capacity is utilized 100%, application I/O on a file does not proceed. This issue occurs because the ENOSPC error handling code path tries to take the Inode ownership lock which is already held by the current thread. As a result, any operation on that file hangs. RESOLUTION: This fix releases the Inode ownership lock and reclaims it after ENOSPC error handling is complete. * 3949506 (Tracking ID: 3949501) SYMPTOM: The sfcache offline command hangs. DESCRIPTION: During Inode inactivation, Inode with FEL dirty flag set gets included in the cluster-wide inactive list instead of the local inactive list. This issue occurs due to an internal error. As a result, the FEL dirty flag does not get cleared and the "sfcache offline" command hangs. RESOLUTION: This fix now includes the Inodes with FEL dirty flag in a local inactive list. * 3949507 (Tracking ID: 3949502) SYMPTOM: When SmartIO FEL-based writeback caching is enabled, memory leak of few bytes may happen during node reconfiguration. DESCRIPTION: FEL recovery is initiated during node reconfiguration during which some internal data structure remains held. Due to this extra hold, the data structures are not freed, which leads to small memory leak. RESOLUTION: This fix now ensures that the hold on the data structure is handled correctly. * 3949508 (Tracking ID: 3949503) SYMPTOM: When FEL is enabled in CFS environment, after a node crash, stale data may be seen on a file. DESCRIPTION: While revoking RWlock after node recovery, the file system ensures that the FEL writes are flushed to the disks and a write completion record is written in FEL log. In scenarios where a node crashes and the write completion record is not updated, FEL writes get replayed during FEL recovery. This may overwrite the writes that may have happened after the revoke, on some other cluster node, resulting in data corruption. RESOLUTION: This fix now writes the completion record after FEL write flush. * 3949509 (Tracking ID: 3949504) SYMPTOM: When SmartIO FEL-based writeback caching is enabled, memory leak of few bytes may happen during node reconfiguration. DESCRIPTION: FEL recovery is initiated during node reconfiguration during which some internal data structure remains held. Due to this extra hold, the data structures are not freed that leads to small memory leak. RESOLUTION: This fix now ensures that the hold on the data structure is handled correctly. * 3949510 (Tracking ID: 3949505) SYMPTOM: When SmartIO FEL-based writeback caching is enabled, I/O operations on a file in filesystem can result in panic after cluster reconfiguration. DESCRIPTION: In an FEL environment, the file system may get mounted after a node recovery, but the FEL device may remain offline. In such a scenario, the FEL related data structures remain inaccessible and the node panics during an application I/O that attempts to access the FEL related data structures. RESOLUTION: This fix checks whether the FEL device recovery has completed before accessing the FEL related data structures. * 3950740 (Tracking ID: 3953165) SYMPTOM: Customer may get generic message or warning in syslog with string as "vxfs:0000:" instead of uniquely numbered message id for VxFS module. DESCRIPTION: Few syslog messages introduced in InfoScale 7.4 release were not given unique message number to identify correct places in the product where they are originated. Instead they are marked with common message identification number "0000". RESOLUTION: This patch fixes syslog messages generated by VxFS module, containing "0000" as the message string and provides them with a unique numbering. * 3952340 (Tracking ID: 3953148) SYMPTOM: Extra extent delegation messages exchanges may be noticed between delegation master and delegation client. DESCRIPTION: If a node is not using delayed allocation or SmartIO based write back cache (Front end Log) and if an extent allocation unit is being revoked while non-dalloc/non-FEL extent allocation is in progress then node may have delegation deficit. This is not correct as node is not using delayed allocation/FEL. RESOLUTION: Fix is to ignore delegation deficit if deficit is on account of non-delayed allocation/non-FEL allocation. INSTALLING THE PATCH -------------------- Run the Installer script to automatically install the patch: ----------------------------------------------------------- Please be noted that the installation of this P-Patch will cause downtime. To install the patch perform the following steps on at least one node in the cluster: 1. Copy the patch infoscale-rhel7_x86_64-Patch-7.4.0.1200.tar.gz to /tmp 2. Untar infoscale-rhel7_x86_64-Patch-7.4.0.1200.tar.gz to /tmp/hf # mkdir /tmp/hf # cd /tmp/hf # gunzip /tmp/infoscale-rhel7_x86_64-Patch-7.4.0.1200.tar.gz # tar xf /tmp/infoscale-rhel7_x86_64-Patch-7.4.0.1200.tar 3. Install the hotfix(Please be noted that the installation of this P-Patch will cause downtime.) # pwd /tmp/hf # ./installVRTSinfoscale740P1200 [ ...] You can also install this patch together with 7.4 base release using Install Bundles 1. Download this patch and extract it to a directory 2. Change to the Veritas InfoScale 7.4 directory and invoke the installer script with -patch_path option where -patch_path should point to the patch directory # ./installer -patch_path [] [ ...] Install the patch manually: -------------------------- #Manual installation is not supported REMOVING THE PATCH ------------------ #Manual uninstallation is not supported SPECIAL INSTRUCTIONS -------------------- NONE OTHERS ------ NONE