* * * READ ME * * * * * * InfoScale 7.4 * * * * * * Patch 1300 * * * Patch Date: 2019-03-22 This document provides the following information: * PATCH NAME * OPERATING SYSTEMS SUPPORTED BY THE PATCH * PACKAGES AFFECTED BY THE PATCH * BASE PRODUCT VERSIONS FOR THE PATCH * SUMMARY OF INCIDENTS FIXED BY THE PATCH * DETAILS OF INCIDENTS FIXED BY THE PATCH * INSTALLATION PRE-REQUISITES * INSTALLING THE PATCH * REMOVING THE PATCH * KNOWN ISSUES PATCH NAME ---------- InfoScale 7.4 Patch 1300 OPERATING SYSTEMS SUPPORTED BY THE PATCH ---------------------------------------- RHEL7 x86-64 PACKAGES AFFECTED BY THE PATCH ------------------------------ VRTSveki VRTSvxvm BASE PRODUCT VERSIONS FOR THE PATCH ----------------------------------- * InfoScale Availability 7.4 * InfoScale Enterprise 7.4 * InfoScale Foundation 7.4 * InfoScale Storage 7.4 SUMMARY OF INCIDENTS FIXED BY THE PATCH --------------------------------------- Patch ID: VRTSveki-7.4.0.1200 * 3972598 (3955519) VRTSvxvm upgrade fails since VRTSveki upgrade fails while using yum upgrade command. Patch ID: VRTSveki-7.4.0.1100 * 3967356 (3967265) Support for RHEL 7.6 and RHEL 7.x RETPOLINE kernels. Patch ID: VRTSvxvm-7.4.0.1500 * 3970687 (3971046) Replication does not switch between synchronous and asynchronous mode automatically based on the network conditions. Patch ID: VRTSvxvm-7.4.0.1400 * 3949320 (3947022) VVR:vxconfigd hang during during /scripts/configuratio/assoc_datavol.tc#6 * 3958838 (3953681) Data corruption issue is seen when more than one plex of volume is detached. * 3958884 (3954787) Data corruption may occur in GCO along with FSS environment on RHEL 7.5 Operating system. * 3958887 (3953711) Panic observed while switching the logowner to slave while IO's are in progress * 3958976 (3955101) Panic observed in GCO environment (cluster to cluster replication) during replication. * 3959204 (3949954) Dumpstack messages are printed when vxio module is loaded for the first time when called blk_register_queue. * 3959433 (3956134) System panic might occur when IO is in progress in VVR (veritas volume replicator) environment. * 3967098 (3966239) IO hang observed while copying data in cloud environments. * 3967099 (3965715) vxconfigd may core dump when VIOM(Veritas InfoScale Operation Manager) is enabled. Patch ID: 7.4.0.1200 * 3949322 (3944259) The vradmin verifydata and vradmin ibc commands fail on private diskgroups with Lost connection error. * 3950578 (3953241) Messages in syslog are seen with message string "0000" for VxVM module. * 3950760 (3946217) In a scenario where encryption over wire is enabled and secondary logging is disabled, vxconfigd hangs and replication does not progress. * 3950799 (3950384) In a scenario where volume encryption at rest is enabled, data corruption may occur if the file system size exceeds 1TB. * 3951488 (3950759) The application I/Os hang if the volume-level I/O shipping is enabled and the volume layout is mirror-concat or mirror-stripe. DETAILS OF INCIDENTS FIXED BY THE PATCH --------------------------------------- This patch fixes the following incidents: Patch ID: VRTSveki-7.4.0.1200 * 3972598 (Tracking ID: 3955519) SYMPTOM: VRTSvxvm upgrade fails since VRTSveki upgrade fails while using yum upgrade command. DESCRIPTION: While using yum for upgrading the packages, dependent packages are updated first and hence VRTSveki is upgraded before VRTSvxvm. The upgrade of VRTSveki fails since the unload of veki module fails because the other vxvm/vxfs modules are still loaded. Since the upgrade fails, the link under the directory /lib/modules/*/vertias/veki is not created and hence the upgrade of the VRTSvxvm fails since the vxdmp module cannot be unloaded. RESOLUTION: Code changes have been done to create the link in the directory even though the successful unload has not happened. Patch ID: VRTSveki-7.4.0.1100 * 3967356 (Tracking ID: 3967265) SYMPTOM: RHEL 7.x RETPOLINE kernels and RHEL 7.6 are not supported DESCRIPTION: Red Hat has released RHEL 7.6 which has RETPOLINE kernel, and also released RETPOLINE kernels for older RHEL 7.x Updates. Veritas Cluster Server kernel modules need to be recompiled with RETPOLINE aware GCC to support RETPOLINE kernel. RESOLUTION: Support for RHEL 7.6 and RETPOLINE kernels on RHEL 7.x kernels is now introduced. Patch ID: VRTSvxvm-7.4.0.1500 * 3970687 (Tracking ID: 3971046) SYMPTOM: Replication does not switch between synchronous and asynchronous mode automatically based on the network conditions. DESCRIPTION: Network conditions may impact the replication performance. However, the current VVR replication does not switch between synchronous and asynchronous mode automatically based on the network conditions. RESOLUTION: This patch provides the adaptive synchronous mode for VVR, which is an enhancement to the existing synchronous override mode. In the adaptive synchronous mode, the replication mode switches from synchronous to asynchronous based on the cross-site network latency. Thus replication happens in the synchronous mode when the network conditions are good, and it automatically switches to the asynchronous mode when there is an increase in the cross-site network latency. You can also set alerts that notify you when the system undergoes network deterioration. For more details, see https://www.veritas.com/bin/support/docRepoServlet?bookId=136858821-137189101-1&requestType=pdf Patch ID: VRTSvxvm-7.4.0.1400 * 3949320 (Tracking ID: 3947022) SYMPTOM: vxconfigd hang. DESCRIPTION: There was a window between NIOs being added to rp_port->pt_waitq and rlink being disconnected where NIOs were left in pt_waitq and hence their parent (ack sio) were not done. The ack sio had IO count which led to the vxconfigd hang. RESOLUTION: Don't add NIO to rp_port->pt_waitq if rp_port->pt_closing is set. Instead call done on the NIO with error ENC_CLOSING.Before deleting the port, call done on the NIOs in pt_waitq with error ENC_CLOSING. * 3958838 (Tracking ID: 3953681) SYMPTOM: Data corruption issue is seen when more than one plex of volume is detached. DESCRIPTION: When a plex of volume gets detached, DETACH map gets enabled in the DCO (Data Change Object). The incoming IO's are tracked in DRL (Dirty Region Log) and then asynchronously copied to DETACH map for tracking. If one more plex gets detached then it might happen that some of the new incoming regions are missed in the DETACH map of the previously detached plex. This leads to corruption when the disk comes back and plex resync happens using corrupted DETACH map. RESOLUTION: Code changes are done to correctly track the IO's in the DETACH map of previously detached plex and avoid corruption. * 3958884 (Tracking ID: 3954787) SYMPTOM: On a RHEL 7.5 FSS environment with GCO configured having NVMe devices and Infiniband network, data corruption might occur when sending the IO from Master to slave node. DESCRIPTION: In the recent RHEL 7.5 release, linux stopped allowing IO on the underlying NVMe device which has gaps in between BIO vectors. In case of VVR, the SRL header of 3 blocks is added to the BIO . When the BIO is sent through LLT to the other node because of LLT limitation of 32 fragments can lead to unalignment of BIO vectors. When this unaligned BIO is sent to the underlying NVMe device, the last 3 blocks of the BIO are skipped and not written to the disk on the slave node. This leads to incomplete data written on the slave node which leads to data corruption. RESOLUTION: Code changes have been done to handle this case and send the BIO aligned to the underlying NVMe device. * 3958887 (Tracking ID: 3953711) SYMPTOM: System might encounter panic while switching the logowner to slave while IO's are in progress with the following stack: vol_rv_service_message_free() vol_rv_replica_reconfigure() sched_clock_cpu() vol_rv_error_handle() vol_rv_errorhandler_callback() vol_klog_start() voliod_iohandle() voliod_loop() voliod_kiohandle() kthread() insert_kthread_work() ret_from_fork_nospec_begin() insert_kthread_work() vol_rv_service_message_free() DESCRIPTION: While processing a transaction, we leave the IO count on the RV object to let the transaction to proceed. In such case we set the RV object in SIO to NULL. But while freeing the message, the object is de-referenced without taking into consideration it can be NULL. This can lead to a panic because of NULL pointer de-reference in code. RESOLUTION: Code changes have been made to handle NULL value of the RV object. * 3958976 (Tracking ID: 3955101) SYMPTOM: Server might panic in a GCO environment with the following stack: nmcom_server_main_tcp() ttwu_do_wakeup() ttwu_do_activate.constprop.90() try_to_wake_up() update_curr() update_curr() account_entity_dequeue() __schedule() nmcom_server_proc_tcp() kthread() kthread_create_on_node() ret_from_fork() kthread_create_on_node() DESCRIPTION: There are recent changes done in the code to handle Dynamic port changes i.e deletion and addition of ports can now happen dynamically. It might happen that while accessing the port, it was deleted in the background by other thread. This would lead to a panic in the code since the port to be accessed has been already deleted. RESOLUTION: Code changes have been done to take care of this situation and check if the port is available before accessing it. * 3959204 (Tracking ID: 3949954) SYMPTOM: Dumpstack messages are printed when vxio module is loaded for the first time when called blk_register_queue. DESCRIPTION: In RHEL 7.5 a new check was added in kernel code in blk_register_queue where if QUEUE_FLAG_REGISTERED was already set on the queue a dumpstack warning message was printed. In vxvm the flag was already set as the flag got copied from the device queue which was earlier registered by the OS. RESOLUTION: Changes are done in VxVM code to avoid copying of QUEUE_FLAG_REGISTERED fix the dumpstack warnings. * 3959433 (Tracking ID: 3956134) SYMPTOM: System panic might occur when IO is in progress in VVR (veritas volume replicator) environment with below stack: page_fault() voliomem_grab_special() volrv_seclog_wsio_start() voliod_iohandle() voliod_loop() kthread() ret_from_fork() DESCRIPTION: In a memory crunch scenario in some cases the memory reservation for SIO (staged IO) in VVR configuration might fail. If this situation occurs then SIO is tried at a later time when the memory becomes available again but while doing some of the fields of SIO are passed NULL values which leads to panic in the VVR code. RESOLUTION: Code changes have been done to pass proper values to IO when it is retired in VVR environment. * 3967098 (Tracking ID: 3966239) SYMPTOM: IO hang observed while copying data to VxVM (Veritas Volume Manager) volumes in cloud environments. DESCRIPTION: In an cloud environment having a mirrored volume with DCO (Data Change Object) attached to the volume, IO's issued on the volume have to process DRL (Dirty region log) which is used for faster recovery on reboot. When IO processing is happening on DRL, there is an condition in code which can lead to IO's not being driven through DRL resulting in a hang situation. Further IO's keep on queuing waiting for the IO's on the DRL to complete leading to a hang situation. RESOLUTION: Code changes have been done to resolve the condition which leads to IO's not being driven through DRL. * 3967099 (Tracking ID: 3965715) SYMPTOM: vxconfigd may core dump when VIOM(Veritas InfoScale Operation Manager) is enabled. Following is stack trace: #0 0x00007ffff7309d22 in ____strtoll_l_internal () from /lib64/libc.so.6 #1 0x000000000059c61f in ddl_set_vom_discovered_attr () #2 0x00000000005a4230 in ddl_find_devices_in_system () #3 0x0000000000535231 in find_devices_in_system () #4 0x0000000000535530 in mode_set () #5 0x0000000000477a73 in setup_mode () #6 0x0000000000479485 in main () DESCRIPTION: vxconfigd daemon reads Json data generated by VIOM for dynamically updating some of the VxVM disk(LUN) attributes. While accessing this data it wrongly parsed LUN size attribute as string which returns NULL instead of returning LUN size. Accessing this NULL value leads to vxconfigd daemon to core dump and generates segmentation fault. RESOLUTION: Appropriate changes are done to handle the LUN size attribute correctly. Patch ID: 7.4.0.1200 * 3949322 (Tracking ID: 3944259) SYMPTOM: The vradmin verifydata and vradmin ibc commands fail on private diskgroups with Lost connection error. DESCRIPTION: This issue occurs because of a deadlock between the IBC mechanism and the ongoing I/Os on the secondary RVG. IBC mechanism expects I/O transfer to secondary in a sequential order, however to improve performance I/Os are now written in parallel. The mismatch in IBC behavior causes a deadlock and the vradmin verifydata and vradmin ibc fail due to time out error. RESOLUTION: As a part of this fix, IBC behavior is now improved such that it now considers parallel and possible out-of-sequence I/O writes to the secondary. * 3950578 (Tracking ID: 3953241) SYMPTOM: Customer may get generic message or warning in syslog with string as "vxvm:0000: " instead of uniquely numbered message id for VxVM module. DESCRIPTION: Few syslog messages introduced in InfoScale 7.4 release were not given unique message number to identify correct places in the product where they are originated. Instead they are marked with common message identification number "0000". RESOLUTION: This patch fixes syslog messages generated by VxVM module, containing "0000" as the message string and provides them with a unique numbering. * 3950760 (Tracking ID: 3946217) SYMPTOM: In a scenario where encryption over wire is enabled and secondary logging is disabled, vxconfigd hangs and replication does not progress. DESCRIPTION: In a scenario where encryption over wire is enabled and secondary logging is disabled, the application I/Os are encrypted in a sequence, but are not written to the secondary in the same order. The out-of-sequence and in-sequence I/Os are stuck in a loop, waiting for each other to complete. Due to this, I/Os are left incomplete and eventually hang. As a result, the vxconfigd hangs and the replication does not progress. RESOLUTION: As a part of this fix, the I/O encryption and write sequence is improved such that all I/Os are first encrypted and then sequentially written to the secondary. * 3950799 (Tracking ID: 3950384) SYMPTOM: In a scenario where volume data encryption at rest is enabled, data corruption may occur if the file system size exceeds 1TB and the data is located in a file extent which has an extent size bigger than 256KB. DESCRIPTION: In a scenario where data encryption at rest is enabled, data corruption may occur when both the following cases are satisfied: - File system size is over 1TB - The data is located in a file extent which has an extent size bigger than 256KB This issue occurs due to a bug which causes an integer overflow for the offset. RESOLUTION: As a part of this fix, appropriate code changes have been made to improve data encryption behavior such that the data corruption does not occur. * 3951488 (Tracking ID: 3950759) SYMPTOM: The application I/Os hang if the volume-level I/O shipping is enabled and the volume layout is mirror-concat or mirror-stripe. DESCRIPTION: In a scenario where an application I/O is issued over a volume that has volume-level I/O shipping enabled, the I/O is shipped to all target nodes. Typically, on the target nodes, the I/O must be sent only to the local disk. However, in case of mirror-concat or mirror-stripe volumes, I/Os are sent to remote disks as well. This at times leads in to an I/O hang. RESOLUTION: As a part of this fix, I/O once shipped to the target node is restricted to only locally connected disks and remote disks are skipped. INSTALLING THE PATCH -------------------- Run the Installer script to automatically install the patch: ----------------------------------------------------------- Please be noted that the installation of this P-Patch will cause downtime. To install the patch perform the following steps on at least one node in the cluster: 1. Copy the patch infoscale-rhel7_x86_64-Patch-7.4.0.1300.tar.gz to /tmp 2. Untar infoscale-rhel7_x86_64-Patch-7.4.0.1300.tar.gz to /tmp/hf # mkdir /tmp/hf # cd /tmp/hf # gunzip /tmp/infoscale-rhel7_x86_64-Patch-7.4.0.1300.tar.gz # tar xf /tmp/infoscale-rhel7_x86_64-Patch-7.4.0.1300.tar 3. Install the hotfix(Please be noted that the installation of this P-Patch will cause downtime.) # pwd /tmp/hf # ./installVRTSinfoscale740P1300 [ ...] You can also install this patch together with 7.4 base release using Install Bundles 1. Download this patch and extract it to a directory 2. Change to the Veritas InfoScale 7.4 directory and invoke the installer script with -patch_path option where -patch_path should point to the patch directory # ./installer -patch_path [] [ ...] Install the patch manually: -------------------------- Manual installation is not recommended. REMOVING THE PATCH ------------------ Manual uninstallation is not recommended. KNOWN ISSUES ------------ * Tracking ID: 3973781 SYMPTOM: In CVR scenario where Adaptive Sync mode enabled and Application is running (IOs are active) on primary Slave node. Here, replication is in synchronous mode. Now if network latency consistently cross desired iotimeout value, hence replication mode changes to asyncrounous mode. Post that if there is a replication network disconnect or astop replicationa done. This may result in primary master node panic with following stack trace. Above issue is not faced, if replication continues in adaptive synchronous mode. vol_ru_allocate vol_rv_write1_start voliod_iohandle voliod_loop kthread+0xd1/0xe0 ret_from_fork_nospec_begin WORKAROUND: None. SPECIAL INSTRUCTIONS -------------------- NONE OTHERS ------ NONE