* * * READ ME * * * * * * Veritas Volume Manager 7.3.1 * * * * * * Patch 100 * * * Patch Date: 2018-03-13 This document provides the following information: * PATCH NAME * OPERATING SYSTEMS SUPPORTED BY THE PATCH * PACKAGES AFFECTED BY THE PATCH * BASE PRODUCT VERSIONS FOR THE PATCH * SUMMARY OF INCIDENTS FIXED BY THE PATCH * DETAILS OF INCIDENTS FIXED BY THE PATCH * INSTALLATION PRE-REQUISITES * INSTALLING THE PATCH * REMOVING THE PATCH PATCH NAME ---------- Veritas Volume Manager 7.3.1 Patch 100 OPERATING SYSTEMS SUPPORTED BY THE PATCH ---------------------------------------- RHEL7 x86-64 PACKAGES AFFECTED BY THE PATCH ------------------------------ VRTSvxvm BASE PRODUCT VERSIONS FOR THE PATCH ----------------------------------- * Veritas InfoScale Foundation 7.3.1 * Veritas InfoScale Storage 7.3.1 * Veritas InfoScale Enterprise 7.3.1 SUMMARY OF INCIDENTS FIXED BY THE PATCH --------------------------------------- Patch ID: 7.3.1.100 * 3932464 (3926976) Frequent loss of VxVM functionality due to vxconfigd unable to validate license. * 3933874 (3852146) Shared DiskGroup(DG) fails to import when "-c" and "-o noreonline" options are specified together * 3933875 (3872585) System panics with storage key exception. * 3933876 (3894657) VxVM commands may hang when using space optimized snapshot. * 3933877 (3914789) System may panic when reclaiming on secondary in VVR environment. * 3933878 (3918408) Data corruption when volume grow is attempted on thin reclaimable disks whose space is just freed. * 3933880 (3864063) Application I/O hangs because of a race between the Master Pause SIO (Staging I/O) and the Error Handler SIO. * 3933882 (3865721) Vxconfigd may hang while pausing the replication in CVR(cluster Veritas Volume Replicator) environment. * 3933883 (3867236) Application IO hang happens because of a race between Master Pause SIO(Staging IO) and RVWRITE1 SIO. * 3933884 (3868154) When DMP Native Support is set to ON, dmpnode with multiple VGs cannot be listed properly in the 'vxdmpadm native ls' command * 3933889 (3879234) dd read on the Veritas Volume Manager (VxVM) character device fails with Input/Output error while accessing end of device. * 3933890 (3879324) VxVM DR tool fails to handle busy device problem while LUNs are removed from OS * 3933897 (3907618) vxdisk resize leads to data corruption on filesystem * 3933898 (3908987) False vxrelocd messages being generated by joining CVM slave. * 3933900 (3915523) Local disk from other node belonging to private DG(diskgroup) is exported to the node when a private DG is imported on current node. * 3933904 (3921668) vxrecover command with -m option fails when executed on the slave nodes. * 3933907 (3873123) If the disk with CDS EFI label is used as remote disk on the cluster node, restarting the vxconfigd daemon on that particular node causes vxconfigd to go into disabled state * 3933910 (3910228) Registration of GAB(Global Atomic Broadcast) port u fails on slave nodes after multiple new devices are added to the system. * 3933911 (3925377) Not all disks could be discovered by DMP after first startup. * 3937540 (3906534) After enabling DMP (Dynamic Multipathing) Native support, enable /boot to be mounted on DMP device. * 3937541 (3911930) Provide a way to clear the PGR_FLAG_NOTSUPPORTED on the device instead of using exclude/include commands * 3937542 (3917636) Filesystems from /etc/fstab file are not mounted automatically on boot through systemd on RHEL7 and SLES12. * 3937549 (3934910) DRL map leaks during snapshot creation/removal cycle with dg reimport. * 3937550 (3935232) Replication and IO hang during master takeover because of racing between log owner change and master switch. * 3937808 (3931936) VxVM(Veritas Volume Manager) command hang on master node after restarting slave node. * 3937811 (3935974) When client process shuts down abruptly or resets connection during communication with the vxrsyncd daemon, it may terminate vxrsyncd daemon. * 3940039 (3897047) Filesystems are not mounted automatically on boot through systemd on RHEL7 and SLES12. * 3940143 (3941037) VxVM (Veritas Volume Manager) creates some required files under /tmp and /var/tmp directories. These directories could be modified by non-root users and will affect the Veritas Volume Manager Functioning. DETAILS OF INCIDENTS FIXED BY THE PATCH --------------------------------------- This patch fixes the following incidents: Patch ID: 7.3.1.100 * 3932464 (Tracking ID: 3926976) SYMPTOM: Excessive number of connections are found in open state causing FD leak and eventually reporting license errors. DESCRIPTION: The vxconfigd reports license errors as it fails to open the license files. The failure to open is due to FD exhaustion, caused by excessive FIFO connections left in open state. The FIFO connections used to communicate with vxconfigd by clients (vx commands). Usually these should get closed once the client exits. One of such client "vxdclid" which is a daemon connecting frequently and leaving the connection is open state, causing FD leak. This issue is applicable to Solaris platform only. RESOLUTION: One of the API, a library call is leaving the connection in open state while leaving, which is fixed. * 3933874 (Tracking ID: 3852146) SYMPTOM: In a CVM cluster, when importing a shared diskgroup specifying both -c and -o noreonline options, the following error may be returned: VxVM vxdg ERROR V-5-1-10978 Disk group : import failed: Disk for disk group not found. DESCRIPTION: The -c option will update the disk ID and disk group ID on the private region of the disks in the disk group being imported. Such updated information is not yet seen by the slave because the disks have not been re-onlined (given that noreonline option is specified). As a result, the slave cannot identify the disk(s) based on the updated information sent from the master, causing the import to fail with the error Disk for disk group not found. RESOLUTION: The code is modified to handle the working of the "-c" and "-o noreonline" options together. * 3933875 (Tracking ID: 3872585) SYMPTOM: System running with VxFS and VxVM panics with storage key exception with the following stack: simple_lock dispatch flih_util touchrc pin_seg_range pin_com pinx_plock plock_pinvec plock mfspurr_sc_flih01 DESCRIPTION: The xntpd process mounted on a vxfs filesystem could panic with storage key exception. The xntpd binary page faulted and did an IO, after which the storage key exception was detected OS as it couldn't locate it's keyset. From the code review it was found that in a few error cases in the vxvm, the storage key may not be restored after they're replaced. RESOLUTION: Do storage key restore even when in the error cases in vxio and dmp layer. * 3933876 (Tracking ID: 3894657) SYMPTOM: VxVM commands may hang when using space optimized snapshot. DESCRIPTION: If there is a volume with DRL enabled having space optimized and mirrored cache object volume which DRL enabled, VxVM commands may hang. If the IO load on the volume is high it can lead to memory crunch as memory stabilization is done when DRL(Dirty Region Logging) is enabled. The IOs in the queue may wait for memory to become free. In the meantime, other VxVM commands which require changing the configuration of the volumes may hang because of IO not able to proceed. RESOLUTION: Memory stabilization is not required for VxVM generated internal IO's for cache object volume. Code changes have be done to eliminate memory stabilization for cache object IOs. * 3933877 (Tracking ID: 3914789) SYMPTOM: System may panic when reclaiming on secondary in VVR(Veritas Volume Replicator) environment. It's due to accessing invalid address, error message is similiar to "data access MMU miss". DESCRIPTION: VxVM maintains a linked list to keep memory segment information. When accessing its content with certain offset, linked list is traversed. Due to code defect when offset is equal to segment chunk size, end of such segement is returned instead of start of next segment. It can result silent memory corruption because it tries to access memory out of its boundary. System can panic when out of boundary address isn't allocated yet. RESOLUTION: Code changes have been made to fix the out-of-boundary access. * 3933878 (Tracking ID: 3918408) SYMPTOM: Data corruption when volume grow is attempted on thin reclaimable disks whose space is just freed. DESCRIPTION: When the space in the volume is freed by deleting some data or subdisks, the corresponding subdisks are marked for reclamation. It might take some time for the periodic reclaim task to start if not issued manually. In the meantime, if same disks are used for growing another volume, it can happen that reclaim task will go ahead and overwrite the data written on the new volume. Because of this race condition between reclaim and volume grow operation, data corruption occurs. RESOLUTION: Code changes are done to handle race condition between reclaim and volume grow operation. Also reclaim is skipped for those disks which have been already become part of new volume. * 3933880 (Tracking ID: 3864063) SYMPTOM: Application I/O hangs after the Master Pause command is issued. DESCRIPTION: Some flags (VOL_RIFLAG_DISCONNECTING or VOL_RIFLAG_REQUEST_PENDING) in VVR (Veritas Volume Replicator) kernel are not cleared because of a race between the Master Pause SIO and the Error Handler SIO. This causes the RU (Replication Update) SIO to fail to proceed, which leads to I/O hang. RESOLUTION: The code is modified to handle the race condition. * 3933882 (Tracking ID: 3865721) SYMPTOM: Vxconfigd hang in dealing transaction while pausing the replication in Clustered VVR environment. DESCRIPTION: In Clustered VVR (CVM VVR) environment, while pausing replication which is in DCM (Data Change Map) mode, the master pause SIO (staging IO) can not finish serialization since there are metadata shipping SIOs in the throttle queue with the activesio count added. Meanwhile, because master pause SIOs SERIALIZE flag is set, DCM flush SIO can not be started to flush the throttle queue. It leads to a dead loop hang state. Since the master pause routine needs to sync up with transaction routine, vxconfigd hangs in transaction. RESOLUTION: Code changes were made to flush the metadata shipping throttle queue if master pause SIO can not finish serialization. * 3933883 (Tracking ID: 3867236) SYMPTOM: Application IO hang happens after issuing Master Pause command. DESCRIPTION: The flag VOL_RIFLAG_REQUEST_PENDING in VVR(Veritas Volume Replicator) kernel is not cleared because of a race between Master Pause SIO and RVWRITE1 SIO resulting in RU (Replication Update) SIO to fail to proceed thereby causing IO hang. RESOLUTION: Code changes have been made to handle the race condition. * 3933884 (Tracking ID: 3868154) SYMPTOM: When DMP Native Support is set to ON, and if a dmpnode has multiple VGs, 'vxdmpadm native ls' shows incorrect VG entries for dmpnodes. DESCRIPTION: When DMP Native Support is set to ON, multiple VGs can be created on a disk as Linux supports creating VG on a whole disk as well as on a partition of a disk.This possibility was not handled in the code, hence the display of 'vxdmpadm native ls' was getting messed up. RESOLUTION: Code now handles the situation of multiple VGs of a single disk * 3933889 (Tracking ID: 3879234) SYMPTOM: dd read on the Veritas Volume Manager (VxVM) character device fails with Input/Output error while accessing end of device like below: [root@dn pmansukh_debug]# dd if=/dev/vx/rdsk/hfdg/vol1 of=/dev/null bs=65K dd: reading `/dev/vx/rdsk/hfdg/vol1': Input/output error 15801+0 records in 15801+0 records out 1051714560 bytes (1.1 GB) copied, 3.96065 s, 266 MB/s DESCRIPTION: The issue occurs because of the change in the Linux API generic_file_aio_read. Because of lot of changes in Linux API generic_file_aio_read, it does not properly handle end of device reads/writes. The Linux code has been changed to use blkdev_aio_read which is a GPL symbol and hence cannot be used. RESOLUTION: Made changes in the code to handle end of device reads/writes properly. * 3933890 (Tracking ID: 3879324) SYMPTOM: VxVM(Veritas Volume Manager) DR(Dynamic Reconfiguration) tool fails to handle busy device problem while LUNs are removed from OS DESCRIPTION: OS devices may still be busy after removing them from OS, it fails 'luxadm - e offline ' operation and leaves staled entries in 'vxdisk list' output like: emc0_65535 auto - - error emc0_65536 auto - - error RESOLUTION: Code changes have been done to address busy devices issue. * 3933897 (Tracking ID: 3907618) SYMPTOM: vxdisk resize leads to data corruption on filesystem with MSDOS labelled disk having VxVM sliced format. DESCRIPTION: vxdisk resize changes the geometry on the device if required. When vxdisk resize is in progress, absolute offsets i.e offsets starting from start of the device are used. For MSDOS labelled disk, the full disk is devoted on Slice 4 but not slice 0. Thus when IO is scheduled on the device an extra 32 sectors gets added to the IO which is not required since we are already starting the IO from start of the device. This leads to data corruption since the IO on the device shifted by 32 sectors. RESOLUTION: Code changes have been made to not add 32 sectors to the IO when vxdisk resize is in progress to avoid corruption. * 3933898 (Tracking ID: 3908987) SYMPTOM: The following unnecessary error message is printed to inform customer hot relocation will be performed on master mode. VxVM vxrelocd INFO V-5-2-6551 hot-relocation operation for shared disk group will be performed on master node. DESCRIPTION: In case there're failed disks the message will be printed. Because related code is not placed in right position, it's printed even if there's no failed disks. RESOLUTION: Code changes have been make to fix the issue. * 3933900 (Tracking ID: 3915523) SYMPTOM: Local disk from other node belonging to private DG is exported to the node when a private DG is imported on current node. DESCRIPTION: When we try to import a DG, all the disks belonging to the DG are automatically exported to the current node so as to make sure that the DG gets imported. This is done to have same behaviour as SAN with local disks as well. Since we are exporting all disks in the DG, then it happens that disks which belong to same DG name but different private DG on other node get exported to current node as well. This leads to wrong disk getting selected while DG gets imported. RESOLUTION: Instead of DG name, DGID (diskgroup ID) is used to decide whether disk needs to be exported or not. * 3933904 (Tracking ID: 3921668) SYMPTOM: Running the vxrecover command with -m option fails when run on the slave node with message "The command can be executed only on the master." DESCRIPTION: The issue occurs as currently vxrecover -g -m command on shared disk groups is not shipped using the command shipping framework from CVM (Cluster Volume Manager) slave node to the master node. RESOLUTION: Implemented code change to ship the vxrecover -m command to the master node, when its triggered from the slave node. * 3933907 (Tracking ID: 3873123) SYMPTOM: When remote disk on node is EFI disk, vold enable fails. And following message get logged, and eventually causing the vxconfigd to go into disabled state: Kernel and on-disk configurations don't match; transactions are disabled. DESCRIPTION: This is becasue one of the cases of EFI remote disk is not properly handled in disk recovery part when vxconfigd is enabled. RESOLUTION: Code changes have been done to set the EFI flag on darec in recovery code * 3933910 (Tracking ID: 3910228) SYMPTOM: Registration of GAB(Global Atomic Broadcast) port u fails on slave nodes after multiple new devices are added to the system.. DESCRIPTION: Vxconfigd sends command to GAB for port u registration and waits for a respnse from GAB. During this timeframe if the vxconfigd is interrupted by any other module apart from GAB then it will not be able to receive the signal from GAB of successful registration. Since the signal is not received, vxconfigd believes the registration did not succeed and treats it as a failure. RESOLUTION: Mask the signals which vxconfigd can receive before waiting for the signal from GAB for registration of gab u port. * 3933911 (Tracking ID: 3925377) SYMPTOM: Not all disks could be discovered by Dynamic Multi-Pathing(DMP) after first startup.. DESCRIPTION: DMP is started too earlier in the boot process if iSCSI and raw haven't been installed. Till that point the FC devices are not recognized by OS, hence DMP misses FC devices. RESOLUTION: The code is modified to make sure DMP get started after OS disk discovery. * 3937540 (Tracking ID: 3906534) SYMPTOM: After enabling DMP (Dynamic Multipathing) Native support, enable /boot to be mounted on DMP device. DESCRIPTION: Currently /boot is mounted on top of OS (Operating System) device. When DMP Native support is enabled, only VG's (Volume Groups) are migrated from OS device to DMP device.This is the reason /boot is not migrated to DMP device. With this if OS device path is not available then system becomes unbootable since /boot is not available. Thus it becomes necessary to mount /boot on DMP device to provide multipathing and resiliency. RESOLUTION: Code changes have been done to migrate /boot on top of DMP device when DMP Native support is enabled. Note - The code changes are currently implemented for RHEL-6 only. For other linux platforms, /boot will still not be mounted on the DMP device * 3937541 (Tracking ID: 3911930) SYMPTOM: Valid PGR operations sometimes fail on a dmpnode. DESCRIPTION: As part of the PGR operations, if the inquiry command finds that PGR is not supported on the dmpnode node, a flag PGR_FLAG_NOTSUPPORTED is set on the dmpnode. Further PGR operations check this flag and issue PGR commands only if this flag is NOT set. This flag remains set even if the hardware is changed so as to support PGR. RESOLUTION: A new command (namely enablepr) is provided in the vxdmppr utility to clear this flag on the specified dmpnode. * 3937542 (Tracking ID: 3917636) SYMPTOM: Filesystems from /etc/fstab file are not mounted automatically on boot through systemd on RHEL7 and SLES12. DESCRIPTION: While bootup, when systemd tries to mount using the devices mentioned in /etc/fstab file on the device, the device is not accessible leading to the failure of the mount operation. As the device discovery happens through udev infrastructure, the udev-rules for those devices need to be run when volumes are created so that devices get registered with systemd. In the case udev rules are executed even before the devices in "/dev/vx/dsk" directory are created. Since the devices are not created, devices will not be registered with systemd leading to the failure of mount operation. RESOLUTION: Run "udevadm trigger" to execute all the udev rules once all volumes are created so that devices are registered. * 3937549 (Tracking ID: 3934910) SYMPTOM: IO errors on data volume or file system happen after some cycles of snapshot creation/removal with dg reimport. DESCRIPTION: With the snapshot of the data volume removal and the dg reimport, the DRL map keep active rather than to be inactivated. With the new snapshot created, the DRL would be re-enabled and new DRL map allocated with the first write to the data volume. The original active DRL map would not be used and leaked. After some such cycles, the extent of the DCO volume would be exhausted due to the active but not be used DRL maps, then no more DRL map could be allocated and the IOs would be failed or unable to be issued on the data volume. RESOLUTION: Code changes are done to inactivate the DRL map if the DRL is disabled during the volume start, then it could be reused later safely. * 3937550 (Tracking ID: 3935232) SYMPTOM: Replication and IO hang may happen on new master node during master takeover. DESCRIPTION: During master switch is in progress if log owner change kicks in, flag VOLSIO_FLAG_RVC_ACTIVE will be set by log owner change SIO. RVG(Replicated Volume Group) recovery initiated by master switch will clear flag VOLSIO_FLAG_RVC_ACTIVE after RVG recovery done. When log owner change done, as flag VOLSIO_FLAG_RVC_ACTIVE has been cleared, resetting flag VOLOBJ_TFLAG_VVR_QUIESCE is skipped. The present of flag VOLOBJ_TFLAG_VVR_QUIESCE will make replication and application IO on RVG always be in pending state. RESOLUTION: Code changes have been done to make log owner change wait until master switch completed. * 3937808 (Tracking ID: 3931936) SYMPTOM: In FSS(Flexible Storage Sharing) environment, after restarting slave node VxVM command on master node hang result in failed disks on slave node could not rejoin disk group. DESCRIPTION: While lost remote disks on slave node comes back, online these disk and add them to disk group operations are performed on master node. Disk online includes operations from both master and slave node. On slave node these disks should be offlined then reonlined, but due to code defect reonline disks are missed result in these disks are kept in reonlining state. The following add disk to disk group operation needs to issue private region IOs on the disk. These IOs are shipped to slave node to complete. As the disks are in reonline state, busy error gets returned and remote IOs keep retrying, hence VxVM command hang on master node. RESOLUTION: Code changes have been made to fix the issue. * 3937811 (Tracking ID: 3935974) SYMPTOM: While communicating with client process, vxrsyncd daemon terminates and after sometime it gets started or may require a reboot to start. DESCRIPTION: When the client process shuts down abruptly and vxrsyncd daemon attempt to write on the client socket, SIGPIPE signal is generated. The default action for this signal is to terminate the process. Hence vxrsyncd gets terminated. RESOLUTION: This SIGPIPE signal should be handled in order to prevent the termination of vxrsyncd. * 3940039 (Tracking ID: 3897047) SYMPTOM: Filesystems are not mounted automatically on boot through systemd on RHEL7 and SLES12. DESCRIPTION: When systemd service tries to start all the FS in /etc/fstab, the Veritas Volume Manager (VxVM) volumes are not started since vxconfigd is still not up. The VxVM volumes are started a little bit later in the boot process. Since the volumes are not available, the FS are not mounted automatically at boot. RESOLUTION: Registered the VxVM volumes with UDEV daemon of Linux so that the FS would be mounted when the VxVM volumes are started and discovered by udev. * 3940143 (Tracking ID: 3941037) SYMPTOM: VxVM (Veritas Volume Manager) creates some required files under /tmp and /var/tmp directories. DESCRIPTION: VxVM (Veritas Volume Manager) creates some .lock files under /etc/vx directory. The non-root users have access to these .lock files, and they may accidentally modify, move or delete those files. Such actions may interfere with the normal functioning of the Veritas Volume Manager. RESOLUTION: This Fix address the issue by masking the write permission for non-root users for these .lock files. INSTALLING THE PATCH -------------------- Run the Installer script to automatically install the patch: ----------------------------------------------------------- Please be noted that the installation of this P-Patch will cause downtime. To install the patch perform the following steps on at least one node in the cluster: 1. Copy the patch vm-rhel7_x86_64-Patch-7.3.1.100.tar.gz to /tmp 2. Untar vm-rhel7_x86_64-Patch-7.3.1.100.tar.gz to /tmp/hf # mkdir /tmp/hf # cd /tmp/hf # gunzip /tmp/vm-rhel7_x86_64-Patch-7.3.1.100.tar.gz # tar xf /tmp/vm-rhel7_x86_64-Patch-7.3.1.100.tar 3. Install the hotfix(Please be noted that the installation of this P-Patch will cause downtime.) # pwd /tmp/hf # ./installVRTSvxvm731P100 [ ...] You can also install this patch together with 7.3.1 maintenance release using Install Bundles 1. Download this patch and extract it to a directory 2. Change to the Veritas InfoScale 7.3.1 directory and invoke the installer script with -patch_path option where -patch_path should point to the patch directory # ./installer -patch_path [] [ ...] Install the patch manually: -------------------------- 1.Before-the-upgrade :- (a) Stop I/Os to all the VxVM volumes. (b) Umount any filesystems with VxVM volumes. (c) Stop applications using any VxVM volumes. 2.Check whether root support or DMP native support is enabled or not: # vxdmpadm gettune dmp_native_support If the current value is "on", DMP native support is enabled on this machine. If disabled: goto step 4. If enabled: goto step 3. 3.If DMP native support is enabled: a.It is essential to disable DMP native support. Run the following command to disable DMP native support # vxdmpadm settune dmp_native_support=off b.Reboot the system # reboot 4.Select the appropriate RPMs for your system, and upgrade to the new patch. # rpm -Uvh VRTSvxvm-7.3.1.100-RHEL7.x86_64.rpm 5.Run vxinstall to get VxVM configured # vxinstall 6.If DMP Native Support was enabled before patch upgrade, enable it back. a. Run the following command to enable DMP native support # vxdmpadm settune dmp_native_support=on b. Reboot the system # reboot REMOVING THE PATCH ------------------ # rpm -e rpm-name SPECIAL INSTRUCTIONS -------------------- NONE OTHERS ------ NONE