README VERSION : 1.1 README CREATION DATE : 2018-04-03 PATCH-ID : 7.3.1.001 PATCH NAME : VA-7.3.1.001 REQUIRED PATCHES : NONE INCOMPATIBLE PATCHES : NONE SUPPORTED PADV : rhel7.3_x86_64, rhel7.4_x86_64, OL7.3_x86_64, OL7.4_x86_64 (P-PLATFORM , A-ARCHITECTURE , D-DISTRIBUTION , V-VERSION) PATCH CRITICALITY : Optional HAS KERNEL COMPONENT : YES ID : NONE PATCH INSTALLATION INSTRUCTIONS: ----------------------------------------- For detailed installation instructions : Please refer to : https://origin-www.veritas.com/content/support/en_US/doc/130196629-130196633-1 For detailed instructions on Upgrading Veritas Access, please refer to "Chapter 10 : Upgrading Veritas Access using a rolling upgrade". SPECIAL INSTRUCTIONS: ----------------------------------------- 1. Extract the tarball 2. Rolling upgrade can be started using below command # ./installaccess -rolling_upgrade 3. Patch can be upgraded from VA-7.3.1 release only. 4. Make sure that upgrade is performed one node at a time even though installer tries to upgrade multiple nodes at a time. For example, # ./installaccess -rolling_upgrade Veritas Access 7.3.1.001 Rolling Upgrade Program Copyright (c) 2018 Veritas Technologies LLC. All rights reserved. Veritas and the Veritas Logo are trademarks or registered trademarks of Veritas Technologies LLC or its affiliates in the U.S. and other countries. Other names may be trademarks of their respective owners. The Licensed Software and Documentation are deemed to be "commercial computer software" and "commercial computer software documentation" as defined in FAR Sections 12.212 and DFARS Section 227.7202. Logs are being written to /var/tmp/installaccess-201804030721cXT while installaccess is in progress. Enter the system name of the cluster on which you would like to perform rolling upgrade [q,?] (fss7310_01) Checking communication on fss7310_01 .................................................................................... Done Checking rolling upgrade prerequisites on fss7310_01 .................................................................... Done Veritas Access 7.3.1.001 Rolling Upgrade Program Cluster information verification: Cluster Name: fss7310 Cluster ID Number: 61886 Systems: fss7310_01 fss7310_02 fss7310_03 fss7310_04 Would you like to perform rolling upgrade on the cluster? [y, n, q] (y) Rolling upgrade phase 1 upgrades all VRTS product packages except non-kernel packages. Rolling upgrade phase 2 upgrades all non-kernel packages including: VRTSvcs VRTScavf VRTSvcsag VRTSvcsea VRTSvbs VRTSnas Checking communication on fss7310_02 .................................................................................... Done Checking rolling upgrade prerequisites on fss7310_02 .................................................................... Done Checking communication on fss7310_03 .................................................................................... Done Checking rolling upgrade prerequisites on fss7310_03 .................................................................... Done Checking communication on fss7310_04 .................................................................................... Done Checking rolling upgrade prerequisites on fss7310_04 .................................................................... Done Checking the product compatibility of the nodes in the cluster .......................................................... Done Rolling upgrade phase 1 is performed on the system(s) fss7310_03. It is recommended to perform rolling upgrade phase 1 on the remaining system(s) fss7310_01 fss7310_02 fss7310_04. Would you like to perform rolling upgrade phase 1 on the recommended system(s)? [y, n, q] (y) n Do you want to quit without phase 1 performed on all systems? [y, n, q] (n) n Enter the system names separated by spaces on which you want to perform rolling upgrade: [q,?] fss7310_02 5. If file systems are online during upgrade, please make sure that recovery is finished before starting the upgrade of new node. To check the recovery process, please check below command : # vxtask list Recovery will be triggered in ~3-5 minutes after node has joined the cluster. 6. Before starting upgrade please make sure that none of the services are in "FAILED/FAULTED/W_ONLINE" state. 7. Fresh installation can also be done using this patch. Please refer to https://origin-www.veritas.com/content/support/en_US/doc/130196629-130196633-1 for more details SUMMARY OF FIXED ISSUES: ----------------------------------------- Patch ID: 7.3.1.001 IA-9843 "storage fs create" taking lot of time to create file systems. IA-9839 "storage fs list" taking lot of time to list all the file systems. IA-9838 vxprint/vxdisk commands running slowly IA-11243 "Storage fs checkmirror" taking longer in large environments. IA-10216 GUI discovery taking longer affecting other system operations IA-10973 CLISH commands hang when private NIC fails IA-10942 Linux network OS tunables not persistent across reboots IA-11338 File system creation failing if the number of Volume objects getting created are very high. IA-9840 Cluster reboot all leaving FSS cluster in inconsistent state IA-11405 Some of the Plexes in the volumes may remain in IOFAIL state after reboot. IA-10946 NIC failure event was not recorded in the event monitoring. IA-11237 Inconsistent event monitoring in case NODE offline/online events. IA-10375 Unable to online IP address on newly added node, if any filesystem has quota set on it IA-11058 Recursive empty directories created in /shared/knfsv4 after a node reboots multiple times IA-11034 striped-mirrored volumes are created with DCO by default IA-11051 User not be able to set WORM retention. IA-11072 Volume recoveries started after cluster stop operations. IA-11307 User not able to destroy FS in an isolated pool. IA-10379 sosreport is not collected in evidences IA-11402 Display events related to disk/plex similar to GUI in clish also. IA-9847 vxddladm addjbod was leading to random devices having udid_mismatch IA-11502 Fix Corruption issue for Erasure coded volume after cluster restart DETAILS OF INCIDENTS FIXED BY THE PATCH ----------------------------------------- Patch ID: 7.3.1.001 * TRACKING ID: IA-9843 ONE_LINE_ABSTRACT: "storage fs create" taking lot of time to create file systems. SYMPTOM : "storage fs create" taking lot of time to create file systems. DESCRIPTION : The time taken for command “storage fs create” was increasing as the number of File systems increase because of redundant code. RESOLUTION: Optimized the "storage fs create" operation to reduce time taken by storage fs create. * TRACKING ID: IA-9839 ONE_LINE_ABSTRACT: "storage fs list" taking lot of time to list all the file systems. SYMPTOM: "storage fs list" taking lot of time to list the file systems. DESCRIPTION: There was lot of redundant code which was invoking lot of back-end commands to fetch the data. This was causing "storage fs list" to take long time. RESOLUTION: Code optimized to make "storage fs list" command to run faster. * TRACKING ID: IA-9838 ONE_LINE_ABSTRACT : vxprint/vxdisk commands running slowly SYMPTOM : vxprint/vxdisk commands running slowly DESCRIPTION : These are internal commands which were taking longer to run as they were fetching unnecessary records. RESOLUTION: Optimized the commands to fetch required records only. * TRACKING ID: IA-11243 SYMPTOM : "Storage fs checkmirror" taking longer in large environments. DESCRIPTION: There was lot of redundant code which was invoking lot of back-end commands to fetch the data. This was causing "storage fs checkmirror" to take long time. RESOLUTION: Code is optimized to run "storage fs checkmirror" faster. * TRACKING ID: IA-10216 ONE_LINE_ABSTRACT : GUI discovery taking longer affecting other system operations SYMPTOM : GUI discovery taking longer affecting other system operations DESCRIPTION : GUI operations were running for very long time causing other CLISH commands to run slowly. RESOLUTION : Improved the GUI discovery performance and optimizations done to reduce time taken by GUI operations. * TRACKING ID: IA-10973 ONE_LINE_ABSTRACT : CLISH commands hang when private NIC fails SYMPTOM : CLISH commands hang when private NIC fails DESCRIPTION : In CLISH commands, the connectivity of the nodes was checked using the status of the nodes in the cluster. If private NIC on which IP address is plumbed is down, the communication between the nodes is lost leading the commands getting hung. RESOLUTION: Changed the logic to check for status of private NIC rather than Node Status. * TRACKING ID: IA-10942 ONE_LINE_ABSTRACT : Linux network OS tunables not persistent across reboots SYMPTOM : Linux network OS tunables not persistent across reboots DESCRIPTION : The network tunables on the cluster need to be changed for better performance. But, those were not persistent across the reboots. Access init scripts were changing the tunables to default state. RESOLUTION: Modified code to make sure Access init scripts are changing the tunables to values recommended for better performance in FSS environment. * TRACKING ID: IA-11338 ONE_LINE_ABSTRACT : File system creation failing if the number of Volume objects getting created are very high. SYMPTOM : File System creation failed with “memory allocation” error DESCRIPTION : If number of volume objects getting created in the Access environment are too high, it may reach the memory limit leading to the memory allocation failure for file system getting created. RESOLUTION : Modified the internal memory limit to make sure File system creation will not fail with the memory allocation failure. * TRACKING ID: IA-9840 ONE_LINE_ABSTRACT : Cluster reboot all leaving FSS cluster in inconsistent state SYMPTOM : Cluster reboot all leaving cluster in inconsistent state DESCRIPTION : When all the nodes in the environment are rebooted, because of the issues with the startup scripts and service group dependency issues, cluster services were not coming online in proper order leading to inconsistent state. Many of the services used to be in W_ONLINE/FAILED/FAULTED state because of this issue after "cluster reboot all". RESOLUTION : Fixed the service group dependencies and strtup scripts to online cluster services in proper order. * TRACKING ID: IA-11405 ONE_LINE_ABSTRACT : Some of the Plexes in the volumes may remain in IOFAIL state after reboot. SYMPTOM : Some of the Plexes in the volumes may remain in IOFAIL state after reboot. DESCRIPTION: When one of the node in cluster is rebooted in FSS environment plexes need to be synced up when the nodes come back up. Because of a bug in recovery, some of the plexes used to remain in IOFAIL state. RESOLUTION : Fixed the issue by triggering recovery for the failed plexes correctly. * TRACKING ID: IA-10946 ONE_LINE_ABSTRACT : NIC failure event was not recorded in the event monitoring. SYMPTOM : NIC failure event was not recorded in the event monitoring. DESCRIPTION : The NIC failure event was not displayed in GUI as well as CLISH. RESOLUTION: Code changes were done to display the NIC failure event both in GUI as well as CLISH. * TRACKING ID: IA-11237 ONE_LINE_ABSTRACT : Inconsistent event monitoring in case NODE offline/online events. SYMPTOM : NODE offline/online events are not displayed in CLISH but shown in GUI. DESCRIPTION: Event reporting was missing in case of CLISH event monitoring framework. RESOLUTION: Code modified to make event monitoring framework consistent across GUI and CLISH * TRACKING ID: IA-10375 ONE_LINE_ABSTRACT : Unable to online IP address on newly added node, if any filesystem has quota set on it SYMPTOM: Unable to online IP address on newly added node, if any filesystem has quota set on it DESCRIPTION : IP address was not coming online on the newly added node if any filesystem had user and group quota set before adding the node. RESOLUTION: Code changes done to update VCS configuration file while adding the new node * TRACKING ID: IA-11329 ONE_LINE_ABSTRACT : Add node failing if the existing node has VLAN and bond configured. SYMPTOM: Add node may fail if the existing cluster has bond and VLAN configured. DESCRIPTION: During addnode operation networking was not getting configured correctly, because of which after addnode may fail or networking might not be configured correctly on the newly added node. RESOLUTION: Fix done to perform network configuration correctly during addnode. * TRACKING ID: IA-11058 ONE_LINE_ABSTRACT :Recursive empty directories created in /shared/knfsv4 after a node reboots multiple times SYMPTOM: Recursive empty directories created in /shared/knfsv4 after a node reboots multiple times DESCRIPTION: We were force copying directories, without checking if destination existed or not. If destination directories already exists while force copying directories (cp -rf src_dir dest_dir), entire src_dir is copied inside dest_dir, i.e. now dest_dir contains it’s original contents as well as src_dir and all its subdirectories. This results in nested subdirectories structure. RESOLUTION: Modified the code to add the check if destination directory exists before copying. * TRACKING ID: IA-11034 ONE_LINE_ABSTRACT: striped-mirrored volumes are created with DCO by default SYMPTOM: striped-mirrored volumes are created with DCO by default DESCRIPTION: When creating FS with mirrored configurations, volumes are created with DCO and detach map activated. RESOLUTION: We created volumes with logtype=none so no DCO is created. * TRACKING ID: IA-11051 ONE_LINE_ABSTRACT:User not be able to set WORM retention. SYMPTOM: User not be able to set WORM retention. DESCRIPTION: When setting WORM retention for a particular directory via CLISH, it fails giving stack trace. RESOLUTION: There is an undefined variable and this has been fixed to set WORM retention * TRACKING ID: IA-11072 ONE_LINE_ABSTRACT: Volume recoveries started after cluster stop operations. SYMPTOM: Volume recoveries started after cluster stop operations. DESCRIPTION: In cases where we have to update kernel packages, we need a clean way to bring the cluster to a stop and perform the maintenance activity. RESOLUTION: We have added command in CLISH that can stop either the entire cluster or just a node. Command is “cluster stop nodename|all” * TRACKING ID: IA-11307 ONE_LINE_ABSTRACT:User not able to destroy FS in an isolated pool. SYMPTOM: User not able to destroy FS in an isolated pool. DESCRITION: There is an unknown error displayed when trying to destroy FS in an isolated pool. RESOLUTION: Code has been fixed to do this. * TRACKING ID: IA-10379 ONE_LINE_ABSTRACT : sosreport is not collected in evidences SYMPTOM: sosreport is not collected in evidences DESCRIPTION: When collecting the debuginfo sosreport is not collected is not getting collected as part of evidences. RESOLUTION: Fixed the code to collect the sosreport * TRACKING ID: IA-11402 ONE_LINE_ABSTRACT : Display events related to disk/plex similar to GUI in clish also. SYMPTOM: The events for Disk/Plex were seen only in GUI and not in clish. DESCRIPTION: Clish did not have events related to Disk offline/online and Plex failure. Similar events could be seen in GUI which was an inconsistent behaviour. RESOLUTION: Code changes done to display events related to Disk/Plex similar to GUI in Clish also. * TRACKING ID: IA-9847 ONE_LINE_ABSTRACT : vxddladm addjbod was leading to random devices having udid_mismatch SYMPTOM: After executing vxddladm addjbod command, random devices were having false udid_mismatch. DESCRIPTION: Because of the new changes added to support option “localdisks=yes”, garbage value was getting added to UDID of the device. This lead to the disk having inconsistent on disk and ASL UDID leading to udid_mismatch flag. RESOLUTION: Code changes have been made to avoid addition of garbage value to UDID. * TRACKING ID: IA-11502 ONE_LINE_ABSTRACT : Fix Corruption issue for Erasure coded volume after cluster restart SYMPTOM: Data on Erasure Coded volume may get corrupted after restarting the cluster. DESCRIPTION: After cluster restart, during log replay operation invalid log entries might get replayed resulting in data corruption. RESOLUTION: Code changes done to avoid flush of invalid log entries during log replay to avoid data corruption. KNOWN ISSUES ----------------------------------------- * TRACKING ID: IA-11385 SYMPTOM : Rolling upgrade may fail with CIFS server in online state. WORK-Around : Please stop CIFS server before starting the upgrade. * TRACKING ID: IA-11427 SYMPTOM : GUI is not displaying any data after upgrade operation with error "License is not installed" WORK-Around : To resolve this issue we need to execute following command from node where ManagementConsole service group is online, /opt/VRTSnas/pysnas/bin/isaconfig