* * * READ ME * * * * * * Veritas CloudPoint 2.1.2.7555 * * * * * * Hot Fix 3 * * * Patch Date: 2019-03-29 This document provides the following information: * PATCH NAME * OPERATING SYSTEMS SUPPORTED BY THE PATCH * BASE PRODUCT VERSION FOR THE PATCH * SUMMARY OF INCIDENTS FIXED BY THE PATCH * DETAILS OF INCIDENTS FIXED BY THE PATCH * INSTALLING THE PATCH * KNOWN ISSUES * NOTE PATCH NAME ---------- Veritas CloudPoint 2.1.2.7555 Hot Fix 3 OPERATING SYSTEMS SUPPORTED BY THE PATCH ---------------------------------------- Ubuntu 16.04 x86-64 RHEL 7.5 BASE PRODUCT VERSION FOR THE PATCH ----------------------------------- * Veritas CloudPoint 2.1.2 SUMMARY OF INCIDENTS FIXED BY THE PATCH --------------------------------------- Patch 2.1.2.7555 STESC-2632 (C3PM-13305) : Policy Jobs are hanging. STESC-2755 (C3PM-13442 & C3PM-13441) : flexsnap-coordinator keeps restarting. STESC-2758 (STESC-2856) : AD Login to CloudPoint fails with "Failed to Verify if user exists in AD. Please Check the AD Configuration parameters". STESC-2824 (C3PM-13306) : Snapshots exist longer than retention. STESC-2855 : Permissions not working as expected. C3PM-13435 : Added code to check for errors after completion of an operation. STESC-2802 : AWS snapshots : Increase timeout to ~15 minutes. STESC-2601 : Correcting typo exceptions to exception. Patch 2.1.2.7546 C3PM-13040 : (C3PM-13041) Allow install when rootfs is mounted in non shared mode Patch 2.1.2.7545: C3PM-12968 : (C3PM-12970) CloudPoint with Proxy Support DETAILS OF INCIDENTS FIXED BY THE PATCH -------------------------------------------- This patch fixes the following JIRA incidents: *STESC-2632 (Tracking ID: C3PM-13305) SYMPTOM: Policy Jobs are hanging. DESCRIPTION: Customer is intermittently seeing hung "Policy Job" jobs and we are looking for RCA. RESOLUTION: 1. GCP plugin code has a bug in delete-snapshot flow which was causing the policy execution to enter into an infinite loop while deleting the snapshots meeting the retention-criteria. Fix the GCP plugin'd delete-snapshot code to resolve it 2. Use a new http-connection for every thread that makes GCP-service calls.The third-party GCP library is not http-connection thread safe. So to make sure that the GCP library code doesnt break at any point, we need to create http-connections for every thread that talks to GCP. STESC-2755 (Tracking ID: C3PM-13442 & C3PM-13441) SYMPTOM: flexsnap-coordinator keeps restarting. DESCRIPTION: Flexsnap-Coordinator failed to clean very old workflow tasks as MongoDB exceeded 32mb memory limit while sorting records in 'flows' collection as there were ~18k records. "OperationFailed: Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit" This caused automatic restart of coordinator thats the resilient nature of coordinator. RESOLUTION: Add an index on a sort field of 'flows' collection, so that MongoDB won't load all collections into memory. Also clean-up the workflow logs every-day instead of keeping them for a long time until co-ordinator restarts. STESC-2758 (Tracking ID: STESC-2856) SYMPTOM: AD Login to CloudPoint fails with "Failed to Verify if user exists in AD. Please Check the AD Configuration parameters" DESCRIPTION: Unable to login to CloudPoint using AD, it fails with "Failed to Verify if user exists in AD. Please Check the AD Configuration parameters". This issue was occurring as customer's AD environment has different email-domain where as user's email has different domain-name. e.g Email domain is - corp.westworlds.com where as user's email is abcd@west.com Existing CP design is using the user's email (e.g.with @west.com) as is to validate the password entered by user in AD and AD is throwing the error saying 'credentials are incorrect' RESOLUTION: Modify the authorization-service to use the user's name+emailDomain (e.g abcd@corp.westworlds.com) to authenticate the user. STESC-2824 (Tracking ID: C3PM-13306) SYMPTOM: Snapshots exist longer than retention. DESCRIPTION: Snapshots exist longer than retention.This could result in over use of capacity, and additional cost to the customer. As per the existing design of CP, if we make any changes to any existing policy, its being treated as a new policy with new groupId. Hence this new policy is not applying the retention criteria on snapshots created by old policy as groupIds are different (though it's same policy before changes). RESOLUTION: Modify the Policy service, which is creating a new groupId for any changes in existing policy and causing this issue, to not create a new ID for any modifications to an existing policy. STESC-2855 (Tracking ID: STESC-2855) SYMPTOM: Permissions not working as expected. DESCRIPTION: Even after assigning full/admin privileges via a role, user is still getting "Need Access?" icon on several places and is unable to see ldap/email settings. RESOLUTION: Authorization service is not updating the role of a user properly. Ensure the role info of a user is getting updated in MongoDB properly if permissions of that user changes. C3PM-13435 (Tracking ID: STESC-13435) SYMPTOM: On failure of GCP operations, error from the logs and error in UI is not maching. DESCRIPTION: Whenever a GCP operation fails, error which user sees in the logs (of agent or coordinator), it is different from the one that UI is showing in respective job from Job-logs UI page. RESOLUTION: Code-changes have been made to ensure that the correct logs are propagated to UI/API. STESC-2802 (Tracking ID: STESC-2802) SYMPTOM: AWS snapshots fail with "Waiter ImageAvailable failed: Max attempts exceeded" DESCRIPTION: AWS snapshots have a default timeout value of 10 minutes. This value needs to be increased. RESOLUTION: Increased the default timeout value to ~15 minutes. STESC-2601 (Tracking ID: STESC-2601) SYMPTOM: PureStorage assets discovery fails with below stack: Unable to Fetch Snapsource: None DESCRIPTION: During asset discovery of PureStorage, fetching assets of purestorage API fails and while logging typo Exceptions was used. Unable to Fetch Snapsource: None Traceback (most recent call last): File "src/purestg/purestg.py", line 102, in __init__ KeyError: None Jan 25 13:45:44 9df855057503 flexsnap-agent-agent.4a2c30d637864ea689604b4c666b1726[1] Thread-2 flexsnap.agent: ERROR - 'Logger' object has no attribute 'exceptions' Traceback (most recent call last): File "/opt/VRTScloudpoint/lib/flexsnap/agent.py", line 562, in list_assets p_assets = p.assets() File "src/purestg/purestg.py", line 435, in assets AttributeError: 'Logger' object has no attribute 'exceptions' RESOLUTION: Corrected the typo and updated the logging messages throughout the PureStorage Plugin. *C3PM-13040 (Tracking ID: C3PM-13041) SYMPTOM: When using Amazon Linux AMI 2018.03 as the CloudPoint host, the deployment fails due to "Configuration Failed. Error Details: Failed to start flexsnap-vic." This occurs because the rootfs or / shared subtree operation is set to private. findmnt / -o TARGET, PROPAGATION TARGET PROPAGATION / private This appears to be a docker limitation when trying to mount a volumes with the shared docker option. DESCRIPTION: CloudPoint deployment fails if mount propagation is not set as shared on host. In such case, vic and onhostagent containers fail to start during deployment due to insufficient mount priviledge. RESOLUTION: Allow CloudPoint installation with specific parameter if user wants to continue and disable indexing & classification features *C3PM-12968 (Tracking ID: C3PM-12970) SYMPTOM: Plugin configuration in CloudPoint within Proxy environment fails with stack below: HTTPSConnectionPool(host='sts.amazonaws.com', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(, 'Connection to sts.amazonaws.com timed out. (connect timeout=60)')) DESCRIPTION: CloudPoint is installed on EC2 instance which is in private network and it is communicating outside through the proxy server. Agent containers are unaware about proxy settings, so the communication with cloud services fail. RESOLUTION: CloudPoint containers would start with these proxy env variables set. When a new plugin is added, new agent container will get spawned with these environment variables set and would be able to communicate with cloud plugins through proxy. User needs to pass all 3 below variables with respective values in the CloudPoint installation command, VX_HTTP_PROXY VX_HTTPS_PROXY NO_PROXY INSTALLING THE PATCH -------------------- I. Before patching: 1. Contact Veritas Technical Support for this Hot Fix. 2. Untar the Hot Fix file to a CloudPoint host. 3. Run the following commands to load the patch. # docker load -i # docker run --rm -it -v /cloudpoint:/cloudpoint -v /var/run/docker.sock:/var/run/docker.sock veritas/flexsnap-cloudpoint:2.1.2.7555 load 4. Ensure that there are no protection policy snapshots or other operations in progress. Estimate the patching time will be 15 minutes to an hour. 5. Log out from the CloudPoint UI. 6. Run the following command as root to stop CloudPoint. # docker run --rm -it -v /cloudpoint:/cloudpoint -v /var/run/docker.sock:/var/run/docker.sock veritas/flexsnap-cloudpoint:2.1.2.7546 stop II. Patching: Run the following command as root. # Check rootfs mount propagation on the CloudPoint host using "findmnt -o TARGET, PROPAGATION -n /" # If above command return mount propagation other than 'shared' then run following command for installation docker run --rm -it -e SKIP_SHARED_MNT=Yes -v /cloudpoint:/cloudpoint -v /var/run/docker.sock:/var/run/docker.sock veritas/flexsnap-cloudpoint:2.1.2.7555 install Else run following command docker run --rm -it -v /cloudpoint:/cloudpoint -v /var/run/docker.sock:/var/run/docker.sock veritas/flexsnap-cloudpoint:2.1.2.7555 install # Run the following command as root when you want to setup CloudPoint with Proxy Environment. docker run -it --rm -v /cloudpoint:/cloudpoint -e VX_HTTP_PROXY= -e VX_HTTPS_PROXY= -e VX_NO_PROXY= -v /var/run/docker.sock:/var/run/docker.sock veritas/flexsnap-cloudpoint:2.1.2.7555 install III. After patching: 1. Refresh your web broswer and log in to the CloudPoint UI. 2. Verify the CloudPoint version. Click on Settings and select About. The following information should show up. Current Version: 2.1.2.7555 3. Verify the CloudPoint data. KNOWN ISSUES ----------- N/A NOTE ---- 1. Roll back to the previous version if needed. a. Log out from the CloudPoint UI. b. Run the following commands as root. # docker run --rm -it -v /cloudpoint:/cloudpoint -v /var/run/docker.sock:/var/run/docker.sock veritas/flexsnap-cloudpoint:2.1.2.7555 stop # docker run --rm -it -v /cloudpoint:/cloudpoint -v /var/run/docker.sock:/var/run/docker.sock veritas/flexsnap-cloudpoint:2.1.2.7555 uninstall # docker run --rm -it -v /cloudpoint:/cloudpoint -v /var/run/docker.sock:/var/run/docker.sock veritas/flexsnap-cloudpoint:2.1.2.7546 install 2. The previous version(s) of Docker container images are not removed. You can remove them to save your CloudPoint instance disk space.