README VERSION : 1.1 README CREATION DATE : 2012-09-20 PATCH-ID : PHCO_43069 PATCH NAME : VRTSvxfen 5.1 SP1RP2 BASE PACKAGE NAME : Veritas I/O Fencing by Symantec BASE PACKAGE VERSION : VRTSvxfen 5.1SP1 SUPERSEDED PATCHES : NONE REQUIRED PATCHES : NONE INCOMPATIBLE PATCHES : NONE SUPPORTED PADV : hpux1131 (P-PLATFORM , A-ARCHITECTURE , D-DISTRIBUTION , V-VERSION) PATCH CATEGORY : OTHER PATCH CRITICALITY : OPTIONAL HAS KERNEL COMPONENT : NO ID : NONE REBOOT REQUIRED : NO PATCH INSTALLATION INSTRUCTIONS: -------------------------------- Please refer the release notes for installation instructions PATCH UNINSTALLATION INSTRUCTIONS: ---------------------------------- Please refer the release notes for un-installation instructions. SPECIAL INSTRUCTIONS: --------------------- NONE SUMMARY OF FIXED ISSUES: ----------------------------------------- 2708626 (2710892) Node is unable to join the fencing cluster after reboot, due to a snapshot mismatch. 2814826 (2812400) If the agile disk naming scheme is used, vxfen fails to start. 2815240 (2531558) graceful shutdown of node should not trigger race condition on peer 2855757 (2855755) VxFEN might fail to start or online coordination point replacement (OCPR) might fail if a CP server used as a coordination point for the first time and not reachable that time. SUMMARY OF KNOWN ISSUES: ----------------------------------------- KNOWN ISSUES : -------------- FIXED INCIDENTS: ---------------- PATCH ID:PHCO_43069 * INCIDENT NO:2708626 TRACKING ID:2710892 SYMPTOM: One or more nodes may not be able to join an already running fencing cluster. DESCRIPTION: One or more nodes may not be able to join an already running fencing cluster. This is because, at startup, fencing verifies whether the node that is joining sees exactly the same coordination points and in the same sequence as the nodes already in the cluster. If the locale of the node that is joining is different from the locale of nodes already in the cluster, then the sequence of coordination points in the coordination point snapshot can differ, which leads to a snapshot mismatch error and fencing does not start. RESOLUTION: In order to make sure that any node can join an already running cluster irrespective of its locale, Symantec has added checks which ensure that the sequence of coordination points in the snapshot does not change across locales. * INCIDENT NO:2814826 TRACKING ID:2812400 SYMPTOM: vxfen fails to start if the agile disk naming scheme is used. DESCRIPTION: vxfen fails to start if the agile disk naming scheme is used in HP- UX. This is because the disk access name displayed in the first column of the 'vxdisk list' command output was used to find the name of coordinator disk. The disk access name in this output was prepended with the vxdmp device path ('/dev/vx/rdmp') and was used as the coordinator disk. This disk access name differs from the actual name of the disk in the '/dev/vx/rdmp/' location and so the SCSI ioctl system call fails. RESOLUTION: This issue is fixed and now the 'devicetag' field of the disk from the 'vxdisk list ' command output is used. This is the name of the device without any slice information and it is also the same name that is present in the 'dev/vx/rdmp' location. * INCIDENT NO:2815240 TRACKING ID:2531558 SYMPTOM: Graceful shutdown of a node no longer triggers I/O fencing race condition on peer nodes DESCRIPTION: In the earlier releases, a gracefully leaving node clears its I/O fencing keys from coordination points. But the remaining sub-cluster races against the gracefully leaving node to remove its registrations from the data disks. During this operation, if the sub-cluster loses access to the coordination points, the entire cluster may panic if the racer loses the race for coordination points. RESOLUTION: In this release, this behavior has changed. When a node leaves gracefully, the CVM or other clients on that node are stopped before the VxFEN module is unconfigured. Hence, data disks are already clear of its keys. The remaining sub-cluster tries to clear the gracefully leaving node's keys from the coordination points but does not panic if it is not able to clear the keys. * INCIDENT NO:2855757 TRACKING ID:2855755 SYMPTOM: vxfen might fail to start or online coordination point migration might fail if CP server is used as a coordination point for the first time for a node. DESCRIPTION: The UID database file used to cache the CP server information locally in a node of a cluster. If the UID database file is absent in the system some operation on that file fails. This is true when for the first time the file is created in the node. RESOLUTION: In this release, this behavior has changed. We now handle the absence of the file by creating the file before accessing it for the first time. INCIDENTS FROM OLD PATCHES: --------------------------- Patch Id::PHCO_42254 * Incident no::2175599 Tracking ID ::2203068 Symptom::In a 64-node configuration, when the Veritas fencing module (VxFEN) starts in the customized mode, it starts only on 33 nodes. On the nodes where VxFEN fails to start, the following error appears: V-11-2-1043 Detected a preexisting split brain. Unable to join cluster. Description::VxFEN's user mode process, vxfend, uses a limited size buffer to store the snapshot of cluster membership. This buffer can only accommodate the snapshot of up to 33 nodes. Resolution::Symantec has updated vxfend code to use a larger buffer size so that it can accommodate the cluster membership snapshot of all the nodes. * Incident no::2276622 Tracking ID ::2252385 Symptom::The VxFEN (fencing) component of Veritas Cluster Server (VCS) fails to start using coordinator disks from certain disk arrays. Even if you configure multiple coordinator disks, the component displays the following error message: "V-11-2-1003 At least three coordinator disks must be defined" The vxfentsthdw utility does not report any problem with the coordinator disks. However, if you run the SCSI-extended inquiry command "vxfenadm -i " on the disks, it reports same serial number for all the coordinator disks. In contrast, the VxVM component's utilities report different and unique serial numbers for the same coordinator disks. Description::You may face this problem with some disk arrays that are running more recent firmware. On such disk arrays, the fencing component does not retrieve the correct serial number of a disk via SCSI-extended inquiry command. The SCSI-extended inquiry from the fencing component is based on an old SPC-3 specification. According to the old specification, vendor-specific information on page 0x83 of a disk may contain multiple, but globally unique serial numbers associated with the addressed logical unit. The fencing component picks one serial number out of all available serial numbers for a disk. On disk arrays with new firmware, the SCSI information on page 0x83 of a disk may contain several serial numbers associated with different entities. These entities include: o The addressed logical unit o The SCSI target device that contains the addressed logical unit o The target port that receives the request The current issue occurs when the fencing component incorrectly picks up the serial number associated with the target port, which is the same for all the configured coordinator disks. Resolution::Symantec has updated the fencing library code to pick the serial number associated with the addressed logical unit, when running an SCSI-extended inquiry on a disk array. The fencing component now ignores the serial numbers associated with the other entities in the array. * Incident no::2366201 Tracking ID ::2365394 Symptom::When a VCS node starts up, if even one of the coordination points for the Veritas fencing module (VxFEN) is inaccessible to the node, then VxFEN fails to start on that node. It is desirable that VxFEN must be able to start on the node as long as a majority of the coordination points are accessible to the node when the node starts up. Description::When a node starts up, the following conditions must occur before VxFEN starts on that node: aC/ VxFEN must be able to get the Universal Unique Identifier (UUID) or serial number of each coordination point specified in the /etc/vxfenmode file. aC/ VxFEN must be able to register the node with a majority of CPs specified in the /etc/vxfenmode file. However, due to accessibility issues, if VxFEN fails to get the UUID or serial number of one of the specified CPs, then VxFEN treats it as a fatal failure. The node cannot then join a cluster or start a cluster. As a result, every coordination point becomes a potential single point of failure, and compromises high availability (HA). Resolution::Symantec has modified the fencing module to fix the issue. Each VCS node now stores the UUIDs or serial numbers of all the coordination points that the node registers with. As a result, if a node is later unable to access a specified coordination point, VxFEN can use the stored UUIDs/serial numbers. By design, the fix works only when a majority of coordination points are accessible to the node when the node starts. At the time of a fencing race, the racer needs to have its keys registered on a majority of coordination points in order to be able to win the race. In order to enable this, fencing is designed to not start if a majority of coordination points are not available at the time of startup. This fix applies only to clusters that use customized fencing. As part of the fix, Symantec has introduced two optional attributes to the /etc/vxfenmode file. db_ignore_list : Specifies the type/s of coordination points for which a node must not store the UUID/serial number. To specify multiple values, use a comma- separated list. VxFEN supports the values "none", "disk", and "server". Note: By default, this feature is available only for coordination point servers. To turn it on for disks, you must set the value of the db_ignore_list to "none". db_entries_limit: Specifies the maximum number of UUIDs/serial numbers that a node can store. The default value for this attribute is 1000. If the default value is used, the node approximately requires 1MB of disk space to store the UUIDs/serial numbers. * Incident no::2382335 Tracking ID ::2208802 Symptom::In a shared diskgroup that contains more than one disk, the 'vxfentsthdw -g ' command fails to map a shared disk correctly to the nodes that share it. Description::When you run the 'vxfentsthdw -g ' command, the Veritas fencing module (VxFEN) uses the serial number of a shared disk to map that disk to the nodes that share it. To determine the serial number of the shared disk, the module runs queries in the /usr/bin/ksh shell on all nodes. On certain platforms, the ksh shell may not exist in the default path. As a result, the queries for the serial number may return a null result. A null result will match with every other serial number queried in the cluster. As a result, the vxfentshdw command incorrectly maps shared disks. Resolution::Symantec has replaced the hardcoded path for the /user/bin/ksh shell to make it appropriate for all operating systems. * Incident no::2382460 Tracking ID ::2209661 Symptom::If you configure the Veritas fencing module (VxFEN) in one of the following two ways, then you may not be able to distinguish certain important messages in the log file. o /etc/vxfenmode file contains 3 or more coordination points with single_cp=1 o /etc/vxfenmode file contains 1 disk as a coordination point with single_cp=1 Description::For the above configurations, certain messages are not appropriately highlighted in the following log file: /var/VRTSvcs/log/vxfen/vxfend_A.log For the first configuration, the log file contains the following important message: Ignoring the single_cp attribute For the second configuration, the log file contains the following important message: Option single_cp is enabled. With this option, Symantec recommends the usage of a Co-ordination Point server protected via SFHA and reachable from each node of this client cluster via multiple completely redundant networks. VxFen detected a single disk. Resolution::Symantec has updated the vxfend code to highlight the above messages with the word WARNING and proper formatting: For the first configuration, the log file displays the following message: *** WARNING: Ignoring the single_cp attribute *** as 3 coordination points *** have been specified. For the second configuration, the log file displays the following message: *** WARNING: Option single_cp is enabled. With this *** option, Symantec recommends the usage of a *** coordination point server protected via SFHA and *** reachable from each node of this client cluster *** via multiple completely redundant networks. VxFen *** detected a single disk. * Incident no::2382559 Tracking ID ::2208792 Symptom::In a cluster where the Veritas fencing module (VxFEN) is running, the vxfenswap utility fails with following error: I/O fencing does not appear to be configured on node Description::When you run the vxfenswap utility, it checks for the VxFEN status on the local node. To determine the status, the utility runs a query in the /usr/bin/ksh shell. On certain platforms, the query may return a null result as the ksh shell may not exist in the default path. The utility therefore concludes that VxFEN is not configured. Resolution::Symantec has replaced the hardcoded path for the /user/bin/ksh shell to make it appropriate for all operating systems. * Incident no::2386326 Tracking ID ::2375203 Symptom::The Veritas fencing module (VxFEN) fails to start and displays the following error message: ERROR V-11-2-1003 At least three coordinator disks must be defined If you run the 'vxfenadm -i 'command, the output indicates the same serial number for different disks. If you run the './etc/vx/diag.d/vxdmpinq -e 1 -p 131 ' command to determine the raw data size, the output indicates that the raw data size is greater than 96. Description::The fencing module runs a SCSI3 query on disk to determine its serial number. The buffer size for the query is 96 KB whereas the size of the output is much larger. Therefore, the serial number of the disk is truncated, and appears to be the same for all disks. Resolution::Symantec has updated the vxfenadm utility to use a larger buffer size for its SCSI3 queries. * Incident no::2394176 Tracking ID ::2350983 Symptom::If you run the vxfenswap utility on a multinode VCS cluster, then after some time, the vxfenswap operation stalls and no output appears on the console. However, the console does not freeze (the system does not hang). If you run the 'ps -ef | grep vxfen' command on every node, the output indicates that the 'vxfenconfig -o modify' process is running on some nodes, but it is not running at least on one node. Description::The vxfenswap utility executes the 'vxfenconfig -o modify' command on each node using ssh or rsh. The utility performs the other tasks related to online replacement of coordination points (OCPR) via broadcast messages among cluster nodes. If the utility is unable to fork the 'vxfenconfig -o modify' command on one of the nodes, the vxfenconfig instance on the other nodes cannot proceed further. This may occur due to intermittent failures in ssh/rsh communication or in network connectivity between the cluster nodes. The vxfenswap utility runs the ' ssh vxfenconfig -o modify &' command on each node, as a result of which the ssh/rsh process runs as a background job and the vxfenswap utility cannot capture the state of the background jobs. This is the root cause of the symptom. As a workaround, you can run the 'vxfenswap -a cancel' command from the console or one of the other nodes. The stalled vxfenswap process resumes , and VxFEN continues to use old coordination points. However, this workaround is not comprehensive. Resolution::Symantec has modified the vxfenswap utility to track the processes related to the 'ssh vxfenconfig -o modify &' command for their exit status. If any process fails, vxfenswap sends a message to the console stating: Failed to validate the new set of coordination points The vxfenswap utility then rolls back the entire OCPR operation to bring the cluster to its normal state. You therefore need not manually run the 'vxfenswap -a cancel' command. * Incident no::2438261 Tracking ID ::2482167 Symptom::The vxfenswap utility fails to change the interaction policy for coordinator disks from SCSI3 raw to SCSI3 dmp by using the '/etc/vxfenmode.test' files. However, you can use the vxfenswap utility to change the policy from SCSI3 dmp to SCSI3 raw. Description::Even when the '/etc/vxfenmode.test' file exists, vxfenswap reads /etc/vxfenmode first, and then reads /etc/vxfenmode.test. The utility must read only the '/etc/vxfenmode.test' file when available. Resolution::Symantec has modified vxfenswap to read only /etc/vxfenmode.test when available. When the test file does not exist, vxfenswap reads /etc/vxfenmode. * Incident no::2382452 Tracking ID ::2209143 Symptom::If you unconfigure Coordination Point Server by using the /opt/VRTScps/bin/configure_cps.pl utility, the following unexpected messages appear: sh: -c: line 0: unexpected EOF while looking for matching `'' sh: -c: line 1: syntax error: unexpected end of file Description::The configure_cps.pl utility contains an irregular apostrophe character that causes the syntax error. Resolution::Symantec has modified the configure_cps.pl utility to resolve the syntax error.