Custom Reports Using Data Collectors
Product: Cluster Server
Platform: HP-UX
Product version: 5.0.1
Product component: High Availability
Check category: All
Check category: Availability
Check description: Determines whether the resource owner defined by the User agent attribute exists.
Check procedure:
Check recommendation: Make sure that the User agent attribute specifies a valid UNIX account on the clustered system. To determine if the user name is valid, enter the following command:
# /usr/bin/id user_name.
Check category: Availability
Check description: Compares the checksums of the executable files on all cluster nodes. The node on which the application is currently running (ONLINE) is assumed to be the canonical copy. The check is skipped on the node where the application is currently running. On all other cluster nodes, if the checksum differs, the check fails. If the application is not running (ONLINE) on any node, the check is skipped on all nodes.
Check procedure:
Check recommendation: The checksum of each executable file should be the same on all nodes. Identify the definitive and correct executable file on one of the cluster nodes, then synchronize the files on the remaining failover nodes.
Check category: Availability
Check description: Checks whether the application binaries exist and are executable.
Check procedure:
Check recommendation: Make sure that the scripts specified in the cluster configuration exist and that they are executable on all systems in the cluster.
Check category: Availability
Check description: Checks whether the application user account exists.
Check procedure:
Check recommendation: Make sure that the application user has a valid UNIX login, and that the account is enabled for shell access.
Check category: Availability
Check description: Checks whether all GAB ports are configured on the system.
Check procedure:
Check recommendation: It is recommended that you configure all the GAB ports in the system.
Check category: Availability
Check description: Verifies whether I/O fencing is properly configured for cluster.
Check procedure:
Check recommendation: Either I/O fencing is not running on the system or fencing mode is not configured properly. It is recommended to configure I/O fencing with fencing mode configured as per product requirements when using SFCFS to avoid VxFS file system corruption.
Check category: Availability
Check description: Checks that packages installed across all the nodes in a cluster are consistent
Check procedure:
Check recommendation: Ensure that packages installed on all the nodes in a cluster are consistent and package versions are identical. Inconsistent packages can cause errors in application fail-over.
Check category: Availability
Check description: Checks for the presence of VCS application agents in the system.
Check procedure:
Check recommendation: It is recommended that you install the missing VCS application agents listed in the output details.
Check category: Availability
Check description: Checks for the presence of VCS replication agents on the system.
Check procedure:
Check recommendation: It is recommended that you install the missing VCS replication agents listed in the output details.
Check category: Availability
Check description: Checks whether any VCS resource is marked as non-critical (Critical=0).
Check procedure:
Check recommendation: A group cannot failover to an alternate system unless it has at least one resource marked as critical. Therefore, to ensure maximum high availability, login as root and execute the following command to set the affected resources to critical:
# hares -modify resource_name Critical 1
Check category: Availability
Check description: Checks whether all the disks in the VxVM disk group are visible on the cluster node.
Check procedure:
Check recommendation: Make sure that all VxVM disks have been discovered. Do the following:
1. Run an operating system-specific disk discovery command such as lsdev (AIX), ioscan (HP-UX), fdisk (Linux), or format or devfsadm (Solaris).
2. Run vxdctl enable.
# vxdctl enable.
Check category: Availability
Check description: Checks for valid Volume Manager (VxVM) licenses on the cluster systems.
Check procedure:
Check recommendation: Use the /opt/VRTS/bin/vxlicinst utility to install a valid VxVM license key.
Check category: Availability
Check description: On the local system where the DiskGroup resource is offline, it checks whether the unique disk identifiers (UDIDs) for the disks match those on the online systems.
Check procedure:
Check recommendation: Make sure that the UDIDs for the disks on the cluster nodes match. To find the UDID for a disk, enter the following command:
# vxdisk -s list disk_name.
Note: The check does not handle SRDF replication. In case of SRDF replication, user should make use of 'clearclone=1' attribute (SFHA 6.0.5 onwards) which will clear the clone flag and update the disk UDID.
Check category: Availability
Check description: Verifies that all the disks in the disk group in a campus cluster have site names. Also verifies that all volumes on the disk group have the same number of plexes on each site in the campus cluster.
Check procedure:
Check recommendation: Make sure that the site name is added to each disk in a disk group. To verify the site name, enter the following command:
# vxdisk -s list disk_name
On each site in the campus cluster, make sure that all volumes on the disk group have the same number of plexes. To verify the plex and subdisk information of a volume created on a disk group, enter the following command:
# vxprint -g disk_group.
Check category: Availability
Check description: Checks if the dig binary is present and is executable on the system.
Check procedure:
Check recommendation: Make sure that the dig binary is present in at least one of the following locations:
* /usr/bin/dig
* /bin/dig
* /usr/sbin/dig
To make the dig binary executable, enter the following command:
# chmod +x dig_binary_path.
Check category: Availability
Check description: Checks whether the Transaction Signature (TSIG) key file that is specified in the cluster configuration exists, is readable, and has a non-zero size.
Check procedure:
Check recommendation: Make sure that the TSIG key file exists and is a non-zero sized file. To make the file readable, enter the following command:
# chmod +r absolute_key_file_path.
Check category: Availability
Check description: Checks if stealth masters can reply to a Start of Authority (SOA) query for the configured domain.
Check procedure:
Check recommendation: Make sure that you configure the StealthMasters and Domain attributes with the correct values, and that the following SOA query for the domain works properly:
# dig @stealth_master -t SOA domain_name.
Check category: Availability
Check description: Checks if any VCS agents have faulted and are not running.
Check procedure:
Check recommendation: VCS resources that belong to a type whose agent has faulted are not monitored. To restart the agent, as root do the following:
1. Start the agent: # haagent -start Agent -sys node
2. Confirm that the agent has restarted by:
i.Checking the engine log: /var/VRTSvcs/log/engine_A.log
ii. Doing:# ps -ef | grep Agent.
Check category: Availability
Check description: Checks whether any VCS resources are in a FAULTED state.
Check procedure:
Check recommendation: A group cannot failover to a system where the VCS resource has faulted. Fix the problem and use the following command to clear the FAULTED resource state:
hares -clear resource -sys node
Check category: Availability
Check description: Checks whether the network interface that is specified in the cluster configuration exists on the system.
Check procedure:
Check recommendation: In the cluster configuration, make sure you specify the correct network device.
Check category: Availability
Check description: Checks whether the route to the IP address exists on the network interface specified in the cluster configuration.
Check procedure:
Check recommendation: On the associated network device, add the route to the specified IP address.
Check category: Availability
Check description: Checks whether a valid fsck policy has been specified for all the Mount resources that are in the offline state to automatically recover the file systems.
Check procedure:
Check recommendation: Set the FsckOpt attribute for the affected Mount resource to either -Y (fix errors during fsck) or -N (do not fix errors during fsck).
Check category: Availability
Check description: Checks whether the specified mount point is available for mounting after failover happens.
Check procedure:
Check recommendation: If the mount point is mounted, unmount it. Enter the following command: # umount mount_point.
Check category: Availability
Check description: Verifies that the available mount point is not configured to mount a file when the system starts.
Check procedure:
Check recommendation: On a cluster node, make sure that the operating system-specific file system table file does not contain an entry for the mount point. These files are /etc/filesystems (AIX), /etc/fstab (HP-UX and Linux), and /etc/vfstab (Solaris).
Check category: Availability
Check description: Checks whether the specified mount point existing on a cluster node is available for mounting.
Check procedure:
Check recommendation: Create the specified mount point, and make sure that is it not in use.
Check category: Availability
Check description: Checks whether the File System (VxFS) installed on the cluster system where the Mount resource is currently offline has a valid license.
Check procedure:
Check recommendation: Use the /opt/VRTS/bin/vxlicinst utility to install a valid VxFS license on the target cluster systems.
Check category: Availability
Check description: Checks whether the lock directory specified in the cluster configuration is on shared storage.
Check procedure:
Check recommendation: Make sure that the directory specified in the LocksPathName attribute is on shared storage.
Check category: Availability
Check description: Verifies that the NFS server does not start automatically when the system starts.
Check procedure:
Check recommendation: In the system configuration file, disable the NFS server so the NFS daemons do not start when the system boots. On Solaris 10 and later, make sure the svcadm command does not start the NFS daemon when the system boots.
Check category: Availability
Check description: Checks whether the UP flag is set for the network interface specified in the cluster configuration.
Check procedure:
Check recommendation: Make sure that you configure the Device attribute of the NIC resource type to a network interface that is configured on the system with the UP flag set. To set the UP flag on a configured device, use following command:!!Linux:!!# ip link set device_name up!!Solaris/AIX/HP:!!For IPv4:!!# ifconfig device_name inet up!!For IPv6:!!# ifconfig device_name inet6 up.
Check category: Availability
Check description: Checks whether the ToleranceLimit attribute has been set for the VCS NIC resource type.
Check procedure:
Check recommendation: Setting the ToleranceLimit to a non-zero value prevents false failover in the case of a spurious network outage. To set the ToleranceLimit for the NIC resource type, login as root and enter the following command:
# hatype -modify NIC ToleranceLimit n
where n > 0.
Because this command prevents an immediate failover and may compromise the high availability of the affected resource groups, only use this command on transient networks.
Check category: Availability
Check description: Checks whether the ORACLE_HOME directory location specified in the cluster configuration exists on the system.
Check procedure:
Check recommendation: Ensure that the target cluster system is configured to mount ORACLE_HOME.
Check category: Availability
Check description: Checks whether the user ID (UID) and group ID (GID) of the owner specified in the Oracle owner attribute match the UID and GID of the owner on the VCS node.
Check procedure:
Check recommendation: Make sure that the UID and GID of the Oracle owner match those specified for the owner on the VCS node.
Check category: Availability
Check description: Verifies that the parameter file that is specified in the Oracle agent PFile or SPFile attribute exists.
Check procedure:
Check recommendation: Make sure that the parameter file (PFile or SPFile) exists, which is specified in the cluster configuration.
Check category: Availability
Check description: Identifies and logs the application's checksum. The application is defined in the PathName attribute.
Check procedure:
Check recommendation: Make sure that the script specified in the PathName attribute value exists, and that it is executable on all systems in the cluster.
Check category: Availability
Check description: Checks if the path specified by the PathName attribute exists on the cluster node. If the path does not exist locally, the check determines if a Mount resource with a corresponding mount point is available to ensure that the path is on shared storage.
Check procedure:
Check recommendation: Make sure that the shared directory specified in the Share resource configuration exists either locally or through a Mount resource with a corresponding mount point.
Check category: Availability
Check description: Checks if the checksums of the VCS triggers are the same on all the nodes of the cluster.
Check procedure:
Check recommendation: Verify that the specified binaries in /opt/VRTSvcs/bin/triggers are identical on all nodes in the cluster.
Check category: Availability
Check description: Checks if installed VCS triggers are executable.
Check procedure:
Check recommendation: Ensure that the triggers installed in /opt/VRTSvcs/bin/triggers are executable by the root user.
Check category: Availability
Check description: Checks whether all the disks are visible to all the nodes in a cluster.
Check procedure:
Check recommendation: Make sure that all the disks are connected to all the nodes in a cluster. Run operating system-specific disk discovery commands such as lsdev (AIX), ioscan (HP-UX), fdisk (Linux) or devfsadm (Solaris).
If the disks are not visible, connect the disks to the nodes.
Check category: Availability
Check description: Checks whether duplicate disk groups are configured on the specified nodes.
Check procedure:
Check recommendation: To facilitate successful failover, make sure that there is only one disk group name configured for the specified node. To list the disk groups on a system, enter the following command:
# vxdg list
Check category: Availability
Check description: Checks if free swap space is below the threshold value specified in the sortdc.conf file: !param!HC_VFD_CHK_FREE_SWAP_THRESHOLD!/param!.
Check procedure:
Check recommendation: Increase the swap space by adding an additional swap device.
Check category: Availability
Check description: Checks if the GAB_START entry in the GAB configuration file is set to 1.
Check procedure:
Check recommendation: Make sure that the GAB_START entry in the GAB configuration file is set to 1 so that the GAB module is enabled and ready to startup after system reboot.
Check category: Availability
Check description: Checks if the LLT_START entry in the LLT configuration file is set to 1.
Check procedure:
Check recommendation: Make sure that the LLT_START entry in the LLT configuration file is set to 1 so that the LLT module is enabled and ready to startup after system reboot.
Check category: Availability
Check description: Checks whether the nodes in a cluster have the same operating system, operating system version, and operating system patch level. These attributes must be identical on all systems in a VCS cluster.
Check procedure:
Check recommendation: Use operating system-specific command to verify that the nodes in a cluster have the same operating system, version, and patch level. For example, 'uname -a', 'oslevel' (AIX).
Check category: Availability
Check description: Checks if the VCS cluster ID is a non-zero value.
Check procedure:
Check recommendation: In the /etc/llttab file, set the VCS cluster ID to a unique, non-zero integer less than or equal to 65535.
Check category: Availability
Check description: The ClusterAddress attribute is a prerequisite for GCO. This check verifies if the ClusterAddress Cluster attribute is set, and it is the same as the virtual IP address in the ClusterService service group.
Check procedure:
Check recommendation: Set the value of the Address attribute of the webip resource to that of the ClusterAddress Cluster atttribute.As root :
1. Get the value of the ClusterAddress Cluster attribute:
# haclus -value ClusterAddress -localclus
2. Modify the Address attribute of the webip resource:
# haconf -makerw
# hares -modify webip Address address
# hares -dump -makero, where address is the output of the first command.
Check category: Availability
Check description: If the ClusterService service group is configured, it verifies that its OnlineRetryLimit is set.
Check procedure:
Check recommendation: Set the OnlineRetryLimit for the ClusterService service group.Enter:
hagrp -modify ClusterService OnlineRetryLimit N
where N >= 1
Check category: Availability
Check description: Checks whether the existing cluster configuration at the !param!HC_VFD_CHK_VCS_CONFIG_DIR!/param! directory can be used to start VCS on a system.
Check procedure:
Check recommendation: Fix the VCS configuration located at the directory specified by the HC_VFD_CHK_VCS_CONFIG_DIR parameter in the sortdc.conf on the cluster node.
Check category: Availability
Check description: Checks if any of the private links in the cluster are in the jeopardy state.
Check procedure:
Check recommendation: 1. Determine the connectivity of this node with the remaining nodes in the cluster. Enter:
/sbin/lltstat -nvv
If the status is DOWN this node cannot see that link to the other node(s).
2. Restore connectivity through this private link.
3. Verify that connectivity has been restored. Enter:
# /sbin/gabconfig -a | /bin/grep jeopardy
If this command does not have any output, the link has been restored.
Check category: Availability
Check description: Checks if the VCS configuration is read-only.
Check procedure:
Check recommendation: Close the cluster and save any configuration changes. As root, execute:
# haconf -dump -makero
Check category: Availability
Check description: The VCS sysname defined in the /etc/VRTSvcs/conf/sysname file must be identical to the node name defined in the /etc/llttab file against the set-node attribute. This check also verifies that /etc/llthosts is consistent across all nodes of the cluster.
Check procedure:
Check recommendation: Make the contents of /etc/VRTSvcs/conf/sysname identical to node name defined in the /etc/llttab file against the set-node attribute.
Check category: Availability
Check description: Checks if each cluster that is discovered in the set of input nodes has a unique name.
Check procedure:
Check recommendation: Cluster names should be unique. Change cluster names in case you plan to set up the Global Cluster Option(GCO) between clusters with identical cluster names.
Check category: Availability
Check description: Checks whether any VCS resource has been disabled.
Check procedure:
Check recommendation: Enable the VCS resource. Login as root and execute the following commands:
# haconf -makerw
# hares -modify resource_name Enabled 1
# haconf -dump -makero.
Check category: Availability
Check description: Checks whether any VCS service group with an enabled resource has been persistently frozen.
Check procedure:
Check recommendation: Enable all VCS resource in the service group. As root:
1. Enable VCS resources in the service group:
# haconf -makerw
# hares -modify resource_name Enabled 1
2. Unfreeze the VCS service group:
# hagrp -unfreeze group_name -persistent
# haconf -dump -makero
Check category: Availability
Check description: Checks if the values of the VCS resource attributes for virtual hosts or address exist locally in the /etc/hosts file of the system. It is useful for name resolution in case of network connectivity loss to the DNS server.
Check procedure:
Check recommendation: Add the value of specified VCS resource attributes to the system /etc/hosts file.
Check category: Availability
Check description: Checks whether HttpDir and ConfFile exist.
Check procedure:
Check recommendation: Make sure that HttpDir is a valid directory and that the ConfFile file exists on the clustered system.
Check category: Best practices
Check description: Checks whether the installed Storage Foundation / InfoScale products are at the latest software patch level.
Check procedure:
Check recommendation: To avoid known risks or issues, it is recommended that you install the latest versions of the Storage Foundation / InfoScale products on the system.
Check category: Best practices
Check description: Checks whether the IfconfigTwice attribute for the VCS IP resource type is set to 1. Setting the attribute to 1 ensures that when the IP address is brought online or failed over, the system sends multiple Address Resolution Protocol (ARP) packets to the network clients. Sending multiple packets reduces the risk of connection problems after a failover event.
Check procedure:
Check recommendation: Make sure you set the IfconfigTwice attribute has been set to a value of 1 or larger.
Check category: Best practices
Check description: Checks whether the NetworkHosts attribute for the VCS NIC resource type has been configured. This attribute specifies the list of hosts that are pinged to determine if the network is active. If you do not specify this attribute, the agent must rely on the NIC broadcast address. This causes a flood in network traffic.
Check procedure:
Check recommendation: Make sure you configure the NetworkHosts attribute with a list of IP addresses that can be pinged to determine if the network connection is active. To set the NetworkHosts attribute for the NIC resource, login as root and enter the following command: # hares -modify res_name NetworkHosts ip_address where ip_address is a space separated list of IP addresses.