Troubleshooting Volume Replicator performance

To troubleshoot Volume Replicator performance and improve replication, you can perform certain checks which are explained below.

To calculate, check, and improve the replication performance

  1. When the replication is active run the following command at the command prompt. Make sure that you run this command only on the Primary.
    vxrlink -i 5 stats <rlink_name>

    Note the values in the Blocks column. This value indicates the number of blocks that have been successfully sent to the remote node.

  2. Compute replication throughput using the following formula:
    ((# of blocks sent successfully * block size) / stats interval)
    / 1024) KB.

    where block size is 512 bytes.

    Stats interval is the value of the time interval that is specified for the -i parameter with the vxrlink stats command. In the command example the time interval is 5 seconds.

  3. If the throughput computed in step 2 above is not equivalent to the expected throughput, then do the following:
    • Check if the DCM is active by checking the flags field in the output of the following command:

      vxprint -lPV

      If DCM is active, run the following command to resume replication:

      vxrvg -g <diskgroup> resync <rvg>

      You can also perform the Resync operation from VEA by selecting the Resynchronize Secondaries option from the Primary RVG right-click menu. Note that the Secondary becomes inconsistent during the DCM replay.

    • Check if there are any pending writes using the following command:

      vxrlink -i 5 status <rlink_name>

      If the application is not write intensive it is possible that the RLINK is mostly up-to-date, and there are not many pending updates to be sent to Secondary.

      To determine the number of writes that are happening to the data volumes run the Performance Monitor tool. This tool is generally installed when the operating system is installed.

      To launch the tool run perfmon from the command prompt. This launches the performance monitor. Select the (+) button to launch the Add Counters dialog. Select Dynamic Volume from the Performance Object drop-down list and select the Write Block/Sec from the Select counters from list pane.

    • If there are pending writes in the Replicator Log, and replication is not using the expected bandwidth, check the Timeout, Stream, and Memory error columns in the output of the vxrlink stats command.

      If the number of time-out errors are high and the UDP protocol is used for replication, perform the following:

      If the network has a time relay component, change the replication packet size using the following command, to reduce the number of time-out errors and improve the replication throughput:

      vxrlink set packet_size=1400 <rlink_name>

      Some components in the network drop UDP packets larger than the MTU size, suspecting a denial of service (DoS) attack. Changing replication packet size to 1K should improve the performance in this case.

    • If there are a number of memory errors, perform the following:

      Run the vxtune command. The output of the command displays the default values that are set for the following tunables:

      C:\Documents and Settings\administrator.INDSSMG>vxtune
      vol_max_nmpool_sz = 16384 kilobytes
      vol_max_rdback_sz = 8192 kilobytes
      vol_min_lowmem_sz = 1024 kilobytes
      vol_rvio_maxpool_sz = 32768 kilobytes
      compression_window = 0 kilobytes
      max_tcp_conn_count = 64
      nmcom_max_msgs = 512
      max_rcvgap = 5
      rlink_rdbklimit = 16384 kilobytes
      compression_speed = 7
      compression_threads = 10
      msgq_sequence = 1
      vol_maxkiocount = 1048576
      force_max_conn = False
      tcp_src_port_restrict = False
      nat_support = False

      Change the value of the NMCOM_POOL_SIZE (vol_max_nmpool_sz) tunable appropriately. The default (and minimum) value is 4192 (4MB) and maximum is 524288 (512MB).

      After changing this value, restart the system so that the changes take effect.

      Note that the value that is specified for the NMCOM_POOL_SIZE tunable is global to the system. Thus, if the node is a Secondary for two RLINKs (Primary hosts) then the value of the tunable must be set accordingly.