Volume Manager on Linux, patch detail

To use SORT, JavaScript must be enabled. How to enable JavaScript.
vm-rhel6_x86_64-5.1SP1PR2P1 Obsolete Go to Download Center to download. The latest patch(es) : sfha-rhel6_x86_64-5.1SP1PR2RP4
Basic information
Release type:	P-patch
Release date:	2011-07-08
OS update support:	None
Technote:	None
Documentation:	None
Popularity:	5314 viewed downloaded
Download size:	16.15 MB
Checksum:	4238392137
Applies to one or more of the following products:
VirtualStore 5.1SP1PR2 On RHEL6 x86-64
Storage Foundation 5.1SP1PR2 On RHEL6 x86-64
Storage Foundation Cluster File System 5.1SP1PR2 On RHEL6 x86-64
Storage Foundation Cluster File System for Oracle RAC 5.1SP1PR2 On RHEL6 x86-64
Storage Foundation for Oracle RAC 5.1SP1PR2 On RHEL6 x86-64
Storage Foundation HA 5.1SP1PR2 On RHEL6 x86-64
Obsolete patches, incompatibilities, superseded patches, or other requirements:
This patch is obsolete. It is superseded by:	Release date
sfha-rhel6_x86_64-5.1SP1PR2RP4	2013-08-21
vm-rhel6_x86_64-5.1SP1RP3P1 (obsolete)	2012-11-17
sfha-rhel6_x86_64-5.1SP1PR2RP3 (obsolete)	2012-10-02
vm-rhel6_x86_64-5.1SP1RP2P3 (obsolete)	2012-06-13
vm-rhel6_x86_64-5.1SP1RP2P2 (obsolete)	2011-10-28
vm-rhel6_x86_64-5.1SP1RP2P1 (obsolete)	2011-10-19
sfha-rhel6_x86_64-5.1SP1PR2RP2 (obsolete)	2011-09-28
Fixes the following incidents:
2395052, 2410744, 2424140
Patch ID:
VRTSvxvm-5.1.120.100-SP1PR2P1_RHEL6
Readme file
                          * * * READ ME * * *
             * * * Veritas Volume Manager 5.1 SP1 PR2 * * *
                         * * * P-patch 1 * * *
                         Patch Date: 2011-10-18


This document provides the following information:

   * PATCH NAME
   * PACKAGES AFFECTED BY THE PATCH
   * BASE PRODUCT VERSIONS FOR THE PATCH
   * OPERATING SYSTEMS SUPPORTED BY THE PATCH
   * INCIDENTS FIXED BY THE PATCH
   * INSTALLATION PRE-REQUISITES
   * INSTALLING THE PATCH
   * REMOVING THE PATCH


PATCH NAME
----------
Veritas Volume Manager 5.1 SP1 PR2 P-patch 1


PACKAGES AFFECTED BY THE PATCH
------------------------------
VRTSvxvm


BASE PRODUCT VERSIONS FOR THE PATCH
-----------------------------------
   * Veritas Storage Foundation for Oracle RAC 5.1 SP1 PR2
   * Veritas Storage Foundation Cluster File System 5.1 SP1 PR2
   * Veritas Storage Foundation 5.1 SP1 PR2
   * Veritas Storage Foundation High Availability 5.1 SP1 PR2
   * Veritas Storage Foundation Cluster File System for Oracle RAC 5.1 SP1 PR2
   * Symantec VirtualStore 5.1 SP1 PR2


OPERATING SYSTEMS SUPPORTED BY THE PATCH
----------------------------------------
RHEL5 x86-64
RHEL6 x86-64
SLES10 x86-64
SLES11 x86-64


INCIDENTS FIXED BY THE PATCH
----------------------------
This patch fixes the following Symantec incidents:

Patch ID: 5.1.132.100

* 2440015 (Tracking ID: 2428170)

SYMPTOM:
I/O hangs when reading or writing to a volume after a total storage 
failure in CVM environments with Active-Passive arrays.

DESCRIPTION:
In the event of a storage failure, in active-passive environments, 
the CVM-DMP fail over protocol is initiated. This protocol is responsible for 
coordinating the fail-over of primary paths to secondary paths on all nodes in 
the 
cluster.
In the event of a total storage failure, where both the primary paths and 
secondary paths fail, in some situations the protocol fails to cleanup some 
internal structures, leaving the devices quiesced.

RESOLUTION:
After a total storage failure all devices should be un-quiesced, 
allowing the I/Os to fail. The CVM-DMP protocol has been changed to cleanup 
devices, even if all paths to a device have been removed.

* 2477272 (Tracking ID: 2169726)

SYMPTOM:
If a combination of cloned and non-cloned disks for a diskgroup is available at 
the time of import, then the diskgroup imported through vxdg import operation 
contains both cloned and non-cloned disks.

DESCRIPTION:
For a particular diskgroup, if some of the disks are not available during the 
diskgroup import operation and the corresponding cloned disks are present, then 
the diskgroup imported through vxdg import operation contains combination of 
cloned and non-cloned disks.
Example - 
Diskgroup named dg1 with the disks disk1 and disk2 exists on some machine. 
Clones of disks named disk1_clone disk2_clone are also available. If disk2 goes 
offline and the import for dg1 is performed, then the resulting diskgroup will 
contain disks disk1 and disk2_clone.

RESOLUTION:
The diskgroup import operation will consider cloned disks only if no non-cloned 
disk is available.

* 2497637 (Tracking ID: 2489350)

SYMPTOM:
In a Storage Foundation environment running Symantec Oracle Disk Manager (ODM),
Veritas File System (VxFS), Cluster volume Manager (CVM) and Veritas Volume
Replicator (VVR), kernel memory is leaked under certain conditions.

DESCRIPTION:
In CVR (CVM + VVR), under certain conditions (for example when I/O throttling
gets enabled or kernel messaging subsystem is overloaded), the I/O resources
allocated before are freed and the I/Os are being restarted afresh. While
freeing the I/O resources, VVR primary node doesn"t free the kernel memory
allocated for FS-VM private information data structure and causing the kernel
memory leak of 32 bytes for each restarted I/O.

RESOLUTION:
Code changes are made in VVR to free the kernel memory allocated for FS-VM
private information data structure before the I/O is restarted afresh.

* 2497796 (Tracking ID: 2235382)

SYMPTOM:
IOs can hang in DMP driver when IOs are in progress while carrying out path
failover.

DESCRIPTION:
While restoring any failed path to a non-A/A LUN, DMP driver is checking that
whether any pending IOs are there on the same dmpnode. If any are present then DMP
is marking the corresponding LUN with special flag so that path failover/failback
can be triggered by the pending IOs. There is a window here and by chance if all
the pending IOs return before marking the dmpnode, then any future IOs on the
dmpnode get stuck in wait queues.

RESOLUTION:
Make sure that whenever the LUN is having pending IOs then only to set the flag on
it so that failover can be triggered by pending IOs.

* 2507120 (Tracking ID: 2438426)

SYMPTOM:
The following messages are displayed after vxconfigd is started.

pp_claim_device: Could not get device number for /dev/rdsk/emcpower0 
pp_claim_device: Could not get device number for /dev/rdsk/emcpower1

DESCRIPTION:
Device Discovery Layer(DDL) has incorrectly marked a path under dmp device with 
EFI flag even though there is no corresponding Extensible Firmware Interface 
(EFI) device in /dev/[r]dsk/. As a result, Array Support Library (ASL) issues a 
stat command on non-existent EFI device and displays the above messages.

RESOLUTION:
Avoided marking EFI flag on Dynamic MultiPathing (DMP) paths which correspond to 
non-efi devices.

* 2507124 (Tracking ID: 2484334)

SYMPTOM:
The system panic occurs with the following stack while collecting the DMP 
stats.

dmp_stats_is_matching_group+0x314()
dmp_group_stats+0x3cc()
dmp_get_stats+0x194()
gendmpioctl()
dmpioctl+0x20()

DESCRIPTION:
Whenever new devices are added to the system, the stats table is adjusted to
accomodate the new devices in the DMP. There exists a race between the stats
collection thread and the thread which adjusts the stats table to accomodate
the new devices. The race can result the stats collection thread to access the
memory beyond the known size of the table causing the system panic.

RESOLUTION:
The stats collection code in the DMP is rectified to restrict the access to the 
known size of the stats table.

* 2508294 (Tracking ID: 2419486)

SYMPTOM:
Data corruption is observed with single path when naming scheme is changed 
from enclodure based (EBN) to OS Native (OSN).

DESCRIPTION:
The Data corruption can occur in the following configuration, 
when the naming scheme is changed while applications are on-line.

1. The DMP device is configured with single path or the devices are controlled
   by Third party Multipathing Driver (Ex: MPXIO, MPIO etc.,)

2. The DMP device naming scheme is EBN (enclosure based naming) and 
persistence=yes

3. The naming scheme is changed to OSN using the following command
   # vxddladm set namingscheme=osn


There is possibility of change in name of the VxVM device (DA record) while
the naming scheme is changing. As a result of this the device attribute list 
is updated with new DMP device names. Due to a bug in the code which updates 
the attribute list, the VxVM device records are mapped to wrong DMP devices.

Example:

Following are the device names with EBN naming scheme.

MAS-usp0_0   auto:cdsdisk    hitachi_usp0_0  prod_SC32    online
MAS-usp0_1   auto:cdsdisk    hitachi_usp0_4  prod_SC32    online
MAS-usp0_2   auto:cdsdisk    hitachi_usp0_5  prod_SC32    online
MAS-usp0_3   auto:cdsdisk    hitachi_usp0_6  prod_SC32    online
MAS-usp0_4   auto:cdsdisk    hitachi_usp0_7  prod_SC32    online
MAS-usp0_5   auto:none       -            -            online invalid
MAS-usp0_6   auto:cdsdisk    hitachi_usp0_1  prod_SC32    online
MAS-usp0_7   auto:cdsdisk    hitachi_usp0_2  prod_SC32    online
MAS-usp0_8   auto:cdsdisk    hitachi_usp0_3  prod_SC32    online
MAS-usp0_9   auto:none       -            -            online invalid
disk_0       auto:cdsdisk    -            -            online
disk_1       auto:none       -            -            online invalid

bash-3.00# vxddladm set namingscheme=osn

The follwoing is after executing the above command.
The MAS-usp0_9 is changed as MAS-usp0_6 and the following devices
are changed accordingly.

bash-3.00# vxdisk list
DEVICE       TYPE            DISK         GROUP        STATUS
MAS-usp0_0   auto:cdsdisk    hitachi_usp0_0  prod_SC32    online
MAS-usp0_1   auto:cdsdisk    hitachi_usp0_4  prod_SC32    online
MAS-usp0_2   auto:cdsdisk    hitachi_usp0_5  prod_SC32    online
MAS-usp0_3   auto:cdsdisk    hitachi_usp0_6  prod_SC32    online
MAS-usp0_4   auto:cdsdisk    hitachi_usp0_7  prod_SC32    online
MAS-usp0_5   auto:none       -            -            online invalid
MAS-usp0_6   auto:none       -            -            online invalid
MAS-usp0_7   auto:cdsdisk    hitachi_usp0_1  prod_SC32    online
MAS-usp0_8   auto:cdsdisk    hitachi_usp0_2  prod_SC32    online
MAS-usp0_9   auto:cdsdisk    hitachi_usp0_3  prod_SC32    online
c4t20000014C3D27C09d0s2 auto:none       -            -            online invalid
c4t20000014C3D26475d0s2 auto:cdsdisk    -            -            online

RESOLUTION:
Code changes are made to update device attribute list correctly even if name of
the VxVM device is changed while the naming scheme is changing.

* 2508418 (Tracking ID: 2390431)

SYMPTOM:
In a Disaster Recovery environment, when DCM (Data Change Map) is active and 
during SRL(Storage Replicator Log)/DCM flush, the system panics due to missing
parent on one of the DCM in an RVG (Replicated Volume Group).

DESCRIPTION:
The DCM flush happens during every log update and its frequency depends on the 
IO load. If the I/O load is high, the DCM flush happens very often and if there 
are more volumes in the RVG, the frequency is very high. Every DCM flush 
triggers the DCM flush on all the volumes in the RVG. If there are 50 volumes, 
in an RVG, then each DCM flush creates 50 children and is controlled by one 
parent SIO. Once all the 50 children are done, then the parent SIO releases 
itself for the next flush. Once the DCM flush of each child completes, it 
detaches itself from the parent by setting the parent field to NULL. It so 
happens that, if the 49th child is done and before it is detaching it from the 
parent, the 50th child completes and releases the parent_SIO for the next DCM 
flush. Before the 49th child detaches, the new DCM flush is started on the same 
50th child. After the next flush is started, the 49th child of the previous 
flush detaches itself from the parent and since it is a static SIO, it 
indirectly resets the new flush parent field. Also, the lock is not obtained 
before modifing the sio state field in a few scenarios.

RESOLUTION:
Before reducing the children count, detach the parent first. This will make 
sure the new flush will not race with the previous flush. Protect the field 
with the required lock in all the scenarios.

* 2511928 (Tracking ID: 2420386)

SYMPTOM:
Corrupted data is seen near the end of a sub-disk, on thin-reclaimable 
disks with either CDS EFI or sliced disk formats.

DESCRIPTION:
In environments with thin-reclaim disks running with either CDS-EFI 
disks or sliced disks, misaligned reclaims can be initiated. In some situations, 
when reclaiming a sub-disk, the reclaim does not take into account the correct 
public region start offset, which in rare instances can potentially result in 
reclaiming data before the sub-disk which is being reclaimed.

RESOLUTION:
The public offset is taken into account when initiating all reclaim
operations.

* 2515137 (Tracking ID: 2513101)

SYMPTOM:
When VxVM is upgraded from 4.1MP4RP2 to 5.1SP1RP1, the data on CDS disk gets
corrupted.

DESCRIPTION:
When CDS disks are initialized with VxVM version 4.1MP4RP2, the no of cylinders
are calculated based on the disk raw geometry. If the calculated no. of
cylinders exceed Solaris VTOC limit (65535), because of unsigned integer
overflow, truncated value of no of cylinders gets written in CDS label.
    After the VxVM is upgraded to 5.1SP1RP1, CDS label gets wrongly written in
the public region leading to the data corruption.

RESOLUTION:
The code changes are made  to suitably adjust the no. of tracks & heads so that
the calculated no. of cylinders be within Solaris VTOC limit.

* 2525333 (Tracking ID: 2148851)

SYMPTOM:
"vxdisk resize" operation fails on a disk with VxVM cdsdisk/simple/sliced layout
on Solaris/Linux platform with the following message:

      VxVM vxdisk ERROR V-5-1-8643 Device emc_clariion0_30: resize failed: New
      geometry makes partition unaligned

DESCRIPTION:
The new cylinder size selected during "vxdisk resize" operation is unaligned with
the partitions that existed prior to the "vxdisk resize" operation.

RESOLUTION:
The algorithm to select the new geometry has been redesigned such that the new
cylinder size is always aligned with the existing as well as new partitions.

* 2531983 (Tracking ID: 2483053)

SYMPTOM:
VVR Primary system consumes very high kernel heap memory and appear to 
be hung.

DESCRIPTION:
There is a race between REGION LOCK deletion thread which runs as 
part of SLAVE leave reconfiguration and the thread which process the DATA_DONE 
message coming from log client to logowner. Because of this race, the flags 
which stores the status information about the I/Os was not correctly updated. 
This used to cause a lot of SIOs being stuck in a queue consuming a large kernel 
heap.

RESOLUTION:
The code changes are made to take the proper locks while updating 
the SIOs" fields.

* 2531987 (Tracking ID: 2510523)

SYMPTOM:
In CVM-VVR configuration, I/Os on "master" and "slave" nodes hang when "master"
role is switched to the other node using "vxclustadm setmaster" command.

DESCRIPTION:
Under heavy I/O load, the I/Os are sometimes throttled in VVR, if number of
outstanding I/Os on SRL reaches a certain limit (2048 I/Os).
When "master" role is switched to the other node by using "vxclustadm setmaster"
command, the throttled I/Os on original master are never restarted. This causes
the I/O hang.

RESOLUTION:
Code changes are made in VVR to make sure the throttled I/Os are restarted
before "master" switching is started.

* 2531993 (Tracking ID: 2524936)

SYMPTOM:
Disk group is disabled after rescanning disks with "vxdctl enable"
command with the console output below,


 <timestamp> pp_claim_device:         0 
 <timestamp> Could not get metanode from ODM database  
 <timestamp> pp_claim_device:         0 
 <timestamp> Could not get metanode from ODM database  

The error messages below are also seen in vxconfigd debug log output,
              
<timestamp>  VxVM vxconfigd ERROR V-5-1-12223 Error in claiming /dev/<disk>: The 
process file table is full. 
<timestamp>  VxVM vxconfigd ERROR V-5-1-12223 Error in claiming /dev/<disk>: The 
process file table is full. 
...
<timestamp> VxVM vxconfigd ERROR V-5-1-12223 Error in claiming /dev/<disk>: The 
process file table is full.

AIX-

DESCRIPTION:
When the total physical memory in AIX machine is greater than or equal to
40GB & multiple of 40GB (like 80GB, 120GB), a limitation/bug in setulimit
function causes an overflowed value set as the new limit/size of the data area,
which results in memory allocation failures in vxconfigd. Creation of the shared
memory segment also fails during this course. Error handling of this case is 
missing in vxconfigd code, hence resulting in error in claiming disks and 
offlining configuration copies which in-turn results in disabling disk group.

AIX-

RESOLUTION:
Code changes are made to handle the failure case on shared memory segment
creation.

* 2552402 (Tracking ID: 2432006)

SYMPTOM:
System intermittently hangs during boot if disk is encapsulated.
When this problem occurs, OS boot process stops after outputing this:
"VxVM sysboot INFO V-5-2-3409 starting in boot mode..."

DESCRIPTION:
The boot process hung due to a dead lock between two threads, one VxVM
transaction thread and another thread attempting a read on root volume 
issued by dhcpagent.  Read I/O is deferred till transaction is finished but
read count incremented earlier is not properly adjusted.

RESOLUTION:
Proper care is taken to decrement pending read count if read I/O is deferred.

* 2553391 (Tracking ID: 2536667)

SYMPTOM:
[04DAD004]voldiodone+000C78 (F10000041116FA08) 
[04D9AC88]volsp_iodone_common+000208 (F10000041116FA08, 
0000000000000000, 
  0000000000000000) 
[04B7A194]volsp_iodone+00001C (F10000041116FA08) 
[000F3FDC]internal_iodone_offl+0000B0 (??, ??) 
[000F3F04]iodone_offl+000068 () 
[000F20CC]i_softmod+0001F0 () 
[0017C570].finish_interrupt+000024 ()

DESCRIPTION:
Panic happened due to accessing a stale DG pointer as DG got deleted before the 
I/O returned. It may happen on cluster configuration where commands generating 
private region i/os and "vxdg deport/delete" commands are executing 
simultaneously on two nodes of the cluster.

RESOLUTION:
Code changes are made to drain private region I/Os before deleting the DG.

* 2563291 (Tracking ID: 2527289)

SYMPTOM:
In a Campus Cluster setup, storage fault may lead to DETACH of all the
configured site. This also results in IOfailure on all the nodes in the Campus
Cluster.

DESCRIPTION:
Site detaches are done on site consistent dgs when any volume in the dg looses
all the mirrors of a Site. During the processing of the DETACH of last mirror in
a site we identify that it is the last mirror and DETACH the site which in turn
detaches all the objects of that site.

In Campus Cluster setup we attach a dco volume for any data volume created on a
site-consistent dg. The general configuration is to have one DCO mirror on each
site. Loss of a single mirror of the dco volume on any node will result in the
detach of that site. 

In a 2 site configuration this particular scenario would result in both the dco
mirrors being lost simultaneously. While the site detach for the first mirror is
being processed we also signal for DETACH of the second mirror which ends up
DETACHING the second site too. 

This is not hit in other tests as we already have a check to make sure that we
do not DETACH the last mirror of a Volume. This check is being subverted in this
particular case due to the type of storage failure.

RESOLUTION:
Before triggering the site detach we need to have an explicit check to see if we
are trying to DETACH the last ACTIVE site.

* 2574840 (Tracking ID: 2344186)

SYMPTOM:
In a master-slave configuration with FMR3/DCO volumes, reboot of a cluster node 
fails to join back the cluster again with following error messages in the console

[..]
Jul XX 18:44:09 vienna vxvm:vxconfigd: [ID 702911 daemon.error] V-5-1-11092 
cleanup_client: (Volume recovery in progress) 230
Jul XX 18:44:09 vienna vxvm:vxconfigd: [ID 702911 daemon.error] V-5-1-11467 
kernel_fail_join() :                Reconfiguration interrupted: Reason is 
retry to add a node failed (13, 0)
[..]

DESCRIPTION:
VxVM volumes with FMR3/DCO have inbuilt DRL mechanism to track the disk block of 
in-flight IOs in order to recover the data much quicker in case of a node crash. 
Thus, a joining node awaits the variable, responsible for recovery, to get unset 
to join the cluster. However, due to a bug in FMR3/DCO code, this variable was set 
forever, thus leading to node join failure.

RESOLUTION:
Modified the FMR3/DCO code to appropriately set and unset this recovery variable.


INSTALLING THE PATCH
--------------------
(rhel5 x86_64)
# rpm -Uhv VRTSvxvm-5.1.132.100-SP1RP2P1_RHEL5.x86_64.rpm

(rhel6 x86_64)
# rpm -Uhv VRTSvxvm-5.1.132.100-SP1RP2P1_RHEL6.x86_64.rpm

(sles10 x86_64)
# rpm -Uhv VRTSvxvm-5.1.132.100-SP1RP2P1_SLES10.x86_64.rpm

(sles11 x86_64)
# rpm -Uhv VRTSvxvm-5.1.132.100-SP1RP2P1_SLES11.x86_64.rpm


REMOVING THE PATCH
------------------
# rpm -e  <rpm-name>


SPECIAL INSTRUCTIONS
--------------------
NONE


OTHERS
------
NONE