* * * READ ME * * *
             * * * Veritas Volume Manager 5.1 SP1 RP1 * * *
                         * * * P-patch 2 * * *
                         Patch Date: 2012-06-19


This document provides the following information:

   * PATCH NAME
   * PACKAGES AFFECTED BY THE PATCH
   * BASE PRODUCT VERSIONS FOR THE PATCH
   * OPERATING SYSTEMS SUPPORTED BY THE PATCH
   * INCIDENTS FIXED BY THE PATCH
   * INSTALLATION PRE-REQUISITES
   * INSTALLING THE PATCH
   * REMOVING THE PATCH


PATCH NAME
----------
Veritas Volume Manager 5.1 SP1 RP1 P-patch 2


PACKAGES AFFECTED BY THE PATCH
------------------------------
VRTSvxvm
VRTSvxvm


BASE PRODUCT VERSIONS FOR THE PATCH
-----------------------------------
   * Veritas Storage Foundation for Oracle RAC 5.1 SP1
   * Veritas Storage Foundation Cluster File System 5.1 SP1
   * Veritas Storage Foundation 5.1 SP1
   * Veritas Storage Foundation High Availability 5.1 SP1
   * Veritas Dynamic Multi-Pathing 5.1 SP1


OPERATING SYSTEMS SUPPORTED BY THE PATCH
----------------------------------------
HP-UX 11i v3 (11.31)


INCIDENTS FIXED BY THE PATCH
----------------------------
This patch fixes the following Symantec incidents:

Patch ID: PHCO_42992, PHKL_42993

* 2280285 (Tracking ID: 2365486)

SYMPTOM:
In Two nodes SFRAC configuration, after enabling ports when "vxdisk
scandisks" is run, systems panics with following stack: 

PANIC STACK:

.unlock_enable_mem()
.unlock_enable_mem()
dmp_update_path()
dmp_decode_update_dmpnode()
dmp_decipher_instructions()
dmp_process_instruction_buffer()
dmp_reconfigure_db()
gendmpioctl()
vxdmpioctl()
rdevioctl()
spec_ioctl()
vnop_ioctl()
vno_ioctl()
common_ioctl()
ovlya_addr_sc_flih_main()

DESCRIPTION:
Improper order of acquire and release of locks during reconfiguration of DMP
when I/O activity was running parallelly, lead to above panic.

RESOLUTION:
Release the locks in the same order as they in which they are acquired.

* 2532440 (Tracking ID: 2495186)

SYMPTOM:
With TCP protocol used for replication, I/O throttling happens due to
memory flow control.

DESCRIPTION:
In some slow network configuration, the I/O throughput is throttled
back due to the replication I/O.

RESOLUTION:
It is better to keep the replication I/O outside the normal I/O code
path to improve its I/O throughput performance.

* 2563291 (Tracking ID: 2527289)

SYMPTOM:
In a Campus Cluster setup, storage fault may lead to DETACH of all the
configured site. This also results in IOfailure on all the nodes in the Campus
Cluster.

DESCRIPTION:
Site detaches are done on site consistent dgs when any volume in the dg looses
all the mirrors of a Site. During the processing of the DETACH of last mirror in
a site we identify that it is the last mirror and DETACH the site which in turn
detaches all the objects of that site.

In Campus Cluster setup we attach a dco volume for any data volume created on a
site-consistent dg. The general configuration is to have one DCO mirror on each
site. Loss of a single mirror of the dco volume on any node will result in the
detach of that site. 

In a 2 site configuration this particular scenario would result in both the dco
mirrors being lost simultaneously. While the site detach for the first mirror is
being processed we also signal for DETACH of the second mirror which ends up
DETACHING the second site too. 

This is not hit in other tests as we already have a check to make sure that we
do not DETACH the last mirror of a Volume. This check is being subverted in this
particular case due to the type of storage failure.

RESOLUTION:
Before triggering the site detach we need to have an explicit check to see if we
are trying to DETACH the last ACTIVE site.

* 2621549 (Tracking ID: 2621465)

SYMPTOM:
When a failed disk belongs to a site has once again become accessible, it 
cannot be reattached to the disk group.

DESCRIPTION:
As the disk has a site tag name set, 'vxdg adddisk' command invoked 
in 'vxreattach' command needs the option '-f' to add the disk back to the 
disk group.

RESOLUTION:
Add the option '-f' to 'vxdg adddisk' command when it is invoked 
in 'vxreattach' command.

* 2626900 (Tracking ID: 2608849)

SYMPTOM:
1.Under a heavy I/O load on logclient node, write I/Os on VVR Primary logowner
takes a very long time to complete.

2. I/Os on "master" and "slave" nodes hang when "master" role is switched
multiple times using "vxclustadm setmaster" command.

DESCRIPTION:
1.
VVR can not allow more than 2048 I/Os outstanding on the SRL volume. Any I/Os
beyond this threshold will be throttled. The throttled I/Os are restarted after
every SRL header flush operation. During restarting the throttled I/Os, I/Os
came from logclient are given higher priority causing logowner I/Os to starve.

2.
In CVM reconfiguration code path the RLINK ports are not cleanly deleted on old
log-owner. This causes the RLINks not to connect leading to both replication and
I/O hang.

RESOLUTION:
Algorithm which restarts the throttled I/Os is modified to give fair chance to
both local and remote I/Os to proceed.
Additionally, the code changes are made in CVM reconfiguration code path to
delete the RLINK ports cleanly before switching the master role.

* 2626915 (Tracking ID: 2417546)

SYMPTOM:
Raw devices are lost after OS rebooting and also cause permissions issue due to
change in dmpnode permissions from 660 to 600.

DESCRIPTION:
On reboot, while creating raw devices we generate next available device number.
There is counting bug due to which VxVM were ending up creating one less device.
Also it was creating permissions issue due to change in dmpnode permissions.

RESOLUTION:
This issue is addressed by source change wherein correct counters are kept 
and device permissions are changed appropriately.

* 2626920 (Tracking ID: 2061082)

SYMPTOM:
"vxddladm -c assign names" command does not work if dmp_native_support 
tunable is enabled.

DESCRIPTION:
If dmp_native_support tunable is set to "on" then VxVM does not allow change in
name of dmpnodes. This holds true even for device with native support not
enabled like VxVM labeled or Third Party Devices. So there is  no way for
selectively changing name of devices for which native support is not enabled.

RESOLUTION:
This enhancement is addressed by code change to selectively change name for
devices with native support not enabled.

* 2636094 (Tracking ID: 2635476)

SYMPTOM:
DMP (Dynamic Multi Pathing) driver does not automatically enable the failed 
paths of Logical Units (LUNs) that are restored.

DESCRIPTION:
DMP's restore demon probes each failed path at a default interval of 5 minutes 
(tunable) to detect if that path can be enabled. As part of enabling the path, 
DMP issues an open() on the path's device number. Owing to a bug in the DMP
code, the open() was issued on a wrong device partition which resulted in
failure for every probe. Thus, the path remained in failed status at DMP layer
though it was enabled at the array side.

RESOLUTION:
Modified the DMP restore daemon code path to issue the open() on the appropriate
device partitions.

* 2643651 (Tracking ID: 2643634)

SYMPTOM:
If standard(non-clone) disks and cloned disks of the same disk group are seen in
a host, dg import will fail with the following error message when the
standard(non-clone) disks have no enabled configuration copy of the disk group.

# vxdg import <dgname>
VxVM vxdg ERROR V-5-1-10978 Disk group <dgname>: import failed:
Disk group has no valid configuration copies

DESCRIPTION:
When VxVM is importing such a mixed configuration of standard(non-clone) disks
and cloned disks, standard(non-clone) disks will be selected as the member of
the disk group in 5.0MP3RP5HF1 and 5.1SP1RP2. It will be done while
administrators are not aware of the fact that there is a mixed configuration and
the standard(non-clone) disks are to be selected for the import. It is hard to
figure out from the error message and need time to investigate what is the issue.

RESOLUTION:
Syslog message enhancements are made in the code that administrators can figure
out if such a mixed configuration is seen in a host and also which disks are
selected for the import.

* 2666175 (Tracking ID: 2666163)

SYMPTOM:
A small memory leak may be seen in vxconfigd, the VxVM configuration daemon when
Serial Split Brain(SSB) error is detected in the import process.

DESCRIPTION:
The leak may occur when Serial Split Brain(SSB) error is detected in the import
process. It is because when the SSB error is returning from a function, a
dynamically allocated memory area in the same function would not be freed. The
SSB detection is a VxVM feature where VxVM detects if the configuration copy in
the disk private region becomes stale unexpectedly. A typical use case of the
SSB error is that a disk group is imported to different systems at the same time
and configuration copy update in both systems results in an inconsistency in the
copies. VxVM cannot identify which configuration copy is most up-to-date in this
situation. As a result, VxVM may detect SSB error on the next import and show
the details through a CLI message.

RESOLUTION:
Code changes are made to avoid the memory leak and also a small message fix has
been done.

* 2695225 (Tracking ID: 2675538)

SYMPTOM:
Data corruption can be observed on a CDS (Cross-platform Data Sharing) disk, 
as part of LUN resize operations. The following pattern would be found in the 
data region of the disk.

<DISK-IDENTIFICATION> cyl <number-of-cylinders> alt 2 hd <number-of-tracks> sec 
<number-of-sectors-per-track>

DESCRIPTION:
The CDS disk maintains a SUN VTOC in the zeroth block and a backup label at the 
end of the disk. The VTOC maintains the disk geometry information like number of 
cylinders, tracks and sectors per track. The backup label is the duplicate of 
VTOC and the backup label location is determined from VTOC contents. As part of 
resize, VTOC is not updated to the new size, which results in the wrong 
calculation of the backup label location. If the wrongly calculated backup label 
location falls in the public data region rather than the end of the disk as 
designed, data corruption occurs.

RESOLUTION:
Update the VTOC contents appropriately for LUN resize operations to prevent the 
data corruption.

* 2695227 (Tracking ID: 2674465)

SYMPTOM:
Data corruption is observed when DMP node names are changed by following
commands for DMP devices that are controlled by a third party multi-pathing
driver (E.g. MPXIO and PowerPath )

# vxddladm [-c] assign names
# vxddladm assign names file=<path-name>
# vxddladm set namingscheme=<scheme-name>

DESCRIPTION:
The above said commands when executed would re-assign names to each devices.
Accordingly the in-core DMP database should be updated for each device to map
the new device name with appropriate device number. Due to a bug in the code,
the mapping of names with the device number wasn't done appropriately which
resulted in subsequent IOs going to a wrong device thus leading to data 
corruption.

RESOLUTION:
DMP routines responsible for mapping the names with right device number is
modified to fix this corruption problem.

* 2695228 (Tracking ID: 2688747)

SYMPTOM:
Under a heavy I/O load on logclient node, the writes on VVR Primary logowner
takes a very long time to complete. Writes appear to be hung.

DESCRIPTION:
VVR cannot allow more than specific number of I/Os (4096)outstanding on the SRL
volume. Any I/Os beyond this threshold will be throttled. The throttled I/Os are
restarted periodically. While restarting, I/Os belonging logclient get high
preference compared to logowner I/Os, which can eventually lead to starvation or
I/O hang situation on logowner.

RESOLUTION:
Changes are done in algorithm of I/O scheduling of restarted I/Os, it's made
sure that throttled local I/Os will get the chance to proceed under all conditions.

* 2701152 (Tracking ID: 2700486)

SYMPTOM:
If the VVR Primary and Secondary nodes have the same host-name, and there is a
loss of heartbeats between them, vradmind daemon can core-dump if an active
stats session already exists on the Primary node.

Following stack-trace is observed:

pthread_kill()
_p_raise() 
raise.raise()
abort() 
__assert_c99
StatsSession::sessionInitReq()
StatsSession::processOpReq()
StatsSession::processOpMsgs()
RDS::processStatsOpMsg()
DBMgr::processStatsOpMsg()
process_message()
main()

DESCRIPTION:
On loss of heartbeats between the Primary and Secondary nodes, and a subsequent
reconnect, RVG information is sent to the Primary by Secondary node. In this 
case, if a Stats session already exists on the Primary, a STATS_SESSION_INIT 
request is sent back to the Secondary. However, the code was using "hostname" 
(as returned by `uname -a`) to identify the secondary node. Since both the 
nodes had the same hostname, the resulting STATS_SESSION_INIT request was 
received at the Primary itself, causing vradmind to core dump.

RESOLUTION:
Code was modified to use 'virtual host-name' information contained in the 
RLinks, rather than hostname(1m), to identify the secondary node. In a scenario 
where both Primary and Secondary have the same host-name, virtual host-names 
are used to configure VVR.

* 2702110 (Tracking ID: 2700792)

SYMPTOM:
vxconfigd, the VxVM volume configuration daemon may dump a core with the
following stack during the Cluster Volume Manager(CVM) startup with "hares
-online cvm_clus -sys [node]".

  dg_import_finish()
  dg_auto_import_all()
  master_init()
  role_assume()
  vold_set_new_role()
  kernel_get_cvminfo()
  cluster_check()
  vold_check_signal()
  request_loop()
  main()

DESCRIPTION:
During CVM startup, vxconfigd accesses the disk group record's pointer of a
pending record while the transaction on the disk group is in progress. At times,
vxconfigd incorrectly accesses the stale pointer while processing the current
transaction, thus resulting in a core dump.

RESOLUTION:
Code changes are made to access the appropriate pointer of the disk group record
which is active in the current transaction. Also, the disk group record is
appropriately initialized to NULL value.

* 2703370 (Tracking ID: 2700086)

SYMPTOM:
In the presence of "Not-Ready" EMC devices on the system, multiple dmp (path
disabled/enabled) events messages are seen in the syslog

DESCRIPTION:
The issue is that vxconfigd enables the BCV devices which are in Not-Ready state
for IO as the SCSI inquiry succeeds, but soon finds that they cannot be used for
I/O and disables those paths. This activity takes place whenever "vxdctl enable"
or "vxdisk scandisks" command is executed.

RESOLUTION:
Avoid changing the state of the BCV device which is in "Not-Ready" to prevent IO
and dmp event messages.

* 2703373 (Tracking ID: 2698860)

SYMPTOM:
Mirroring a large size VxVM volume comprising of THIN luns underneath
and with VxFS filesystem atop mounted fails with the following error:

Command error
# vxassist -b -g $disk_group_name mirror $volume_name
VxVM vxplex ERROR V-5-1-14671 Volume <volume_name> is configured on THIN luns
and not mounted. Use 'force' option, to bypass smartmove. To take advantage of
smartmove for supporting thin luns, retry this operation after mounting the
volume.
VxVM vxplex ERROR V-5-1-407 Attempting to cleanup after failure ...

Truss output error:
statvfs("<mount_point>", 0xFFBFEB54)              Err#79 EOVERFLOW

DESCRIPTION:
The statvfs system call is invoked internally during mirroring
operation to retrieve statistics information of VxFS file system hosted
on the volume. However, since the statvfs system call only
supports maximum 4294967295 (4GB-1) blocks, so if the total filesystem
blocks are greater than that, EOVERFLOW error occurs. This also results
in vxplex terminating with the errors.

RESOLUTION:
Use the 64 bits version of statvfs i.e., statvfs64 system call to resolve
the EOVERFLOW and vxplex errors.

* 2711758 (Tracking ID: 2710579)

SYMPTOM:
Data corruption can be observed on a CDS (Cross-platform Data Sharing) disk, 
as part of operations like LUN resize, Disk FLUSH, Disk ONLINE etc. The
following pattern would be found in the 
data region of the disk.

<DISK-IDENTIFICATION> cyl <number-of-cylinders> alt 2 hd <number-of-tracks> sec 
<number-of-sectors-per-track>

DESCRIPTION:
The CDS disk maintains a SUN VTOC in the zeroth block and a backup label at the 
end of the disk. The VTOC maintains the disk geometry information like number of 
cylinders, tracks and sectors per track. The backup label is the duplicate of 
VTOC and the backup label location is determined from VTOC contents. If the
content of SUN VTOC located in the zeroth sector are incorrect, this may result
in the wrong 
calculation of the backup label location. If the wrongly calculated backup label 
location falls in the public data region rather than the end of the disk as 
designed, data corruption occurs.

RESOLUTION:
Suppressed writing the backup label to prevent the data corruption.

* 2713862 (Tracking ID: 2390998)

SYMPTOM:
When running'vxdctl'or'vxdisk scandisks'command after the process of migrating 
SAN ports, system panicked, following is the stack trace:
.disable_lock()
dmp_close_path()
dmp_do_cleanup()
dmp_decipher_instructions()
dmp_process_instruction_buffer()
dmp_reconfigure_db()
gendmpioctl()
vxdmpioctl()

DESCRIPTION:
SAN ports migration ends up with two path nodes for the same device number, one 
node marked as NODE_DEVT_USED which means the same device number has been 
reused by another node. When open the dmp device, the actual open count on the 
new node (not marked with NODE_DEVT_USED) is modified. If the caller is 
referencing the old node (marked with NODE_DEVT_USED), it will then modify the 
layered open count on the old node. This results in the inconsistent open 
reference counts of the node and cause panic while checking open counts in 
close dmp device.

RESOLUTION:
The code change has been done to make the modification of actual open count and 
layered open count on the same node while performing dmp device open/close.

* 2741105 (Tracking ID: 2722850)

SYMPTOM:
Disabling/enabling controllers while I/O is in progress results in dmp (Dynamic
Multi-Pathing) thread hang with following stack:

dmp_handle_delay_open
gen_dmpnode_update_cur_pri
dmp_start_failover
gen_update_cur_pri
dmp_update_cur_pri
dmp_process_curpri
dmp_daemons_loop

DESCRIPTION:
DMP takes an exclusive lock to quiesce a node to be failed over, and releases
the lock to do update operations. These update operations presume that the node
will be in quiesced status. A small timing window exists between lock release
and update operations, wherein other threads can break-in into this window and
unquiesce the node, which will lead to the hang while performing update operations.

RESOLUTION:
Corrected the quiesce counter of a node to avoid other threads unquiesce it when
a thread is performing update operations.

* 2744219 (Tracking ID: 2729501)

SYMPTOM:
In Dynamic Multi pathing environment, excluding a path also excludes other set 
of paths with matching substrings.

DESCRIPTION:
excluding a path using vxdmpadm exclude vxvm path=<> is excluding all the paths 
with matching substring. This is due to strncmp() used for comparison.
Also the size of h/w path defined in the structure is more than what is actually 
fetched.

RESOLUTION:
Correct the size of h/w path in the structure and use strcmp for comparison 
inplace of strncmp()

* 2750454 (Tracking ID: 2423701)

SYMPTOM:
Upgrade of VxVM caused change in permissions of /etc/vx/vxesd during live
upgrade from drwx------  to d---r-x---.

DESCRIPTION:
'/etc/vx/vxesd' directory gets shipped in VxVM with "drwx------" permissions.
However, while starting the vxesd daemon, if this directory is not present, it
gets created with "d---r-x---".

RESOLUTION:
Changes are made so that while starting vxesd daemon '/etc/vx/vxesd' gets
created with 'drwx------' permissions.

* 2752178 (Tracking ID: 2741240)

SYMPTOM:
In a VxVM environment, "vxdg join" when executed during heavy IO load fails 
with 
the below message.

VxVM vxdg ERROR V-5-1-4597 vxdg join [source_dg] [target_dg] failed
join failed : Commit aborted, restart transaction
join failed : Commit aborted, restart transaction

Half of the disks that were part of source_dg will become part of target_dg 
whereas other half will have no DG details.

DESCRIPTION:
In a vxdg join transaction, VxVM has implemented it as a two phase transaction. 
If the transaction fails after the first phase and during the second phase, 
half 
of the disks belonging to source_dg will become part of target_dg and the other 
half of the disks will be in a complex irrecoverable state. Also, in heavy IO 
situation, any retry limit (i.e.) a limit to retry transactions can be easily 
exceeded.

RESOLUTION:
"vxdg join" is now designed as a one phase atomic transaction and the retry 
limit is eliminated.

* 2774907 (Tracking ID: 2771452)

SYMPTOM:
In lossy and high latency network, I/O gets hung on VVR primary. Just before the
I/O hang, Rlink frequently connects and disconnects.

DESCRIPTION:
In lossy and high latency network, because of heartbeat time outs, RLINK gets
disconnected. As a part of Rlink disconnect, the communication port is deleted.
During this process, the RVG is serialized and the I/Os are kept in a special
queue - rv_restartq. The I/Os in rv_restartq are supposed to be restarted once the
port deletion is successful.
The port deletion involves termination of all the communication server processes.
Because of a bug in the port deletion logic, the global variable which keeps track
of number of communication server processes got decremented twice.
This caused port deletion process to be hung leading to I/Os in rv_restartq never
being restarted.

RESOLUTION:
In port deletion logic, it's made sure that the global variable which keeps track
of number of communication server processes will get decremented correctly.


INSTALLING THE PATCH
--------------------
$ swinstall -x autoreboot=true <patch id> 
Please do swverify after installing the patches in order to make sure
   that the patches are installed correctly using:

   $ swverify <patch id>


REMOVING THE PATCH
------------------
To remove the patch, enter the following command:

        # swremove  -x autoreboot=true <patch id>


SPECIAL INSTRUCTIONS
--------------------
NONE


OTHERS
------
NONE