Volume Manager on Solaris, patch detail

To use SORT, JavaScript must be enabled. How to enable JavaScript.
vm-sol_sparc-5.1SP1RP2P3 Obsolete Go to Download Center to download. The latest patch(es) : sfha-sol_sparc-5.1SP1RP4
Basic information
Release type:	P-patch
Release date:	2012-06-13
OS update support:	None
Technote:	None
Documentation:	None
Popularity:	2868 viewed downloaded
Download size:	44.84 MB
Checksum:	3571820347
Applies to one or more of the following products:
VirtualStore 5.1SP1 On Solaris 10 SPARC
VirtualStore 5.1SP1 On Solaris 9 SPARC
Dynamic Multi-Pathing 5.1SP1 On Solaris 10 SPARC
Dynamic Multi-Pathing 5.1SP1 On Solaris 9 SPARC
Storage Foundation 5.1SP1 On Solaris 10 SPARC
Storage Foundation 5.1SP1 On Solaris 9 SPARC
Storage Foundation Cluster File System 5.1SP1 On Solaris 10 SPARC
Storage Foundation Cluster File System 5.1SP1 On Solaris 9 SPARC
Storage Foundation for Oracle RAC 5.1SP1 On Solaris 10 SPARC
Storage Foundation for Oracle RAC 5.1SP1 On Solaris 9 SPARC
Storage Foundation HA 5.1SP1 On Solaris 10 SPARC
Storage Foundation HA 5.1SP1 On Solaris 9 SPARC
Obsolete patches, incompatibilities, superseded patches, or other requirements:
This patch is obsolete. It is superseded by:	Release date
sfha-sol_sparc-5.1SP1RP4	2013-08-21
vm-sol_sparc-5.1SP1RP3P2 (obsolete)	2013-03-15
sfha-sol_sparc-5.1SP1RP3 (obsolete)	2012-10-02
This patch supersedes the following patches:	Release date
vm-sol_sparc-5.1SP1RP2P2 (obsolete)	2011-11-03
vm-sol_sparc-5.1SP1RP2P1 (obsolete)	2011-10-19
vm-sol_sparc-5.1SP1RP1P2 (obsolete)	2011-06-07
vm-sol_sparc-5.1SP1RP1P1 (obsolete)	2011-03-02
vm-sol_sparc-5.1SP1P2 (obsolete)	2010-12-07
This patch requires:	Release date
sfha-sol_sparc-5.1SP1RP2 (obsolete)	2011-09-28
Fixes the following incidents:
2280285, 2405446, 2440015, 2477272, 2497637, 2497796, 2507120, 2507124, 2508294, 2508418, 2511928, 2515137, 2525333, 2531983, 2531987, 2531993, 2532440, 2552402, 2553391, 2562911, 2563291, 2574840, 2583307, 2589679, 2603605, 2612969, 2621549, 2626900, 2626911, 2626920, 2633041, 2636094, 2643651, 2651421, 2666175, 2676703, 2695225, 2695227, 2695228, 2701152, 2702110, 2703370, 2703373, 2706036, 2711758, 2713862, 2741105, 2744219, 2750453, 2750454, 2750458, 2750462, 2752178, 2774907
Patch ID:
142629-15
Readme file
                          * * * READ ME * * *
             * * * Veritas Volume Manager 5.1 SP1 RP2 * * *
                         * * * P-patch 3 * * *
                         Patch Date: 2012-06-13


This document provides the following information:

   * PATCH NAME
   * PACKAGES AFFECTED BY THE PATCH
   * BASE PRODUCT VERSIONS FOR THE PATCH
   * OPERATING SYSTEMS SUPPORTED BY THE PATCH
   * INCIDENTS FIXED BY THE PATCH
   * INSTALLATION PRE-REQUISITES
   * INSTALLING THE PATCH
   * REMOVING THE PATCH


PATCH NAME
----------
Veritas Volume Manager 5.1 SP1 RP2 P-patch 3


PACKAGES AFFECTED BY THE PATCH
------------------------------
VRTSvxvm


BASE PRODUCT VERSIONS FOR THE PATCH
-----------------------------------
   * Veritas Storage Foundation for Oracle RAC 5.1 SP1
   * Veritas Storage Foundation Cluster File System 5.1 SP1
   * Veritas Storage Foundation 5.1 SP1
   * Veritas Storage Foundation High Availability 5.1 SP1
   * Veritas Dynamic Multi-Pathing 5.1 SP1
   * Symantec VirtualStore 5.1 SP1


OPERATING SYSTEMS SUPPORTED BY THE PATCH
----------------------------------------
Solaris 9 SPARC
Solaris 10 SPARC


INCIDENTS FIXED BY THE PATCH
----------------------------
This patch fixes the following Symantec incidents:

Patch ID: 142629-15

* 2280285 (Tracking ID: 2365486)

SYMPTOM:
In Two nodes SFRAC configuration, after enabling ports when "vxdisk
scandisks" is run, systems panics with following stack: 

PANIC STACK:

.unlock_enable_mem()
.unlock_enable_mem()
dmp_update_path()
dmp_decode_update_dmpnode()
dmp_decipher_instructions()
dmp_process_instruction_buffer()
dmp_reconfigure_db()
gendmpioctl()
vxdmpioctl()
rdevioctl()
spec_ioctl()
vnop_ioctl()
vno_ioctl()
common_ioctl()
ovlya_addr_sc_flih_main()

DESCRIPTION:
Improper order of acquire and release of locks during reconfiguration of DMP
when I/O activity was running parallelly, lead to above panic.

RESOLUTION:
Release the locks in the same order as they in which they are acquired.

* 2405446 (Tracking ID: 2253970)

SYMPTOM:
Enhancement to customize private region I/O size based on maximum transfer size 
of underlying disk.

DESCRIPTION:
There are different types of Array Controllers which support data transfer 
sizes starting from 256K and beyond. VxVM tunable volmax_specialio controls 
vxconfigd's configuration I/O as well as Atomic Copy I/O size. When 
volmax_specialio is tuned to a value greater than 1MB to leverage maximum 
transfer sizes of underlying disks, import operation is failing for disks which 
cannot accept more than 256K I/O size. If the tunable is set to 256k then it 
will be the case where large transfer size of disks is not being leveraged.

RESOLUTION:
This enhancement leverages large disk transfer sizes as well as supports Array 
controllers with 256K transfer sizes.

* 2532440 (Tracking ID: 2495186)

SYMPTOM:
With TCP protocol used for replication, I/O throttling happens due to
memory flow control.

DESCRIPTION:
In some slow network configuration, the I/O throughput is throttled
back due to the replication I/O.

RESOLUTION:
It is better to keep the replication I/O outside the normal I/O code
path to improve its I/O throughput performance.

* 2563291 (Tracking ID: 2527289)

SYMPTOM:
In a Campus Cluster setup, storage fault may lead to DETACH of all the
configured site. This also results in IOfailure on all the nodes in the Campus
Cluster.

DESCRIPTION:
Site detaches are done on site consistent dgs when any volume in the dg looses
all the mirrors of a Site. During the processing of the DETACH of last mirror in
a site we identify that it is the last mirror and DETACH the site which in turn
detaches all the objects of that site.

In Campus Cluster setup we attach a dco volume for any data volume created on a
site-consistent dg. The general configuration is to have one DCO mirror on each
site. Loss of a single mirror of the dco volume on any node will result in the
detach of that site. 

In a 2 site configuration this particular scenario would result in both the dco
mirrors being lost simultaneously. While the site detach for the first mirror is
being processed we also signal for DETACH of the second mirror which ends up
DETACHING the second site too. 

This is not hit in other tests as we already have a check to make sure that we
do not DETACH the last mirror of a Volume. This check is being subverted in this
particular case due to the type of storage failure.

RESOLUTION:
Before triggering the site detach we need to have an explicit check to see if we
are trying to DETACH the last ACTIVE site.

* 2589679 (Tracking ID: 2589569)

SYMPTOM:
The vxdisksetup takes longer time (approximately 2-4 mins) to initialize sliced 
disk on A/P array.

DESCRIPTION:
In VxVM(Veritas Volume Manager), the DKIOCGVTOC/DKIOCGGEOM IOCTL(s) are used to 
detect the EFI disk. If the said IOCTL(s) return an error ENOTSUP, then the 
disk is said to have EFI label. Upon returning ENOTSUP error from primary path, 
the DMP driver attempts to retry the IOCTL(s) on secondary path, which is 
consuming more time.

RESOLUTION:
The IOCTL service routine is modified to restrict DMP driver from retrying the 
IOCTL(s) on secondary path.

* 2603605 (Tracking ID: 2419948)

SYMPTOM:
Race between the SRL flush due to SRL overflow and the kernel logging code, 
leads to a panic.

DESCRIPTION:
Rlink is disconencted, the RLINK state is moved to HALT. Primary RVG SRL is 
overflowed since there is no replication and which initiated DCM logging.

This change the STATE of rlink to DCM. (since rlink is already disconencted, 
this will keep the finale state as HALT.
During the SRL overflow, if the rlink connection resoted, then it has many 
state changes before completing the connection.

If the SRL overflow and  klogging code, finishes inbetween the above state 
transistion, and if it not finding it in VOLRP_PHASE_HALT, then the system is 
initiating the panic.

RESOLUTION:
Consider the above state change as valid, and make sure the SRL overflow code 
dont always expect the HALT state. Take action for the other state or wait for 
the full state transistion to complete for the rlink connection.

* 2612969 (Tracking ID: 2612960)

SYMPTOM:
Onlining a disk with GPT (GUID Partition Table) and VxVM aixdisk layout may 
result in vxconfigd dumping core and printing the following message:

Assertion failed: (0), file <file-name>, line <line-no>.

DESCRIPTION:
This problem occurs only on disks with VxVM aixdisk layout which previously had
GPT layout prior to being initialized with VxVM aixdisk layout.  The existence
of the GPT label on a disk with VxVM aixdisk layout resulted in VxVM unable to
discover the disk layout properly.

RESOLUTION:
While discovering the layout of the disk, VxVM first checks if the disk has VxVM
aixdisk layout.  VxVM clears out GPT label on disks which have VxVM aixdisk 
layout.

* 2621549 (Tracking ID: 2621465)

SYMPTOM:
When a failed disk belongs to a site has once again become accessible, it 
cannot be reattached to the disk group.

DESCRIPTION:
As the disk has a site tag name set, 'vxdg adddisk' command invoked 
in 'vxreattach' command needs the option '-f' to add the disk back to the 
disk group.

RESOLUTION:
Add the option '-f' to 'vxdg adddisk' command when it is invoked 
in 'vxreattach' command.

* 2626900 (Tracking ID: 2608849)

SYMPTOM:
1.Under a heavy I/O load on logclient node, write I/Os on VVR Primary logowner
takes a very long time to complete.

2. I/Os on "master" and "slave" nodes hang when "master" role is switched
multiple times using "vxclustadm setmaster" command.

DESCRIPTION:
1.
VVR can not allow more than 2048 I/Os outstanding on the SRL volume. Any I/Os
beyond this threshold will be throttled. The throttled I/Os are restarted after
every SRL header flush operation. During restarting the throttled I/Os, I/Os
came from logclient are given higher priority causing logowner I/Os to starve.

2.
In CVM reconfiguration code path the RLINK ports are not cleanly deleted on old
log-owner. This causes the RLINks not to connect leading to both replication and
I/O hang.

RESOLUTION:
Algorithm which restarts the throttled I/Os is modified to give fair chance to
both local and remote I/Os to proceed.
Additionally, the code changes are made in CVM reconfiguration code path to
delete the RLINK ports cleanly before switching the master role.

* 2626911 (Tracking ID: 2605444)

SYMPTOM:
vxdmpadm disable/enable primary path (EFI labelled) in A/PF array results in 
all paths getting disabled

DESCRIPTION:
Enabling an EFI labeled primary path is disabling the secondary path. When the 
primary path is disabled, a failover occurs on to secondary path. The name of 
the secondary path under goes a change dropping the slice s2 from the name 
(cxtxdxs2 becomes cxtxdx). The change in the name is not updated in the device 
property list. This inability in updating the list causes disabling of the 
secondary path when the primary path is enabled.

RESOLUTION:
The code path which changes the name of the secondary path is rectified to 
update the property list.

* 2626920 (Tracking ID: 2061082)

SYMPTOM:
"vxddladm -c assign names" command does not work if dmp_native_support 
tunable is enabled.

DESCRIPTION:
If dmp_native_support tunable is set to "on" then VxVM does not allow change in
name of dmpnodes. This holds true even for device with native support not
enabled like VxVM labeled or Third Party Devices. So there is  no way for
selectively changing name of devices for which native support is not enabled.

RESOLUTION:
This enhancement is addressed by code change to selectively change name for
devices with native support not enabled.

* 2633041 (Tracking ID: 2509291)

SYMPTOM:
"vxconfigd" daemon hangs if disable/enable of host side fc switch ports is
exercised for several iterations and consequently, VxVM related commands 
don't return.

schedule
dmp_biowait
dmp_indirect_io
gendmpioctl
dmpioctl
dmp_ioctl
dmp_compat_ioctl
compat_blkdev_ioctl
compat_sys_ioctl
sysenter_do_call

DESCRIPTION:
When the fail-over thread corresponding to the lun is scheduled, it goes ahead
and frees memory allocated for the fail-over request and returns from Array
Policy Module fail-over function call. When the thread is scheduled again, it
still points to the same fail-over request that got freed above. When it tries
to get the next value, NULL value is returned. The fail-over thread waiting for
other luns never gets invoked and results in vxconfigd daemon hang

RESOLUTION:
Code changes have been made to the Array Policy Module to save the fail-over
request pointer after marking the request state field as fail-over has completed
successfully.

* 2636094 (Tracking ID: 2635476)

SYMPTOM:
DMP (Dynamic Multi Pathing) driver does not automatically enable the failed 
paths of Logical Units (LUNs) that are restored.

DESCRIPTION:
DMP's restore demon probes each failed path at a default interval of 5 minutes 
(tunable) to detect if that path can be enabled. As part of enabling the path, 
DMP issues an open() on the path's device number. Owing to a bug in the DMP
code, the open() was issued on a wrong device partition which resulted in
failure for every probe. Thus, the path remained in failed status at DMP layer
though it was enabled at the array side.

RESOLUTION:
Modified the DMP restore daemon code path to issue the open() on the appropriate
device partitions.

* 2643651 (Tracking ID: 2643634)

SYMPTOM:
If standard(non-clone) disks and cloned disks of the same disk group are seen in
a host, dg import will fail with the following error message when the
standard(non-clone) disks have no enabled configuration copy of the disk group.

# vxdg import <dgname>
VxVM vxdg ERROR V-5-1-10978 Disk group <dgname>: import failed:
Disk group has no valid configuration copies

DESCRIPTION:
When VxVM is importing such a mixed configuration of standard(non-clone) disks
and cloned disks, standard(non-clone) disks will be selected as the member of
the disk group in 5.0MP3RP5HF1 and 5.1SP1RP2. It will be done while
administrators are not aware of the fact that there is a mixed configuration and
the standard(non-clone) disks are to be selected for the import. It is hard to
figure out from the error message and need time to investigate what is the issue.

RESOLUTION:
Syslog message enhancements are made in the code that administrators can figure
out if such a mixed configuration is seen in a host and also which disks are
selected for the import.

* 2651421 (Tracking ID: 2649846)

SYMPTOM:
In a Sun Cluster with VxVM (Veritas Volume Manager) and EMC Power Path 
(Third Party Multi-Pathing Driver) environment, "cldg create -t vxvm ..." which
is a Sun Cluster command core dumps while creating a VxVM type disk group.

The Sun Cluster command is:
cldg create -t vxvm -n <hostname> -v <disgroup name>
And the error messages is:
umem allocator: redzone violation: write past end of buffer

DESCRIPTION:
"cldg create -t vxvm -v <disk group>..." the Sun Cluster command needs 
VxVM's assistance to get the sub-path of devices included in the disk group. 
However, VxVM doesn't allocate enough memory to hold the devices' name. Owing 
to 
this reason Sun Cluster reports the error messages.

RESOLUTION:
VxVM library has been modified to allocate adequate memory space to hold the 
device name.

* 2666175 (Tracking ID: 2666163)

SYMPTOM:
A small memory leak may be seen in vxconfigd, the VxVM configuration daemon when
Serial Split Brain(SSB) error is detected in the import process.

DESCRIPTION:
The leak may occur when Serial Split Brain(SSB) error is detected in the import
process. It is because when the SSB error is returning from a function, a
dynamically allocated memory area in the same function would not be freed. The
SSB detection is a VxVM feature where VxVM detects if the configuration copy in
the disk private region becomes stale unexpectedly. A typical use case of the
SSB error is that a disk group is imported to different systems at the same time
and configuration copy update in both systems results in an inconsistency in the
copies. VxVM cannot identify which configuration copy is most up-to-date in this
situation. As a result, VxVM may detect SSB error on the next import and show
the details through a CLI message.

RESOLUTION:
Code changes are made to avoid the memory leak and also a small message fix has
been done.

* 2676703 (Tracking ID: 2553729)

SYMPTOM:
The following is observed during 'Upgrade' of VxVM (Veritas Volume Manager):

i) 'clone_disk' flag is seen on non-clone disks in STATUS field when 'vxdisk -e 
list' is executed after uprade to 5.1SP1 from lower versions of VxVM.


Eg:

DEVICE       TYPE           DISK        GROUP        STATUS
emc0_0054    auto:cdsdisk   emc0_0054    50MP3dg     online clone_disk
emc0_0055    auto:cdsdisk   emc0_0055    50MP3dg     online clone_disk

ii) Disk groups (dg) whose versions are less than 140 do not get imported after 
upgrade to VxVM versions 5.0MP3RP5HF1 or 5.1SP1RP2.

Eg:

# vxdg -C import <dgname>
VxVM vxdg ERROR V-5-1-10978 Disk group <dgname>: import failed:
Disk group version doesn't support feature; see the vxdg upgrade command

DESCRIPTION:
While uprading VxVM

i) After upgrade to 5.1SP1 or higher versions:
If a dg which is created on lower versions is deported and imported back on 
5.1SP1 after the upgrade, then "clone_disk" flags gets set on non-cloned disks 
because of the design change in UDID (unique disk identifier) of the disks.

ii) After upgrade to 5.0MP3RP5HF1 or 5.1SP1RP2:
Import of dg with versions less than 140 fails.

RESOLUTION:
Code changes are made to ensure that:
i) clone_disk flag does not get set for non-clone disks after the upgrade.
ii) Disk groups with versions less than 140 get imported after the upgrade.

* 2695225 (Tracking ID: 2675538)

SYMPTOM:
Data corruption can be observed on a CDS (Cross-platform Data Sharing) disk, 
as part of LUN resize operations. The following pattern would be found in the 
data region of the disk.

<DISK-IDENTIFICATION> cyl <number-of-cylinders> alt 2 hd <number-of-tracks> sec 
<number-of-sectors-per-track>

DESCRIPTION:
The CDS disk maintains a SUN VTOC in the zeroth block and a backup label at the 
end of the disk. The VTOC maintains the disk geometry information like number of 
cylinders, tracks and sectors per track. The backup label is the duplicate of 
VTOC and the backup label location is determined from VTOC contents. As part of 
resize, VTOC is not updated to the new size, which results in the wrong 
calculation of the backup label location. If the wrongly calculated backup label 
location falls in the public data region rather than the end of the disk as 
designed, data corruption occurs.

RESOLUTION:
Update the VTOC contents appropriately for LUN resize operations to prevent the 
data corruption.

* 2695227 (Tracking ID: 2674465)

SYMPTOM:
Data corruption is observed when DMP node names are changed by following
commands for DMP devices that are controlled by a third party multi-pathing
driver (E.g. MPXIO and PowerPath )

# vxddladm [-c] assign names
# vxddladm assign names file=<path-name>
# vxddladm set namingscheme=<scheme-name>

DESCRIPTION:
The above said commands when executed would re-assign names to each devices.
Accordingly the in-core DMP database should be updated for each device to map
the new device name with appropriate device number. Due to a bug in the code,
the mapping of names with the device number wasn't done appropriately which
resulted in subsequent IOs going to a wrong device thus leading to data 
corruption.

RESOLUTION:
DMP routines responsible for mapping the names with right device number is
modified to fix this corruption problem.

* 2695228 (Tracking ID: 2688747)

SYMPTOM:
Under a heavy I/O load on logclient node, the writes on VVR Primary logowner
takes a very long time to complete. Writes appear to be hung.

DESCRIPTION:
VVR cannot allow more than specific number of I/Os (4096)outstanding on the SRL
volume. Any I/Os beyond this threshold will be throttled. The throttled I/Os are
restarted periodically. While restarting, I/Os belonging logclient get high
preference compared to logowner I/Os, which can eventually lead to starvation or
I/O hang situation on logowner.

RESOLUTION:
Changes are done in algorithm of I/O scheduling of restarted I/Os, it's made
sure that throttled local I/Os will get the chance to proceed under all conditions.

* 2701152 (Tracking ID: 2700486)

SYMPTOM:
If the VVR Primary and Secondary nodes have the same host-name, and there is a
loss of heartbeats between them, vradmind daemon can core-dump if an active
stats session already exists on the Primary node.

Following stack-trace is observed:

pthread_kill()
_p_raise() 
raise.raise()
abort() 
__assert_c99
StatsSession::sessionInitReq()
StatsSession::processOpReq()
StatsSession::processOpMsgs()
RDS::processStatsOpMsg()
DBMgr::processStatsOpMsg()
process_message()
main()

DESCRIPTION:
On loss of heartbeats between the Primary and Secondary nodes, and a subsequent
reconnect, RVG information is sent to the Primary by Secondary node. In this 
case, if a Stats session already exists on the Primary, a STATS_SESSION_INIT 
request is sent back to the Secondary. However, the code was using "hostname" 
(as returned by `uname -a`) to identify the secondary node. Since both the 
nodes had the same hostname, the resulting STATS_SESSION_INIT request was 
received at the Primary itself, causing vradmind to core dump.

RESOLUTION:
Code was modified to use 'virtual host-name' information contained in the 
RLinks, rather than hostname(1m), to identify the secondary node. In a scenario 
where both Primary and Secondary have the same host-name, virtual host-names 
are used to configure VVR.

* 2702110 (Tracking ID: 2700792)

SYMPTOM:
vxconfigd, the VxVM volume configuration daemon may dump a core with the
following stack during the Cluster Volume Manager(CVM) startup with "hares
-online cvm_clus -sys [node]".

  dg_import_finish()
  dg_auto_import_all()
  master_init()
  role_assume()
  vold_set_new_role()
  kernel_get_cvminfo()
  cluster_check()
  vold_check_signal()
  request_loop()
  main()

DESCRIPTION:
During CVM startup, vxconfigd accesses the disk group record's pointer of a
pending record while the transaction on the disk group is in progress. At times,
vxconfigd incorrectly accesses the stale pointer while processing the current
transaction, thus resulting in a core dump.

RESOLUTION:
Code changes are made to access the appropriate pointer of the disk group record
which is active in the current transaction. Also, the disk group record is
appropriately initialized to NULL value.

* 2703370 (Tracking ID: 2700086)

SYMPTOM:
In the presence of "Not-Ready" EMC devices on the system, multiple dmp (path
disabled/enabled) events messages are seen in the syslog

DESCRIPTION:
The issue is that vxconfigd enables the BCV devices which are in Not-Ready state
for IO as the SCSI inquiry succeeds, but soon finds that they cannot be used for
I/O and disables those paths. This activity takes place whenever "vxdctl enable"
or "vxdisk scandisks" command is executed.

RESOLUTION:
Avoid changing the state of the BCV device which is in "Not-Ready" to prevent IO
and dmp event messages.

* 2703373 (Tracking ID: 2698860)

SYMPTOM:
Mirroring a large size VxVM volume comprising of THIN luns underneath
and with VxFS filesystem atop mounted fails with the following error:

Command error
# vxassist -b -g $disk_group_name mirror $volume_name
VxVM vxplex ERROR V-5-1-14671 Volume <volume_name> is configured on THIN luns
and not mounted. Use 'force' option, to bypass smartmove. To take advantage of
smartmove for supporting thin luns, retry this operation after mounting the
volume.
VxVM vxplex ERROR V-5-1-407 Attempting to cleanup after failure ...

Truss output error:
statvfs("<mount_point>", 0xFFBFEB54)              Err#79 EOVERFLOW

DESCRIPTION:
The statvfs system call is invoked internally during mirroring
operation to retrieve statistics information of VxFS file system hosted
on the volume. However, since the statvfs system call only
supports maximum 4294967295 (4GB-1) blocks, so if the total filesystem
blocks are greater than that, EOVERFLOW error occurs. This also results
in vxplex terminating with the errors.

RESOLUTION:
Use the 64 bits version of statvfs i.e., statvfs64 system call to resolve
the EOVERFLOW and vxplex errors.

* 2706036 (Tracking ID: 2617336)

SYMPTOM:
System panics when a root disk with a swap partition is encapsulated on a
Solaris 10 system with kernel patch 147440-04 installed.

DESCRIPTION:
Systems upgraded to Solaris 10 kernel patch 147440-04 and have swap device
encapsulated will recursively panic due to a NULL pointer passed to vxioioctl
from a new kernel routine 'swapify()'

RESOLUTION:
The vxio driver will not access the disk IOCTL return value pointer when it is
set to NULL.

* 2711758 (Tracking ID: 2710579)

SYMPTOM:
Data corruption can be observed on a CDS (Cross-platform Data Sharing) disk, 
as part of operations like LUN resize, Disk FLUSH, Disk ONLINE etc. The
following pattern would be found in the 
data region of the disk.

<DISK-IDENTIFICATION> cyl <number-of-cylinders> alt 2 hd <number-of-tracks> sec 
<number-of-sectors-per-track>

DESCRIPTION:
The CDS disk maintains a SUN VTOC in the zeroth block and a backup label at the 
end of the disk. The VTOC maintains the disk geometry information like number of 
cylinders, tracks and sectors per track. The backup label is the duplicate of 
VTOC and the backup label location is determined from VTOC contents. If the
content of SUN VTOC located in the zeroth sector are incorrect, this may result
in the wrong 
calculation of the backup label location. If the wrongly calculated backup label 
location falls in the public data region rather than the end of the disk as 
designed, data corruption occurs.

RESOLUTION:
Suppressed writing the backup label to prevent the data corruption.

* 2713862 (Tracking ID: 2390998)

SYMPTOM:
When running'vxdctl'or'vxdisk scandisks'command after the process of migrating 
SAN ports, system panicked, following is the stack trace:
.disable_lock()
dmp_close_path()
dmp_do_cleanup()
dmp_decipher_instructions()
dmp_process_instruction_buffer()
dmp_reconfigure_db()
gendmpioctl()
vxdmpioctl()

DESCRIPTION:
SAN ports migration ends up with two path nodes for the same device number, one 
node marked as NODE_DEVT_USED which means the same device number has been 
reused by another node. When open the dmp device, the actual open count on the 
new node (not marked with NODE_DEVT_USED) is modified. If the caller is 
referencing the old node (marked with NODE_DEVT_USED), it will then modify the 
layered open count on the old node. This results in the inconsistent open 
reference counts of the node and cause panic while checking open counts in 
close dmp device.

RESOLUTION:
The code change has been done to make the modification of actual open count and 
layered open count on the same node while performing dmp device open/close.

* 2741105 (Tracking ID: 2722850)

SYMPTOM:
Disabling/enabling controllers while I/O is in progress results in dmp (Dynamic
Multi-Pathing) thread hang with following stack:

dmp_handle_delay_open
gen_dmpnode_update_cur_pri
dmp_start_failover
gen_update_cur_pri
dmp_update_cur_pri
dmp_process_curpri
dmp_daemons_loop

DESCRIPTION:
DMP takes an exclusive lock to quiesce a node to be failed over, and releases
the lock to do update operations. These update operations presume that the node
will be in quiesced status. A small timing window exists between lock release
and update operations, wherein other threads can break-in into this window and
unquiesce the node, which will lead to the hang while performing update operations.

RESOLUTION:
Corrected the quiesce counter of a node to avoid other threads unquiesce it when
a thread is performing update operations.

* 2744219 (Tracking ID: 2729501)

SYMPTOM:
In Dynamic Multi pathing environment, excluding a path also excludes other set 
of paths with matching substrings.

DESCRIPTION:
excluding a path using vxdmpadm exclude vxvm path=<> is excluding all the paths 
with matching substring. This is due to strncmp() used for comparison.
Also the size of h/w path defined in the structure is more than what is actually 
fetched.

RESOLUTION:
Correct the size of h/w path in the structure and use strcmp for comparison 
inplace of strncmp()

* 2750453 (Tracking ID: 2439481)

SYMPTOM:
After doing live upgrade on an encapsulated disk with mirror, mirror disk entry 
does not get removed from the rootdg.

DESCRIPTION:
When the alternate disk(-d option) is specifed as c#t#d# to the vxlustart
script, the mirrored disk entries are not removed from the rootdg. 

The vxlustart script does not handle the case where the alternate disk is 
specified as c#t#d# format while the DA/DM names of the disks do not resemble 
c#t#d# format.

RESOLUTION:
Changes are done to enable specifying alt_disk in c#t#d# format with vxlustart 
when DA/DM name do not resemble c#t#d# format.

* 2750454 (Tracking ID: 2423701)

SYMPTOM:
Upgrade of VxVM caused change in permissions of /etc/vx/vxesd during live
upgrade from drwx------  to d---r-x---.

DESCRIPTION:
'/etc/vx/vxesd' directory gets shipped in VxVM with "drwx------" permissions.
However, while starting the vxesd daemon, if this directory is not present, it
gets created with "d---r-x---".

RESOLUTION:
Changes are made so that while starting vxesd daemon '/etc/vx/vxesd' gets
created with 'drwx------' permissions.

* 2750458 (Tracking ID: 2370250)

SYMPTOM:
When vxlufinish script runs "fuser -k" on the list of filesystem obtained using
"lufslist" command, it fails with the following error:

$ ./vxlufinish -V -u 5.10

   VERITAS Volume Manager is finishing Live-Upgrade of OS release 5.10

/dev/dsk/c1t0d0s0:
/dev/vx/dsk/datadg/datavol:    29079o
/dev/vx/dsk/ocrvotedg/ocrvotevol:    26857o   24239o
# luumount -n dest.9041
# luactivate dest.9041
# lumount -n dest.9041 /altroot.5.10

   vxlufinish check is successful. Still you can expect
   errors in encapsulation or in luactivate because of incorrect
   installation. Now try running it with no -V option

$ Write failed: Broken pipe

DESCRIPTION:
The filesystems which are excluded during "lucreate" get mounted as loopback
file system (lofs) under Alternate Boot Environment (ABE). These lofs
filesystems basically point to actual special device files under Primary Boot
Environment (PBE). 
The subroutine "unmount_all"  runs "fuser -k" on lofs mounts. Hence the issue.

RESOLUTION:
The solution involves following steps-
1) generate a list(L1) of mount points from "/etc/mnttab" of PBE.
2) generate a list(L2) of lofs mount points using "/etc/mnttab" of ABE. 
3) do not run "fuser -k" on lofs mounts in L2.

* 2750462 (Tracking ID: 2553942)

SYMPTOM:
"vxlustart -k" fails for the option of auto registration.

DESCRIPTION:
The .volume.inf file for sol10u10Build17 consists the string
VI"SOL_10_811_SPARC" whereas for other updates of Solaris 10 the string is
VI"SOL_10_910_SPARC" . The subroutine "chech_auto_registration"  parses year and
month using this string.  On the basis of year and month "auto_reg_required"
gets set.

RESOLUTION:
Changed the logic so that "auto_reg_required" gets set correctly.

* 2752178 (Tracking ID: 2741240)

SYMPTOM:
In a VxVM environment, "vxdg join" when executed during heavy IO load fails 
with 
the below message.

VxVM vxdg ERROR V-5-1-4597 vxdg join [source_dg] [target_dg] failed
join failed : Commit aborted, restart transaction
join failed : Commit aborted, restart transaction

Half of the disks that were part of source_dg will become part of target_dg 
whereas other half will have no DG details.

DESCRIPTION:
In a vxdg join transaction, VxVM has implemented it as a two phase transaction. 
If the transaction fails after the first phase and during the second phase, 
half 
of the disks belonging to source_dg will become part of target_dg and the other 
half of the disks will be in a complex irrecoverable state. Also, in heavy IO 
situation, any retry limit (i.e.) a limit to retry transactions can be easily 
exceeded.

RESOLUTION:
"vxdg join" is now designed as a one phase atomic transaction and the retry 
limit is eliminated.

* 2774907 (Tracking ID: 2771452)

SYMPTOM:
In lossy and high latency network, I/O gets hung on VVR primary. Just before the
I/O hang, Rlink frequently connects and disconnects.

DESCRIPTION:
In lossy and high latency network, because of heartbeat time outs, RLINK gets
disconnected. As a part of Rlink disconnect, the communication port is deleted.
During this process, the RVG is serialized and the I/Os are kept in a special
queue - rv_restartq. The I/Os in rv_restartq are supposed to be restarted once the
port deletion is successful.
The port deletion involves termination of all the communication server processes.
Because of a bug in the port deletion logic, the global variable which keeps track
of number of communication server processes got decremented twice.
This caused port deletion process to be hung leading to I/Os in rv_restartq never
being restarted.

RESOLUTION:
In port deletion logic, it's made sure that the global variable which keeps track
of number of communication server processes will get decremented correctly.

Patch ID: 142629-14

* 2583307 (Tracking ID: 2185069)

SYMPTOM:
In a CVR setup, while the application IOs are going on all nodes of
primary, bringing down a slave node results in panic on master node with following
stack trace:

 #0 [ffff8800282a3680] machine_kexec at ffffffff8103695b
 #1 [ffff8800282a36e0] crash_kexec at ffffffff810b8f08
 #2 [ffff8800282a37b0] oops_end at ffffffff814cbbd0
 #3 [ffff8800282a37e0] no_context at ffffffff8104651b
 #4 [ffff8800282a3830] __bad_area_nosemaphore at ffffffff810467a5
 #5 [ffff8800282a3880] bad_area_nosemaphore at ffffffff81046873
 #6 [ffff8800282a3890] do_page_fault at ffffffff814cd658
 #7 [ffff8800282a38e0] page_fault at ffffffff814caf45
    [exception RIP: vol_rv_async_childdone+876]
    RIP: ffffffffa080b7ac  RSP: ffff8800282a3990  RFLAGS: 00010006
    RAX: ffff8801ee8a5200  RBX: ffff8801f6e17200  RCX: ffff8802324290c0
    RDX: ffff8801f7c8fac8  RSI: 0000000000000009  RDI: ffff8801f7c8fac8
    RBP: ffff8800282a3a00   R8: ffff8801f38d8000   R9: 0000000000000001
    R10: 000000000000003f  R11: 000000000000000c  R12: ffff8801f2580000
    R13: ffff88021bdfa7c0  R14: ffff8801f7c8fa00  R15: ffff8801ed46a200
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #8 [ffff8800282a3a08] volsiodone at ffffffffa0672c3e
 #9 [ffff8800282a3a88] vol_subdisksio_done at ffffffffa06764a7
#10 [ffff8800282a3ac8] volkcontext_process at ffffffffa0642a59
#11 [ffff8800282a3b18] voldiskiodone at ffffffffa062f1c1
#12 [ffff8800282a3bc8] voldiskiodone_intr at ffffffffa062f3a2
#13 [ffff8800282a3bf8] voldmp_iodone at ffffffffa05f7806
#14 [ffff8800282a3c08] bio_endio at ffffffff811a0d3d
#15 [ffff8800282a3c18] gendmpiodone at ffffffffa059594a
#16 [ffff8800282a3c68] dmpiodone at ffffffffa0596cf2
#17 [ffff8800282a3cb8] bio_endio at ffffffff811a0d3d
#18 [ffff8800282a3cc8] req_bio_endio at ffffffff8123f7fb
#19 [ffff8800282a3cf8] blk_update_request at ffffffff8124083f
#20 [ffff8800282a3d58] blk_update_bidi_request at ffffffff81240ba7
#21 [ffff8800282a3d88] blk_end_bidi_request at ffffffff81241c7f
#22 [ffff8800282a3db8] blk_end_request at ffffffff81241d20
#23 [ffff8800282a3dc8] scsi_io_completion at ffffffff8134a42f
#24 [ffff8800282a3e48] scsi_finish_command at ffffffff81341812
#25 [ffff8800282a3e88] scsi_softirq_done at ffffffff8134aa3d
#26 [ffff8800282a3eb8] blk_done_softirq at ffffffff81247275
#27 [ffff8800282a3ee8] __do_softirq at ffffffff81073bd7
#28 [ffff8800282a3f58] call_softirq at ffffffff810142cc
#29 [ffff8800282a3f70] do_softirq at ffffffff81015f35
#30 [ffff8800282a3f90] irq_exit at ffffffff810739d5
#31 [ffff8800282a3fa0] smp_call_function_single_interrupt at ffffffff8102eab5
#32 [ffff8800282a3fb0] call_function_single_interrupt at ffffffff81013e33
--- <IRQ stack> ---
#33 [ffff8801f3ca9af8] call_function_single_interrupt at ffffffff81013e33
    [exception RIP: page_waitqueue+125]
    RIP: ffffffff8110b16d  RSP: ffff8801f3ca9ba8  RFLAGS: 00000213
    RAX: 0000000000000b9d  RBX: ffff8801f3ca9ba8  RCX: 0000000000000034
    RDX: ffff880000027d80  RSI: 0000000000000000  RDI: 00000000000003df
    RBP: ffffffff81013e2e   R8: ea00000000000000   R9: 5000000000000000
    R10: 0000000000000000  R11: ffff8801ecd0f268  R12: ffffea0006c13d40
    R13: 0000000000001000  R14: ffffffff8119d881  R15: ffff8801f3ca9b18
    ORIG_RAX: ffffffffffffff04  CS: 0010  SS: 0018
#34 [ffff8801f3ca9bb0] unlock_page at ffffffff8110c16a
#35 [ffff8801f3ca9bd0] blkdev_write_end at ffffffff811a3cd0
#36 [ffff8801f3ca9c00] generic_file_buffered_write at ffffffff8110c944
#37 [ffff8801f3ca9cd0] __generic_file_aio_write at ffffffff8110e230
#38 [ffff8801f3ca9d90] blkdev_aio_write at ffffffff811a339c
#39 [ffff8801f3ca9dc0] do_sync_write at ffffffff8116c51a
#40 [ffff8801f3ca9ef0] vfs_write at ffffffff8116c818
#41 [ffff8801f3ca9f30] sys_write at ffffffff8116d251
#42 [ffff8801f3ca9f80] sysenter_dispatch at ffffffff8104ca7f

DESCRIPTION:
The reason for panic is that an internal data structure access is not
properly serialized resulting in corruption of that data structure.

RESOLUTION:
Resolution is to properly serialize access to the internal data
structure so that its contents are not corrupted under any scenario,

Patch ID: 142629-13

* 2440015 (Tracking ID: 2428170)

SYMPTOM:
I/O hangs when reading or writing to a volume after a total storage 
failure in CVM environments with Active-Passive arrays.

DESCRIPTION:
In the event of a storage failure, in active-passive environments, 
the CVM-DMP fail over protocol is initiated. This protocol is responsible for 
coordinating the fail-over of primary paths to secondary paths on all nodes in 
the 
cluster.
In the event of a total storage failure, where both the primary paths and 
secondary paths fail, in some situations the protocol fails to cleanup some 
internal structures, leaving the devices quiesced.

RESOLUTION:
After a total storage failure all devices should be un-quiesced, 
allowing the I/Os to fail. The CVM-DMP protocol has been changed to cleanup 
devices, even if all paths to a device have been removed.

* 2477272 (Tracking ID: 2169726)

SYMPTOM:
After import operation, the imported diskgroup contains combination of cloned 
and original disks. For example, after importing the diskgroup which has four 
disks, two of the disks from imported diskgroup are cloned disks and the other 
two are original disks.

DESCRIPTION:
For a particular diskgroup, if some of the original disks are not available at 
the time of diskgroup import operation and the corresponding cloned disks are 
present, then the diskgroup imported through vxdg import operation contains 
combination of cloned and original disks.
Example - 
Diskgroup named dg1 with the disks disk1 and disk2 exists on some machine. 
Clones of disks named disk1_clone disk2_clone are also available. If disk2 goes 
offline and the import for dg1 is performed, then the resulting diskgroup will 
contain disks disk1 and disk2_clone.

RESOLUTION:
The diskgroup import operation will consider cloned disks only if no original 
disk is available. If any of the original disks exists at the time of import 
operation, then the import operation will be attempted using original disks 
only.

* 2497637 (Tracking ID: 2489350)

SYMPTOM:
In a Storage Foundation environment running Symantec Oracle Disk Manager (ODM),
Veritas File System (VxFS), Cluster volume Manager (CVM) and Veritas Volume
Replicator (VVR), kernel memory is leaked under certain conditions.

DESCRIPTION:
In CVR (CVM + VVR), under certain conditions (for example when I/O throttling
gets enabled or kernel messaging subsystem is overloaded), the I/O resources
allocated before are freed and the I/Os are being restarted afresh. While
freeing the I/O resources, VVR primary node doesn't free the kernel memory
allocated for FS-VM private information data structure and causing the kernel
memory leak of 32 bytes for each restarted I/O.

RESOLUTION:
Code changes are made in VVR to free the kernel memory allocated for FS-VM
private information data structure before the I/O is restarted afresh.

* 2497796 (Tracking ID: 2235382)

SYMPTOM:
IOs can hang in DMP driver when IOs are in progress while carrying out path
failover.

DESCRIPTION:
While restoring any failed path to a non-A/A LUN, DMP driver is checking that
whether any pending IOs are there on the same dmpnode. If any are present then DMP
is marking the corresponding LUN with special flag so that path failover/failback
can be triggered by the pending IOs. There is a window here and by chance if all
the pending IOs return before marking the dmpnode, then any future IOs on the
dmpnode get stuck in wait queues.

RESOLUTION:
Make sure that whenever the LUN is having pending IOs then only to set the flag on
it so that failover can be triggered by pending IOs.

* 2507120 (Tracking ID: 2438426)

SYMPTOM:
The following messages are displayed after vxconfigd is started.

pp_claim_device: Could not get device number for /dev/rdsk/emcpower0 
pp_claim_device: Could not get device number for /dev/rdsk/emcpower1

DESCRIPTION:
Device Discovery Layer(DDL) has incorrectly marked a path under dmp device with 
EFI flag even though there is no corresponding Extensible Firmware Interface 
(EFI) device in /dev/[r]dsk/. As a result, Array Support Library (ASL) issues a 
stat command on non-existent EFI device and displays the above messages.

RESOLUTION:
Avoided marking EFI flag on Dynamic MultiPathing (DMP) paths which correspond to 
non-efi devices.

* 2507124 (Tracking ID: 2484334)

SYMPTOM:
The system panic occurs with the following stack while collecting the DMP 
stats.

dmp_stats_is_matching_group+0x314()
dmp_group_stats+0x3cc()
dmp_get_stats+0x194()
gendmpioctl()
dmpioctl+0x20()

DESCRIPTION:
Whenever new devices are added to the system, the stats table is adjusted to
accomodate the new devices in the DMP. There exists a race between the stats
collection thread and the thread which adjusts the stats table to accomodate
the new devices. The race can result the stats collection thread to access the
memory beyond the known size of the table causing the system panic.

RESOLUTION:
The stats collection code in the DMP is rectified to restrict the access to the 
known size of the stats table.

* 2508294 (Tracking ID: 2419486)

SYMPTOM:
Data corruption is observed with single path when naming scheme is changed 
from enclodure based (EBN) to OS Native (OSN).

DESCRIPTION:
The Data corruption can occur in the following configuration, 
when the naming scheme is changed while applications are on-line.

1. The DMP device is configured with single path or the devices are controlled
   by Third party Multipathing Driver (Ex: MPXIO, MPIO etc.,)

2. The DMP device naming scheme is EBN (enclosure based naming) and 
persistence=yes

3. The naming scheme is changed to OSN using the following command
   # vxddladm set namingscheme=osn


There is possibility of change in name of the VxVM device (DA record) while
the naming scheme is changing. As a result of this the device attribute list 
is updated with new DMP device names. Due to a bug in the code which updates 
the attribute list, the VxVM device records are mapped to wrong DMP devices.

Example:

Following are the device names with EBN naming scheme.

MAS-usp0_0   auto:cdsdisk    hitachi_usp0_0  prod_SC32    online
MAS-usp0_1   auto:cdsdisk    hitachi_usp0_4  prod_SC32    online
MAS-usp0_2   auto:cdsdisk    hitachi_usp0_5  prod_SC32    online
MAS-usp0_3   auto:cdsdisk    hitachi_usp0_6  prod_SC32    online
MAS-usp0_4   auto:cdsdisk    hitachi_usp0_7  prod_SC32    online
MAS-usp0_5   auto:none       -            -            online invalid
MAS-usp0_6   auto:cdsdisk    hitachi_usp0_1  prod_SC32    online
MAS-usp0_7   auto:cdsdisk    hitachi_usp0_2  prod_SC32    online
MAS-usp0_8   auto:cdsdisk    hitachi_usp0_3  prod_SC32    online
MAS-usp0_9   auto:none       -            -            online invalid
disk_0       auto:cdsdisk    -            -            online
disk_1       auto:none       -            -            online invalid

bash-3.00# vxddladm set namingscheme=osn

The follwoing is after executing the above command.
The MAS-usp0_9 is changed as MAS-usp0_6 and the following devices
are changed accordingly.

bash-3.00# vxdisk list
DEVICE       TYPE            DISK         GROUP        STATUS
MAS-usp0_0   auto:cdsdisk    hitachi_usp0_0  prod_SC32    online
MAS-usp0_1   auto:cdsdisk    hitachi_usp0_4  prod_SC32    online
MAS-usp0_2   auto:cdsdisk    hitachi_usp0_5  prod_SC32    online
MAS-usp0_3   auto:cdsdisk    hitachi_usp0_6  prod_SC32    online
MAS-usp0_4   auto:cdsdisk    hitachi_usp0_7  prod_SC32    online
MAS-usp0_5   auto:none       -            -            online invalid
MAS-usp0_6   auto:none       -            -            online invalid
MAS-usp0_7   auto:cdsdisk    hitachi_usp0_1  prod_SC32    online
MAS-usp0_8   auto:cdsdisk    hitachi_usp0_2  prod_SC32    online
MAS-usp0_9   auto:cdsdisk    hitachi_usp0_3  prod_SC32    online
c4t20000014C3D27C09d0s2 auto:none       -            -            online invalid
c4t20000014C3D26475d0s2 auto:cdsdisk    -            -            online

RESOLUTION:
Code changes are made to update device attribute list correctly even if name of
the VxVM device is changed while the naming scheme is changing.

* 2508418 (Tracking ID: 2390431)

SYMPTOM:
In a Disaster Recovery environment, when DCM (Data Change Map) is active and 
during SRL(Storage Replicator Log)/DCM flush, the system panics due to missing
parent on one of the DCM in an RVG (Replicated Volume Group).

DESCRIPTION:
The DCM flush happens during every log update and its frequency depends on the 
IO load. If the I/O load is high, the DCM flush happens very often and if there 
are more volumes in the RVG, the frequency is very high. Every DCM flush 
triggers the DCM flush on all the volumes in the RVG. If there are 50 volumes, 
in an RVG, then each DCM flush creates 50 children and is controlled by one 
parent SIO. Once all the 50 children are done, then the parent SIO releases 
itself for the next flush. Once the DCM flush of each child completes, it 
detaches itself from the parent by setting the parent field to NULL. It so 
happens that, if the 49th child is done and before it is detaching it from the 
parent, the 50th child completes and releases the parent_SIO for the next DCM 
flush. Before the 49th child detaches, the new DCM flush is started on the same 
50th child. After the next flush is started, the 49th child of the previous 
flush detaches itself from the parent and since it is a static SIO, it 
indirectly resets the new flush parent field. Also, the lock is not obtained 
before modifing the sio state field in a few scenarios.

RESOLUTION:
Before reducing the children count, detach the parent first. This will make 
sure the new flush will not race with the previous flush. Protect the field 
with the required lock in all the scenarios.

* 2511928 (Tracking ID: 2420386)

SYMPTOM:
Corrupted data is seen near the end of a sub-disk, on thin-reclaimable 
disks with either CDS EFI or sliced disk formats.

DESCRIPTION:
In environments with thin-reclaim disks running with either CDS-EFI 
disks or sliced disks, misaligned reclaims can be initiated. In some situations, 
when reclaiming a sub-disk, the reclaim does not take into account the correct 
public region start offset, which in rare instances can potentially result in 
reclaiming data before the sub-disk which is being reclaimed.

RESOLUTION:
The public offset is taken into account when initiating all reclaim
operations.

* 2515137 (Tracking ID: 2513101)

SYMPTOM:
When VxVM is upgraded from 4.1MP4RP2 to 5.1SP1RP1, the data on CDS disk gets
corrupted.

DESCRIPTION:
When CDS disks are initialized with VxVM version 4.1MP4RP2, the no of cylinders
are calculated based on the disk raw geometry. If the calculated no. of
cylinders exceed Solaris VTOC limit (65535), because of unsigned integer
overflow, truncated value of no of cylinders gets written in CDS label.
    After the VxVM is upgraded to 5.1SP1RP1, CDS label gets wrongly written in
the public region leading to the data corruption.

RESOLUTION:
The code changes are made  to suitably adjust the no. of tracks & heads so that
the calculated no. of cylinders be within Solaris VTOC limit.

* 2525333 (Tracking ID: 2148851)

SYMPTOM:
"vxdisk resize" operation fails on a disk with VxVM cdsdisk/simple/sliced layout
on Solaris/Linux platform with the following message:

      VxVM vxdisk ERROR V-5-1-8643 Device emc_clariion0_30: resize failed: New
      geometry makes partition unaligned

DESCRIPTION:
The new cylinder size selected during "vxdisk resize" operation is unaligned with
the partitions that existed prior to the "vxdisk resize" operation.

RESOLUTION:
The algorithm to select the new geometry has been redesigned such that the new
cylinder size is always aligned with the existing as well as new partitions.

* 2531983 (Tracking ID: 2483053)

SYMPTOM:
VVR Primary system consumes very high kernel heap memory and appear to 
be hung.

DESCRIPTION:
There is a race between REGION LOCK deletion thread which runs as 
part of SLAVE leave reconfiguration and the thread which process the DATA_DONE 
message coming from log client to logowner. Because of this race, the flags 
which stores the status information about the I/Os was not correctly updated. 
This used to cause a lot of SIOs being stuck in a queue consuming a large kernel 
heap.

RESOLUTION:
The code changes are made to take the proper locks while updating 
the SIOs' fields.

* 2531987 (Tracking ID: 2510523)

SYMPTOM:
In CVM-VVR configuration, I/Os on "master" and "slave" nodes hang when "master"
role is switched to the other node using "vxclustadm setmaster" command.

DESCRIPTION:
Under heavy I/O load, the I/Os are sometimes throttled in VVR, if number of
outstanding I/Os on SRL reaches a certain limit (2048 I/Os).
When "master" role is switched to the other node by using "vxclustadm setmaster"
command, the throttled I/Os on original master are never restarted. This causes
the I/O hang.

RESOLUTION:
Code changes are made in VVR to make sure the throttled I/Os are restarted
before "master" switching is started.

* 2531993 (Tracking ID: 2524936)

SYMPTOM:
Disk group is disabled after rescanning disks with "vxdctl enable"
command with the console output below,


 <timestamp> pp_claim_device:         0 
 <timestamp> Could not get metanode from ODM database  
 <timestamp> pp_claim_device:         0 
 <timestamp> Could not get metanode from ODM database  

The error messages below are also seen in vxconfigd debug log output,
              
<timestamp>  VxVM vxconfigd ERROR V-5-1-12223 Error in claiming /dev/<disk>: The 
process file table is full. 
<timestamp>  VxVM vxconfigd ERROR V-5-1-12223 Error in claiming /dev/<disk>: The 
process file table is full. 
...
<timestamp> VxVM vxconfigd ERROR V-5-1-12223 Error in claiming /dev/<disk>: The 
process file table is full.

AIX-

DESCRIPTION:
When the total physical memory in AIX machine is greater than or equal to
40GB & multiple of 40GB (like 80GB, 120GB), a limitation/bug in setulimit
function causes an overflowed value set as the new limit/size of the data area,
which results in memory allocation failures in vxconfigd. Creation of the shared
memory segment also fails during this course. Error handling of this case is 
missing in vxconfigd code, hence resulting in error in claiming disks and 
offlining configuration copies which in-turn results in disabling disk group.

AIX-

RESOLUTION:
Code changes are made to handle the failure case on shared memory segment
creation.

* 2552402 (Tracking ID: 2432006)

SYMPTOM:
System intermittently hangs during boot if disk is encapsulated.
When this problem occurs, OS boot process stops after outputing this:
"VxVM sysboot INFO V-5-2-3409 starting in boot mode..."

DESCRIPTION:
The boot process hung due to a dead lock between two threads, one VxVM
transaction thread and another thread attempting a read on root volume 
issued by dhcpagent.  Read I/O is deferred till transaction is finished but
read count incremented earlier is not properly adjusted.

RESOLUTION:
Proper care is taken to decrement pending read count if read I/O is deferred.

* 2553391 (Tracking ID: 2536667)

SYMPTOM:
[04DAD004]voldiodone+000C78 (F10000041116FA08) 
[04D9AC88]volsp_iodone_common+000208 (F10000041116FA08, 
0000000000000000, 
  0000000000000000) 
[04B7A194]volsp_iodone+00001C (F10000041116FA08) 
[000F3FDC]internal_iodone_offl+0000B0 (??, ??) 
[000F3F04]iodone_offl+000068 () 
[000F20CC]i_softmod+0001F0 () 
[0017C570].finish_interrupt+000024 ()

DESCRIPTION:
Panic happened due to accessing a stale DG pointer as DG got deleted before the 
I/O returned. It may happen on cluster configuration where commands generating 
private region i/os and "vxdg deport/delete" commands are executing 
simultaneously on two nodes of the cluster.

RESOLUTION:
Code changes are made to drain private region I/Os before deleting the DG.

* 2562911 (Tracking ID: 2375011)

SYMPTOM:
User is not able to change the "dmp_native_support" tunable to "on" or "off"
in the presence of the root ZFS pool.

SOL_

DESCRIPTION:
DMP does not allow the dmp_native_support tunable to be changed if any of the
ZFS pools is in use. Therefore in the presence of root ZFS pool, DMP reports the
following error when the user tried to change the "dmp_native_support" tunable
to "on" or "off"

# vxdmpadm settune dmp_native_support=off
VxVM vxdmpadm ERROR V-5-1-15690 Operation failed for one or more zpools
VxVM vxdmpadm ERROR V-5-1-15686 The following zpool(s) could not be migrated as
they are in use -
     rpool

SOL_

RESOLUTION:
DMP code has been changed to skip the root ZFS pool in its internal checks for
active ZFS pools prior to changing the value of dmp_native_support tunable.

* 2563291 (Tracking ID: 2527289)

SYMPTOM:
In a Campus Cluster setup, storage fault may lead to DETACH of all the
configured site. This also results in IOfailure on all the nodes in the Campus
Cluster.

DESCRIPTION:
Site detaches are done on site consistent dgs when any volume in the dg looses
all the mirrors of a Site. During the processing of the DETACH of last mirror in
a site we identify that it is the last mirror and DETACH the site which in turn
detaches all the objects of that site.

In Campus Cluster setup we attach a dco volume for any data volume created on a
site-consistent dg. The general configuration is to have one DCO mirror on each
site. Loss of a single mirror of the dco volume on any node will result in the
detach of that site. 

In a 2 site configuration this particular scenario would result in both the dco
mirrors being lost simultaneously. While the site detach for the first mirror is
being processed we also signal for DETACH of the second mirror which ends up
DETACHING the second site too. 

This is not hit in other tests as we already have a check to make sure that we
do not DETACH the last mirror of a Volume. This check is being subverted in this
particular case due to the type of storage failure.

RESOLUTION:
Before triggering the site detach we need to have an explicit check to see if we
are trying to DETACH the last ACTIVE site.

* 2574840 (Tracking ID: 2344186)

SYMPTOM:
In a master-slave configuration with FMR3/DCO volumes, reboot of a cluster node 
fails to join back the cluster again with following error messages in the 
console

[..]
Jul XX 18:44:09 vienna vxvm:vxconfigd: [ID 702911 daemon.error] V-5-1-11092 
cleanup_client: (Volume recovery in progress) 230
Jul XX 18:44:09 vienna vxvm:vxconfigd: [ID 702911 daemon.error] V-5-1-11467 
kernel_fail_join() :                Reconfiguration interrupted: Reason is 
retry to add a node failed (13, 0)
[..]

DESCRIPTION:
VxVM volumes with FMR3/DCO have inbuilt DRL mechanism to track the disk block 
of 
in-flight IOs in order to recover the data much quicker in case of a node 
crash. 
Thus, a joining node awaits the variable, responsible for recovery, to get 
unset 
to join the cluster. However, due to a bug in FMR3/DCO code, this variable was 
set 
forever, thus leading to node join failure.

RESOLUTION:
Modified the FMR3/DCO code to appropriately set and unset this recovery 
variable.


INSTALLING THE PATCH
--------------------
o Before-the-upgrade :-
  (a) Stop I/Os to all the VxVM volumes.
  (b) Umount any filesystems with VxVM volumes.
  (c) Stop applications using any VxVM volumes.

For Solaris  9, and 10 releases, refer to the man pages for instructions on using 'patchadd' and 'patchrm' scripts provided with Solaris.
Any other special or non-generic installation instructions should be described below as special instructions.  The following example installs a patch to a st
andalone machine:

        example# patchadd 146884-xx


REMOVING THE PATCH
------------------
The following example removes a patch from a standalone system:

        example# patchrm 146884-xx


SPECIAL INSTRUCTIONS
--------------------
NONE


OTHERS
------
NONE