* * * READ ME * * *
             * * * Veritas Volume Manager 5.1 SP1 RP2 * * *
                         * * * P-patch 3 * * *
                         Patch Date: 2012-06-13


This document provides the following information:

   * PATCH NAME
   * PACKAGES AFFECTED BY THE PATCH
   * BASE PRODUCT VERSIONS FOR THE PATCH
   * OPERATING SYSTEMS SUPPORTED BY THE PATCH
   * INCIDENTS FIXED BY THE PATCH
   * INSTALLATION PRE-REQUISITES
   * INSTALLING THE PATCH
   * REMOVING THE PATCH


PATCH NAME
----------
Veritas Volume Manager 5.1 SP1 RP2 P-patch 3


PACKAGES AFFECTED BY THE PATCH
------------------------------
VRTSvxvm


BASE PRODUCT VERSIONS FOR THE PATCH
-----------------------------------
   * Veritas Storage Foundation for Oracle RAC 5.1 SP1
   * Veritas Storage Foundation Cluster File System 5.1 SP1
   * Veritas Storage Foundation 5.1 SP1
   * Veritas Storage Foundation High Availability 5.1 SP1
   * Veritas Dynamic Multi-Pathing 5.1 SP1
   * Symantec VirtualStore 5.1 SP1


OPERATING SYSTEMS SUPPORTED BY THE PATCH
----------------------------------------
Solaris 10 X64
Solaris 10 X86


INCIDENTS FIXED BY THE PATCH
----------------------------
This patch fixes the following Symantec incidents:

Patch ID: 142630-15

* 2280285 (Tracking ID: 2365486)

SYMPTOM:
In Two nodes SFRAC configuration, after enabling ports when "vxdisk
scandisks" is run, systems panics with following stack: 

PANIC STACK:

.unlock_enable_mem()
.unlock_enable_mem()
dmp_update_path()
dmp_decode_update_dmpnode()
dmp_decipher_instructions()
dmp_process_instruction_buffer()
dmp_reconfigure_db()
gendmpioctl()
vxdmpioctl()
rdevioctl()
spec_ioctl()
vnop_ioctl()
vno_ioctl()
common_ioctl()
ovlya_addr_sc_flih_main()

DESCRIPTION:
Improper order of acquire and release of locks during reconfiguration of DMP
when I/O activity was running parallelly, lead to above panic.

RESOLUTION:
Release the locks in the same order as they in which they are acquired.

* 2405446 (Tracking ID: 2253970)

SYMPTOM:
Enhancement to customize private region I/O size based on maximum transfer size 
of underlying disk.

DESCRIPTION:
There are different types of Array Controllers which support data transfer 
sizes starting from 256K and beyond. VxVM tunable volmax_specialio controls 
vxconfigd's configuration I/O as well as Atomic Copy I/O size. When 
volmax_specialio is tuned to a value greater than 1MB to leverage maximum 
transfer sizes of underlying disks, import operation is failing for disks which 
cannot accept more than 256K I/O size. If the tunable is set to 256k then it 
will be the case where large transfer size of disks is not being leveraged.

RESOLUTION:
This enhancement leverages large disk transfer sizes as well as supports Array 
controllers with 256K transfer sizes.

* 2532440 (Tracking ID: 2495186)

SYMPTOM:
With TCP protocol used for replication, I/O throttling happens due to
memory flow control.

DESCRIPTION:
In some slow network configuration, the I/O throughput is throttled
back due to the replication I/O.

RESOLUTION:
It is better to keep the replication I/O outside the normal I/O code
path to improve its I/O throughput performance.

* 2563291 (Tracking ID: 2527289)

SYMPTOM:
In a Campus Cluster setup, storage fault may lead to DETACH of all the
configured site. This also results in IOfailure on all the nodes in the Campus
Cluster.

DESCRIPTION:
Site detaches are done on site consistent dgs when any volume in the dg looses
all the mirrors of a Site. During the processing of the DETACH of last mirror in
a site we identify that it is the last mirror and DETACH the site which in turn
detaches all the objects of that site.

In Campus Cluster setup we attach a dco volume for any data volume created on a
site-consistent dg. The general configuration is to have one DCO mirror on each
site. Loss of a single mirror of the dco volume on any node will result in the
detach of that site. 

In a 2 site configuration this particular scenario would result in both the dco
mirrors being lost simultaneously. While the site detach for the first mirror is
being processed we also signal for DETACH of the second mirror which ends up
DETACHING the second site too. 

This is not hit in other tests as we already have a check to make sure that we
do not DETACH the last mirror of a Volume. This check is being subverted in this
particular case due to the type of storage failure.

RESOLUTION:
Before triggering the site detach we need to have an explicit check to see if we
are trying to DETACH the last ACTIVE site.

* 2589679 (Tracking ID: 2589569)

SYMPTOM:
The vxdisksetup takes longer time (approximately 2-4 mins) to initialize sliced 
disk on A/P array.

DESCRIPTION:
In VxVM(Veritas Volume Manager), the DKIOCGVTOC/DKIOCGGEOM IOCTL(s) are used to 
detect the EFI disk. If the said IOCTL(s) return an error ENOTSUP, then the 
disk is said to have EFI label. Upon returning ENOTSUP error from primary path, 
the DMP driver attempts to retry the IOCTL(s) on secondary path, which is 
consuming more time.

RESOLUTION:
The IOCTL service routine is modified to restrict DMP driver from retrying the 
IOCTL(s) on secondary path.

* 2603605 (Tracking ID: 2419948)

SYMPTOM:
Race between the SRL flush due to SRL overflow and the kernel logging code, 
leads to a panic.

DESCRIPTION:
Rlink is disconencted, the RLINK state is moved to HALT. Primary RVG SRL is 
overflowed since there is no replication and which initiated DCM logging.

This change the STATE of rlink to DCM. (since rlink is already disconencted, 
this will keep the finale state as HALT.
During the SRL overflow, if the rlink connection resoted, then it has many 
state changes before completing the connection.

If the SRL overflow and  klogging code, finishes inbetween the above state 
transistion, and if it not finding it in VOLRP_PHASE_HALT, then the system is 
initiating the panic.

RESOLUTION:
Consider the above state change as valid, and make sure the SRL overflow code 
dont always expect the HALT state. Take action for the other state or wait for 
the full state transistion to complete for the rlink connection.

* 2612969 (Tracking ID: 2612960)

SYMPTOM:
Onlining a disk with GPT (GUID Partition Table) and VxVM aixdisk layout may 
result in vxconfigd dumping core and printing the following message:

Assertion failed: (0), file <file-name>, line <line-no>.

DESCRIPTION:
This problem occurs only on disks with VxVM aixdisk layout which previously had
GPT layout prior to being initialized with VxVM aixdisk layout.  The existence
of the GPT label on a disk with VxVM aixdisk layout resulted in VxVM unable to
discover the disk layout properly.

RESOLUTION:
While discovering the layout of the disk, VxVM first checks if the disk has VxVM
aixdisk layout.  VxVM clears out GPT label on disks which have VxVM aixdisk 
layout.

* 2621549 (Tracking ID: 2621465)

SYMPTOM:
When a failed disk belongs to a site has once again become accessible, it 
cannot be reattached to the disk group.

DESCRIPTION:
As the disk has a site tag name set, 'vxdg adddisk' command invoked 
in 'vxreattach' command needs the option '-f' to add the disk back to the 
disk group.

RESOLUTION:
Add the option '-f' to 'vxdg adddisk' command when it is invoked 
in 'vxreattach' command.

* 2626900 (Tracking ID: 2608849)

SYMPTOM:
1.Under a heavy I/O load on logclient node, write I/Os on VVR Primary logowner
takes a very long time to complete.

2. I/Os on "master" and "slave" nodes hang when "master" role is switched
multiple times using "vxclustadm setmaster" command.

DESCRIPTION:
1.
VVR can not allow more than 2048 I/Os outstanding on the SRL volume. Any I/Os
beyond this threshold will be throttled. The throttled I/Os are restarted after
every SRL header flush operation. During restarting the throttled I/Os, I/Os
came from logclient are given higher priority causing logowner I/Os to starve.

2.
In CVM reconfiguration code path the RLINK ports are not cleanly deleted on old
log-owner. This causes the RLINks not to connect leading to both replication and
I/O hang.

RESOLUTION:
Algorithm which restarts the throttled I/Os is modified to give fair chance to
both local and remote I/Os to proceed.
Additionally, the code changes are made in CVM reconfiguration code path to
delete the RLINK ports cleanly before switching the master role.

* 2626911 (Tracking ID: 2605444)

SYMPTOM:
vxdmpadm disable/enable primary path (EFI labelled) in A/PF array results in 
all paths getting disabled

DESCRIPTION:
Enabling an EFI labeled primary path is disabling the secondary path. When the 
primary path is disabled, a failover occurs on to secondary path. The name of 
the secondary path under goes a change dropping the slice s2 from the name 
(cxtxdxs2 becomes cxtxdx). The change in the name is not updated in the device 
property list. This inability in updating the list causes disabling of the 
secondary path when the primary path is enabled.

RESOLUTION:
The code path which changes the name of the secondary path is rectified to 
update the property list.

* 2626920 (Tracking ID: 2061082)

SYMPTOM:
"vxddladm -c assign names" command does not work if dmp_native_support 
tunable is enabled.

DESCRIPTION:
If dmp_native_support tunable is set to "on" then VxVM does not allow change in
name of dmpnodes. This holds true even for device with native support not
enabled like VxVM labeled or Third Party Devices. So there is  no way for
selectively changing name of devices for which native support is not enabled.

RESOLUTION:
This enhancement is addressed by code change to selectively change name for
devices with native support not enabled.

* 2633041 (Tracking ID: 2509291)

SYMPTOM:
"vxconfigd" daemon hangs if disable/enable of host side fc switch ports is
exercised for several iterations and consequently, VxVM related commands 
don't return.

schedule
dmp_biowait
dmp_indirect_io
gendmpioctl
dmpioctl
dmp_ioctl
dmp_compat_ioctl
compat_blkdev_ioctl
compat_sys_ioctl
sysenter_do_call

DESCRIPTION:
When the fail-over thread corresponding to the lun is scheduled, it goes ahead
and frees memory allocated for the fail-over request and returns from Array
Policy Module fail-over function call. When the thread is scheduled again, it
still points to the same fail-over request that got freed above. When it tries
to get the next value, NULL value is returned. The fail-over thread waiting for
other luns never gets invoked and results in vxconfigd daemon hang

RESOLUTION:
Code changes have been made to the Array Policy Module to save the fail-over
request pointer after marking the request state field as fail-over has completed
successfully.

* 2636094 (Tracking ID: 2635476)

SYMPTOM:
DMP (Dynamic Multi Pathing) driver does not automatically enable the failed 
paths of Logical Units (LUNs) that are restored.

DESCRIPTION:
DMP's restore demon probes each failed path at a default interval of 5 minutes 
(tunable) to detect if that path can be enabled. As part of enabling the path, 
DMP issues an open() on the path's device number. Owing to a bug in the DMP
code, the open() was issued on a wrong device partition which resulted in
failure for every probe. Thus, the path remained in failed status at DMP layer
though it was enabled at the array side.

RESOLUTION:
Modified the DMP restore daemon code path to issue the open() on the appropriate
device partitions.

* 2643651 (Tracking ID: 2643634)

SYMPTOM:
If standard(non-clone) disks and cloned disks of the same disk group are seen in
a host, dg import will fail with the following error message when the
standard(non-clone) disks have no enabled configuration copy of the disk group.

# vxdg import <dgname>
VxVM vxdg ERROR V-5-1-10978 Disk group <dgname>: import failed:
Disk group has no valid configuration copies

DESCRIPTION:
When VxVM is importing such a mixed configuration of standard(non-clone) disks
and cloned disks, standard(non-clone) disks will be selected as the member of
the disk group in 5.0MP3RP5HF1 and 5.1SP1RP2. It will be done while
administrators are not aware of the fact that there is a mixed configuration and
the standard(non-clone) disks are to be selected for the import. It is hard to
figure out from the error message and need time to investigate what is the issue.

RESOLUTION:
Syslog message enhancements are made in the code that administrators can figure
out if such a mixed configuration is seen in a host and also which disks are
selected for the import.

* 2651421 (Tracking ID: 2649846)

SYMPTOM:
In a Sun Cluster with VxVM (Veritas Volume Manager) and EMC Power Path 
(Third Party Multi-Pathing Driver) environment, "cldg create -t vxvm ..." which
is a Sun Cluster command core dumps while creating a VxVM type disk group.

The Sun Cluster command is:
cldg create -t vxvm -n <hostname> -v <disgroup name>
And the error messages is:
umem allocator: redzone violation: write past end of buffer

DESCRIPTION:
"cldg create -t vxvm -v <disk group>..." the Sun Cluster command needs 
VxVM's assistance to get the sub-path of devices included in the disk group. 
However, VxVM doesn't allocate enough memory to hold the devices' name. Owing 
to 
this reason Sun Cluster reports the error messages.

RESOLUTION:
VxVM library has been modified to allocate adequate memory space to hold the 
device name.

* 2666175 (Tracking ID: 2666163)

SYMPTOM:
A small memory leak may be seen in vxconfigd, the VxVM configuration daemon when
Serial Split Brain(SSB) error is detected in the import process.

DESCRIPTION:
The leak may occur when Serial Split Brain(SSB) error is detected in the import
process. It is because when the SSB error is returning from a function, a
dynamically allocated memory area in the same function would not be freed. The
SSB detection is a VxVM feature where VxVM detects if the configuration copy in
the disk private region becomes stale unexpectedly. A typical use case of the
SSB error is that a disk group is imported to different systems at the same time
and configuration copy update in both systems results in an inconsistency in the
copies. VxVM cannot identify which configuration copy is most up-to-date in this
situation. As a result, VxVM may detect SSB error on the next import and show
the details through a CLI message.

RESOLUTION:
Code changes are made to avoid the memory leak and also a small message fix has
been done.

* 2676703 (Tracking ID: 2553729)

SYMPTOM:
The following is observed during 'Upgrade' of VxVM (Veritas Volume Manager):

i) 'clone_disk' flag is seen on non-clone disks in STATUS field when 'vxdisk -e 
list' is executed after uprade to 5.1SP1 from lower versions of VxVM.


Eg:

DEVICE       TYPE           DISK        GROUP        STATUS
emc0_0054    auto:cdsdisk   emc0_0054    50MP3dg     online clone_disk
emc0_0055    auto:cdsdisk   emc0_0055    50MP3dg     online clone_disk

ii) Disk groups (dg) whose versions are less than 140 do not get imported after 
upgrade to VxVM versions 5.0MP3RP5HF1 or 5.1SP1RP2.

Eg:

# vxdg -C import <dgname>
VxVM vxdg ERROR V-5-1-10978 Disk group <dgname>: import failed:
Disk group version doesn't support feature; see the vxdg upgrade command

DESCRIPTION:
While uprading VxVM

i) After upgrade to 5.1SP1 or higher versions:
If a dg which is created on lower versions is deported and imported back on 
5.1SP1 after the upgrade, then "clone_disk" flags gets set on non-cloned disks 
because of the design change in UDID (unique disk identifier) of the disks.

ii) After upgrade to 5.0MP3RP5HF1 or 5.1SP1RP2:
Import of dg with versions less than 140 fails.

RESOLUTION:
Code changes are made to ensure that:
i) clone_disk flag does not get set for non-clone disks after the upgrade.
ii) Disk groups with versions less than 140 get imported after the upgrade.

* 2695225 (Tracking ID: 2675538)

SYMPTOM:
Data corruption can be observed on a CDS (Cross-platform Data Sharing) disk, 
as part of LUN resize operations. The following pattern would be found in the 
data region of the disk.

<DISK-IDENTIFICATION> cyl <number-of-cylinders> alt 2 hd <number-of-tracks> sec 
<number-of-sectors-per-track>

DESCRIPTION:
The CDS disk maintains a SUN VTOC in the zeroth block and a backup label at the 
end of the disk. The VTOC maintains the disk geometry information like number of 
cylinders, tracks and sectors per track. The backup label is the duplicate of 
VTOC and the backup label location is determined from VTOC contents. As part of 
resize, VTOC is not updated to the new size, which results in the wrong 
calculation of the backup label location. If the wrongly calculated backup label 
location falls in the public data region rather than the end of the disk as 
designed, data corruption occurs.

RESOLUTION:
Update the VTOC contents appropriately for LUN resize operations to prevent the 
data corruption.

* 2695227 (Tracking ID: 2674465)

SYMPTOM:
Data corruption is observed when DMP node names are changed by following
commands for DMP devices that are controlled by a third party multi-pathing
driver (E.g. MPXIO and PowerPath )

# vxddladm [-c] assign names
# vxddladm assign names file=<path-name>
# vxddladm set namingscheme=<scheme-name>

DESCRIPTION:
The above said commands when executed would re-assign names to each devices.
Accordingly the in-core DMP database should be updated for each device to map
the new device name with appropriate device number. Due to a bug in the code,
the mapping of names with the device number wasn't done appropriately which
resulted in subsequent IOs going to a wrong device thus leading to data 
corruption.

RESOLUTION:
DMP routines responsible for mapping the names with right device number is
modified to fix this corruption problem.

* 2695228 (Tracking ID: 2688747)

SYMPTOM:
Under a heavy I/O load on logclient node, the writes on VVR Primary logowner
takes a very long time to complete. Writes appear to be hung.

DESCRIPTION:
VVR cannot allow more than specific number of I/Os (4096)outstanding on the SRL
volume. Any I/Os beyond this threshold will be throttled. The throttled I/Os are
restarted periodically. While restarting, I/Os belonging logclient get high
preference compared to logowner I/Os, which can eventually lead to starvation or
I/O hang situation on logowner.

RESOLUTION:
Changes are done in algorithm of I/O scheduling of restarted I/Os, it's made
sure that throttled local I/Os will get the chance to proceed under all conditions.

* 2701152 (Tracking ID: 2700486)

SYMPTOM:
If the VVR Primary and Secondary nodes have the same host-name, and there is a
loss of heartbeats between them, vradmind daemon can core-dump if an active
stats session already exists on the Primary node.

Following stack-trace is observed:

pthread_kill()
_p_raise() 
raise.raise()
abort() 
__assert_c99
StatsSession::sessionInitReq()
StatsSession::processOpReq()
StatsSession::processOpMsgs()
RDS::processStatsOpMsg()
DBMgr::processStatsOpMsg()
process_message()
main()

DESCRIPTION:
On loss of heartbeats between the Primary and Secondary nodes, and a subsequent
reconnect, RVG information is sent to the Primary by Secondary node. In this 
case, if a Stats session already exists on the Primary, a STATS_SESSION_INIT 
request is sent back to the Secondary. However, the code was using "hostname" 
(as returned by `uname -a`) to identify the secondary node. Since both the 
nodes had the same hostname, the resulting STATS_SESSION_INIT request was 
received at the Primary itself, causing vradmind to core dump.

RESOLUTION:
Code was modified to use 'virtual host-name' information contained in the 
RLinks, rather than hostname(1m), to identify the secondary node. In a scenario 
where both Primary and Secondary have the same host-name, virtual host-names 
are used to configure VVR.

* 2702110 (Tracking ID: 2700792)

SYMPTOM:
vxconfigd, the VxVM volume configuration daemon may dump a core with the
following stack during the Cluster Volume Manager(CVM) startup with "hares
-online cvm_clus -sys [node]".

  dg_import_finish()
  dg_auto_import_all()
  master_init()
  role_assume()
  vold_set_new_role()
  kernel_get_cvminfo()
  cluster_check()
  vold_check_signal()
  request_loop()
  main()

DESCRIPTION:
During CVM startup, vxconfigd accesses the disk group record's pointer of a
pending record while the transaction on the disk group is in progress. At times,
vxconfigd incorrectly accesses the stale pointer while processing the current
transaction, thus resulting in a core dump.

RESOLUTION:
Code changes are made to access the appropriate pointer of the disk group record
which is active in the current transaction. Also, the disk group record is
appropriately initialized to NULL value.

* 2703370 (Tracking ID: 2700086)

SYMPTOM:
In the presence of "Not-Ready" EMC devices on the system, multiple dmp (path
disabled/enabled) events messages are seen in the syslog

DESCRIPTION:
The issue is that vxconfigd enables the BCV devices which are in Not-Ready state
for IO as the SCSI inquiry succeeds, but soon finds that they cannot be used for
I/O and disables those paths. This activity takes place whenever "vxdctl enable"
or "vxdisk scandisks" command is executed.

RESOLUTION:
Avoid changing the state of the BCV device which is in "Not-Ready" to prevent IO
and dmp event messages.

* 2703373 (Tracking ID: 2698860)

SYMPTOM:
Mirroring a large size VxVM volume comprising of THIN luns underneath
and with VxFS filesystem atop mounted fails with the following error:

Command error
# vxassist -b -g $disk_group_name mirror $volume_name
VxVM vxplex ERROR V-5-1-14671 Volume <volume_name> is configured on THIN luns
and not mounted. Use 'force' option, to bypass smartmove. To take advantage of
smartmove for supporting thin luns, retry this operation after mounting the
volume.
VxVM vxplex ERROR V-5-1-407 Attempting to cleanup after failure ...

Truss output error:
statvfs("<mount_point>", 0xFFBFEB54)              Err#79 EOVERFLOW

DESCRIPTION:
The statvfs system call is invoked internally during mirroring
operation to retrieve statistics information of VxFS file system hosted
on the volume. However, since the statvfs system call only
supports maximum 4294967295 (4GB-1) blocks, so if the total filesystem
blocks are greater than that, EOVERFLOW error occurs. This also results
in vxplex terminating with the errors.

RESOLUTION:
Use the 64 bits version of statvfs i.e., statvfs64 system call to resolve
the EOVERFLOW and vxplex errors.

* 2706036 (Tracking ID: 2617336)

SYMPTOM:
System panics when a root disk with a swap partition is encapsulated on a
Solaris 10 system with kernel patch 147440-04 installed.

DESCRIPTION:
Systems upgraded to Solaris 10 kernel patch 147440-04 and have swap device
encapsulated will recursively panic due to a NULL pointer passed to vxioioctl
from a new kernel routine 'swapify()'

RESOLUTION:
The vxio driver will not access the disk IOCTL return value pointer when it is
set to NULL.

* 2711758 (Tracking ID: 2710579)

SYMPTOM:
Data corruption can be observed on a CDS (Cross-platform Data Sharing) disk, 
as part of operations like LUN resize, Disk FLUSH, Disk ONLINE etc. The
following pattern would be found in the 
data region of the disk.

<DISK-IDENTIFICATION> cyl <number-of-cylinders> alt 2 hd <number-of-tracks> sec 
<number-of-sectors-per-track>

DESCRIPTION:
The CDS disk maintains a SUN VTOC in the zeroth block and a backup label at the 
end of the disk. The VTOC maintains the disk geometry information like number of 
cylinders, tracks and sectors per track. The backup label is the duplicate of 
VTOC and the backup label location is determined from VTOC contents. If the
content of SUN VTOC located in the zeroth sector are incorrect, this may result
in the wrong 
calculation of the backup label location. If the wrongly calculated backup label 
location falls in the public data region rather than the end of the disk as 
designed, data corruption occurs.

RESOLUTION:
Suppressed writing the backup label to prevent the data corruption.

* 2713862 (Tracking ID: 2390998)

SYMPTOM:
When running'vxdctl'or'vxdisk scandisks'command after the process of migrating 
SAN ports, system panicked, following is the stack trace:
.disable_lock()
dmp_close_path()
dmp_do_cleanup()
dmp_decipher_instructions()
dmp_process_instruction_buffer()
dmp_reconfigure_db()
gendmpioctl()
vxdmpioctl()

DESCRIPTION:
SAN ports migration ends up with two path nodes for the same device number, one 
node marked as NODE_DEVT_USED which means the same device number has been 
reused by another node. When open the dmp device, the actual open count on the 
new node (not marked with NODE_DEVT_USED) is modified. If the caller is 
referencing the old node (marked with NODE_DEVT_USED), it will then modify the 
layered open count on the old node. This results in the inconsistent open 
reference counts of the node and cause panic while checking open counts in 
close dmp device.

RESOLUTION:
The code change has been done to make the modification of actual open count and 
layered open count on the same node while performing dmp device open/close.

* 2741105 (Tracking ID: 2722850)

SYMPTOM:
Disabling/enabling controllers while I/O is in progress results in dmp (Dynamic
Multi-Pathing) thread hang with following stack:

dmp_handle_delay_open
gen_dmpnode_update_cur_pri
dmp_start_failover
gen_update_cur_pri
dmp_update_cur_pri
dmp_process_curpri
dmp_daemons_loop

DESCRIPTION:
DMP takes an exclusive lock to quiesce a node to be failed over, and releases
the lock to do update operations. These update operations presume that the node
will be in quiesced status. A small timing window exists between lock release
and update operations, wherein other threads can break-in into this window and
unquiesce the node, which will lead to the hang while performing update operations.

RESOLUTION:
Corrected the quiesce counter of a node to avoid other threads unquiesce it when
a thread is performing update operations.

* 2744219 (Tracking ID: 2729501)

SYMPTOM:
In Dynamic Multi pathing environment, excluding a path also excludes other set 
of paths with matching substrings.

DESCRIPTION:
excluding a path using vxdmpadm exclude vxvm path=<> is excluding all the paths 
with matching substring. This is due to strncmp() used for comparison.
Also the size of h/w path defined in the structure is more than what is actually 
fetched.

RESOLUTION:
Correct the size of h/w path in the structure and use strcmp for comparison 
inplace of strncmp()

* 2750453 (Tracking ID: 2439481)

SYMPTOM:
After doing live upgrade on an encapsulated disk with mirror, mirror disk entry 
does not get removed from the rootdg.

DESCRIPTION:
When the alternate disk(-d option) is specifed as c#t#d# to the vxlustart
script, the mirrored disk entries are not removed from the rootdg. 

The vxlustart script does not handle the case where the alternate disk is 
specified as c#t#d# format while the DA/DM names of the disks do not resemble 
c#t#d# format.

RESOLUTION:
Changes are done to enable specifying alt_disk in c#t#d# format with vxlustart 
when DA/DM name do not resemble c#t#d# format.

* 2750454 (Tracking ID: 2423701)

SYMPTOM:
Upgrade of VxVM caused change in permissions of /etc/vx/vxesd during live
upgrade from drwx------  to d---r-x---.

DESCRIPTION:
'/etc/vx/vxesd' directory gets shipped in VxVM with "drwx------" permissions.
However, while starting the vxesd daemon, if this directory is not present, it
gets created with "d---r-x---".

RESOLUTION:
Changes are made so that while starting vxesd daemon '/etc/vx/vxesd' gets
created with 'drwx------' permissions.

* 2750458 (Tracking ID: 2370250)

SYMPTOM:
When vxlufinish script runs "fuser -k" on the list of filesystem obtained using
"lufslist" command, it fails with the following error:

$ ./vxlufinish -V -u 5.10

   VERITAS Volume Manager is finishing Live-Upgrade of OS release 5.10

/dev/dsk/c1t0d0s0:
/dev/vx/dsk/datadg/datavol:    29079o
/dev/vx/dsk/ocrvotedg/ocrvotevol:    26857o   24239o
# luumount -n dest.9041
# luactivate dest.9041
# lumount -n dest.9041 /altroot.5.10

   vxlufinish check is successful. Still you can expect
   errors in encapsulation or in luactivate because of incorrect
   installation. Now try running it with no -V option

$ Write failed: Broken pipe

DESCRIPTION:
The filesystems which are excluded during "lucreate" get mounted as loopback
file system (lofs) under Alternate Boot Environment (ABE). These lofs
filesystems basically point to actual special device files under Primary Boot
Environment (PBE). 
The subroutine "unmount_all"  runs "fuser -k" on lofs mounts. Hence the issue.

RESOLUTION:
The solution involves following steps-
1) generate a list(L1) of mount points from "/etc/mnttab" of PBE.
2) generate a list(L2) of lofs mount points using "/etc/mnttab" of ABE. 
3) do not run "fuser -k" on lofs mounts in L2.

* 2750462 (Tracking ID: 2553942)

SYMPTOM:
"vxlustart -k" fails for the option of auto registration.

DESCRIPTION:
The .volume.inf file for sol10u10Build17 consists the string
VI"SOL_10_811_SPARC" whereas for other updates of Solaris 10 the string is
VI"SOL_10_910_SPARC" . The subroutine "chech_auto_registration"  parses year and
month using this string.  On the basis of year and month "auto_reg_required"
gets set.

RESOLUTION:
Changed the logic so that "auto_reg_required" gets set correctly.

* 2752178 (Tracking ID: 2741240)

SYMPTOM:
In a VxVM environment, "vxdg join" when executed during heavy IO load fails 
with 
the below message.

VxVM vxdg ERROR V-5-1-4597 vxdg join [source_dg] [target_dg] failed
join failed : Commit aborted, restart transaction
join failed : Commit aborted, restart transaction

Half of the disks that were part of source_dg will become part of target_dg 
whereas other half will have no DG details.

DESCRIPTION:
In a vxdg join transaction, VxVM has implemented it as a two phase transaction. 
If the transaction fails after the first phase and during the second phase, 
half 
of the disks belonging to source_dg will become part of target_dg and the other 
half of the disks will be in a complex irrecoverable state. Also, in heavy IO 
situation, any retry limit (i.e.) a limit to retry transactions can be easily 
exceeded.

RESOLUTION:
"vxdg join" is now designed as a one phase atomic transaction and the retry 
limit is eliminated.

* 2774907 (Tracking ID: 2771452)

SYMPTOM:
In lossy and high latency network, I/O gets hung on VVR primary. Just before the
I/O hang, Rlink frequently connects and disconnects.

DESCRIPTION:
In lossy and high latency network, because of heartbeat time outs, RLINK gets
disconnected. As a part of Rlink disconnect, the communication port is deleted.
During this process, the RVG is serialized and the I/Os are kept in a special
queue - rv_restartq. The I/Os in rv_restartq are supposed to be restarted once the
port deletion is successful.
The port deletion involves termination of all the communication server processes.
Because of a bug in the port deletion logic, the global variable which keeps track
of number of communication server processes got decremented twice.
This caused port deletion process to be hung leading to I/Os in rv_restartq never
being restarted.

RESOLUTION:
In port deletion logic, it's made sure that the global variable which keeps track
of number of communication server processes will get decremented correctly.

Patch ID: 142630-14

* 2583307 (Tracking ID: 2185069)

SYMPTOM:
In a CVR setup, while the application IOs are going on all nodes of
primary, bringing down a slave node results in panic on master node with following
stack trace:

 #0 [ffff8800282a3680] machine_kexec at ffffffff8103695b
 #1 [ffff8800282a36e0] crash_kexec at ffffffff810b8f08
 #2 [ffff8800282a37b0] oops_end at ffffffff814cbbd0
 #3 [ffff8800282a37e0] no_context at ffffffff8104651b
 #4 [ffff8800282a3830] __bad_area_nosemaphore at ffffffff810467a5
 #5 [ffff8800282a3880] bad_area_nosemaphore at ffffffff81046873
 #6 [ffff8800282a3890] do_page_fault at ffffffff814cd658
 #7 [ffff8800282a38e0] page_fault at ffffffff814caf45
    [exception RIP: vol_rv_async_childdone+876]
    RIP: ffffffffa080b7ac  RSP: ffff8800282a3990  RFLAGS: 00010006
    RAX: ffff8801ee8a5200  RBX: ffff8801f6e17200  RCX: ffff8802324290c0
    RDX: ffff8801f7c8fac8  RSI: 0000000000000009  RDI: ffff8801f7c8fac8
    RBP: ffff8800282a3a00   R8: ffff8801f38d8000   R9: 0000000000000001
    R10: 000000000000003f  R11: 000000000000000c  R12: ffff8801f2580000
    R13: ffff88021bdfa7c0  R14: ffff8801f7c8fa00  R15: ffff8801ed46a200
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #8 [ffff8800282a3a08] volsiodone at ffffffffa0672c3e
 #9 [ffff8800282a3a88] vol_subdisksio_done at ffffffffa06764a7
#10 [ffff8800282a3ac8] volkcontext_process at ffffffffa0642a59
#11 [ffff8800282a3b18] voldiskiodone at ffffffffa062f1c1
#12 [ffff8800282a3bc8] voldiskiodone_intr at ffffffffa062f3a2
#13 [ffff8800282a3bf8] voldmp_iodone at ffffffffa05f7806
#14 [ffff8800282a3c08] bio_endio at ffffffff811a0d3d
#15 [ffff8800282a3c18] gendmpiodone at ffffffffa059594a
#16 [ffff8800282a3c68] dmpiodone at ffffffffa0596cf2
#17 [ffff8800282a3cb8] bio_endio at ffffffff811a0d3d
#18 [ffff8800282a3cc8] req_bio_endio at ffffffff8123f7fb
#19 [ffff8800282a3cf8] blk_update_request at ffffffff8124083f
#20 [ffff8800282a3d58] blk_update_bidi_request at ffffffff81240ba7
#21 [ffff8800282a3d88] blk_end_bidi_request at ffffffff81241c7f
#22 [ffff8800282a3db8] blk_end_request at ffffffff81241d20
#23 [ffff8800282a3dc8] scsi_io_completion at ffffffff8134a42f
#24 [ffff8800282a3e48] scsi_finish_command at ffffffff81341812
#25 [ffff8800282a3e88] scsi_softirq_done at ffffffff8134aa3d
#26 [ffff8800282a3eb8] blk_done_softirq at ffffffff81247275
#27 [ffff8800282a3ee8] __do_softirq at ffffffff81073bd7
#28 [ffff8800282a3f58] call_softirq at ffffffff810142cc
#29 [ffff8800282a3f70] do_softirq at ffffffff81015f35
#30 [ffff8800282a3f90] irq_exit at ffffffff810739d5
#31 [ffff8800282a3fa0] smp_call_function_single_interrupt at ffffffff8102eab5
#32 [ffff8800282a3fb0] call_function_single_interrupt at ffffffff81013e33
--- <IRQ stack> ---
#33 [ffff8801f3ca9af8] call_function_single_interrupt at ffffffff81013e33
    [exception RIP: page_waitqueue+125]
    RIP: ffffffff8110b16d  RSP: ffff8801f3ca9ba8  RFLAGS: 00000213
    RAX: 0000000000000b9d  RBX: ffff8801f3ca9ba8  RCX: 0000000000000034
    RDX: ffff880000027d80  RSI: 0000000000000000  RDI: 00000000000003df
    RBP: ffffffff81013e2e   R8: ea00000000000000   R9: 5000000000000000
    R10: 0000000000000000  R11: ffff8801ecd0f268  R12: ffffea0006c13d40
    R13: 0000000000001000  R14: ffffffff8119d881  R15: ffff8801f3ca9b18
    ORIG_RAX: ffffffffffffff04  CS: 0010  SS: 0018
#34 [ffff8801f3ca9bb0] unlock_page at ffffffff8110c16a
#35 [ffff8801f3ca9bd0] blkdev_write_end at ffffffff811a3cd0
#36 [ffff8801f3ca9c00] generic_file_buffered_write at ffffffff8110c944
#37 [ffff8801f3ca9cd0] __generic_file_aio_write at ffffffff8110e230
#38 [ffff8801f3ca9d90] blkdev_aio_write at ffffffff811a339c
#39 [ffff8801f3ca9dc0] do_sync_write at ffffffff8116c51a
#40 [ffff8801f3ca9ef0] vfs_write at ffffffff8116c818
#41 [ffff8801f3ca9f30] sys_write at ffffffff8116d251
#42 [ffff8801f3ca9f80] sysenter_dispatch at ffffffff8104ca7f

DESCRIPTION:
The reason for panic is that an internal data structure access is not
properly serialized resulting in corruption of that data structure.

RESOLUTION:
Resolution is to properly serialize access to the internal data
structure so that its contents are not corrupted under any scenario,

Patch ID: 142630-13

* 2440015 (Tracking ID: 2428170)

SYMPTOM:
I/O hangs when reading or writing to a volume after a total storage 
failure in CVM environments with Active-Passive arrays.

DESCRIPTION:
In the event of a storage failure, in active-passive environments, 
the CVM-DMP fail over protocol is initiated. This protocol is responsible for 
coordinating the fail-over of primary paths to secondary paths on all nodes in 
the 
cluster.
In the event of a total storage failure, where both the primary paths and 
secondary paths fail, in some situations the protocol fails to cleanup some 
internal structures, leaving the devices quiesced.

RESOLUTION:
After a total storage failure all devices should be un-quiesced, 
allowing the I/Os to fail. The CVM-DMP protocol has been changed to cleanup 
devices, even if all paths to a device have been removed.

* 2477272 (Tracking ID: 2169726)

SYMPTOM:
If a combination of cloned and non-cloned disks for a diskgroup is available at 
the time of import, then the diskgroup imported through vxdg import operation 
contains both cloned and non-cloned disks.

DESCRIPTION:
For a particular diskgroup, if some of the disks are not available during the 
diskgroup import operation and the corresponding cloned disks are present, then 
the diskgroup imported through vxdg import operation contains combination of 
cloned and non-cloned disks.
Example - 
Diskgroup named dg1 with the disks disk1 and disk2 exists on some machine. 
Clones of disks named disk1_clone disk2_clone are also available. If disk2 goes 
offline and the import for dg1 is performed, then the resulting diskgroup will 
contain disks disk1 and disk2_clone.

RESOLUTION:
The diskgroup import operation will consider cloned disks only if no non-cloned 
disk is available.

* 2497637 (Tracking ID: 2489350)

SYMPTOM:
In a Storage Foundation environment running Symantec Oracle Disk Manager (ODM),
Veritas File System (VxFS), Cluster volume Manager (CVM) and Veritas Volume
Replicator (VVR), kernel memory is leaked under certain conditions.

DESCRIPTION:
In CVR (CVM + VVR), under certain conditions (for example when I/O throttling
gets enabled or kernel messaging subsystem is overloaded), the I/O resources
allocated before are freed and the I/Os are being restarted afresh. While
freeing the I/O resources, VVR primary node doesn't free the kernel memory
allocated for FS-VM private information data structure and causing the kernel
memory leak of 32 bytes for each restarted I/O.

RESOLUTION:
Code changes are made in VVR to free the kernel memory allocated for FS-VM
private information data structure before the I/O is restarted afresh.

* 2497796 (Tracking ID: 2235382)

SYMPTOM:
IOs can hang in DMP driver when IOs are in progress while carrying out path
failover.

DESCRIPTION:
While restoring any failed path to a non-A/A LUN, DMP driver is checking that
whether any pending IOs are there on the same dmpnode. If any are present then DMP
is marking the corresponding LUN with special flag so that path failover/failback
can be triggered by the pending IOs. There is a window here and by chance if all
the pending IOs return before marking the dmpnode, then any future IOs on the
dmpnode get stuck in wait queues.

RESOLUTION:
Make sure that whenever the LUN is having pending IOs then only to set the flag on
it so that failover can be triggered by pending IOs.

* 2507120 (Tracking ID: 2438426)

SYMPTOM:
The following messages are displayed after vxconfigd is started.

pp_claim_device: Could not get device number for /dev/rdsk/emcpower0 
pp_claim_device: Could not get device number for /dev/rdsk/emcpower1

DESCRIPTION:
Device Discovery Layer(DDL) has incorrectly marked a path under dmp device with 
EFI flag even though there is no corresponding Extensible Firmware Interface 
(EFI) device in /dev/[r]dsk/. As a result, Array Support Library (ASL) issues a 
stat command on non-existent EFI device and displays the above messages.

RESOLUTION:
Avoided marking EFI flag on Dynamic MultiPathing (DMP) paths which correspond to 
non-efi devices.

* 2507124 (Tracking ID: 2484334)

SYMPTOM:
The system panic occurs with the following stack while collecting the DMP 
stats.

dmp_stats_is_matching_group+0x314()
dmp_group_stats+0x3cc()
dmp_get_stats+0x194()
gendmpioctl()
dmpioctl+0x20()

DESCRIPTION:
Whenever new devices are added to the system, the stats table is adjusted to
accomodate the new devices in the DMP. There exists a race between the stats
collection thread and the thread which adjusts the stats table to accomodate
the new devices. The race can result the stats collection thread to access the
memory beyond the known size of the table causing the system panic.

RESOLUTION:
The stats collection code in the DMP is rectified to restrict the access to the 
known size of the stats table.

* 2508294 (Tracking ID: 2419486)

SYMPTOM:
Data corruption is observed with single path when naming scheme is changed 
from enclodure based (EBN) to OS Native (OSN).

DESCRIPTION:
The Data corruption can occur in the following configuration, 
when the naming scheme is changed while applications are on-line.

1. The DMP device is configured with single path or the devices are controlled
   by Third party Multipathing Driver (Ex: MPXIO, MPIO etc.,)

2. The DMP device naming scheme is EBN (enclosure based naming) and 
persistence=yes

3. The naming scheme is changed to OSN using the following command
   # vxddladm set namingscheme=osn


There is possibility of change in name of the VxVM device (DA record) while
the naming scheme is changing. As a result of this the device attribute list 
is updated with new DMP device names. Due to a bug in the code which updates 
the attribute list, the VxVM device records are mapped to wrong DMP devices.

Example:

Following are the device names with EBN naming scheme.

MAS-usp0_0   auto:cdsdisk    hitachi_usp0_0  prod_SC32    online
MAS-usp0_1   auto:cdsdisk    hitachi_usp0_4  prod_SC32    online
MAS-usp0_2   auto:cdsdisk    hitachi_usp0_5  prod_SC32    online
MAS-usp0_3   auto:cdsdisk    hitachi_usp0_6  prod_SC32    online
MAS-usp0_4   auto:cdsdisk    hitachi_usp0_7  prod_SC32    online
MAS-usp0_5   auto:none       -            -            online invalid
MAS-usp0_6   auto:cdsdisk    hitachi_usp0_1  prod_SC32    online
MAS-usp0_7   auto:cdsdisk    hitachi_usp0_2  prod_SC32    online
MAS-usp0_8   auto:cdsdisk    hitachi_usp0_3  prod_SC32    online
MAS-usp0_9   auto:none       -            -            online invalid
disk_0       auto:cdsdisk    -            -            online
disk_1       auto:none       -            -            online invalid

bash-3.00# vxddladm set namingscheme=osn

The follwoing is after executing the above command.
The MAS-usp0_9 is changed as MAS-usp0_6 and the following devices
are changed accordingly.

bash-3.00# vxdisk list
DEVICE       TYPE            DISK         GROUP        STATUS
MAS-usp0_0   auto:cdsdisk    hitachi_usp0_0  prod_SC32    online
MAS-usp0_1   auto:cdsdisk    hitachi_usp0_4  prod_SC32    online
MAS-usp0_2   auto:cdsdisk    hitachi_usp0_5  prod_SC32    online
MAS-usp0_3   auto:cdsdisk    hitachi_usp0_6  prod_SC32    online
MAS-usp0_4   auto:cdsdisk    hitachi_usp0_7  prod_SC32    online
MAS-usp0_5   auto:none       -            -            online invalid
MAS-usp0_6   auto:none       -            -            online invalid
MAS-usp0_7   auto:cdsdisk    hitachi_usp0_1  prod_SC32    online
MAS-usp0_8   auto:cdsdisk    hitachi_usp0_2  prod_SC32    online
MAS-usp0_9   auto:cdsdisk    hitachi_usp0_3  prod_SC32    online
c4t20000014C3D27C09d0s2 auto:none       -            -            online invalid
c4t20000014C3D26475d0s2 auto:cdsdisk    -            -            online

RESOLUTION:
Code changes are made to update device attribute list correctly even if name of
the VxVM device is changed while the naming scheme is changing.

* 2508418 (Tracking ID: 2390431)

SYMPTOM:
In a Disaster Recovery environment, when DCM (Data Change Map) is active and 
during SRL(Storage Replicator Log)/DCM flush, the system panics due to missing
parent on one of the DCM in an RVG (Replicated Volume Group).

DESCRIPTION:
The DCM flush happens during every log update and its frequency depends on the 
IO load. If the I/O load is high, the DCM flush happens very often and if there 
are more volumes in the RVG, the frequency is very high. Every DCM flush 
triggers the DCM flush on all the volumes in the RVG. If there are 50 volumes, 
in an RVG, then each DCM flush creates 50 children and is controlled by one 
parent SIO. Once all the 50 children are done, then the parent SIO releases 
itself for the next flush. Once the DCM flush of each child completes, it 
detaches itself from the parent by setting the parent field to NULL. It so 
happens that, if the 49th child is done and before it is detaching it from the 
parent, the 50th child completes and releases the parent_SIO for the next DCM 
flush. Before the 49th child detaches, the new DCM flush is started on the same 
50th child. After the next flush is started, the 49th child of the previous 
flush detaches itself from the parent and since it is a static SIO, it 
indirectly resets the new flush parent field. Also, the lock is not obtained 
before modifing the sio state field in a few scenarios.

RESOLUTION:
Before reducing the children count, detach the parent first. This will make 
sure the new flush will not race with the previous flush. Protect the field 
with the required lock in all the scenarios.

* 2511928 (Tracking ID: 2420386)

SYMPTOM:
Corrupted data is seen near the end of a sub-disk, on thin-reclaimable 
disks with either CDS EFI or sliced disk formats.

DESCRIPTION:
In environments with thin-reclaim disks running with either CDS-EFI 
disks or sliced disks, misaligned reclaims can be initiated. In some situations, 
when reclaiming a sub-disk, the reclaim does not take into account the correct 
public region start offset, which in rare instances can potentially result in 
reclaiming data before the sub-disk which is being reclaimed.

RESOLUTION:
The public offset is taken into account when initiating all reclaim
operations.

* 2515137 (Tracking ID: 2513101)

SYMPTOM:
When VxVM is upgraded from 4.1MP4RP2 to 5.1SP1RP1, the data on CDS disk gets
corrupted.

DESCRIPTION:
When CDS disks are initialized with VxVM version 4.1MP4RP2, the no of cylinders
are calculated based on the disk raw geometry. If the calculated no. of
cylinders exceed Solaris VTOC limit (65535), because of unsigned integer
overflow, truncated value of no of cylinders gets written in CDS label.
    After the VxVM is upgraded to 5.1SP1RP1, CDS label gets wrongly written in
the public region leading to the data corruption.

RESOLUTION:
The code changes are made  to suitably adjust the no. of tracks & heads so that
the calculated no. of cylinders be within Solaris VTOC limit.

* 2525333 (Tracking ID: 2148851)

SYMPTOM:
"vxdisk resize" operation fails on a disk with VxVM cdsdisk/simple/sliced layout
on Solaris/Linux platform with the following message:

      VxVM vxdisk ERROR V-5-1-8643 Device emc_clariion0_30: resize failed: New
      geometry makes partition unaligned

DESCRIPTION:
The new cylinder size selected during "vxdisk resize" operation is unaligned with
the partitions that existed prior to the "vxdisk resize" operation.

RESOLUTION:
The algorithm to select the new geometry has been redesigned such that the new
cylinder size is always aligned with the existing as well as new partitions.

* 2531983 (Tracking ID: 2483053)

SYMPTOM:
VVR Primary system consumes very high kernel heap memory and appear to 
be hung.

DESCRIPTION:
There is a race between REGION LOCK deletion thread which runs as 
part of SLAVE leave reconfiguration and the thread which process the DATA_DONE 
message coming from log client to logowner. Because of this race, the flags 
which stores the status information about the I/Os was not correctly updated. 
This used to cause a lot of SIOs being stuck in a queue consuming a large kernel 
heap.

RESOLUTION:
The code changes are made to take the proper locks while updating 
the SIOs' fields.

* 2531987 (Tracking ID: 2510523)

SYMPTOM:
In CVM-VVR configuration, I/Os on "master" and "slave" nodes hang when "master"
role is switched to the other node using "vxclustadm setmaster" command.

DESCRIPTION:
Under heavy I/O load, the I/Os are sometimes throttled in VVR, if number of
outstanding I/Os on SRL reaches a certain limit (2048 I/Os).
When "master" role is switched to the other node by using "vxclustadm setmaster"
command, the throttled I/Os on original master are never restarted. This causes
the I/O hang.

RESOLUTION:
Code changes are made in VVR to make sure the throttled I/Os are restarted
before "master" switching is started.

* 2531993 (Tracking ID: 2524936)

SYMPTOM:
Disk group is disabled after rescanning disks with "vxdctl enable"
command with the console output below,


 <timestamp> pp_claim_device:         0 
 <timestamp> Could not get metanode from ODM database  
 <timestamp> pp_claim_device:         0 
 <timestamp> Could not get metanode from ODM database  

The error messages below are also seen in vxconfigd debug log output,
              
<timestamp>  VxVM vxconfigd ERROR V-5-1-12223 Error in claiming /dev/<disk>: The 
process file table is full. 
<timestamp>  VxVM vxconfigd ERROR V-5-1-12223 Error in claiming /dev/<disk>: The 
process file table is full. 
...
<timestamp> VxVM vxconfigd ERROR V-5-1-12223 Error in claiming /dev/<disk>: The 
process file table is full.

AIX-

DESCRIPTION:
When the total physical memory in AIX machine is greater than or equal to
40GB & multiple of 40GB (like 80GB, 120GB), a limitation/bug in setulimit
function causes an overflowed value set as the new limit/size of the data area,
which results in memory allocation failures in vxconfigd. Creation of the shared
memory segment also fails during this course. Error handling of this case is 
missing in vxconfigd code, hence resulting in error in claiming disks and 
offlining configuration copies which in-turn results in disabling disk group.

AIX-

RESOLUTION:
Code changes are made to handle the failure case on shared memory segment
creation.

* 2552402 (Tracking ID: 2432006)

SYMPTOM:
System intermittently hangs during boot if disk is encapsulated.
When this problem occurs, OS boot process stops after outputing this:
"VxVM sysboot INFO V-5-2-3409 starting in boot mode..."

DESCRIPTION:
The boot process hung due to a dead lock between two threads, one VxVM
transaction thread and another thread attempting a read on root volume 
issued by dhcpagent.  Read I/O is deferred till transaction is finished but
read count incremented earlier is not properly adjusted.

RESOLUTION:
Proper care is taken to decrement pending read count if read I/O is deferred.

* 2553391 (Tracking ID: 2536667)

SYMPTOM:
[04DAD004]voldiodone+000C78 (F10000041116FA08) 
[04D9AC88]volsp_iodone_common+000208 (F10000041116FA08, 
0000000000000000, 
  0000000000000000) 
[04B7A194]volsp_iodone+00001C (F10000041116FA08) 
[000F3FDC]internal_iodone_offl+0000B0 (??, ??) 
[000F3F04]iodone_offl+000068 () 
[000F20CC]i_softmod+0001F0 () 
[0017C570].finish_interrupt+000024 ()

DESCRIPTION:
Panic happened due to accessing a stale DG pointer as DG got deleted before the 
I/O returned. It may happen on cluster configuration where commands generating 
private region i/os and "vxdg deport/delete" commands are executing 
simultaneously on two nodes of the cluster.

RESOLUTION:
Code changes are made to drain private region I/Os before deleting the DG.

* 2562911 (Tracking ID: 2375011)

SYMPTOM:
User is not able to change the "dmp_native_support" tunable to "on" or "off"
in the presence of the root ZFS pool.

SOL_

DESCRIPTION:
DMP does not allow the dmp_native_support tunable to be changed if any of the
ZFS pools is in use. Therefore in the presence of root ZFS pool, DMP reports the
following error when the user tried to change the "dmp_native_support" tunable
to "on" or "off"

# vxdmpadm settune dmp_native_support=off
VxVM vxdmpadm ERROR V-5-1-15690 Operation failed for one or more zpools
VxVM vxdmpadm ERROR V-5-1-15686 The following zpool(s) could not be migrated as
they are in use -
     rpool

SOL_

RESOLUTION:
DMP code has been changed to skip the root ZFS pool in its internal checks for
active ZFS pools prior to changing the value of dmp_native_support tunable.

* 2563291 (Tracking ID: 2527289)

SYMPTOM:
In a Campus Cluster setup, storage fault may lead to DETACH of all the
configured site. This also results in IOfailure on all the nodes in the Campus
Cluster.

DESCRIPTION:
Site detaches are done on site consistent dgs when any volume in the dg looses
all the mirrors of a Site. During the processing of the DETACH of last mirror in
a site we identify that it is the last mirror and DETACH the site which in turn
detaches all the objects of that site.

In Campus Cluster setup we attach a dco volume for any data volume created on a
site-consistent dg. The general configuration is to have one DCO mirror on each
site. Loss of a single mirror of the dco volume on any node will result in the
detach of that site. 

In a 2 site configuration this particular scenario would result in both the dco
mirrors being lost simultaneously. While the site detach for the first mirror is
being processed we also signal for DETACH of the second mirror which ends up
DETACHING the second site too. 

This is not hit in other tests as we already have a check to make sure that we
do not DETACH the last mirror of a Volume. This check is being subverted in this
particular case due to the type of storage failure.

RESOLUTION:
Before triggering the site detach we need to have an explicit check to see if we
are trying to DETACH the last ACTIVE site.

* 2574840 (Tracking ID: 2344186)

SYMPTOM:
In a master-slave configuration with FMR3/DCO volumes, reboot of a cluster node 
fails to join back the cluster again with following error messages in the console

[..]
Jul XX 18:44:09 vienna vxvm:vxconfigd: [ID 702911 daemon.error] V-5-1-11092 
cleanup_client: (Volume recovery in progress) 230
Jul XX 18:44:09 vienna vxvm:vxconfigd: [ID 702911 daemon.error] V-5-1-11467 
kernel_fail_join() :                Reconfiguration interrupted: Reason is 
retry to add a node failed (13, 0)
[..]

DESCRIPTION:
VxVM volumes with FMR3/DCO have inbuilt DRL mechanism to track the disk block of 
in-flight IOs in order to recover the data much quicker in case of a node crash. 
Thus, a joining node awaits the variable, responsible for recovery, to get unset 
to join the cluster. However, due to a bug in FMR3/DCO code, this variable was set 
forever, thus leading to node join failure.

RESOLUTION:
Modified the FMR3/DCO code to appropriately set and unset this recovery variable.

Patch ID: 142630-12

* 2169348 (Tracking ID: 2094672)

SYMPTOM:
Master node hang with lot of I/O's and during node reconfig due to node leave.

DESCRIPTION:
The reconfig is stuck, because the I/O is not drained completely. The master 
node is responsible to handle the I/O for the both primary and slave. When the 
slave node is died, and the pending slave I/O on the master node is not cleaned 
up himself properly. This lead to some I/O's left in the queue un-deleted.

RESOLUTION:
clean up the I/O during the node failure and reconfig scenario.

* 2169372 (Tracking ID: 2108152)

SYMPTOM:
vxconfigd, the VxVM volume configuration daemon startup fails to get into
enabled mode and "vxdctl enable" command displays the error "VxVM vxdctl ERROR
V-5-1-1589 enable failed: Error in disk group configuration copies ".

DESCRIPTION:
vxconfigd issues input/output control system call (ioctl) to read the disk
capacity from disks. However, if it fails, the error number is not propagated
back to vxconfigd. The subsequent disk operations to these failed devices were
causing vxconfigd to get into disabled mode.

RESOLUTION:
The fix is made to propagate the actual "error number" returned by the ioctl
failure back to vxconfigd.

* 2198041 (Tracking ID: 2196918)

SYMPTOM:
When creating a space-opimized snapshot by specifying cache-object size
either in percentage terms of the volume size or an absolute size, the snapshot
creation can fail with an error similar to following:
"VxVM vxassist ERROR V-5-1-10127 creating volume snap-dvol2-CV01:
        Volume or log length violates disk group alignment"

DESCRIPTION:
VxVM expects all virtual storage objects to have size aligned to a
value which is set diskgroup-wide. One can get this value with:
# vxdg list testdg|grep alignment
alignment: 8192 (bytes)

When the cachesize is specified in percentage, the value might not align with dg
alignment. If not aligned, the creation of the cache-volume could fail with
specified error message

RESOLUTION:
After computing the cache-size from specified percentage value, it
is aligned up to the diskgroup alignment value before trying to create the
cache-volume.

* 2204146 (Tracking ID: 2200670)

SYMPTOM:
Some disks are left detached and not recovered by vxattachd.

DESCRIPTION:
If the shared disk group is not imported or node is not part of the 
cluster when storage connectivity to failed node is restored, the vxattachd 
daemon does not getting notified about storage connectivity restore and does not 
trigger a reattach. Even if the disk group is later imported or the node is 
joined to CVM cluster, the disks are not automatically reattached.

RESOLUTION:
i) Missing events for a deported diskgroup: The fix handles this by 
listening to the import event of the diksgroup and triggers the brute-force 
recovery for that specific diskgroup.
ii) parallel recover of volumes from same disk: vxrecover automatically 
serializes the recovery of objects that are from the same disk to avoid the back 
and forth head movements. Also provided an option in vxattchd and vxrecover to 
control the number of parallel recovery that can 
happen for objects from the same disk.

* 2205859 (Tracking ID: 2196480)

SYMPTOM:
Initialization of VxVM cdsdisk layout fails on a disk of size less than 1 TB.

DESCRIPTION:
The disk geometry is derived to fabricate the cdsdisk label during the 
initialization of VxVM cdsdisk layout on a disk of size less than 1 TB.  The 
disk geometry was violating one of the following requirements: (1) cylinder 
size is aligned with 8 KB. (2) Number of cylinders is less than 2^16 (3) The 
last sector in the device is not included in the last cylinder. (4) Number of 
heads is less than 2 ^16 (5) tracksize is less than 2^16

RESOLUTION:
The issue has been resolved by making sure that the disk geometry used in 
fabricating the cdsdisk label satisfies all the five requirements described 
above.

* 2211971 (Tracking ID: 2190020)

SYMPTOM:
On heavy I/O system load dmp_deamon requests 1 mega byte continuous memory 
paging which inturn slows down the system due to continuous page swapping.
 
LINUX-

DESCRIPTION:
dmp_deamon keeps calculating statistical information (every 1 second by 
default). When the I/O load is high the I/O statistics buffer allocation code 
path 
calculation dynamically allocates continuous ~1 mega byte per-cpu.
 
LINUX-

RESOLUTION:
To avoid repeated memory allocation/free calls in every DMP I/O stats daemon 
interval, a two buffer strategy was implemented for storing DMP stats records. 
Two buffers of same size will be allocated at the beginning, one of the buffer 
will be used for writing active records while the other will be read by IO stats 
daemon. The two buffers will be swapped every stats daemon interval.

* 2215263 (Tracking ID: 2215262)

SYMPTOM:
Netapp iSCSI LUN goes into error state while initializing via VEA GUI.

DESCRIPTION:
VEA(vmprovider) calls fstyp command to check the file system type configured on 
the device before doing the initialization. The fstyp sends some unsupported 
pass through ioctl to dmp device which makes APM specific function is called to 
check path state of the device. The path state checking function sends the SCSI 
inquiry command to get the device state, but the unexpected error returned from 
the inquiry because the memory allocated in path state checking function is not 
aligned, therefore the path gets into disabled state.

RESOLUTION:
Fix the memory allocation method in netapp APM path state checking function to 
make start address of the memory is aligned. In addition, error analyzing 
function in netapp APM use the same memory allocation method, we need fix it as 
well.

* 2215376 (Tracking ID: 2215256)

SYMPTOM:
Volume Manager is unable to recognize the devices connected through F5100 HBA

DESCRIPTION:
During device discovery volume manager does not scan the luns that are connected
through SAS HBA (F5100 is a new SAS HBA). So the commands like 'vxdisk list'
does not even show the luns that are connected through F5100 HBA

RESOLUTION:
Modified the device discovery code in volume manager to include the paths/luns
that are connected through SAS HBA.

* 2220064 (Tracking ID: 2228531)

SYMPTOM:
Vradmind hangs in vol_klog_lock() on VVR (Veritas Volume Replicator) Secondary 
site.
Stack trace might look like:

genunix:cv_wait+0x38()
vxio:vol_klog_lock+0x5c()
vxio:vol_mv_close+0xc0()
vxio:vol_close_object+0x30()
vxio:vol_object_ioctl+0x198()
vxio:voliod_ioctl()
vxio:volsioctl_real+0x2d4()
specfs:spec_ioctl()
genunix:fop_ioctl+0x20()
genunix:ioctl+0x184()
unix:syscall_trap32+0xcc()

DESCRIPTION:
In this scenario, a flag value should be set for vradmind to be signalled and 
woken up. As the flag value is not set here, it causes an enduring sleep. A race 
condition exists between setting and resetting of the flag values, resulting in 
the hang.

RESOLUTION:
Code changes are made to hold a lock to avoid the race condition between 
setting and resetting of the flag values.

* 2227945 (Tracking ID: 2226304)

SYMPTOM:
In Solaris 9 platform, newfs(1M)/mkfs_ufs(1M) cannot create ufs file system on 
>1 Tera byte(TB) VxVM volume and it displays the following error:

# newfs /dev/vx/rdsk/<diskgroup name>/<volume>
newfs: construct a new file system /dev/vx/rdsk/<diskgroup name>/<volume>: 
(y/n)? y
Can not determine partition size: Inappropriate ioctl for device

# prtvtoc /dev/vx/rdsk/<diskgroup name>/<volume>
prtvtoc: /dev/vx/rdsk/<diskgroup name>/<volume>: Unknown problem reading VTOC

DESCRIPTION:
newfs(1M)/mkfs_ufs(1M) invokes DKIOCGETEFI ioctl. During the enhancement of EFI 
support on Solaris 10 on 5.0MP3RP3 or later, DKIOCGETEFI ioctl functionality 
was not implemented on Solaris 9 because of the following limitations:

1.	EFI feature has not been introduced from Solaris 9 FCS and has been 
introduced from Solaris 9 U3(4/03) which includes 114127-03(libefi) and 114129-
02(libuuid and efi/uuid headers).

2.	During the enhancement of EFI support on Solaris 10, for solaris 9, 
DKIOCGVTOC ioctl was only supported on a volume <= 1TB since the VTOC 
specification was defined for only <= 1 TB LUN/volume. If the size of the 
volume is > 1 TB DKIOCGVTOC ioctl would return an inaccurate vtoc structure due 
to value overflow.

RESOLUTION:
The resolution is to enhance the VxVM code to handle DKIOCGETEFI ioctl 
correctly on VxVM volume on Solaris 9 platform. When newfs(1M)/mkfs_ufs(1M) 
invokes DKIOCGETEFI ioctl on a VxVM volume device, VxVM shall return the 
relevant EFI label information so that the UFS utilities can determine the 
volume size correctly.

* 2232052 (Tracking ID: 2230716)

SYMPTOM:
While trying to convert from SVM to VxVM, user runs doconvert. The conversion 
process does not throw any error however after rebooting the host conversion is 
not completed and no diskgroup is created.

DESCRIPTION:
After executing /opt/VRTSvxvm/vmcvt/bin/doconvert and rebooting the conversion
does not complete.  Found this is due to /etc/lvm/md.cf file is not
cleared of all meta devices and VxVM upon reboot tries to initialize the disk 
and 
create the diskgroup fails with error:

"VxVM vxdisk ERROR V-5-1-15395 Device disk_1 is already in use by SVM.
If you want to initialize this device for VxVM use, please clear SVM metadata 
by 
running 'metastat' and 
'metaclear' commands."

Above error is seen in the svc log file.

RESOLUTION:
We added a fix to clear the SVM metadevices using metaclear while we do the 
conversion process.

* 2232829 (Tracking ID: 2232789)

SYMPTOM:
With NetApp metro cluster disk arrays, takeover operations (toggling of LUN
ownership within NetApp filer) can lead to IO failures on VxVM volumes.

Example of an IO error message at VxVM
VxVM vxio V-5-0-2 Subdisk disk_36-03 block 24928: Uncorrectable write error

DESCRIPTION:
During the takeover operation, the array fails the PGR and IO SCSI commands on
secondary paths with the following transient error codes - 0x02/0x04/0x0a
(NOT READY/LOGICAL UNIT NOT ACCESSIBLE, ASYMMETRIC ACCESS STATE TRANSITION) or
0x02/0x04/0x01 (NOT READY/LOGICAL UNIT IS IN PROCESS OF BECOMING READY) -  
that are not handled properly within VxVM.

RESOLUTION:
Included required code logic within the APM so that the SCSI commands with
transient errors are retried for the duration of NetApp filer reconfig time (60
secs) before failing the IO's on VxVM volumes.

* 2234292 (Tracking ID: 2152830)

SYMPTOM:
Sometimes the storage admins create multiple copies/clones of the same device. 
Diskgroup import fails with a non-descriptive error message when multiple
copies(clones) of the same device exists and original device(s) are either
offline or not available.

# vxdg import mydg
VxVM vxdg ERROR V-5-1-10978 Disk group mydg: import failed: 
No valid disk found containing disk group

DESCRIPTION:
If the original devices are offline or unavailable, vxdg import picks
up cloned disks for import. DG import fails by design unless the clones
are tagged and tag is specified during DG import. While the import
failure is expected, but the error message is non-descriptive and
doesn't provide any corrective action to be taken by user.

RESOLUTION:
Fix has been added to give correct error meesage when duplicate clones
exist during import. Also, details of duplicate clones is reported in
the syslog.

Example:

[At CLI level]
# vxdg import testdg             
VxVM vxdg ERROR V-5-1-10978 Disk group testdg: import failed:
DG import duplcate clone detected

[In syslog]
vxvm:vxconfigd: warning V-5-1-0 Disk Group import failed: Duplicate clone disks are
detected, please follow the vxdg (1M) man page to import disk group with
duplicate clone disks. Duplicate clone disks are: c2t20210002AC00065Bd0s2 :
c2t50060E800563D204d1s2  c2t50060E800563D204d0s2 : c2t50060E800563D204d1s2

* 2241149 (Tracking ID: 2240056)

SYMPTOM:
'vxdg move/split/join' may fail during high I/O load.

DESCRIPTION:
During heavy I/O load 'dg move' transcation may fail because of open/close 
assertion and retry will be done. As the retry limit is set to 30 'dg move' 
fails if retry hits the limit.

RESOLUTION:
Change the default transaction retry to unlimit, introduce a new option 
to 'vxdg move/split/join' to set transcation retry limit as follows:

vxdg [-f] [-o verify|override] [-o expand] [-o transretry=retrylimit] move 
src_diskgroup dst_diskgroup objects ...

vxdg [-f] [-o verify|override] [-o expand] [-o transretry=retrylimit] split 
src_diskgroup dst_diskgroup objects ...

vxdg [-f] [-o verify|override] [-o transretry=retrylimit] join src_diskgroup 
dst_diskgroup

* 2253269 (Tracking ID: 2263317)

SYMPTOM:
vxdg(1M) man page does not clearly describe diskgroup import and destroy 
operations for the case in which original diskgroup is destroyed and cloned 
disks are present.

DESCRIPTION:
Diskgroup import with dgid is cosidered as a recovery operation. Therefore, 
while 
importing with dgid, even though the original diskgroup is destroyed, both the 
original as well as cloned disks are considered as available disks. Hence, the 
original diskgroup is imported in such a scenario.
The existing vxdg(1M) man page does not clearly describe this scenario.

RESOLUTION:
Modified the vxdg(1M) man page to clearly describe the scenario.

* 2256728 (Tracking ID: 2248730)

SYMPTOM:
Command hungs if "vxdg import" called from script with STDERR
redirected.

DESCRIPTION:
If script is having "vxdg import" with STDERR redirected then
script does not finish till DG import and recovery is finished. Pipe between
script and vxrecover is not closed properly which keeps calling script waiting
for vxrecover to complete.

RESOLUTION:
Closed STDERR in vxrecover and redirected the output to
/dev/console.

* 2257706 (Tracking ID: 2257678)

SYMPTOM:
When running vxinstall command to install VxVM on Linux/Solaris system with 
root disk on a LVM volume we get an error as follows:
# vxinstall
...
  The system is encapsulated. Reinstalling the Volume Manager
  at this stage could leave your system unusable.  Please un-encapsulate
  before continuing with the reinstallation.  Cannot continue further.
#

DESCRIPTION:
The vxinstall script checks if the root devices of the System is encapsulated 
(under VxVM control). The check for this was incorrectly coded. This led to LVM 
volumes also being detected as VxVM volumes. This error prevented vxinstall to 
proceed emitting the above error message. The error message was not true and is 
a false positive.

RESOLUTION:
The resolution was to modify the code, so that LVM volumes with rootvol in 
their name are not detected as VxVM encapsulated volumes.

* 2272956 (Tracking ID: 2144775)

SYMPTOM:
failover_policy attribute not persistent across reboot

DESCRIPTION:
failiver_policy attribute was not implemented to be persistent across reboot.
Hence on every reboot failover_policy used to switch back to default.

RESOLUTION:
Added code changes to make failover_policy attribute settings persistent across
reboot.

* 2273573 (Tracking ID: 2270880)

SYMPTOM:
On Solaris 10 (SPARC only), if the size of EFI(Extensible Firmware Interface)
labeled disk is greater than 2TB, the disk capacity will be truncated to 2TB
when it is initialized with CDS(Cross-platform Data Sharing) under VxVM(Veritas
Volume Manager).

For example, the sizes shown as the sector count by prtvtoc(1M) and public
region size by vxdisk(1M) will be truncated to the sizes approximate 2TB.

# prtvtoc /dev/rdsk/c0t500601604BA07D17d13
<snip>
*                          First      Sector    Last
* Partition  Tag  Flags    Sector     Count     Sector     Mount Directory
       2     15    00         48    4294967215  4294967262

# vxdisk list c0t500601604BA07D17d13 | grep public
public:    slice=2 offset=65744 len=4294901456 disk_offset=48

DESCRIPTION:
From VxVM 5.1 SP1 and onwards, the CDS format is enhanced to support for disks
of greater than 1TB. VxVM will use EFI layout to support CDS functionality for
disks of greater than 1TB, however on Solaris 10 (SPARC only), a problem is seen
that the disk capacity will be truncated to 2TB if the size of EFI labeled disk
is greater than 2TB.

This is because the library /usr/lib/libvxscsi.so in Solaris 10 (SPARC only)
package does not contain the required enhancement on Solaris 10 to support CDS
format for disks greater than 2TB.

RESOLUTION:
The VxVM package for Solaris has been changed to contain all the libvxscsi.so
binaries which is built for Solaris platforms(versions) respectively, for
example libvxscsi.so.SunOS_5.9 and libvxscsi.so.SunOS_5.10.

From this fix and onwards, the appropriate platform's built of the binary will
be installed as /usr/lib/libvxscsi.so during the installation of the VxVM 
package.

* 2276958 (Tracking ID: 2205108)

SYMPTOM:
On VxVM 5.1SP1 or later, device discovery operations such as vxdctl
enable, vxdisk scandisks and vxconfigd -k failed to claim new  disks correctly.
For example, if user provisions five new disks, VxVM, instead of creating five
different Dynamic Multi-Pathing (DMP) nodes, creates only one and includes the 
rest as its paths. Also, the following message is displayed at console during 
this problem.  

NOTICE: VxVM vxdmp V-5-0-34 added disk array , datype =

Please note that the cabinet serial number following "disk array" and the value
of "datype" is not printed in the above message.

DESCRIPTION:
VxVM's DDL (Device Discovery Layer) is responsible for appropriately claiming
the newly provisioned disks. Due to a bug in one of the routines within this
layer, though the disks are claimed, their LSN (Lun Serial Number, an unique
identifier of disks) is ignored thereby every disk is wrongly categorized 
within a DMP node.

RESOLUTION:
Modified the problematic code within the DDL thereby new disks are claimed
appropriately.

WORKAROUND:

If vxconfigd does not hang or dump a core with this issue, a reboot can be a
workaround to recover this situation or to break up once and rebuild the DMP/DDL
database on the devices as the following steps;

# vxddladm excludearray all
# mv /etc/vx/jbod.info /etc/vx/jbod.info.org
# vxddladm disablescsi3
# devfsadm -Cv
# vxconfigd -k

# vxddladm includearray all
# mv /etc/vx/jbod.info.org /etc/vx/jbod.info
# vxddladm enablescsi3
# rm /etc/vx/disk.info /etc/vx/array.info
# vxconfigd -k

* 2291184 (Tracking ID: 2291176)

SYMPTOM:
vxrootadm does not set dump device correctly with LANG=ja ( i.e. Japanese).

DESCRIPTION:
The vxrootadm script tries to get dump device from dumpadm command output, 
but as language is set to Japanese, it is not able to grep English words 
from the output. As a result it fails to set dump device properly.

RESOLUTION:
Set environment Language variable  to C (English) before parsing the dumpadm
command output. This fix has been made to vxrootadm and vxunroot scripts where
they try to parse the dumpadm command output.

* 2299691 (Tracking ID: 2299670)

SYMPTOM:
VxVM disk groups created on EFI (Extensible Firmware Interface) LUNs do not get
auto-imported during system boot in VxVM version 5.1SP1 and later.

DESCRIPTION:
While determining the disk format of EFI LUNs, stat() system call on the
corresponding DMP devices fail with ENOENT ("No such file or directory") error
because the DMP device nodes are not created in the root file system during
system boot. This leads to failure in auto-import of disk groups created on EFI
LUNs.

RESOLUTION:
VxVM code is modified to use OS raw device nodes if stat() fails on DMP device
nodes.

* 2316309 (Tracking ID: 2316297)

SYMPTOM:
The following error messages are printed on the console every time system
boots.           
 VxVM vxdisk ERROR V-5-1-534 Device [DEVICE NAME]: Device is in use

DESCRIPTION:
During system boot up, while Volume Manager diskgroup imports, vxattachd daemon
tries to online the disk. Since the disk may be already online sometimes, an
attempt to re-online disk gives the below error message:
 VxVM vxdisk ERROR V-5-1-534 Device [DEVICE NAME]: Device is in use

RESOLUTION:
The solution is to check if the disk is already in "online" state. If so, avoid
reonline.

* 2323999 (Tracking ID: 2323925)

SYMPTOM:
If the rootdisk is under VxVM control and /etc/vx/reconfig.d/state.d/install-db
file exists, the following messages are observed on the console:

UX:vxfs fsck: ERROR: V-3-25742: /dev/vx/dsk/rootdg/homevol:sanity check 
failed: cannot open /dev/vx/dsk/rootdg/homevol: No such device or address
UX:vxfs fsck: ERROR: V-3-25742: /dev/vx/dsk/rootdg/optvol:sanity check failed: 
cannot open /dev/vx/dsk/rootdg/optvol: No such device or address

DESCRIPTION:
In the vxvm-startup script, there is check for the
/etc/vx/reconfig.d/state.d/install-db file. If the install-db file exist on the
system, the VxVM assumes that volume manager is not configured and does not
start volume configuration daemon "vxconfigd". "install-db" file somehow existed
on the system for a VxVM rootable system, this causes the failure.

RESOLUTION:
If install-db file exists on the system and the system is VxVM rootable, the
following warning message is displayed on the console:
"This is a VxVM rootable system.
 Volume configuration daemon could not be started due to the presence of
 /etc/vx/reconfig.d/state.d/install-db file.
 Remove the install-db file to proceed"

* 2328219 (Tracking ID: 2253552)

SYMPTOM:
vxconfigd leaks memory while reading the default tunables related to
smartmove (a VxVM feature).

DESCRIPTION:
In Vxconfigd, memory allocated for default tunables related to
smartmove feature is not freed causing a memory leak.

RESOLUTION:
The memory is released after its scope is over.

* 2337091 (Tracking ID: 2255182)

SYMPTOM:
If EMC CLARiiON arrays are configured with different failovermode for each host 
controllers ( e.g. one HBA has failovermode set as 1 while the other as 2 ), 
then VxVM's vxconfigd demon dumps core.

DESCRIPTION:
DDL (VxVM's Device Discovery Layer) determines the array type depending on the 
failovermode setting. DDL expects the same array type to be returned across all 
the paths going to that array. This fundamental assumption of DDL will be broken
with different failovermode settings thus leading to vxconfigd core dump.

RESOLUTION:
Validation code is added in DDL to detect such configurations and emit 
appropriate warning messages to the user to take corrective actions and skips
the later set of paths that are reporting different array type.

* 2349653 (Tracking ID: 2349352)

SYMPTOM:
Data corruption is observed on DMP device with single path during Storage 
reconfiguration (LUN addition/removal).

DESCRIPTION:
Data corruption can occur in the following configuration, when new LUNs are 
provisioned or removed under VxVM, while applications are on-line.
 
1. The DMP device naming scheme is EBN (enclosure based naming) and 
persistence=no
2. The DMP device is configured with single path or the devices are controlled 
by Third Party Multipathing Driver (Ex: MPXIO, MPIO etc.,)
 
There is a possibility of change in name of the VxVM devices (DA record), when 
LUNs are removed or added followed by the following commands, since the 
persistence naming is turned off.
 
(a) vxdctl enable
(b) vxdisk scandisks
 
Execution of above commands discovers all the devices and rebuilds the device 
attribute list with new DMP device names. The VxVM device records are then 
updated with this new attributes. Due to a bug in the code, the VxVM device 
records are mapped to wrong DMP devices. 
 
Example:
 
Following are the device before adding new LUNs.
 
sun6130_0_16 auto            -            -            nolabel
sun6130_0_17 auto            -            -            nolabel
sun6130_0_18 auto:cdsdisk    disk_0       prod_SC32    online nohotuse
sun6130_0_19 auto:cdsdisk    disk_1       prod_SC32    online nohotuse
 
The following are after adding new LUNs
 
sun6130_0_16 auto            -            -            nolabel
sun6130_0_17 auto            -            -            nolabel
sun6130_0_18 auto            -            -            nolabel
sun6130_0_19 auto            -            -            nolabel
sun6130_0_20 auto:cdsdisk    disk_0       prod_SC32    online nohotuse
sun6130_0_21 auto:cdsdisk    disk_1       prod_SC32    online nohotuse
 
The name of the VxVM device sun6130_0_18 is changed to sun6130_0_20.

RESOLUTION:
The code that updates the VxVM device records is rectified.

* 2353325 (Tracking ID: 1791397)

SYMPTOM:
Replication doesn't start if rlink detach and attach is done just after SRL
overflow.

DESCRIPTION:
As SRL overflows, it starts flush writes from SRL to DCM(Data change map). If
rlink is detached before complete SRL is flushed to DCM then it leaves the rlink
in SRL flushing state. Due to flushing state of rlink, attaching the rlink again
doesn't start the replication. Problem here is the way rlink flushing state is
interpreted.

RESOLUTION:
To fix this issue, we changed the logic to correctly interpret rlink flushing state.

* 2353327 (Tracking ID: 2179259)

SYMPTOM:
When using disks of size > 2TB and the disk encounters a media error with offset >
2TB while the disk responds to SCSI inquiry, data corruption can occur incase of a
write operation

DESCRIPTION:
The I/O rety logic in DMP assumes that the I/O offset is within 2TB limit and
hence when using disks of size > 2TB and the disk encounters a media error with
offset > 2TB while the disk responds to SCSI inquiry, the I/O would be issued on a
wrong offset within the 2TB range causing data corruption incase of write I/Os.

RESOLUTION:
The fix for this issue to change the I/O retry mechanism to work for >2TB offsets
as well so that no offset truncation happens that could lead to data corruption

* 2353328 (Tracking ID: 2194685)

SYMPTOM:
vxconfigd dumps core in scenario where array side ports are disabled/enabled in
loop for some iterations. 

gdb) where
#0  0x081ca70b in ddl_delete_node ()
#1  0x081cae67 in ddl_check_migration_of_devices ()
#2  0x081d0512 in ddl_reconfigure_all ()
#3  0x0819b6d5 in ddl_find_devices_in_system ()
#4  0x0813c570 in find_devices_in_system ()
#5  0x0813c7da in mode_set ()
#6  0x0807f0ca in setup_mode ()
#7  0x0807fa5d in startup ()
#8  0x08080da6 in main ()

DESCRIPTION:
Due to disabling the array side ports, the secondary paths get removed. But the
primary paths are reusing the devno of the removed secondary paths which is not
correctly handled in current migration code. Due to this, the DMP database gets
corrupted and subsequent discoveries lead to configd core dump.

RESOLUTION:
The issue is due to incorrect setting of a DMP flag.
The flag settting has been fixed to prevent the  DMP database from corruption in
the mentioned scenario.

* 2353403 (Tracking ID: 2337694)

SYMPTOM:
"vxdisk -o thin list" displays size as 0 for thin luns of capacity greater than 
2 TB.

DESCRIPTION:
SCSI READ CAPACITY ioctl is invoked to get the disk capacity.  SCSI READ 
CAPACITY returns data in extended data format if a disk capacity is 2 TB or 
greater.  This extended data was parsed incorectly while calculating the disk 
capacity.

RESOLUTION:
This issue has been resolved by properly parsing the extended data returned by 
SCSI READ CAPACITY ioctl for disks of size greater than 2 TB or greater.

* 2353410 (Tracking ID: 2286559)

SYMPTOM:
System panics in DMP (Dynamic Multi Pathing) kernel module due to kernel heap 
corruption while DMP path failover is in progress.

Panic stack may look like:

vpanic
kmem_error+0x4b4()
gen_get_enabled_ctlrs+0xf4()
dmp_get_enabled_ctlrs+0xf4()
dmp_info_ioctl+0xc8()
dmpioctl+0x20()
dmp_get_enabled_cntrls+0xac()
vx_dmp_config_ioctl+0xe8()
quiescesio_start+0x3e0()
voliod_iohandle+0x30()
voliod_loop+0x24c()
thread_start+4()

DESCRIPTION:
During path failover in DMP, the routine gen_get_enabled_ctlrs() allocates 
memory proportional to the number of enabled paths. However, while releasing 
the memory, the routine may end up freeing more memory because of the change in 
number of enabled paths.

RESOLUTION:
Code changes have been made in the routines to free allocated memory only.

* 2353421 (Tracking ID: 2334534)

SYMPTOM:
In CVM (Cluster Volume Manager) environment, a node (SLAVE) join to the cluster
is getting stuck and leading to unending join hang unless join operation is
stopped on joining node (SLAVE) using command '/opt/VRTS/bin/vxclustadm
stopnode'. While CVM join is hung in user-land (also called as vxconfigd level
join), on CVM MASTER node, vxconfigd (Volume Manager Configuration daemon)
doesn't respond to any VxVM command, which communicates to vxconfigd process.

When vxconfigd level CVM join is hung in user-land, "vxdctl -c mode" on joining
node (SLAVE) displays an output such as:
 
     bash-3.00#  vxdctl -c mode
     mode: enabled: cluster active - SLAVE
     master: mtvat1000-c1d
     state: joining
     reconfig: vxconfigd in join

DESCRIPTION:
As part of a CVM node join to the cluster, every node in the cluster updates the
current CVM membership information (membership information which can be viewed
by using command '/opt/VRTS/bin/vxclustadm nidmap') in kernel first and then
sends a signal to vxconfigd in user land to use that membership in exchanging
configuration records among each others. Since each node receives the signal
(SIGIO) from kernel independently, the joining node's (SLAVE) vxconfigd is ahead
of the MASTER in its execution. Thus any requests coming from the joining node
(SLAVE) is denied by MASTER with the error "VE_CLUSTER_NOJOINERS" i.e. join
operation is not currently allowed (error number: 234) since MASTER's vxconfigd
has not got the updated membership from the kernel yet. While responding to
joining node (SLAVE) with error "VE_CLUSTER_NOJOINERS", if there is any change
in current membership (change in CVM node ID) as part of node join then MASTER
node is wrongly updating the internal data structure of vxconfigd, which is
being used to send response to joining (SLAVE) nodes. Due to wrong update of
internal data structure, later when the joining node retries its request, the
response from master is sent to a wrong node, which doesn't exist in the
cluster, and no response is sent to the joining node. Joining node (SLAVE) never
gets the response from MASTER for its request and hence CVM node join is not
completed and leading to cluster hang.

RESOLUTION:
vxconfigd code is modified to handle the above mentioned scenario effectively. 
vxconfid on MASTER node will process connection request coming from joining node
(SLAVE) effectively only when MASTER node gets the updated CVM membership
information from kernel.

* 2353425 (Tracking ID: 2320917)

SYMPTOM:
vxconfigd, the VxVM configuration daemon dumps core and loses disk group 
configuration while invoking the following VxVM reconfiguration steps:

1)	Volumes which were created on thin reclaimable disks are deleted.
2)	Before the space of the deleted volumes is reclaimed, the disks (whose 
volume is deleted) are removed from the DG with  'vxdg rmdisk' command using '-
k' option.
3)	The disks  are removed using  'vxedit rm' command.
4)	 New disks are added to the disk group using 'vxdg addisk' command.

The stack trace of the core dump is :
[
 0006f40c rec_lock3 + 330
 0006ea64 rec_lock2 + c
 0006ec48 rec_lock2 + 1f0
 0006e27c rec_lock + 28c
 00068d78 client_trans_start + 6e8
 00134d00 req_vol_trans + 1f8
 00127018 request_loop + adc
 000f4a7c main  + fb0
 0003fd40 _start + 108
]

DESCRIPTION:
When a volume is deleted from a disk group that uses thin reclaim luns, 
subdisks are not removed immediately, rather it is marked with a special flag. 
The reclamation happens at a scheduled time every day. "vxdefault" command can 
be invoked to list and modify the settings.

After the disk is removed from disk group using 'vxdg -k rmdisk' and 'vxedit 
rm' command, the subdisks records are still in core database and they are 
pointing to disk media record which has been freed. When the next command is 
run to add another new disk to the disk group, vxconfigd dumps core when 
locking the disk media record which has already been freed.

The subsequent disk group deport and import commands erase all disk group 
configuration as it detects an invalid association between the subdisks and the 
removed disk.

RESOLUTION:
1)	The following message will be printed when 'vxdg rmdisk' is used to 
remove disk that has reclaim pending subdisks:

VxVM vxdg ERROR V-5-1-0 Disk <diskname> is used by one or more subdisks which
are pending to be reclaimed.
        Use "vxdisk reclaim <diskname>" to reclaim space used by these subdisks,
        and retry "vxdg rmdisk" command.
        Note: reclamation is irreversible.

2)	Add a check when using 'vxedit rm' to remove disk. If the disk is in 
removed state and has reclaim pending subdisks, following error message will be 
printed:

VxVM vxedit ERROR V-5-1-10127 deleting <diskname>:
        Record is associated

* 2353427 (Tracking ID: 2337353)

SYMPTOM:
The "vxdmpadm include" command is including all the excluded devices along with 
the device given in the command.

Example:

# vxdmpadm exclude vxvm dmpnodename=emcpower25s2
# vxdmpadm exclude vxvm dmpnodename=emcpower24s2

# more /etc/vx/vxvm.exclude
exclude_all 0
paths
emcpower24c /dev/rdsk/emcpower24c emcpower25s2
emcpower10c /dev/rdsk/emcpower10c emcpower24s2
#
controllers
#
product
#
pathgroups
#

# vxdmpadm include vxvm dmpnodename=emcpower24s2

# more /etc/vx/vxvm.exclude
exclude_all 0
paths
#
controllers
#
product
#
pathgroups
#

DESCRIPTION:
When a dmpnode is excluded, an entry is made in /etc/vx/vxvm.exclude file. This 
entry has to be removed when the dmpnode is included later. Due to a bug in 
comparison of dmpnode device names, all the excluded devices are included.

RESOLUTION:
The bug in the code which compares the dmpnode device names is rectified.

* 2353428 (Tracking ID: 2339251)

SYMPTOM:
In Solaris 10 version, newfs/mkfs_ufs(1M) fails to create UFS file system 
on "VxVM volume > 2 Tera Bytes" with the following error:

    # newfs /dev/vx/rdsk/[disk group]/[volume]
    newfs: construct a new file system /dev/vx/rdsk/[disk group]/[volume]: 
(y/n)? y
    Can not determine partition size: Inappropriate ioctl for device

The truss output of the newfs/mkfs_ufs(1M) shows that the ioctl() system calls, 
to identify the size of the disk or volume device, fails with ENOTTY error.

    ioctl(3, 0x042A, ...)                    Err#25 ENOTTY
    ...
    ioctl(3, 0x0412, ...)                    Err#25 ENOTTY

DESCRIPTION:
In Solaris 10 version, newfs/mkfs_ufs(1M) uses ioctl() system calls, to 
identify the size of the disk or volume device, when creating UFS file system 
on disk or volume devices "> 2TB". If the Operating System (OS) version is less 
than Solaris 10 Update 8, the above ioctl system calls are invoked on "volumes 
> 1TB" as well.

VxVM, Veritas Volume Manager exports the ioctl interfaces for VxVM volumes. 
VxVM 5.1 SP1 RP1 P1 and VxVM 5.0 MP3 RP3 introduced the support for Extensible 
Firmware Interface (EFI) for VxVM volumes in Solaris 9 and Solaris 10 
respectively. However the corresponding EFI specific build time definition in 
Veritas Kernel IO driver (VXIO) was not updated in Solaris 10 in VxVM 5.1 SP1 
RP1 P1 and onwards.

RESOLUTION:
The code changes to add the build time definition for EFI in VXIO entails in 
newfs/mkfs_ufs(1M) successfully creating UFS file system on VxVM volume 
devices "> 2TB" ("> 1TB" if OS version is less than Solaris 10 Update 8).

* 2353464 (Tracking ID: 2322752)

SYMPTOM:
Duplicate device names are observed for NR (Not Ready) devices, when vxconfigd 
is restarted (vxconfigd -k).

# vxdisk list 

emc0_0052    auto            -            -            error
emc0_0052    auto:cdsdisk    -            -            error
emc0_0053    auto            -            -            error
emc0_0053    auto:cdsdisk    -            -            error

DESCRIPTION:
During vxconfigd restart, disk access records are rebuilt in vxconfigd 
database. As part of this process IOs are issued on all the devices to read the 
disk private regions. The failure of these IOs on NR devicess resulted in 
creating duplicate disk access records.

RESOLUTION:
vxconfigd code is modified not to create dupicate disk access records.

* 2357579 (Tracking ID: 2357507)

SYMPTOM:
Machine can panic while detecting unstable paths with following stack
trace.

#0  crash_nmi_callback 
#1  do_nmi 
#2  nmi 
#3  schedule 
#4  __down 
#5  __wake_up 
#6  .text.lock.kernel_lock 
#7  thread_return 
#8  printk 
#9  dmp_notify_event 
#10 dmp_restore_node

DESCRIPTION:
After detecting unstable paths restore daemon allocates memory to
report the event to userland daemons like vxconfigd. While requesting for memory
allocation restore daemon did not drop the spin lock resulting to the machine
panic.

RESOLUTION:
Fixed the code so that spinlocks are not held while requesting for
memory allocation in restore daemon.

* 2357820 (Tracking ID: 2357798)

SYMPTOM:
VVR leaking memory due to unfreed vol_ru_update structure. Memory leak is very
small but it can accumulate to big value if VVR is running for many days.

DESCRIPTION:
VVR allocates update structure for each write, if replication is up-to-date then
next write coming in will also create multi-update and add it to VVR replication
queue. While creating multi-update, VVR wrongly marked the original update with
flag, which means that update is in replication queue, but it was never added(not
required) to replication queue. When update free routine is called it check if
update has flag marked then don't free it, assuming that update is still in
replication queue, it will get free while remove it from queue. Since update was
not in the queue it will never get free and leak the memory. Memory leak will
happen for only first write coming after each time rlink become up-to-date, that
is reason it will take many days to leak big memory.

RESOLUTION:
Marking of flag for some updates was causing this memory leak, flag marking is not
required as we are not adding update into replication queue. Fix is to remove
marking and checking of flag.

* 2360404 (Tracking ID: 2146833)

SYMPTOM:
The vxrootadm/vxmirror command may fail with error:

 VxVM mirror INFO V-5-2-22  Mirror voume swapvol...
 VxVM ERROR V-5-2-673 Mirroring of disk rootdisk failed:
 Error: VxVM vxdisk ERRROR V-5-1-0 Device has UFS FS on it.

DESCRIPTION:
With VxVM 5.1SP1 we have restricted use of -f option with 'vxdisk init' to 
initialize a disk having UFS FS on it. We have introduced a new option '-r' to 
be 
used if user wants to forcefully initalize the disk. The root disk on solaris 
do 
not have a foreign format but have UFS FS. While trying to encapsulate the root 
disk we try to init the disk which fails if -r option is not specified.

RESOLUTION:
The fix is to add -r option to 'vxdisk init'/'vxdisksetup' at all the places 
within our encap scripts to ensure that we successfully initialize a root disk.
We have also made the error message more informative as:

For ex:

bash-3.00# vxdisk -f init c0d40s2
VxVM vxdisk ERROR V-5-1-16114  The device is in use. This device may be a boot 
disk.  Device has a UFS FS on it.
If you still want to initialize this device for VxVM use, ensure that there is 
no 
root FS on it.
Then remove the FS signature from each of the slice(s) as follows:
        dd if=/dev/zero of=/dev/vx/rdmp/c0d40s[n] oseek=18 bs=512 count=1
 [n] is the slice number.
 Or alternatively you can rerun the same command with -r option.

* 2360415 (Tracking ID: 2242268)

SYMPTOM:
The agenode which got already freed got accessed which led to the panic.
Panic stack looks like

[0674CE30]voldrl_unlog+0001F0 (F100000070D40D08, F10001100A14B000,
   F1000815B002B8D0, 0000000000000000)
[06778490]vol_mv_write_done+000AD0 (F100000070D40D08, F1000815B002B8D0)
[065AC364]volkcontext_process+0000E4 (F1000815B002B8D0)
[066BD358]voldiskiodone+0009D8 (F10000062026C808)
[06594A00]voldmp_iodone+000040 (F10000062026C808)

DESCRIPTION:
Panic happened because of accessing the memory location which got already freed.

RESOLUTION:
Skip the data structure for further processing when the memory 
already got freed off.

* 2360419 (Tracking ID: 2237089)

SYMPTOM:
=======
vxrecover failed to recover the data volumes with associated cache volume.

DESCRIPTION:
===========
vxrecover doesn't wait till the recovery of the cache volumes is complete before 
triggering the recovery of the data volumes that are created on top of cache
volume. Due to this the recovery might fail for the data volumes.

RESOLUTION:
==========
Code changes are done to serialize the recovery for different volume types.

* 2360719 (Tracking ID: 2359814)

SYMPTOM:
1. vxconfigbackup(1M) command fails with the following error:
ERROR V-5-2-3720 dgid mismatch

2. "-f" option for the vxconfigbackup(1M) is not documented in the man page.

DESCRIPTION:
1. In some cases, a *.dginfo file will have two lines starting with
"dgid:". It causes vxconfigbackup to fail.
The output from the previous awk command returns 2 lines instead of one for the
$bkdgid variable and the comparison fails, resulting in "dgid mismatch" error even
when the dgids are the same.
This happens in the case if the temp dginfo file is not removed during last run of
vxconfigbackup, such as the script is interrupted, the temp dginfo file is 
updated with appending mode, 

vxconfigbackup.sh:

   echo "TIMESTAMP" >> $DGINFO_F_TEMP 2>/dev/null

Therefore, there may have 2 or more dginfo are added into  the dginfo file, it 
causes the config backup failure with dgid mismatch.

2. "-f" option to force a backup is not documented in the man page of
vxconfigbackup(1M).

RESOLUTION:
1. The solution is to change append mode to destroy mode.

2. Updated the vxconfigbackup(1M) man page with the "-f" option.

* 2364700 (Tracking ID: 2364253)

SYMPTOM:
In case of Space Optimized snapshots at secondary site, VVR leaks kernel memory.

DESCRIPTION:
In case of Space Optimized snapshots at secondary site, VVR proactively starts
the copy-on-write on the snapshot volume. The I/O buffer allocated for this
proactive copy-on-write was not freed even after I/Os are completed which lead
to the memory leak.

RESOLUTION:
After the proactive copy-on-write is complete, memory allocated for the I/O
buffers is released.

* 2367561 (Tracking ID: 2365951)

SYMPTOM:
Growing RAID5 volumes beyond 5TB fails with "Unexpected kernel error in 
configuration update" error.

Example :
# vxassist -g eqpwhkthor1  growby raid5_vol5 19324030976
VxVM vxassist ERROR V-5-1-10128  Unexpected kernel error in configuration update

DESCRIPTION:
VxVM stores the size required to grow RAID5 volumes in an integer variable which 
overflowed for large volume sizes. This results in failure to grow the volume.

RESOLUTION:
VxVM code is modified to handle integer overflow conditions for RAID5 volumes.

* 2377317 (Tracking ID: 2408771)

SYMPTOM:
VXVM does not show all the discovered devices. Number of devices shown
by VXVM is lesser than those by the OS.

DESCRIPTION:
For every lunpath device discovered, VXVM creates a data structure and
is stored in a hash table. Hash value is computed based on unique minor of the
lunpath. In case minor number exceeds 831231, we encounter integer overflow and
store the data structure for this path at wrong location. When we later traverse
this hash list, we limit the accesses based on total number of discovered paths
and as the devices with minor numbers greater than 831232 are hashed wrongly, we
do not create DA records for such devices.

RESOLUTION:
Integer overflow problem has been resolved by appropriately typecasting
the minor number and hence correct hash value is computed.

* 2379034 (Tracking ID: 2379029)

SYMPTOM:
Changing of enclosure name was not working for all devices in enclosure. All these
devices were present in /etc/vx/darecs.

# cat /etc/vx/darecs
ibm_ds8x000_02eb        auto    online 
format=cdsdisk, privoffset=256, pubslice=2, privslice=2
ibm_ds8x000_02ec        auto    online 
format=cdsdisk, privoffset=256, pubslice=2, privslice=2
# vxdmpadm setattr enclosure ibm_ds8x000 name=new_ibm_ds8x000
# vxdisk -o alldgs list
DEVICE       TYPE            DISK         GROUP        STATUS
ibm_ds8x000_02eb auto:cdsdisk    ibm_ds8x000_02eb  mydg         online
ibm_ds8x000_02ec auto:cdsdisk    ibm_ds8x000_02ec  mydg         online
new_ibm_ds8x000_02eb auto            -            -            error
new_ibm_ds8x000_02ec auto            -            -            error

DESCRIPTION:
/etc/vx/darecs only stores foreign devices and nopriv or simple devices, 
the auto device should NOT be written into this file. A DA record is flushed in
the /etc/vx/darecs at the end of transaction, if R_NOSTORE flag is NOT set on a DA
record. There was a bug in VM where if we initialize a disk that does not
exist(e.g. using vxdisk rm) in da_list, the R_NOSTORE flag is NOT set for the new
created DA record. Hence duplicate entries for these devices were created and
resulted in these DAs going in error state.

RESOLUTION:
Source has been modified to add R_NOSTORE flag for auto type DA record created by
auto_init() or auto_define().

# vxdmpadm setattr enclosure ibm_ds8x000 name=new_ibm_ds8x000
# vxdisk -o alldgs list
new_ibm_ds8x000_02eb auto:cdsdisk    ibm_ds8x000_02eb  mydg         online
new_ibm_ds8x000_02ec auto:cdsdisk    ibm_ds8x000_02ec  mydg         online

* 2382705 (Tracking ID: 1675599)

SYMPTOM:
Vxconfigd leaks memory while excluding and including a Third party Driver
controlled LUN in a loop. As part of this vxconfigd loses its license information
and following error is seen in system log:
        "License has expired or is not available for operation"

DESCRIPTION:
In vxconfigd code, memory allocated for various data structures related to
device discovery layer is not freed which led to the memory leak.

RESOLUTION:
The memory is released after its scope is over.

* 2382710 (Tracking ID: 2139179)

SYMPTOM:
DG import can fail with SSB (Serial Split Brain) though the SSB does not exist.

DESCRIPTION:
An association between DM and DA records is done while importing any DG, if the 
SSB id of the DM and DA records match. On a system with stale cloned disks, the 
system is attempting to associate the DM with cloned DA, where the SSB id 
mismatch is observed and resulted in import failure with SSB mismatch.

RESOLUTION:
The selection of DA to associate with DM is rectified to resolve the issue.

* 2382714 (Tracking ID: 2154287)

SYMPTOM:
In the presence of Not-Ready" devices when the SCSI inquiry on the device succeeds
but open or read/write operations fail, one sees that paths to such devices are
continuously marked as ENABLED and DISABLED for every DMP restore task cycle.

DESCRIPTION:
The issue is that the DMP restore task finds these paths connected and hence
enables them for I/O but soon finds that they cannot be used for I/O and
disables them

RESOLUTION:
The fix is to not enable the path unless it is found to be connected and available
to open and issue I/O.

* 2382717 (Tracking ID: 2197254)

SYMPTOM:
vxassist, the VxVM volume creation utility when creating volume with
"logtype=none" doesn't function as expected.

DESCRIPTION:
While creating volumes on thinrclm disks, Data Change Object(DCO) version 20 log
is attached to every volume by default. If the user do not want this default
behavior then "logtype=none" option can be specified as a parameter to vxassist
command. But with VxVM on HP 11.31 , this option does not work and DCO version
20 log is created by default.  The reason for this inconsistency is that  when
"logtype=none" option is specified, the utility sets the flag to prevent
creation of log. However, VxVM wasn't checking whether the flag is set before
creating DCO log which led to this issue.

RESOLUTION:
This is a logical issue which is addressed by code fix. The solution is to check
for this corresponding flag of  "logtype=none" before creating DCO version 20 by
default.

* 2382720 (Tracking ID: 2216515)

SYMPTOM:
System could not boot-up after vxunreloc. If original offsets are used while
un-relocation, it will corrupt the boot disk.

DESCRIPTION:
When root disk is not having any free space and it is encapsulated
then encap process will steal some space from swap. And also it will create a
public slice starting from "0" sector and it will create a -B0 subdisk to
protect the cylinder "0" information. Hence public region length is bigger than
it should have been when non-full disk is initialized.  When disk is yanked out
the rootvol, swapvol is relocated to a disk which is already re-initialized, so
there is no need to reserve space for -B0 subdisk. Hence it will allocate
rootvol, swapvol space from the public region and relocates the data.But when it
comes to un-reloc, unreloc will try to create a subdisk on the new target disk
and will try to keep the same offsets as the original failed disk. Hence it
exceeds the public region slice and will overlap other slice, causing data
corruption.

RESOLUTION:
Source has been modified to not use the original offsets for case of unrelocation
of encapsulated root disk. It will display an info message during unrelocation to
indicate this.
# /etc/vx/bin/vxunreloc  -g rootdg rootdg01
   VxVM  INFO V-5-2-0 Forcefully unrelocating the root disk without preserving
original offsets

* 2383705 (Tracking ID: 2204752)

SYMPTOM:
The following message is observed after the diskgroup creation:
"VxVM ERROR V-5-3-12240: GPT entries checksum mismatch"

DESCRIPTION:
This message is observed with the disk which was initialized as cds_efi and later
on this was initialized as hpdisk. A harmless message "checksum mismatch" is
thrown out even when the diskgroup initialization is successful.

RESOLUTION:
Remove the harmless message "GPT entries checksum mismatch"

* 2384473 (Tracking ID: 2064490)

SYMPTOM:
vxcdsconvert utility fails if disk capacity is greater than or equal to 1 TB

DESCRIPTION:
VxVM cdsdisk uses GPT layout if the disk capacity is greater than 1 TB and uses 
VTOC layout if the disk capacity is less 1 TB.  Thus, vxcdsconvert utility was 
not able to convert to the GPT layout if the disk capacity is greater than or 
equal to 1 TB.

RESOLUTION:
This issue has been resolved by converting to proper cdsdisk layout depending 
on the disk capacity

* 2384844 (Tracking ID: 2356744)

SYMPTOM:
When "vxvm-recover" are executed manually, the duplicate instances of the
Veritas Volume Manager(VxVM) daemons (vxattachd, vxcached, vxrelocd, vxvvrsecdgd
and vxconfigbackupd) are invoked.
When user tries to kill any of the daemons manually, the other instances of the
daemons are left on this system.

DESCRIPTION:
The Veritas Volume Manager(VxVM) daemons (vxattachd, vxcached, vxrelocd,
vxvvrsecdgd and vxconfigbackupd) do not have :

  1. A check for duplicate instance.
  and
  2. Mechanism to clean up the stale processes.

Because of this, when user executes the startup script(vxvm-recover), all
daemons are invoked again and if user kills any of the daemons manually, the
other instances of the daemons are left on this system.

RESOLUTION:
The VxVM daemons are modified to do the "duplicate instance check" and "stale
process cleanup" appropriately.

* 2386763 (Tracking ID: 2346470)

SYMPTOM:
The Dynamic Multi Pathing Administration operations such as "vxdmpadm 
exclude vxvm dmpnodename=<daname>" and "vxdmpadm include vxvm dmpnodename=
<daname>" triggers memory leaks in the heap segment of VxVM Configuration Daemon 
(vxconfigd).

DESCRIPTION:
vxconfigd allocates chunks of memory to store VxVM specific information 
of the disk being included during "vxdmpadm include vxvm dmpnodename=<daname>" 
operation. The allocated memory is not freed while excluding the same disk from 
VxVM control. Also when excluding a disk from VxVM control, another chunk of 
memory is temporarily allocated by vxconfigd to store more details of the device 
being excluded. However this memory is not freed at the end of exclude 
operation.

RESOLUTION:
Memory allocated during include operation of a disk is freed during 
corresponding exclude operation of the disk. Also temporary memory allocated 
during exclude operation of a disk is freed at the end of exclude operation.

* 2389095 (Tracking ID: 2387993)

SYMPTOM:
In presence of NR (Not-Ready) devices, vxconfigd (VxVM configuration 
        daemon) goes into disabled mode once restarted.  

	# vxconfigd -k -x syslog
	# vxdctl mode
	mode: disabled

	If vxconfigd is restarted in debug mode at level 9 following message 
        could be seen. 

	# vxconfigd -k -x 9 -x syslog

	VxVM vxconfigd DEBUG  V-5-1-8856 DA_RECOVER() failed, thread 87: Kernel 
        and on-disk configurations don't match

DESCRIPTION:
When vxconfid is restarted, all the VxVM devices are recovered. As part
        of recovery the capacity of the device is read, which can fail with EIO.
        This error is not handled properly. As a result of this the vxconfigd is  
        going to DISABLED state.

RESOLUTION:
EIO error code from read capacity ioctl is handled specifically.

* 2390804 (Tracking ID: 2249113)

SYMPTOM:
VVR volume recovery hang, at vol_ru_recover_primlog_done() function in a dead 
loop.

DESCRIPTION:
During the SRL recovery, the SRL is read to apply the update to the data 
volume.  
There are possible hold in the SRL due to some writes are not complete 
properly. 
This holes must have to be skipped. and this regions is read as a dummy update 
and sent it to secondary. If the dummy update size is larger than max_write 
(>256k), then the code logic goes intoa dead loop, keep reading the same dummy 
update for ever.

RESOLUTION:
Handle the large holes which are greater than VVR MAX_WRITE.

* 2390815 (Tracking ID: 2383158)

SYMPTOM:
The panic in vol_rv_mdship_srv_done() due to sio is freed and having the 
invalid node pointer.

DESCRIPTION:
The vol_rv_mdship_srv_done() is panicking at referencing wrsio->wrsrv_node as 
the wrsrv_node is having the invalid pointer.It is also observed that the wrsio 
is freed or allocated for different SIO. Looking closely, the 
vol_rv_check_wrswaitq() is called at every done of the SIO, which looks into 
the waitq and releases all the SIO which has RV_WRSHIP_SRV_SIO_FLAG_LOGEND_DONE 
flag set on it. In vol_rv_mdship_srv_done(), we set this flag and do more 
operations on wrsrv. During this time the other SIO which is completed with the 
DONE, calls the function vol_rv_check_wrswaitq() and deletes the SIO of it own 
and other SIO which has the RV_WRSHIP_SRV_SIO_FLAG_LOGEND_DONE flag set. This 
leads to deleting the SIO which is on the fly, and is causing the panic.

RESOLUTION:
The flag must be set just before calling the function vol_rv_mdship_srv_done(), 
and at the end of the SIOdone() to avoid other SIO's to race and delete the 
current running one.

* 2390822 (Tracking ID: 2369786)

SYMPTOM:
On VVR Secondary cluster, if SRL disk goes bad then, then vxconfigd may hang in
transaction code path.

DESCRIPTION:
In case of any error seen in VVR shared disk group environments, error handling
is done cluster wide. On VVR Secondary, if SRL disk goes bad due to some
temporary or actual disk failure, it starts cluster wide error handling. Error
handling requires serialization, in some cases we didn't do serialization which
caused error handling to go in dead loop hence the hang.

RESOLUTION:
Making sure we always serialize the I/O during error handling on VVR Secondary
resolved this issue.

* 2397663 (Tracking ID: 2165394)

SYMPTOM:
If the cloned copy of a diskgroup and a destroyed diskgroup exists on the 
system, an import operation imports destroyed diskgroup instread of cloned one.
For example, consider a system with diskgroup dg containing disk disk1. Disk 
disk01 is cloned to disk02. When diskgroup dg containing disk01 is destroyed and 
diskgroup dg is imported, VXVM should import dg with cloned disk i.e disk02. 
However, it imports the diskgroup dg with disk01.

DESCRIPTION:
After destroying a diskgroup, if the cloned copy of the same diskgroup exists on 
the system, the following disk group import operation wrongly identifies the 
disks to be import and hence destroyed diskgroup gets imported.

RESOLUTION:
The diskgroup import code is modified to identify the correct diskgroup when a 
cloned copy of the destroyed diskgroup exists.

* 2405446 (Tracking ID: 2253970)

SYMPTOM:
Enhancement to customize private region I/O size based on maximum transfer size 
of underlying disk.

DESCRIPTION:
There are different types of Array Controllers which support data transfer 
sizes starting from 256K and beyond. VxVM tunable volmax_specialio controls 
vxconfigd's configuration I/O as well as Atomic Copy I/O size. When 
volmax_specialio is tuned to a value greater than 1MB to leverage maximum 
transfer sizes of underlying disks, import operation is failing for disks which 
cannot accept more than 256K I/O size. If the tunable is set to 256k then it 
will be the case where large transfer size of disks is not being leveraged.

RESOLUTION:
All the above scenarios mentioned in Description are handled in this 
enhancement to leverage large disk transfer sizes as well as support Array 
controllers with 256K transfer sizes.

* 2411052 (Tracking ID: 2268408)

SYMPTOM:
1) On suppressing the underlying path of powerpath controlled device, the disk 
goes in error state. 2) "vxdmpadm exclude vxvm dmpnodename=<emcpower#>" command 
does not suppress TPD devices.

DESCRIPTION:
During discovery, H/W path corresponding to the basename is not generated for 
powerpath controlled devices because basename does not contain the slice 
portion. Device name with s2 slice is expected while generating H/W name.

RESOLUTION:
Whole disk name i.e., device name with s2 slice is used to generate H/W path.

* 2411053 (Tracking ID: 2410845)

SYMPTOM:
If a DG(Disk Group) is imported with reservation key, then during DG deport
lots of 'reservation conflict' messages will be seen.
                
    [DATE TIME] [HOSTNAME] multipathd: VxVM26000: add path (uevent)
    [DATE TIME] [HOSTNAME] multipathd: VxVM26000: failed to store path info
    [DATE TIME] [HOSTNAME] multipathd: uevent trigger error
    [DATE TIME] [HOSTNAME] multipathd: VxVM26001: add path (uevent)
    [DATE TIME] [HOSTNAME] multipathd: VxVM26001: failed to store path info
    [DATE TIME] [HOSTNAME] multipathd: uevent trigger error
    ..
    [DATE TIME] [HOSTNAME] kernel: sd 2:0:0:2: reservation conflict
    [DATE TIME] [HOSTNAME] kernel: sd 2:0:0:2: reservation conflict
    [DATE TIME] [HOSTNAME] kernel: sd 2:0:0:1: reservation conflict
    [DATE TIME] [HOSTNAME] kernel: sd 2:0:0:2: reservation conflict
    [DATE TIME] [HOSTNAME] kernel: sd 2:0:0:1: reservation conflict
    [DATE TIME] [HOSTNAME] kernel: sd 2:0:0:2: reservation conflict

DESCRIPTION:
When removing a PGR(Persistent Group Reservation) key during DG deport, we
need to preempt the key but the preempt operation is failed with reservation
conflict error because the passing key for preemption is not correct.

RESOLUTION:
Code changes are made to set the correct key value for the preemption 
operation.

* 2413077 (Tracking ID: 2385680)

SYMPTOM:
The vol_rv_async_childdone() panic occurred because of corrupted pripendingq

DESCRIPTION:
The pripendingq is always corrupted in this panic. The head
entry is always freed from the queue but not removed. In mdship_srv_done code, 
for error condition, we remove the update from pripendingq only if the next or 
prev pointers of updateq is non-null. This leads to the head pointer not 
getting removed in the abort scenerio and causing the free to happen without 
deleting it from the queue.

RESOLUTION:
The prev and next checks are removed in all the places. Also handled the abort 
case carefully for the following conditions:

1) abort logendq due to slave node panic (i.e.) this has the update entry but 
the update is not removed from the pripendingq.

2) vol_kmsg_eagain type of failures, (i.e.) the update is there, but it is 
removed from the pripendingq.

3) abort very early in the mdship_sio_start() (i.e.) update is allocated but 
not in pripendingq.

* 2413908 (Tracking ID: 2413904)

SYMPTOM:
Performing Dynamic LUN reconfiguration operations (adding and removing LUNs),
can cause corruption in DMP database. This in turn may lead to vxconfigd core
dump OR system panic.

DESCRIPTION:
When a LUN is removed from the VM using  'vxdisk rm' and at the same time some
new LUN is added and in case the newly added LUN reuses the devno of the removed
LUN then this may corrupt the DMP database as this condition is not handled
currently.

RESOLUTION:
Fixed the DMP code to handle the mentioned issue.

* 2415566 (Tracking ID: 2369177)

SYMPTOM:
When using > 2TB disks and the device respons to SCSI inquiry but fails to service
I/O, data corruption can occur as the write I/O would be directed at an incorrect
offset

DESCRIPTION:
Currently when the failed I/O is retried, DMP assumes the offset to be a 32 bit
value and hence I/O offsets >2TB can get truncated leading to the rety I/O issued
at wrong offset value

RESOLUTION:
Change the offset value to a 64 bit quantity to avoid truncation during I/O
retries from DMP.

* 2415577 (Tracking ID: 2193429)

SYMPTOM:
Enclosure attributes like iopolicy, recoveryoption etc do not persist across
reboots in case when before vold startup itself DMP driver is already 
configured before with different array type (e.g. in case of root support) than 
stored in array.info.

DESCRIPTION:
When DMP driver is already configured before vold comes up (as happens in root
support), then the enclosure attributes do not take effect if the enclosure name
in kernel has changed from previous boot cycle. This is because when vold comes 
up
da_attr_list will be NULL. And then it gets events from DMP kernel for data
structures already present in kernel. On receiving this information, it tries to
write da_attr_list into the array.info, but since da_attr_list is NULL, 
array.info
gets overwritten with no data. And hence later vold could not correlate the
enclosure attributes present in dmppolicy.info with enclosures present in
array.info, so the persistent attributes could not get applied.

RESOLUTION:
Do not overwrite array.info of da_attr_list is NULL

* 2417184 (Tracking ID: 2407192)

SYMPTOM:
Application I/O hangs on RVG volumes when RVG logowner is being set on the node
which takes over the master role (either as part of "vxclustadm setmaster" OR as
part of original master leave)

DESCRIPTION:
Whenever a node takes over the master role, RVGs are recovered on the new
master. Because of a race between RVG recovery thread (initiated as part of
master takeover) and the thread which is changing RVG logowner(which is run as
part of "vxrvg set logowner=on",  RVG recovery does not get completed which
leads to I/O hang.

RESOLUTION:
The race condition is handled with appropriate locks and conditional variable.

* 2421100 (Tracking ID: 2419348)

SYMPTOM:
Tags Empty

DESCRIPTION:
This panic is because of race condition between vxconfigd doing a
dmp_reconfigure_db() and another process (vxdclid) executing dmp_passthru_ioctl().

The stack of vxdclid thread:-
000002a107684d51 dmp_get_path_state+0xc(606a5b08140, 301937d9c20, 0, 0, 0, 0)
000002a107684e01 do_passthru_ioctl+0x76c(606a5b08140, 8, 0, 606a506c840,
606a506c848, 0)
000002a107684f61 dmp_passthru_ioctl+0x74(11d000005ca, 40b, 3ad4c0, 100081,
606a3d477b0, 2a107685adc)
000002a107685031 dmpioctl+0x20(11d000005ca, 40b, 3ad4c0, 100081, 606a3d477b0,
2a107685adc)
000002a1076850e1 fop_ioctl+0x20(60582fdfc00, 40b, 3ad4c0, 100081, 606a3d477b0,
1296a58)
000002a107685191 ioctl+0x184(a, 6065a188430, 3ad4c0, ff0bc910, ff1303d8, 40b)
000002a1076852e1 syscall_trap32+0xcc(a, 40b, 3ad4c0, ff0bc910, ff1303d8, ff13a5a0)

And the stack of vxconfid which is doing reconfiguarion:-
vxdmp:dmp_get_iocount+0x68(0x7)
vxdmp:dmp_check_ios_drained+0x40()
vxdmp:dmp_check_ios_drained_in_dmpnode+0x40(0x60693cc0f00, 0x20000000)
vxdmp:dmp_decode_destroy_dmpnode+0x11c(0x2a10536b698, 0x102003, 0x0, 0x19caa70)
vxdmp:dmp_decipher_instructions+0x2e4(0x2a10536b758, 0x10, 0x102003, 0x0, 0x19caa70)
vxdmp:dmp_process_instruction_buffer+0x150(0x11d0003ffff, 0x3df634, 0x102003,
0x0, 0x19caa70)
vxdmp:dmp_reconfigure_db+0x48()
vxdmp:gendmpioctl(0x11d0003ffff, , 0x3df634, 0x102003, 0x604a7017298,
0x2a10536badc) 
vxdmp:dmpioctl+0x20(, 0x444d5040, 0x3df634, 0x102003, 0x604a7017298)

In vxdclid thread we are trying to get the dmpnode from path_t structure. But at
the same time path_t has been freed as part of reconfiguration. So hence the panic.

RESOLUTION:
Get the dmpnode from lvl1tab table instead of path_t structure. Because there is
an ioctl is going on this dmpnode, so dmpnode will be available at this time.

* 2421491 (Tracking ID: 2396293)

SYMPTOM:
On VXVM rooted systems, during machine bootup, vxconfigd core dumps with
following assert and machine does not bootup.
Assertion failed: (0), file auto_sys.c, line 1024
05/30 01:51:25:  VxVM vxconfigd ERROR V-5-1-0 IOT trap - core dumped

DESCRIPTION:
DMP deletes and regenerates device numbers dynamically on every
boot. When we start static vxconfigd in boot mode, since ROOT file system is
READ only, new DSF's for DMP nodes are not created. But, DMP configures devices
in userland and kernel.
So, there is mismatch in device numbers of the DSF's and that in DMP kernel, as
there are stale DSF's from previous boot present.
This leads vxconfigd to actually send I/O's to wrong device numbers resulting in
claiming disk with wrong format.

RESOLUTION:
Issue is fixed by getting the device numbers from vxconfigd and not
doing stat on DMP DSF's.

* 2423086 (Tracking ID: 2033909)

SYMPTOM:
Disabling a controller of a A/P-G type array could lead to I/O hang even when 
there are available paths for I/O.

DESCRIPTION:
DMP was not clearing a flag, in an internal DMP data structure,  to enable I/O 
to all the LUNs during group failover operation.

RESOLUTION:
DMP code modified to clear the appropriate flag for all the LUNs of the LUN 
group so that the failover can occur when a controller is disabled.

* 2428179 (Tracking ID: 2425722)

SYMPTOM:
VxVM's subdisk operation - vxsd mv <source_subdisk> <destination_subdisk> - 
fails on subdisk sizes greater than or equal to 2TB. 

Eg: 

#vxsd -g nbuapp mv disk_1-03 disk_2-03 

VxVM vxsd ERROR V-5-1-740 New subdisks have different size than subdisk disk_1-
03, use -o force

DESCRIPTION:
VxVM code uses 32-bit unsigned integer variable to store the size of subdisks 
which can only accommodate values less than 2TB. Thus, for larger subdisk sizes 
integer overflows resulting in the subdisk move operation failure.

RESOLUTION:
The code has been modified to accommodate larger subdisk sizes.

* 2435050 (Tracking ID: 2421067)

SYMPTOM:
With VVR configured, 'vxconfigd' hangs on the primary site when trying to recover 
the SRL log, after a system or storage failure.

DESCRIPTION:
At the start of each SRL log disk we keep a config header. Part of this header 
includes a flag which is used by VVR to serialize the flushing of the SRL 
configuration table, to ensure only a single thread flushes the table at any one 
time.
In this instance, the 'VOLRV_SRLHDR_CONFIG_FLUSHING' flag was set in the config 
header, and then the config header was written to disk. At this point the storage 
became inaccessible.
During recovery the config header was read from from disk, and when trying to 
initiate a new flush of the SRL table, the system hung as the flag was already 
set to indicate that a flush was in progress.

RESOLUTION:
When loading the SRL header from disk, the flag 'VOLRV_SRLHDR_CONFIG_FLUSHING' is 
now cleared.

* 2436283 (Tracking ID: 2425551)

SYMPTOM:
The cvm reconfiguration takes 1 minute for each RVG configuration.

DESCRIPTION:
Every RVG is given 1 minute time to drain the IO, if not drained, then the code
wait for 1 minute before aborting the I/O's waiting in the logendq. The logic
is such that, for every RVG, it wait 1 minute for the I/O's to drain.

RESOLUTION:
It should be enough to give oveall 1 minute for all RVGs, and abort all the 
RVG's after 1 minute time, instead of waiting for each RVG.
The alternate solution (long term solution) is,

Abort the RVG immediately when the objiocount(rv) == queue_count(logendq). This
will reduce the 1 minute dealy further down to the actual requirend time. In 
this, follwoing things to be take care
1. rusio may be active, which need to be reduced in iocount.
2. every I/O goes into the logendq before getting serviced. So, have to make 
sure they are not in the process of servicing.

* 2436287 (Tracking ID: 2428875)

SYMPTOM:
On a CVR configuration, issue i/o from both master and slave. reboot the slave
lead to reconfiguration hang.

DESCRIPTION:
The I/O's on both master and slave fills up the SRL and goes to the DCM mode. In
DCM mode, the header flush to flush the DCM and the SRL header happens for every
512 updates. Since most of the I/O's are from the SLAVe node, the I/O's
throttled due to the hdr_flush is queued in mdship_throttle_q. This queue is
flushed at the end of header flush. If the slave node is rebooted and when the
SIO are in throttle_q, and when the system is rebooted, the reconfig code
path dont flush the mdship_throttleq and wait for them to drain. This lead to
the reconfiguration hang due to positive I/O count.

RESOLUTION:
abort all the SIO's queued in the mdship_throttleq, when the node is aborted.
Restart the SIO's for the nodes that did not leave.

* 2436288 (Tracking ID: 2411698)

SYMPTOM:
I/Os hang in CVR (Clustered Volume Replicator) environment.

DESCRIPTION:
In CVR environment, when CVM (Clustered Volume Manager) Slave node sends a 
write request to the CVM Master node, following tasks occur.

1) Master grabs the *REGION LOCK* for the write and permits slave to issue the 
write.
2) When new IOs occur on the same region (till the write that acquired *REGION 
LOCK* is not complete), they wait in a *REGION LOCK QUEUE*.
3) Once the IO that acquired the *REGION LOCK* is serviced by slave node, it 
responds to the Master about the same, and Master processes the IOs queued in 
the *REGION LOCK QUEUE*.

The problem occurs when the slave node dies before sending the response to the 
Master about completion of the IO that held the *REGION LOCK*.

RESOLUTION:
Code changes have been made to accomodate the condition as mentioned in the 
section "DESCRIPTION".

* 2440351 (Tracking ID: 2440349)

SYMPTOM:
The grow operation on a DCO volume may grow it into any 'site' not
honoring the allocation requirements strictly.

DESCRIPTION:
When a DCO volume is grown, it may not honor the allocation
specification strictly to use only a particular site even though they are
specified explicitly.

RESOLUTION:
The Data Change Object of Volume Manager is modified such that it
will honor the alloc specification strictly if provided explicitly

* 2442850 (Tracking ID: 2317703)

SYMPTOM:
When the vxesd daemon is invoked by device attach & removal operations in a loop,
it leaves open file descriptors with vxconfigd daemon

DESCRIPTION:
The issue is caused due to multiple vxesd daemon threads trying to establish
contact with vxconfigd daemon at the same time and ending up using losing track of
the file descriptor through which the communication channel was established

RESOLUTION:
The fix for this issue is to maintain a single file descriptor that has a thread
safe reference counter thereby not having multiple communication channels
established between vxesd and vxconfigd by various threads of vxesd.

* 2477291 (Tracking ID: 2428631)

SYMPTOM:
Shared DG import or Node Join fails with Hitachi Tagmastore storage

DESCRIPTION:
CVM uses different fence key for every DG. The key format is of type
'NPGRSSSS' where N is the node id (A, B, C..) and 'SSSS' is the sequence number.
Some arrays have a restriction on total number of unique keys that can be
registered (eg Hitachi Tagmastore) and hence causes issues for configs involving
large number of DGs, rather the product of #DGs and #nodes in the cluster.

RESOLUTION:
Having a unique key for each DG is not essential. Hence a tunable is added to
control this behavior. 

# vxdefault list
KEYWORD                        CURRENT-VALUE   DEFAULT-VALUE
...
same_key_for_alldgs            off             off
...

Default value of the tunable is 'off' to preserve the current behavior. If a
configuration hits the storage array limit on total number of unique keys, the
tunable value could be changed to 'on'. 

# vxdefault set same_key_for_alldgs on
# vxdefault list
KEYWORD                        CURRENT-VALUE   DEFAULT-VALUE
...
same_key_for_alldgs            on              off
...

This would make CVM generate same key for all subsequent DG imports/creates.
Already imported DGs need to be deported and re-imported for them to take into
consideration the changed value of the tunable.

* 2479746 (Tracking ID: 2406292)

SYMPTOM:
In case of I/Os on volumes having multiple subdisks (example striped volumes),
System panicks with following stack.

unix:panicsys+0x48()
unix:vpanic_common+0x78()
unix:panic+0x1c()
genunix:kmem_error+0x4b4()
vxio:vol_subdisksio_delete() - frame recycled
vxio:vol_plexsio_childdone+0x80()
vxio:volsiodone() - frame recycled
vxio:vol_subdisksio_done+0xe0()
vxio:volkcontext_process+0x118()
vxio:voldiskiodone+0x360()
vxio:voldmp_iodone+0xc()
genunix:biodone() - frame recycled
vxdmp:gendmpiodone+0x1ec()
ssd:ssd_return_command+0x240()
ssd:ssdintr+0x294()
fcp:ssfcp_cmd_callback() - frame recycled
qlc:ql_fast_fcp_post+0x184()
qlc:ql_status_entry+0x310()
qlc:ql_response_pkt+0x2bc()
qlc:ql_isr_aif+0x76c()
pcisch:pci_intr_wrapper+0xb8()
unix:intr_thread+0x168()
unix:ktl0+0x48()

DESCRIPTION:
On a striped volume, the IO is split in to multiple parts equivalent to the 
number of sub-disks in the stripe. Each part of the IO is processed parallell 
by different threads. Thus any such two threads processing the IO completion 
can enter in to a race condition. Due to such race condition one of the threads 
happens to access a stale address causing the system panic.

RESOLUTION:
The critical section of code is modified to hold appropriate locks to avoid 
race condition.

* 2480006 (Tracking ID: 2400654)

SYMPTOM:
"vxdmpadm listenclosure" command hangs because of duplicate enclosure entries in
/etc/vx/array.info file.

Example: 

Enclosure "emc_clariion0" has two entries.

#cat /etc/vx/array.info
DD4VM1S
emc_clariion0
0
EMC_CLARiiON
DISKS
disk
0
Disk
DD3VM2S
emc_clariion0
0
EMC_CLARiiON

DESCRIPTION:
When "vxdmpadm listenclosure" command is run, vxconfigd reads its in-core
enclosure list which is populated from the /etc/vx/array.info file. Since the
enclosure "emc_clariion0" (as mentioned in the example) is also a last entry
within the file, the command expects vxconfigd to return the enclosure
information at the last index of the enclosure list. However because of
duplicate enclosure entries, vxconfigd returns a different enclosure information
thereby leading to the hang.

RESOLUTION:
The code changes are made in vxconfigd to detect duplicate entries in
/etc/vx/array.info file and return the appropriate enclosure information as
requested by the vxdmpadm command.

* 2484466 (Tracking ID: 2480600)

SYMPTOM:
I/O of large sizes like 512k and 1024k hang in CVR (Clustered Volume 
Replicator).

DESCRIPTION:
When large IOs, say, of sizes like, 1MB, are performed on volumes under RVG 
(Replicated Volume Group), a limited number of IOs can be accomodated based on 
RVIOMEM pool limit. So, the pool remains full for majority of the duration.At 
this time, when CVM (Clustered Volume Manager) slave gets rebooted, or goes 
down, the pending IOs are aborted and the corresponding memory is freed. In one 
of the cases, it does not get freed, leading to the hang.

RESOLUTION:
Code changes have been made to free the memory under all scenarios.

* 2484695 (Tracking ID: 2484685)

SYMPTOM:
In a Storage Foundation environment running Symantec Oracle Disk Manager (ODM), 
Veritas File System (VxFS) and Volume Manager (VxVM), a system panic may occur 
with following the stack trace:

  000002a10247a7a1 vpanic()
  000002a10247a851 kmem_error+0x4b4()
  000002a10247a921 vol_subdisksio_done+0xe0()
  000002a10247a9d1 volkcontext_process+0x118()
  000002a10247aaa1 voldiskiodone+0x360()
  000002a10247abb1 voldmp_iodone+0xc()
  000002a10247ac61 gendmpiodone+0x1ec()
  000002a10247ad11 ssd_return_command+0x240()
  000002a10247add1 ssdintr+0x294()
  000002a10247ae81 ql_fast_fcp_post+0x184()
  000002a10247af31 ql_24xx_status_entry+0x2c8()
  000002a10247afe1 ql_response_pkt+0x29c()
  000002a10247b091 ql_isr_aif+0x76c()
  000002a10247b181 px_msiq_intr+0x200()
  000002a10247b291 intr_thread+0x168()
  000002a10240b131 cpu_halt+0x174()
  000002a10240b1e1 idle+0xd4()
  000002a10240b291 thread_start+4()

DESCRIPTION:
A race condition exists between two IOs (specifically Volume Manager subdisk 
level staged I/Os) while doing 'done' processing which causes one thread to 
free FS-VM private information data structure before other thread accesses it. 

The propensity of the race increases by increasing the number of CPUs.

RESOLUTION:
Avoid the race condition such that the slower thread doesn't access the freed 
FS-VM private information data structure.

* 2485278 (Tracking ID: 2386120)

SYMPTOM:
Error messages printed in the syslog in the event of master takeover 
failure in some situations are not be enough to find out the root cause of the
failure.

DESCRIPTION:
During master takeover if the new master encounters some errors, 
the master takeover operation fails. We have messages in the code to log the
reasons for the failure. These log messages are not available on the customer
setups. These are generally enabled in the internal development\testing 
scenarios.

RESOLUTION:
Some of the relevant messages have been modified such that they will
now be available on the customer setups as well, logging crucial information
for root cause analysis of the issue.

* 2485288 (Tracking ID: 2431470)

SYMPTOM:
vxpfto sets PFTO(Powerfail Timeout) value on a wrong VxVM device.

HP-

DESCRIPTION:
vxpfto invokes 'vxdisk set' command to set the PFTO value. 
vxdisk accepts both DA(Disk Access) and DM(Disk Media) names for device 
specification. DA and DM names can have conflicts such that even within the 
same disk group, the same name can refer to different devices - one as a DA 
name and another as a DM name. vxpfto command uses DM names when invoking the 
vxdisk command but vxdisk will choose a matching DA name before a DM name. This 
causes incorrect device to be acted upon.

HP-

RESOLUTION:
Fixed the argument check procedure in 'vxdisk set' based on the 
common rule of VxVM (i.e.) if a disk group is specified with '-g' option, then 
only DM name is supported, else it can be a DA name.

* 2488042 (Tracking ID: 2431423)

SYMPTOM:
Panic in vol_mv_commit_check() while accessing Data Change Map(DCM) object. Stack
trace of panic
 
 vol_mv_commit_check at ffffffffa0bef79e
 vol_ktrans_commit at ffffffffa0be9b93
 volconfig_ioctl at ffffffffa0c4a957
 volsioctl_real at ffffffffa0c5395c
 vols_ioctl at ffffffffa1161122
 sys_ioctl at ffffffff801a2a0f
 compat_sys_ioctl at ffffffff801ba4fb
 sysenter_do_call at ffffffff80125039

DESCRIPTION:
In case of DCM failure, object pointer is set to NULL as part of transaction. If
DCM is active then we try to access DCM object in transaction code path without
checking it to be NULL. DCM object pointer could be NULL in case of failed DCM.
Accessing object pointer without check for NULL caused this panic.

RESOLUTION:
Fix is to put NULL check for DCM object in transaction code path.

* 2491856 (Tracking ID: 2424833)

SYMPTOM:
VVR primary node crashes while replicating in lossy and high latency network with
multiple TCP connections. In debug VxVM build TED assert is hit with following
stack :

brkpoint+000004 ()
ted_call_demon+00003C (0000000007D98DB8)
ted_assert+0000F0 (0000000007D98DB8, 0000000007D98B28,
   0000000000000000)
.hkey_legacy_gate+00004C ()
nmcom_send_msg_tcp+000C20 (F100010A83C4E000, 0000000200000002,
   0000000000000000, 0000000000000000, 0000000000000000, 0000000000000000,
   000000DA000000DA, 0000000100000000)
.nmcom_connect_tcp+0007D0 ()
vol_rp_connect+0012D0 (F100010B0408C000)
vol_rp_connect_start+000130 (F1000006503F9308, 0FFFFFFFF420FC50)
voliod_iohandle+0000AC (F1000006503F9308, 0000000100000001,
   0FFFFFFFF420FC50)
voliod_loop+000CFC (0000000000000000)
vol_kernel_thread_init+00002C (0FFFFFFFF420FFF0)
threadentry+000054 (??, ??, ??, ??)

DESCRIPTION:
In lossy and high latency network, connection between VVR primary and seconadry
can get closed and re-established frequently because of heartbeat timeouts or
DATA acknowledgement timeouts. In TCP multi-connection scenario, VVR primary sends
its very first message (called NMCOM_HANDSHAKE) to secondary on zeroth socket
connection number and then it sends "NMCOM_SESSION" message for each of the next
connections. By some reasons, if the sending of the NMCOM_HANDSHAKE message fails,
VVR primary tries to send it through the another connection without checking
whether it's a valid connection or NOT.

RESOLUTION:
Code changes are made in VVR to use the other connections only after all the
connections are established.

Patch ID: 142630-11

* 2280640 (Tracking ID: 2205108)

SYMPTOM:
On VxVM 5.1SP1 or later, device discovery operations such as vxdctl
enable, vxdisk scandisks and vxconfigd -k failed to claim new  disks correctly.
For example, if user provisions five new disks, VxVM, instead of creating five
different Dynamic Multi-Pathing (DMP) nodes, creates only one and includes the 
rest as its paths. Also, the following message is displayed at console during 
this problem.  

NOTICE: VxVM vxdmp V-5-0-34 added disk array , datype =

Please note that the cabinet serial number following "disk array" and the value
of "datype" is not printed in the above message.

DESCRIPTION:
VxVM's DDL (Device Discovery Layer) is responsible for appropriately claiming
the newly provisioned disks. Due to a bug in one of the routines within this
layer, though the disks are claimed, their LSN (Lun Serial Number, an unique
identifier of disks) is ignored thereby every disk is wrongly categorized 
within a DMP node.

RESOLUTION:
Modified the problematic code within the DDL thereby new disks are claimed
appropriately.

WORKAROUND:

If vxconfigd does not hang or dump a core with this issue, a reboot can be a
workaround to recover this situation or to break up once and rebuild the DMP/DDL
database on the devices as the following steps;

# vxddladm excludearray all
# mv /etc/vx/jbod.info /etc/vx/jbod.info.org
# vxddladm disablescsi3
# devfsadm -Cv
# vxconfigd -k

# vxddladm includearray all
# mv /etc/vx/jbod.info.org /etc/vx/jbod.info
# vxddladm enablescsi3
# rm /etc/vx/disk.info /etc/vx/array.info
# vxconfigd -k

* 2291967 (Tracking ID: 2286559)

SYMPTOM:
System panics in DMP (Dynamic Multi Pathing) kernel module due to kernel heap 
corruption while DMP path failover is in progress.

Panic stack may look like:

vpanic
kmem_error+0x4b4()
gen_get_enabled_ctlrs+0xf4()
dmp_get_enabled_ctlrs+0xf4()
dmp_info_ioctl+0xc8()
dmpioctl+0x20()
dmp_get_enabled_cntrls+0xac()
vx_dmp_config_ioctl+0xe8()
quiescesio_start+0x3e0()
voliod_iohandle+0x30()
voliod_loop+0x24c()
thread_start+4()

DESCRIPTION:
During path failover in DMP, the routine gen_get_enabled_ctlrs() allocates 
memory proportional to the number of enabled paths. However, while releasing 
the memory, the routine may end up freeing more memory because of the change in 
number of enabled paths.

RESOLUTION:
Code changes have been made in the routines to free allocated memory only.

* 2299977 (Tracking ID: 2299670)

SYMPTOM:
VxVM disk groups created on EFI (Extensible Firmware Interface) LUNs do not get
auto-imported during system boot in VxVM version 5.1SP1 and later.

DESCRIPTION:
While determining the disk format of EFI LUNs, stat() system call on the
corresponding DMP devices fail with ENOENT ("No such file or directory") error
because the DMP device nodes are not created in the root file system during
system boot. This leads to failure in auto-import of disk groups created on EFI
LUNs.

RESOLUTION:
VxVM code is modified to use OS raw device nodes if stat() fails on DMP device
nodes.

* 2318820 (Tracking ID: 2317540)

SYMPTOM:
System panic due to kernel heap corruption while DMP device driver unload. 

Panic stack on Solaris (when kmem_flags is set to either 0x100 or 0xf) should be
similar to as below:

vpanic()
kmem_error+0x4b4()
dmp_free_stats_table+0x118()
dmp_free_modules+0x24()
vxdmp`_fini+0x178()
moduninstall+0x148()
modunrload+0x6c()
modctl+0x54()
syscall_trap+0xac()

DESCRIPTION:
During DMP kernel device driver unload, it frees all the allocated kernel heap
memory. As part of freeing allocated memory, DMP is trying to free more than the
allocated buffer size for one of the allocated buffer, which is leading to
system panic when kernel memory audit is enabled.

RESOLUTION:
Source code is modified to free the kernel buffer, which is aligned to the
allocation size.

* 2320613 (Tracking ID: 2313021)

SYMPTOM:
In Sun Cluster environment, nodes fail to join the CVM cluster after 
their reboot displaying following messages on console :

<> vxio: [ID 557667 kern.notice] NOTICE: VxVM vxio V-5-3-1251 joinsio_done:
Overlapping reconfiguration, failing the join for node 1. The join will be 
retried.
<> vxio: [ID 976272 kern.notice] NOTICE: VxVM vxio V-5-3-672 abort_joinp: 
aborting
joinp for node 1 with err 11
<> vxvm:vxconfigd: [ID 702911 daemon.notice] V-5-1-12144 CVM_VOLD_JOINOVER 
command
received with error

DESCRIPTION:
A reboot of a node within CVM cluster involves a "node leave" followed 
by a "node join" reconfiguration. During CVM reconfiguration, each node 
exchanges reconfiguration messages with other nodes using the UDP protocol. At 
the end of a CVM reconfiguration, the messages exchanged should be deleted from 
all the nodes in the cluster. However, due to a bug in CVM, the messages weren't 
deleted as part of the "node leave" reconfiguration processing in some nodes 
that resulted in failure of subsequent "node join" reconfigurations.

RESOLUTION:
After every CVM reconfiguration, the processed reconfiguration messages on 
all the nodes in the CVM cluster are deleted properly.

* 2322742 (Tracking ID: 2108152)

SYMPTOM:
vxconfigd, the VxVM volume configuration daemon startup fails to get into
enabled mode and "vxdctl enable" command displays the error "VxVM vxdctl ERROR
V-5-1-1589 enable failed: Error in disk group configuration copies ".

DESCRIPTION:
vxconfigd issues input/output control system call (ioctl) to read the disk
capacity from disks. However, if it fails, the error number is not propagated
back to vxconfigd. The subsequent disk operations to these failed devices were
causing vxconfigd to get into disabled mode.

RESOLUTION:
The fix is made to propagate the actual "error number" returned by the ioctl
failure back to vxconfigd.

* 2322757 (Tracking ID: 2322752)

SYMPTOM:
Duplicate device names are observed for NR (Not Ready) devices, when vxconfigd 
is restarted (vxconfigd -k).

# vxdisk list 

emc0_0052    auto            -            -            error
emc0_0052    auto:cdsdisk    -            -            error
emc0_0053    auto            -            -            error
emc0_0053    auto:cdsdisk    -            -            error

DESCRIPTION:
During vxconfigd restart, disk access records are rebuilt in vxconfigd 
database. As part of this process IOs are issued on all the devices to read the 
disk private regions. The failure of these IOs on NR devicess resulted in 
creating duplicate disk access records.

RESOLUTION:
vxconfigd code is modified not to create dupicate disk access records.

* 2333255 (Tracking ID: 2253552)

SYMPTOM:
vxconfigd leaks memory while reading the default tunables related to
smartmove (a VxVM feature).

DESCRIPTION:
In Vxconfigd, memory allocated for default tunables related to
smartmove feature is not freed causing a memory leak.

RESOLUTION:
The memory is released after its scope is over.

* 2333257 (Tracking ID: 1675599)

SYMPTOM:
Vxconfigd leaks memory while excluding and including a Third party Driver
controlled LUN in a loop. As part of this vxconfigd loses its license information
and following error is seen in system log:
        "License has expired or is not available for operation"

DESCRIPTION:
In vxconfigd code, memory allocated for various data structures related to
device discovery layer is not freed which led to the memory leak.

RESOLUTION:
The memory is released after its scope is over.

* 2337237 (Tracking ID: 2337233)

SYMPTOM:
Excluding a TPD device with "vxdmpadm exclude" command does not work. The
excluded device is still shown in "vxdisk list" outout.

Example:
# vxdmpadm exclude vxvm dmpnodename=emcpower22s2


# cat /etc/vx/vxvm.exclude
exclude_all 0
paths
emcpower22c /pseudo/emcp@22 emcpower22s2
#
controllers
#
product
#
pathgroups
#


# vxdisk scandisks


# vxdisk list | grep emcpower22s2
emcpower22s2 auto:sliced     -            -            online

DESCRIPTION:
Because of a bug in the logic of path name comparison, DMP ends up including the
disks in device discovery which are part of the exclude list.

RESOLUTION:
The code in DMP is corrected to handle path name comparison appropriately.

* 2337354 (Tracking ID: 2337353)

SYMPTOM:
The "vxdmpadm include" command is including all the excluded devices along with 
the device given in the command.

Example:

# vxdmpadm exclude vxvm dmpnodename=emcpower25s2
# vxdmpadm exclude vxvm dmpnodename=emcpower24s2

# more /etc/vx/vxvm.exclude
exclude_all 0
paths
emcpower24c /dev/rdsk/emcpower24c emcpower25s2
emcpower10c /dev/rdsk/emcpower10c emcpower24s2
#
controllers
#
product
#
pathgroups
#

# vxdmpadm include vxvm dmpnodename=emcpower24s2

# more /etc/vx/vxvm.exclude
exclude_all 0
paths
#
controllers
#
product
#
pathgroups
#

DESCRIPTION:
When a dmpnode is excluded, an entry is made in /etc/vx/vxvm.exclude file. This 
entry has to be removed when the dmpnode is included later. Due to a bug in 
comparison of dmpnode device names, all the excluded devices are included.

RESOLUTION:
The bug in the code which compares the dmpnode device names is rectified.

* 2339254 (Tracking ID: 2339251)

SYMPTOM:
In Solaris 10 version, newfs/mkfs_ufs(1M) fails to create UFS file system 
on "VxVM volume > 2 Tera Bytes" with the following error:

    # newfs /dev/vx/rdsk/[disk group]/[volume]
    newfs: construct a new file system /dev/vx/rdsk/[disk group]/[volume]: 
(y/n)? y
    Can not determine partition size: Inappropriate ioctl for device

The truss output of the newfs/mkfs_ufs(1M) shows that the ioctl() system calls, 
to identify the size of the disk or volume device, fails with ENOTTY error.

    ioctl(3, 0x042A, ...)                    Err#25 ENOTTY
    ...
    ioctl(3, 0x0412, ...)                    Err#25 ENOTTY

DESCRIPTION:
In Solaris 10 version, newfs/mkfs_ufs(1M) uses ioctl() system calls, to 
identify the size of the disk or volume device, when creating UFS file system 
on disk or volume devices "> 2TB". If the Operating System (OS) version is less 
than Solaris 10 Update 8, the above ioctl system calls are invoked on "volumes 
> 1TB" as well.

VxVM, Veritas Volume Manager exports the ioctl interfaces for VxVM volumes. 
VxVM 5.1 SP1 RP1 P1 and VxVM 5.0 MP3 RP3 introduced the support for Extensible 
Firmware Interface (EFI) for VxVM volumes in Solaris 9 and Solaris 10 
respectively. However the corresponding EFI specific build time definition in 
Veritas Kernel IO driver (VXIO) was not updated in Solaris 10 in VxVM 5.1 SP1 
RP1 P1 and onwards.

RESOLUTION:
The code changes to add the build time definition for EFI in VXIO entails in 
newfs/mkfs_ufs(1M) successfully creating UFS file system on VxVM volume 
devices "> 2TB" ("> 1TB" if OS version is less than Solaris 10 Update 8).

* 2346469 (Tracking ID: 2346470)

SYMPTOM:
The Dynamic Multi Pathing Administration operations such as "vxdmpadm 
exclude vxvm dmpnodename=<daname>" and "vxdmpadm include vxvm dmpnodename=
<daname>" triggers memory leaks in the heap segment of VxVM Configuration Daemon 
(vxconfigd).

DESCRIPTION:
vxconfigd allocates chunks of memory to store VxVM specific information 
of the disk being included during "vxdmpadm include vxvm dmpnodename=<daname>" 
operation. The allocated memory is not freed while excluding the same disk from 
VxVM control. Also when excluding a disk from VxVM control, another chunk of 
memory is temporarily allocated by vxconfigd to store more details of the device 
being excluded. However this memory is not freed at the end of exclude 
operation.

RESOLUTION:
Memory allocated during include operation of a disk is freed during 
corresponding exclude operation of the disk. Also temporary memory allocated 
during exclude operation of a disk is freed at the end of exclude operation.

* 2349497 (Tracking ID: 2320917)

SYMPTOM:
vxconfigd, the VxVM configuration daemon dumps core and loses disk group 
configuration while invoking the following VxVM reconfiguration steps:

1)	Volumes which were created on thin reclaimable disks are deleted.
2)	Before the space of the deleted volumes is reclaimed, the disks (whose 
volume is deleted) are removed from the DG with  'vxdg rmdisk' command using '-
k' option.
3)	The disks  are removed using  'vxedit rm' command.
4)	 New disks are added to the disk group using 'vxdg addisk' command.

The stack trace of the core dump is :
[
 0006f40c rec_lock3 + 330
 0006ea64 rec_lock2 + c
 0006ec48 rec_lock2 + 1f0
 0006e27c rec_lock + 28c
 00068d78 client_trans_start + 6e8
 00134d00 req_vol_trans + 1f8
 00127018 request_loop + adc
 000f4a7c main  + fb0
 0003fd40 _start + 108
]

DESCRIPTION:
When a volume is deleted from a disk group that uses thin reclaim luns, 
subdisks are not removed immediately, rather it is marked with a special flag. 
The reclamation happens at a scheduled time every day. "vxdefault" command can 
be invoked to list and modify the settings.

After the disk is removed from disk group using 'vxdg -k rmdisk' and 'vxedit 
rm' command, the subdisks records are still in core database and they are 
pointing to disk media record which has been freed. When the next command is 
run to add another new disk to the disk group, vxconfigd dumps core when 
locking the disk media record which has already been freed.

The subsequent disk group deport and import commands erase all disk group 
configuration as it detects an invalid association between the subdisks and the 
removed disk.

RESOLUTION:
1)	The following message will be printed when 'vxdg rmdisk' is used to 
remove disk that has reclaim pending subdisks:

VxVM vxdg ERROR V-5-1-0 Disk <diskname> is used by one or more subdisks which
are pending to be reclaimed.
        Use "vxdisk reclaim <diskname>" to reclaim space used by these subdisks,
        and retry "vxdg rmdisk" command.
        Note: reclamation is irreversible.

2)	Add a check when using 'vxedit rm' to remove disk. If the disk is in 
removed state and has reclaim pending subdisks, following error message will be 
printed:

VxVM vxedit ERROR V-5-1-10127 deleting <diskname>:
        Record is associated

* 2349553 (Tracking ID: 2353493)

SYMPTOM:
On Solaris 10, "pkgchk" command on VxVM package fails with the following error:
#pkgchk -a VRTSvxvm 
 ERROR: /usr/lib/libvxscsi.so.SunOS_5.10
    pathname does not exist

DESCRIPTION:
During installation of the VxVM package, the VxVM's libraries libvxscsi.so did
not get installed in the path /usr/lib/libvxscsi.so.SunOS_5.10, which is a
pre-requisite for successful execution of 'pkgchk'command.

RESOLUTION:
VxVM's installation scripts are modified to include the library at the correct
location.
installation.

* 2353429 (Tracking ID: 2334757)

SYMPTOM:
Vxconfigd consumes a lot of memory when the DMP tunable
dmp_probe_idle_lun is set on.  "pmap" command on vxconfigd process shows
continuous growing heap.

DESCRIPTION:
DMP path restoration daemon probes idle LUNs(Idle LUNs are VxVM disks on
which no I/O requests are scheduled) and generates notify events to vxconfigd. 
        Vxconfigd in turn send the nofification of these events to its clients.
For any reasons, if vxconfigd could not deliver  these events (because client is
busy processing earlier sent event), it keeps these events to itself.
        Because of this slowness of events consumption by its clients, memory
consumption of vxconfigd grows.

RESOLUTION:
dmp_probe_idle_lun is set to off by default.

* 2357935 (Tracking ID: 2349352)

SYMPTOM:
Data corruption is observed on DMP device with single path during Storage 
reconfiguration (LUN addition/removal).

DESCRIPTION:
Data corruption can occur in the following configuration, when new LUNs are 
provisioned or removed under VxVM, while applications are on-line.
 
1. The DMP device naming scheme is EBN (enclosure based naming) and 
persistence=no
2. The DMP device is configured with single path or the devices are controlled 
by Third Party Multipathing Driver (Ex: MPXIO, MPIO etc.,)
 
There is a possibility of change in name of the VxVM devices (DA record), when 
LUNs are removed or added followed by the following commands, since the 
persistence naming is turned off.
 
(a) vxdctl enable
(b) vxdisk scandisks
 
Execution of above commands discovers all the devices and rebuilds the device 
attribute list with new DMP device names. The VxVM device records are then 
updated with this new attributes. Due to a bug in the code, the VxVM device 
records are mapped to wrong DMP devices. 
 
Example:
 
Following are the device before adding new LUNs.
 
sun6130_0_16 auto            -            -            nolabel
sun6130_0_17 auto            -            -            nolabel
sun6130_0_18 auto:cdsdisk    disk_0       prod_SC32    online nohotuse
sun6130_0_19 auto:cdsdisk    disk_1       prod_SC32    online nohotuse
 
The following are after adding new LUNs
 
sun6130_0_16 auto            -            -            nolabel
sun6130_0_17 auto            -            -            nolabel
sun6130_0_18 auto            -            -            nolabel
sun6130_0_19 auto            -            -            nolabel
sun6130_0_20 auto:cdsdisk    disk_0       prod_SC32    online nohotuse
sun6130_0_21 auto:cdsdisk    disk_1       prod_SC32    online nohotuse
 
The name of the VxVM device sun6130_0_18 is changed to sun6130_0_20.

RESOLUTION:
The code that updates the VxVM device records is rectified.

* 2364294 (Tracking ID: 2364253)

SYMPTOM:
In case of Space Optimized snapshots at secondary site, VVR leaks kernel memory.

DESCRIPTION:
In case of Space Optimized snapshots at secondary site, VVR proactively starts
the copy-on-write on the snapshot volume. The I/O buffer allocated for this
proactive copy-on-write was not freed even after I/Os are completed which lead
to the memory leak.

RESOLUTION:
After the proactive copy-on-write is complete, memory allocated for the I/O
buffers is released.

* 2366071 (Tracking ID: 2366066)

SYMPTOM:
The VxVM (Veritas Volume Manager) vxstat command displays absurd statistics for
READ & WRITE operations on VxVM objects. The
absurd statistics is near to the max value of a 32-bit unsigned integer.

For example  :
# vxstat -g <disk group name> -i <interval>

                      OPERATIONS          BLOCKS           AVG TIME(ms)
TYP NAME              READ     WRITE      READ     WRITE   READ  WRITE

<Start Time>
vol <volume name>             10       303       112      2045   6.15  14.43

<Start Time> + 60 seconds
vol <volume name>              2        67        32       476   6.00  14.28

<Start Time> + 60*2 seconds
vol <volume name>      4294967288 4294966980 4294967199 4294965129   0.00   0.00

DESCRIPTION:
vxio, a VxVM driver, uses 32-bit unsigned integer variable to keep track of the
number of READ & WRITE blocks on VxVM objects.
Whenever the 32-bit unsigned integer overflows, vxstat displays the absurd
statistics as shown in SYMPTOM section above.

RESOLUTION:
Both vxio driver and vxstat command have been modified to accommodate larger
number of READ & WRITE blocks on VxVM objects.

Patch ID: 142630-10

* 2256685 (Tracking ID: 2080730)

SYMPTOM:
On Linux, exclusion of devices using the "vxdmpadm exclude" CLI is not
persistent across reboots.

DESCRIPTION:
On Linux, names of OS devices (/dev/sd*) are not persistent. The
"vxdmpadm exclude" CLI uses the OS device names to keep track of
devices to be excluded by VxVM/DMP. As a result, on reboot, if the OS
device names change, then the devices which are intended to be excluded
will be included again.

RESOLUTION:
The resolution is to use persistent physical path names to keep track of the
devices that have been excluded.

* 2256686 (Tracking ID: 2152830)

SYMPTOM:
Sometimes the storage admins create multiple copies/clones of the same device. 
Diskgroup import fails with a non-descriptive error message when multiple
copies(clones) of the same device exists and original device(s) are either
offline or not available.

# vxdg import mydg
VxVM vxdg ERROR V-5-1-10978 Disk group mydg: import failed: 
No valid disk found containing disk group

DESCRIPTION:
If the original devices are offline or unavailable, vxdg import picks
up cloned disks for import. DG import fails by design unless the clones
are tagged and tag is specified during DG import. While the import
failure is expected, but the error message is non-descriptive and
doesn't provide any corrective action to be taken by user.

RESOLUTION:
Fix has been added to give correct error meesage when duplicate clones
exist during import. Also, details of duplicate clones is reported in
the syslog.

Example:

[At CLI level]
# vxdg import testdg             
VxVM vxdg ERROR V-5-1-10978 Disk group testdg: import failed:
DG import duplcate clone detected

[In syslog]
vxvm:vxconfigd: warning V-5-1-0 Disk Group import failed: Duplicate clone disks are
detected, please follow the vxdg (1M) man page to import disk group with
duplicate clone disks. Duplicate clone disks are: c2t20210002AC00065Bd0s2 :
c2t50060E800563D204d1s2  c2t50060E800563D204d0s2 : c2t50060E800563D204d1s2

* 2256688 (Tracking ID: 2202710)

SYMPTOM:
Transactions on Rlink are not allowed during SRL to DCM flush.

DESCRIPTION:
Present implementation doesnat allow rlink transaction to go through if SRL
to DCM flush is in progress. As SRL overflows, VVR start reading from SRL and
mark the dirty regions in corresponding DCMs of data volumes, it is called SRL
to DCM flush. During SRL to DCM flush transactions on rlink is not allowed. Time
to complete SRL flush depend on SRL size, it could range from minutes to many
hours. If user initiate any transaction on rlink then it will hang until SRL
flush completes.

RESOLUTION:
Changed the code behavior to allow rlink transaction during SRL flush. Fix stops
the SRL flush for transaction to go ahead and restart the flush after
transaction completion.

* 2256689 (Tracking ID: 2233889)

SYMPTOM:
The volume recovery happens in a serial fashion when any of the volumes has a
log volume attached to it.

DESCRIPTION:
When recovery is initiated on a disk group, vxrecover creates lists of each type
of volumes such as cache volume, data volume, log volume etc. The log volumes
are recovered in a serial fashion by design. Due to a bug the data volumes are
added to the log volume list if there exists a log volume. Hence even the data
volumes were recovered in a serial fashion if any of the volumes has a log
volume attached.

RESOLUTION:
The code was fixed such that the data volume list, cache volume list and the log
volume list are maintained separately and the data volumes are not added to the
log volumes list. The recovery for the volumes in each list is done in parallel.
--------------------------------------------------------------------------------

* 2256690 (Tracking ID: 2226304)

SYMPTOM:
In Solaris 9 platform, newfs(1M)/mkfs_ufs(1M) cannot create ufs file system on 
>1 Tera byte(TB) VxVM volume and it displays the following error:

# newfs /dev/vx/rdsk/<diskgroup name>/<volume>
newfs: construct a new file system /dev/vx/rdsk/<diskgroup name>/<volume>: 
(y/n)? y
Can not determine partition size: Inappropriate ioctl for device

# prtvtoc /dev/vx/rdsk/<diskgroup name>/<volume>
prtvtoc: /dev/vx/rdsk/<diskgroup name>/<volume>: Unknown problem reading VTOC

SOL-

DESCRIPTION:
newfs(1M)/mkfs_ufs(1M) invokes DKIOCGETEFI ioctl. During the enhancement of EFI 
support on Solaris 10 on 5.0MP3RP3 or later, DKIOCGETEFI ioctl functionality 
was not implemented on Solaris 9 because of the following limitations:

1.	EFI feature has not been introduced from Solaris 9 FCS and has been 
introduced from Solaris 9 U3(4/03) which includes 114127-03(libefi) and 114129-
02(libuuid and efi/uuid headers).

2.	During the enhancement of EFI support on Solaris 10, for solaris 9, 
DKIOCGVTOC ioctl was only supported on a volume <= 1TB since the VTOC 
specification was defined for only <= 1 TB LUN/volume. If the size of the 
volume is > 1 TB DKIOCGVTOC ioctl would return an inaccurate vtoc structure due 
to value overflow. 

SOL-

RESOLUTION:
The resolution is to enhance the VxVM code to handle DKIOCGETEFI ioctl 
correctly on VxVM volume on Solaris 9 platform. When newfs(1M)/mkfs_ufs(1M) 
invokes DKIOCGETEFI ioctl on a VxVM volume device, VxVM shall return the 
relevant EFI label information so that the UFS utilities can determine the 
volume size correctly.

* 2256691 (Tracking ID: 2197254)

SYMPTOM:
vxassist, the VxVM volume creation utility when creating volume with
alogtype=nonea doesnat function as expected.

DESCRIPTION:
While creating volumes on thinrclm disks, Data Change Object(DCO) version 20 log
is attached to every volume by default. If the user do not want this default
behavior then alogtype=nonea option can be specified as a parameter to vxassist
command. But with VxVM on HP 11.31 , this option does not work and DCO version
20 log is created by default.  The reason for this inconsistency is that  when
alogtype=nonea option is specified, the utility sets the flag to prevent
creation of log. However, VxVM wasnat checking whether the flag is set before
creating DCO log which led to this issue.

RESOLUTION:
This is a logical issue which is addressed by code fix. The solution is to check
for this corresponding flag of  alogtype=nonea before creating DCO version 20 by
default.

* 2256692 (Tracking ID: 2240056)

SYMPTOM:
'vxdg move/split/join' may fail during high I/O load.

DESCRIPTION:
During heavy I/O load 'dg move' transcation may fail because of open/close 
assertion and retry will be done. As the retry limit is set to 30 'dg move' 
fails if retry hits the limit.

RESOLUTION:
Change the default transaction retry to unlimit, introduce a new option 
to 'vxdg move/split/join' to set transcation retry limit as follows:

vxdg [-f] [-o verify|override] [-o expand] [-o transretry=retrylimit] move 
src_diskgroup dst_diskgroup objects ...

vxdg [-f] [-o verify|override] [-o expand] [-o transretry=retrylimit] split 
src_diskgroup dst_diskgroup objects ...

vxdg [-f] [-o verify|override] [-o transretry=retrylimit] join src_diskgroup 
dst_diskgroup

* 2256722 (Tracking ID: 2215256)

SYMPTOM:
Volume Manager is unable to recognize the devices connected through F5100 HBA

SOL-

DESCRIPTION:
During device discovery volume manager does not scan the luns that are connected
through SAS HBA (F5100 is a new SAS HBA). So the commands like 'vxdisk list'
does not even show the luns that are connected through F5100 HBA

SOL-

RESOLUTION:
Modified the device discovery code in volume manager to include the paths/luns
that are connected through SAS HBA.

* 2257684 (Tracking ID: 2245121)

SYMPTOM:
Rlinks do not connect for NAT (Network Address Translations) configurations.

DESCRIPTION:
When VVR (Veritas Volume Replicator) is replicating over a Network Address 
Translation (NAT) based firewall, rlinks fail to connect resulting in 
replication failure.

Rlinks do not connect as there is a failure during exchange of VVR heartbeats.
For NAT based firewalls, conversion of mapped IPV6 (Internet Protocol Version 
6) address to IPV4 (Internet Protocol Version 4) address is not handled which 
caused VVR heartbeat exchange with incorrect IP address leading to VVR 
heartbeat failure.

RESOLUTION:
Code fixes have been made to appropriately handle the exchange of VVR 
heartbeats under NAT based firewall.

* 2268733 (Tracking ID: 2248730)

SYMPTOM:
Command hungs if "vxdg import" called from script with STDERR
redirected.

DESCRIPTION:
If script is having "vxdg import" with STDERR redirected then
script does not finish till DG import and recovery is finished. Pipe between
script and vxrecover is not closed properly which keeps calling script waiting
for vxrecover to complete.

RESOLUTION:
Closed STDERR in vxrecover and redirected the output to
/dev/console.

* 2276324 (Tracking ID: 2270880)

SYMPTOM:
On Solaris 10 (SPARC only), if the size of EFI(Extensible Firmware Interface)
labeled disk is greater than 2TB, the disk capacity will be truncated to 2TB
when it is initialized with CDS(Cross-platform Data Sharing) under VxVM(Veritas
Volume Manager).

For example, the sizes shown as the sector count by prtvtoc(1M) and public
region size by vxdisk(1M) will be truncated to the sizes approximate 2TB.

# prtvtoc /dev/rdsk/c0t500601604BA07D17d13
<snip>
*                          First      Sector    Last
* Partition  Tag  Flags    Sector     Count     Sector     Mount Directory
       2     15    00         48    4294967215  4294967262

# vxdisk list c0t500601604BA07D17d13 | grep public
public:    slice=2 offset=65744 len=4294901456 disk_offset=48
 
SOL-

DESCRIPTION:
From VxVM 5.1 SP1 and onwards, the CDS format is enhanced to support for disks
of greater than 1TB. VxVM will use EFI layout to support CDS functionality for
disks of greater than 1TB, however on Solaris 10 (SPARC only), a problem is seen
that the disk capacity will be truncated to 2TB if the size of EFI labeled disk
is greater than 2TB.

This is because the library /usr/lib/libvxscsi.so in Solaris 10 (SPARC only)
package does not contain the required enhancement on Solaris 10 to support CDS
format for disks greater than 2TB.

SOL-

RESOLUTION:
The VxVM package for Solaris has been changed to contain all the libvxscsi.so
binaries which is built for Solaris platforms(versions) respectively, for
example libvxscsi.so.SunOS_5.9 and libvxscsi.so.SunOS_5.10.

From this fix and onwards, the appropriate platform's built of the binary will
be installed as /usr/lib/libvxscsi.so during the installation of the VxVM package.


INSTALLING THE PATCH
--------------------
o Before-the-upgrade :-
  (a) Stop I/Os to all the VxVM volumes.
  (b) Umount any filesystems with VxVM volumes.
  (c) Stop applications using any VxVM volumes.

For Solaris  9, and 10 releases, refer to the man pages for instructions on using 'patchadd' and 'patchrm' scripts provided with Solaris.
Any other special or non-generic installation instructions should be described below as special instructions.  The following example installs a patch to a st
andalone machine:

        example# patchadd 146884-xx


REMOVING THE PATCH
------------------
The following example removes a patch from a standalone system:

        example# patchrm 146884-xx


SPECIAL INSTRUCTIONS
--------------------
NONE


OTHERS
------
NONE