vm-rhel6_x86_64-6.0P1
Obsolete
The latest patch(es) : sfha-rhel6_x86_64-6.0RP1 

 Basic information
Release type: P-patch
Release date: 2012-01-23
OS update support: None
Technote: None
Documentation: None
Popularity: 1089 viewed    downloaded
Download size: 19.24 MB
Checksum: 3881719515

 Applies to one or more of the following products:
VirtualStore 6.0 On RHEL6 x86-64
Dynamic Multi-Pathing 6.0 On RHEL6 x86-64
Storage Foundation 6.0 On RHEL6 x86-64
Storage Foundation Cluster File System 6.0 On RHEL6 x86-64
Storage Foundation for Oracle RAC 6.0 On RHEL6 x86-64
Storage Foundation HA 6.0 On RHEL6 x86-64

 Obsolete patches, incompatibilities, superseded patches, or other requirements:

This patch is obsolete. It is superseded by: Release date
sfha-rhel6_x86_64-6.0RP1 2012-03-22

 Fixes the following incidents:
2614438, 2616853, 2625608, 2626744, 2626924, 2630063, 2630088, 2630106, 2630109, 2630110, 2630112, 2630114, 2630115, 2630132, 2630136, 2637183, 2642760, 2643609, 2643647, 2644354

 Patch ID:
VRTSvxvm-6.0.000.100-6.0P1_RHEL6

Readme file
                          * * * READ ME * * *
                 * * * Veritas Volume Manager 6.0 * * *
                         * * * P-patch 1 * * *
                         Patch Date: 2012-01-17


This document provides the following information:

   * PATCH NAME
   * PACKAGES AFFECTED BY THE PATCH
   * BASE PRODUCT VERSIONS FOR THE PATCH
   * OPERATING SYSTEMS SUPPORTED BY THE PATCH
   * INCIDENTS FIXED BY THE PATCH
   * INSTALLATION PRE-REQUISITES
   * INSTALLING THE PATCH
   * REMOVING THE PATCH


PATCH NAME
----------
Veritas Volume Manager 6.0 P-patch 1


PACKAGES AFFECTED BY THE PATCH
------------------------------
VRTSvxvm


BASE PRODUCT VERSIONS FOR THE PATCH
-----------------------------------
   * Veritas Storage Foundation for Oracle RAC 6.0
   * Veritas Storage Foundation Cluster File System 6.0
   * Veritas Storage Foundation 6.0
   * Veritas Storage Foundation High Availability 6.0
   * Veritas Dynamic Multi-Pathing 6.0
   * Symantec VirtualStore 6.0


OPERATING SYSTEMS SUPPORTED BY THE PATCH
----------------------------------------
RHEL5 x86-64
RHEL6 x86-64
SLES10 x86-64
SLES11 x86-64


INCIDENTS FIXED BY THE PATCH
----------------------------
This patch fixes the following Symantec incidents:

Patch ID: 6.0.000.100

* 2614438 (Tracking ID: 2507947)

SYMPTOM:
I/O verification on some of the volumes failed when array side switch ports 
were disabled/enabled for several iterations.

DESCRIPTION:
When I/O processes are going on, disable-enable array side switch ports for a 
few times. This might result in non responsiveness to I/Os on some or all of 
the volumes. This might be due to failover not getting initiated for all the 
required paths. All the paths of certain dmpnodes might remain in blocked state 
and I/Os get stuck in DMP's defer queue.

RESOLUTION:
Made changes in DMP to check for path failover in case PR command fails with 
transport failure. Perform current primary path updation from all places where 
path's pending IO count is being decremented. Trigger DMP's defer queue 
processing after current primary path updation task. Made changes in the 
Engenio APM to reduced SCSI command timeout value to 20 sec.

* 2616853 (Tracking ID: 2612190)

SYMPTOM:
When storage of shared instant DCO (with SF6.0) is disconnected and two nodes join
one after another, VxVM utilities or vxconfigd get stuck. Following kernel stack
is observed for vxconfigd on Master node
vol_kmsg_send_wait+0x1fc
cvm_obj_sendmsg_wait+0x9c
volmv_fmr_wait_iodrain+0xb8
vol_mv_precommit+0x590
vol_commit_iolock_objects+0x104
vol_ktrans_commit+0x298
volsioctl_real+0x2ac
fop_ioctl+0x20
ioctl+0x184
syscall_trap32+0xcc

DESCRIPTION:
The error case is not handled when a joiner node tries to read the DCO volume
which got marked as BADLOG. This leads to a protocol hang in which the joiner
stops responding, so master node indefinitely hangs waiting for the response
from the joiner.

RESOLUTION:
The code path where the joiner tries to read a DCO which was marked as
BADLOG is fixed.

* 2625608 (Tracking ID: 2486301)

SYMPTOM:
During SFHA CPI installation, "vxfs" package installation fails on system having
good amount of luns coming from A/P-F array

DESCRIPTION:
VxVM package post installation scripts invokes udevtrigger to generate hwpath
information. udevtrigger is asynchronous 
command and causes lot of udev events to be generated. You could also see end
I/O error messages in console OR 
/var/logs/messages file during this time for secondary paths of A/P-F LUNs.
Now VxFS package installation goes ahead 
and waits for /dev/vxportal device file creation. This file gets generated on
corresponding udev event. But since udev is 
busy with previous task this can take time sometime and cause vxfs post scripts
to timeout and fail.

RESOLUTION:
Removed udevtrigger from VxVM post installtion scripts and replaced with
equivalent code to generate hwpath information. SFHA CPI installation works fine
and user shouldn't see any I/O error messages generated via udev daemon during
CPI installation.

* 2626744 (Tracking ID: 2626199)

SYMPTOM:
"vxdmpadm list dmpnode" command shows the path-type value 
as "primary/secondary" 
for a LUN in an Active-Active array as below when it is suppose to be NULL 
value.

<snippet starts>
dmpdev          = c6t0d3 
state           = enabled 
...
array-type      = A/A 
###path         = name state type transport ctlr hwpath aportID aportWWN attr 
path            = c23t0d3 enabled(a) secondary FC c30 2/0/0/2/0/0/0.0x50060e800 
5c0bb00 - - - 
</snippet ends>

DESCRIPTION:
For a LUN under Active-Active array the path-type value is supposed to 
be NULL. In this specific case other commands like "vxdmpadm getsubpaths 
dmpnode=<>" were showing correct (NULL) value for path-type.

RESOLUTION:
The "vxdmpadm list dmpnode" code path failed to initialize the path-
type variable and by default set path-type to "primary or secondary" even for 
Active-Active array LUN's. This is fixed by initializing the path-type variable 
to NULL.

* 2626924 (Tracking ID: 2623182)

SYMPTOM:
VxVM temporary files (/tmp/vx.*) persists even after reboots.

DESCRIPTION:
The file /usr/lib/vxvm/voladm.d/lib/vxadm_lib.sh is used to make temporary 
directory and files . Script which sources this file should call quit() instead 
of 
exit() . quit() calls exit after cleaning up temporary directories and files.

There were files which used to source vxadm_lib.sh but never used to call quit() 
and exited without cleanup :-
1. /etc/vx/bin/vxreattach
2. /etc/vx/bin/vxdmpasm
3. /etc/init.d/vxvm-boot
4. /etc/rc3.d/S16vxvm-recover

RESOLUTION:
The above mentioned VxVM specific scripts are modified to invoke quit() instead 
of 
exit() which would remove temporary directories and files on a reboot.

* 2630063 (Tracking ID: 2518625)

SYMPTOM:
Custer reconfiguration hang is seen when the following sequence of events occur
in that order
1. instant DCO(with SF6.0 release) loses complete storage from master node while
I/O is going on.
2. a new node joins the cluster immediately.
3. Current master leaves the cluster.

When the issue is seen,
the kernel thread "volcvm_vxreconfd_thread" on master shows the following stack.

delay+0x90()
cvm_await_mlocks+0x114()
volmvcvm_cluster_reconfig_exit+0x94()
volcvm_master+0x970()
volcvm_vxreconfd_thread+0x580()
thread_start+4()

the kernel thread "volcvm_vxreconfd_thread" on slave has the following stack.

vol_kmsg_send_wait+0x1fc()
cvm_slave_req_rebuild_done+0x78()
volmvcvm_cluster_reconfig_exit+0xb0()
volcvm_master+0x970()
volcvm_vxreconfd_thread+0x580()
thread_start+4()

The following output of vxdctl is seen on all nodes.
# vxdctl -c mode
mode: enabled: cluster active - SLAVE
master: node1
reconfig: master update

DESCRIPTION:
After storage failure, the DCO is marked as BADLOG in kernel on all existing nodes
and process is initiated to mark the same in volume manager configuration. If a
node joins before this process completes, the joining node would not mark the DCO
as BADLOG. Subsequently a protocol is initiated that depends on this flag but due
to the above mentioned inconsistency, the protocol hangs.

RESOLUTION:
If DCO is marked BADLOG on other nodes, mark the same on joining node.

* 2630088 (Tracking ID: 2583410)

SYMPTOM:
The diagnostic utility vxfmrmap is not supported with instant DCO of SF6.0 release

DESCRIPTION:
The diagnostic utility vxfmrmap errors out with not supported when used with
instant DCO of SF6.0 release.

RESOLUTION:
Added support for vxfmrmap to understand and print maps used in instant DCO of
SF6.0 release.

* 2630106 (Tracking ID: 2612544)

SYMPTOM:
VxVM commands take more time to run after disk group import operation.

DESCRIPTION:
If any vxvm command is executed immediately after a disk group import, there 
will be an increase in time taken for command completion. This happens as VxVM 
tries to complete recovery operations on the disk group that is imported and 
vxconfigd daemon will be busy with those operations. This increase in time is 
noticeable if large number of disks are present in the disk group. Once the 
recovery related task is completed, vxvm commands will work normally.

RESOLUTION:
The fix was implemented to reduce the number of vxvm commands issued after disk 
group import.

* 2630109 (Tracking ID: 2618217)

SYMPTOM:
Possible data corruption due to few I/O's not getting tracked after detaching 
mirvol added for snapshot (linked volumes)in FMR4. Application such as Oracle 
complains about data corruption.

DESCRIPTION:
When mirror volume added for snapshot purpose (linked volumes) detached due to 
some I/O errors, VxVM  start tracking of subsequent I/O's on volume in order to 
properly re-sync the volumes contents. In FMR4 the tracking of was not getting 
enabled atomically, during this window few I/O's were not getting tracked. 
This lead to lead to data corruption after snapshot re-sync operations.

RESOLUTION:
Atomically enabling the tracking of I/O's so that all the I/O's get  tracked 
properly after detaches.

* 2630110 (Tracking ID: 2621521)

SYMPTOM:
Breakoff mirror snapshot volumes using SF 6.0 based instant DCO could be
corrupted when any snapshot operation is done subsequently.

DESCRIPTION:
With volumes having SF6.0 based instant DCO, the snapshot maps are
updated from DRL logs. Volume manager maintains the list of updates that are still
to be used for snapshot map updates. Due to a code issue, some of the valid DRL
updates were getting missed for snapshot map updates. In such cases, the snapshot
maps would be inconsistent and further snapshot operation would cause corruption
in the volumes involved.

RESOLUTION:
The issue has been fixed by making sure that the list is updated
atomically such that intermediate state is not visible to other IOs doing DRL.

* 2630112 (Tracking ID: 2621612)

SYMPTOM:
/boot/grub/menu.lst is not restored correctly on virtual machine after vxunroot 
and it makes user to manually select the fallback bootdevice to continue the 
bootup process.

DESCRIPTION:
In Linux, /boot/grub/menu.lst is boot-loader configuration file containing 
rootdevice. Rootdevice can be mentioned using UUID, Label or symlink like 
/dev/disk/by-id/*, /dev/disk/by-uuid/*, etc (These symlinks are created by udev 
during bootup, for persistent naming scheme).  During vxunroot, we need to 
replace rootdevice volume entry by appropriate symlink to happen next reboot 
successfully. Virtual machine, where SCSI_ID is not available to disks, 
rootdevice was getting populated with /dev/disk/by-uuid/* path which points to 
stale UUID entry (before reboot).
(UUID is universally unique identifier for each partitions of a disk).

RESOLUTION:
rootdevice entry in /boot/grub/menu.lst is populated correctly on both physical 
and virtual setups by using correct symlink like /dev/disk/by-id/*, 
/dev/disk/by-path/* or /dev/disk/by-uuid/*.

* 2630114 (Tracking ID: 2622147)

SYMPTOM:
System hangs in a case when load peaks abruptly on a volume with SF 6.0
based instant DCO.

DESCRIPTION:
When IO load peaks abruptly on a volume with instant DCO in SF 6.0,
the DRL logging can get stuck. The IOs are resumed on volume when DRL is
done for whole batch of IOs. When load is such that DRL cannot be done
for whole batch in a single go it is split into iterations. Due to a code
issue, the partitioning of DRL updates caused it to go in loop updating
same set of changes over and over into disk.

RESOLUTION:
DRL updates are not partitioned now. They are either not
done if not enough free memory available or done in a single batch.

* 2630115 (Tracking ID: 2627126)

SYMPTOM:
Observed IO hang on system as lots of IO's are stuck in DMP global queue.

DESCRIPTION:
Lots of IOs and Paths are stuck in dmp_delayq and dmp_path_delayq respectively
and DMP daemon could not process them, because of the race condition between 
"processing the dmp_delayq" and "waking up the DMP daemon". Lock is held
while processing the dmp_delayq, and it is released for very short duration. If
any path is busy in this duration, it gives IO error, leading to IO hang.

RESOLUTION:
The global delay queue pointers are copied to local variables and lock is held
only for this period, then IOs in the queue are processed using the local queue
variable.

* 2630132 (Tracking ID: 2613524)

SYMPTOM:
I/O failure is observed on a NetApp array when configured in Asymmetric Logical
Unit Access (ALUA) mode.

DESCRIPTION:
The I/O failure was identified to be a corner case scenario getting hit during
NetAPP pNATE longevity tests.

RESOLUTION:
Added fixes in Dynamic Multipathing component to avoid the I/O failure.

* 2630136 (Tracking ID: 2612470)

SYMPTOM:
Volume recovery and plex reattach using vxrecover command leaves some plexes in 
DETACHED state.

DESCRIPTION:
The vxrecover utility collects all volumes that need recovery or plex reattach 
and 
creates tasks for these. In a specific corner case, if the volume size of the 
first volume picked up is small such that recovery completes within 5 seconds, 
any 
plex reattaches required for this volume are skipped, due to a bug in vxrecover.

RESOLUTION:
The specific corner case is identified in vxrecover utility and for such a 
volume 
if any plex needs to be reattached, required task is created before moving to 
the 
next volume.

WORKAROUND:
Re-run vxrecover manually. For shared disk groups, run this on CVM master node 
or 
use -c option.

* 2637183 (Tracking ID: 2647795)

SYMPTOM:
With Smartmove feature enabled, while moving subdisk data corruption is seen on 
file system as subdisk contents are not copied properly.

DESCRIPTION:
With FS Smartmove feature enabled, subdisk move operation queries VxFS for 
status 
of the region before deciding to synchronize the regions. While getting the 
information about the multiple such regions in one IOCTL to VxFS, if the start-
offset is not aligned to region size, one I/O can span across two regions, and 
VxVM was not properly checking status of such regions and skips the 
synchronization of that region causing data corruption.

RESOLUTION:
Code changes are done to properly check the region state even if the region 
spans 2-bits 
in the FSMAP.

* 2642760 (Tracking ID: 2637828)

SYMPTOM:
System panic is seen in vol_adminsio_done() function while running operations 
like vxplex att/vxsd mv/vxsnap addmirr. Kernel stack is given below:

vol_adminsio_done+0x44b 
voliod_iohandle+0xf2 
voliod_loop+0x11ba 
child_rip+0xa

DESCRIPTION:
Utilities like 'vxtask monitor' are signaled after task updation in kernel. Due 
to bug, the structures used in signaling are freed, causing NULL pointer 
dereference which leads to panic.

RESOLUTION:
Code is re-arranged such that cleanup will not happen before signalling is done 
to avoid this race condition. The bug is specific to new enhancement done in 
adminio deprioritization feature and doesn't apply to older releases.

* 2643609 (Tracking ID: 2553729)

SYMPTOM:
Following symptoms can be seen during 'Upgrade' of VxVM (Veritas Volume 
Manager):

i) 'clone_disk' flag is seen on non-clone disks in STATUS field when 'vxdisk -e 
list' is executed after uprade to 5.1SP1 from lower versions of VxVM.


Eg:

DEVICE       TYPE           DISK        GROUP        STATUS
emc0_0054    auto:cdsdisk   emc0_0054    50MP3dg     online clone_disk
emc0_0055    auto:cdsdisk   emc0_0055    50MP3dg     online clone_disk

ii) Clone disk groups (dg) whose versions are less than 140 do not get imported
after upgrade to VxVM versions 5.0MP3RP5HF1 or 5.1SP1RP2.

Eg:

# vxdg -C import <dgname>
VxVM vxdg ERROR V-5-1-10978 Disk group <dgname>: import failed:
Disk group version doesn't support feature; see the vxdg upgrade command

DESCRIPTION:
While uprading VxVM

i) After upgrade to 5.1SP1 or higher versions:
If a dg which is created on lower versions is deported and imported back on 
5.1SP1 after the upgrade, then "clone_disk" flags gets set on non-cloned disks 
because of the design change in UDID (unique disk identifier) of the disks.

ii) After upgrade to 5.0MP3RP5HF1 or 5.1SP1RP2:
Import of clone dg with versions less than 140 fails.

RESOLUTION:
Code changes are made to ensure that:
i) clone_disk flag does not get set for non-clone disks after the upgrade.
ii) Clone disk groups with versions less than 140 get imported after the 
upgrade.

* 2643647 (Tracking ID: 2643634)

SYMPTOM:
If standard(non-clone) disks and cloned disks of the same disk group are seen in
a host, dg import will fail with the following error message when the
standard(non-clone) disks have no enabled configuration copy of the disk group.

# vxdg import <dgname>
VxVM vxdg ERROR V-5-1-10978 Disk group <dgname>: import failed:
Disk group has no valid configuration copies

DESCRIPTION:
When VxVM is importing such a mixed configuration of standard(non-clone) disks
and cloned disks, standard(non-clone) disks will be selected as the member of
the disk group in 5.0MP3RP5HF1 and 5.1SP1RP2. It will be done while
administrators are not aware of the fact that there is a mixed configuration and
the standard(non-clone) disks are to be selected for the import. It is hard to
figure out from the error message and need time to investigate what is the issue.

RESOLUTION:
Syslog message enhancements are made in the code that administrators can figure
out if such a mixed configuration is seen in a host and also which disks are
selected for the import.

* 2644354 (Tracking ID: 2647600)

SYMPTOM:
System panic with the following stack trace because of the Null pointer dereference.
strlen()
vol_do_prnt()
vol_cmn_err()
vol_rp_ack_timeout()
vol_rp_search_queues()
nmcom_server_proc()
nmcom_server_proc_enter()
vxvm_start_thread_enter()

DESCRIPTION:
After going through the panic, this is because of missing entry in
volrv_msgs_names[]. There are 12 entries but it is missing the error entry for
VOLRV_MSG_PRIM_SEQ.

The ack timeout has happened when the VOLRV_MSG_PIM_SEQ is sent.

RESOLUTION:
Add a entry in volrv_msgs_names[VOLRV_MSG_PIM_SEQ] = { "prim_seq" }


INSTALLING THE PATCH
--------------------
o Before-the-upgrade :-
  (a) Stop I/Os to all the VxVM volumes.
  (b) Umount any filesystems with VxVM volumes.
  (c) Stop applications using any VxVM volumes.

o Select the appropriate RPMs for your system, and upgrade to the new patch.

# rpm -Uhv VRTSvxvm-6.0.000.100-6.0P1_RHEL5.x86_64.rpm 

# rpm -Uhv VRTSvxvm-6.0.000.100-6.0P1_RHEL6.x86_64.rpm

# rpm -Uhv VRTSvxvm-6.0.000.100-6.0P1_SLES10.x86_64.rpm

# rpm -Uhv VRTSvxvm-6.0.000.100-6.0P1_SLES11.x86_64.rpm


REMOVING THE PATCH
------------------
# rpm -e  <rpm-name>


SPECIAL INSTRUCTIONS
--------------------
NONE


OTHERS
------
NONE