Volume Manager on AIX, patch detail

To use SORT, JavaScript must be enabled. How to enable JavaScript.

vm-aix-Patch-7.3.1.100 Go to Download Center to download.

Basic information

Release type:	Patch
Release date:	2018-04-27
OS update support:	None
Technote:	None
Documentation:	None
Popularity:	1022 viewed downloaded
Download size:	95.7 MB
Checksum:	3666126432

Applies to one or more of the following products:

InfoScale Enterprise 7.3.1 On AIX 7.1
InfoScale Enterprise 7.3.1 On AIX 7.2
InfoScale Foundation 7.3.1 On AIX 7.1
InfoScale Foundation 7.3.1 On AIX 7.2
InfoScale Storage 7.3.1 On AIX 7.1
InfoScale Storage 7.3.1 On AIX 7.2

Obsolete patches, incompatibilities, superseded patches, or other requirements:

None.

Fixes the following incidents:

3932464, 3933874, 3933875, 3933877, 3933878, 3933880, 3933882, 3933883, 3933890, 3933897, 3933899, 3933900, 3933904, 3933907, 3933910, 3937541, 3937545, 3937549, 3937550, 3937808, 3937811, 3938392

Patch ID:

VRTSvxvm.bff

Readme file

* * * READ ME * * *
* * * Veritas Volume Manager 7.3.1 * * *
* * * Patch 100 * * *
Patch Date: 2018-04-16

This document provides the following information:

* PATCH NAME
* OPERATING SYSTEMS SUPPORTED BY THE PATCH
* PACKAGES AFFECTED BY THE PATCH
* BASE PRODUCT VERSIONS FOR THE PATCH
* SUMMARY OF INCIDENTS FIXED BY THE PATCH
* DETAILS OF INCIDENTS FIXED BY THE PATCH
* INSTALLATION PRE-REQUISITES
* INSTALLING THE PATCH
* REMOVING THE PATCH

PATCH NAME
----------
Veritas Volume Manager 7.3.1 Patch 100

OPERATING SYSTEMS SUPPORTED BY THE PATCH
----------------------------------------
AIX

PACKAGES AFFECTED BY THE PATCH
------------------------------
VRTSvxvm

BASE PRODUCT VERSIONS FOR THE PATCH
-----------------------------------
* InfoScale Enterprise 7.3.1
* InfoScale Foundation 7.3.1
* InfoScale Storage 7.3.1

SUMMARY OF INCIDENTS FIXED BY THE PATCH
---------------------------------------
Patch ID: 7.3.1.100
* 3932464 (3926976) Frequent loss of VxVM functionality due to vxconfigd unable to validate license.
* 3933874 (3852146) Shared DiskGroup(DG) fails to import when "-c" and "-o noreonline" options
are
specified together
* 3933875 (3872585) System panics with storage key exception.
* 3933877 (3914789) System may panic when reclaiming on secondary in VVR environment.
* 3933878 (3918408) Data corruption when volume grow is attempted on thin reclaimable disks whose space is just freed.
* 3933880 (3864063) Application I/O hangs because of a race between the Master Pause SIO (Staging
I/O) and the Error Handler SIO.
* 3933882 (3865721) Vxconfigd may hang while pausing the replication in CVR(cluster Veritas Volume
Replicator) environment.
* 3933883 (3867236) Application IO hang happens because of a race between Master Pause SIO(Staging IO)
and RVWRITE1 SIO.
* 3933890 (3879324) VxVM DR tool fails to handle busy device problem while LUNs are removed from OS
* 3933897 (3907618) vxdisk resize leads to data corruption on filesystem
* 3933899 (3910675) Disks directly attached to the system cannot be exported in FSS environment
* 3933900 (3915523) Local disk from other node belonging to private DG(diskgroup) is exported to the
node when a private DG is imported on current
node.
* 3933904 (3921668) vxrecover command with -m option fails when executed on the slave
nodes.
* 3933907 (3873123) If the disk with CDS EFI label is used as remote
disk on the cluster node, restarting the vxconfigd
daemon on that particular node causes vxconfigd
to go into disabled state
* 3933910 (3910228) Registration of GAB(Global Atomic Broadcast) port u fails on slave nodes after
multiple new devices are added to the system.
* 3937541 (3911930) Provide a way to clear the PGR_FLAG_NOTSUPPORTED on the device instead of using
exclude/include commands
* 3937545 (3932246) vxrelayout operation fails to complete.
* 3937549 (3934910) DRL map leaks during snapshot creation/removal cycle with dg reimport.
* 3937550 (3935232) Replication and IO hang during master takeover because of racing between log
owner change and master switch.
* 3937808 (3931936) VxVM(Veritas Volume Manager) command hang on master node after
restarting
slave node.
* 3937811 (3935974) When client process shuts down abruptly or resets connection during
communication with the vxrsyncd daemon, it may terminate
vxrsyncd daemon.
* 3938392 (3909630) OS Panic happens while registering DMP(Dynamic Multi Pathing) statistic
information.

DETAILS OF INCIDENTS FIXED BY THE PATCH
---------------------------------------
This patch fixes the following incidents:

Patch ID: 7.3.1.100

* 3932464 (Tracking ID: 3926976)

SYMPTOM:
Excessive number of connections are found in open state causing FD leak and
eventually reporting license errors.

DESCRIPTION:
The vxconfigd reports license errors as it fails to open the license files. The
failure to open is due to FD exhaustion, caused by excessive FIFO connections
left in open state.

The FIFO connections used to communicate with vxconfigd by clients (vx
commands). Usually these should get closed once the client exits. One of such
client "vxdclid" which is a daemon connecting frequently and leaving the
connection is open state, causing FD leak.

This issue is applicable to Solaris platform only.

RESOLUTION:
One of the API, a library call is leaving the connection in open state while
leaving, which is fixed.

* 3933874 (Tracking ID: 3852146)

SYMPTOM:
In a CVM cluster, when importing a shared diskgroup specifying both -c and -o
noreonline options, the following error may be returned:
VxVM vxdg ERROR V-5-1-10978 Disk group <dgname>: import failed: Disk for disk
group not found.

DESCRIPTION:
The -c option will update the disk ID and disk group ID on the private region
of the disks in the disk group being imported. Such updated information is not
yet seen by the slave because the disks have not been re-onlined (given that
noreonline option is specified). As a result, the slave cannot identify the
disk(s) based on the updated information sent from the master, causing the
import to fail with the error Disk for disk group not found.

RESOLUTION:
The code is modified to handle the working of the "-c" and "-o noreonline"
options together.

* 3933875 (Tracking ID: 3872585)

SYMPTOM:
System running with VxFS and VxVM panics with storage key exception with the
following stack:

simple_lock
dispatch
flih_util
touchrc
pin_seg_range
pin_com
pinx_plock
plock_pinvec
plock
mfspurr_sc_flih01

DESCRIPTION:
The xntpd process mounted on a vxfs filesystem could panic with storage key
exception. The xntpd binary page faulted and did an IO, after which the storage
key exception was detected OS as it couldn't locate it's keyset. From the code
review it was found that in a few error cases in the vxvm, the storage key may
not be restored after they're replaced.

RESOLUTION:
Do storage key restore even when in the error cases in vxio and dmp layer.

* 3933877 (Tracking ID: 3914789)

SYMPTOM:
System may panic when reclaiming on secondary in VVR(Veritas Volume Replicator)
environment. It's due to accessing invalid address, error message is similiar to
"data access MMU miss".

DESCRIPTION:
VxVM maintains a linked list to keep memory segment information. When accessing
its content with certain offset, linked list is traversed. Due to code defect
when offset is equal to segment chunk size, end of such segement is returned
instead of start of next segment. It can result silent memory corruption because
it tries to access memory out of its boundary. System can panic when out of
boundary address isn't allocated yet.

RESOLUTION:
Code changes have been made to fix the out-of-boundary access.

* 3933878 (Tracking ID: 3918408)

SYMPTOM:
Data corruption when volume grow is attempted on thin reclaimable disks whose space is just freed.

DESCRIPTION:
When the space in the volume is freed by deleting some data or subdisks, the corresponding subdisks are marked for
reclamation. It might take some time for the periodic reclaim task to start if not issued manually. In the meantime, if
same disks are used for growing another volume, it can happen that reclaim task will go ahead and overwrite the data
written on the new volume. Because of this race condition between reclaim and volume grow operation, data corruption
occurs.

RESOLUTION:
Code changes are done to handle race condition between reclaim and volume grow operation. Also reclaim is skipped for
those disks which have been already become part of new volume.

* 3933880 (Tracking ID: 3864063)

SYMPTOM:
Application I/O hangs after the Master Pause command is issued.

DESCRIPTION:
Some flags (VOL_RIFLAG_DISCONNECTING or VOL_RIFLAG_REQUEST_PENDING) in VVR
(Veritas Volume Replicator) kernel are not cleared because of a race between the
Master Pause SIO and the Error Handler SIO. This causes the RU (Replication
Update) SIO to fail to proceed, which leads to I/O hang.

RESOLUTION:
The code is modified to handle the race condition.

* 3933882 (Tracking ID: 3865721)

SYMPTOM:
Vxconfigd hang in dealing transaction while pausing the replication in
Clustered VVR environment.

DESCRIPTION:
In Clustered VVR (CVM VVR) environment, while pausing replication which is in
DCM (Data Change Map) mode, the master pause SIO (staging IO) can not finish
serialization since there are metadata shipping SIOs in the throttle queue
with the activesio count added. Meanwhile, because master pause
SIOs SERIALIZE flag is set, DCM flush SIO can not be started to flush the
throttle queue. It leads to a dead loop hang state. Since the master pause
routine needs to sync up with transaction routine, vxconfigd hangs in
transaction.

RESOLUTION:
Code changes were made to flush the metadata shipping throttle queue if master
pause SIO can not finish serialization.

* 3933883 (Tracking ID: 3867236)

SYMPTOM:
Application IO hang happens after issuing Master Pause command.

DESCRIPTION:
The flag VOL_RIFLAG_REQUEST_PENDING in VVR(Veritas Volume Replicator) kernel is
not cleared because of a race between Master Pause SIO and RVWRITE1 SIO resulting
in RU (Replication Update) SIO to fail to proceed thereby causing IO hang.

RESOLUTION:
Code changes have been made to handle the race condition.

* 3933890 (Tracking ID: 3879324)

SYMPTOM:
VxVM(Veritas Volume Manager) DR(Dynamic Reconfiguration) tool fails to
handle busy device problem while LUNs are removed from OS

DESCRIPTION:
OS devices may still be busy after removing them from OS, it fails 'luxadm -
e offline <disk>' operation and leaves staled entries in 'vxdisk list'
output
like:
emc0_65535 auto - - error
emc0_65536 auto - - error

RESOLUTION:
Code changes have been done to address busy devices issue.

* 3933897 (Tracking ID: 3907618)

SYMPTOM:
vxdisk resize leads to data corruption on filesystem with MSDOS labelled disk having VxVM sliced format.

DESCRIPTION:
vxdisk resize changes the geometry on the device if required. When vxdisk resize is in progress, absolute offsets i.e offsets starting
from start of the device are used. For MSDOS labelled disk, the full disk is devoted on Slice 4 but not slice 0. Thus when IO is
scheduled on the device an extra 32 sectors gets added to the IO which is not required since we are already starting the IO from start of
the device. This leads to data corruption since the IO on the device shifted by 32 sectors.

RESOLUTION:
Code changes have been made to not add 32 sectors to the IO when vxdisk resize is in progress to avoid corruption.

* 3933899 (Tracking ID: 3910675)

SYMPTOM:
Disks directly attached to the system cannot be exported in FSS environment

DESCRIPTION:
In some cases, UDID (Unique Disk Identifier) of the disk directly connected to a
cluster node might not be globally
unique i.e another different disk might have a similar UDID which is directly
connected to a different node in the
cluster. This leads to issues while exporting the device in FSS (Flexible
Storage Sharing) environment since two
different disks have the same UDID which is not expected.

RESOLUTION:
A new option "islocal=yes" has been added to the vxddladm addjbod command so
that hostguid will get appended to UDID to make it unique.

* 3933900 (Tracking ID: 3915523)

SYMPTOM:
Local disk from other node belonging to private DG is exported to the node when
a private DG is imported on current node.

DESCRIPTION:
When we try to import a DG, all the disks belonging to the DG are automatically
exported to the current node so as to make sure
that the DG gets imported. This is done to have same behaviour as SAN with local
disks as well. Since we are exporting all disks in
the DG, then it happens that disks which belong to same DG name but different
private DG on other node get exported to current node
as well. This leads to wrong disk getting selected while DG gets imported.

RESOLUTION:
Instead of DG name, DGID (diskgroup ID) is used to decide whether disk needs to
be exported or not.

* 3933904 (Tracking ID: 3921668)

SYMPTOM:
Running the vxrecover command with -m option fails when run on the
slave node with message "The command can be executed only on the master."

DESCRIPTION:
The issue occurs as currently vxrecover -g <dgname> -m command on shared
disk groups is not shipped using the command shipping framework from CVM
(Cluster Volume Manager) slave node to the master node.

RESOLUTION:
Implemented code change to ship the vxrecover -m command to the master
node, when its triggered from the slave node.

* 3933907 (Tracking ID: 3873123)

SYMPTOM:
When remote disk on node is EFI disk, vold enable fails.
And following message get logged, and eventually causing the vxconfigd to go
into disabled state:
Kernel and on-disk configurations don't match; transactions are disabled.

DESCRIPTION:
This is becasue one of the cases of EFI remote disk is not properly handled
in disk recovery part when vxconfigd is enabled.

RESOLUTION:
Code changes have been done to set the EFI flag on darec in recovery code

* 3933910 (Tracking ID: 3910228)

SYMPTOM:
Registration of GAB(Global Atomic Broadcast) port u fails on slave nodes after
multiple new devices are added to the system..

DESCRIPTION:
Vxconfigd sends command to GAB for port u registration and waits for a respnse
from GAB. During this timeframe if the vxconfigd is interrupted by any other
module apart from GAB then it will not be able to receive the signal from GAB
of successful registration. Since the signal is not received, vxconfigd
believes the registration did not succeed and treats it as a failure.

RESOLUTION:
Mask the signals which vxconfigd can receive before waiting for the signal from
GAB for registration of gab u port.

* 3937541 (Tracking ID: 3911930)

SYMPTOM:
Valid PGR operations sometimes fail on a dmpnode.

DESCRIPTION:
As part of the PGR operations, if the inquiry command finds that PGR is not
supported on the dmpnode node, a flag PGR_FLAG_NOTSUPPORTED is set on the
dmpnode.
Further PGR operations check this flag and issue PGR commands only if this flag
is
NOT set.
This flag remains set even if the hardware is changed so as to support PGR.

RESOLUTION:
A new command (namely enablepr) is provided in the vxdmppr utility to clear this
flag on the specified dmpnode.

* 3937545 (Tracking ID: 3932246)

SYMPTOM:
vxrelayout operation fails to complete.

DESCRIPTION:
IF we lose connectivity to underlying storage while volume relayout is in
progress, some intermediate volumes for the relayout could be in disabled or
undesirable state either due to I/O error. Once the storage connectivity is
back
such intermediate volumes should be recovered by vxrecover utility and resume
the vxrelayout operation automatically. But due to bug in vxrecover utility
the
volumes remained in disable state due to which the vxrelayout operation didn't
complete.

RESOLUTION:
Changes are done in vxrecover utility to enable the intermediate volumes.

* 3937549 (Tracking ID: 3934910)

SYMPTOM:
IO errors on data volume or file system happen after some cycles of snapshot
creation/removal with dg reimport.

DESCRIPTION:
With the snapshot of the data volume removal and the dg reimport, the DRL map
keep active rather than to be inactivated. With the new snapshot created, the
DRL would be re-enabled and new DRL map allocated with the first write to the
data volume. The original active DRL map would not be used and leaked. After
some such cycles, the extent of the DCO volume would be exhausted due to the
active but not be used DRL maps, then no more DRL map could be allocated and
the IOs would be failed or unable to be issued on the data volume.

RESOLUTION:
Code changes are done to inactivate the DRL map if the DRL is disabled during
the volume start, then it could be reused later safely.

* 3937550 (Tracking ID: 3935232)

SYMPTOM:
Replication and IO hang may happen on new master node during master
takeover.

DESCRIPTION:
During master switch is in progress if log owner change kicks in, flag
VOLSIO_FLAG_RVC_ACTIVE will be set by log owner change SIO.
RVG(Replicated Volume Group) recovery initiated by master switch will clear
flag VOLSIO_FLAG_RVC_ACTIVE after RVG recovery done. When log owner
change done, as flag VOLSIO_FLAG_RVC_ACTIVE has been cleared, resetting
flag VOLOBJ_TFLAG_VVR_QUIESCE is skipped. The present of flag
VOLOBJ_TFLAG_VVR_QUIESCE will make replication and application IO on RVG
always be in pending state.

RESOLUTION:
Code changes have been done to make log owner change wait until master
switch completed.

* 3937808 (Tracking ID: 3931936)

SYMPTOM:
In FSS(Flexible Storage Sharing) environment, after restarting slave node VxVM
command on master node hang result in failed disks on slave node could not
rejoin disk group.

DESCRIPTION:
While lost remote disks on slave node comes back, online these disk and add
them to disk group operations are performed on master node. Disk online
includes operations from both master and slave node. On slave node these
disks
should be offlined then reonlined, but due to code defect reonline disks are
missed result in these disks are kept in reonlining state. The following add disk
to
disk group operation needs to issue private region IOs on the disk. These IOs
are
shipped to slave node to complete. As the disks are in reonline state, busy error
gets returned and remote IOs keep retrying, hence VxVM command hang on
master node.

RESOLUTION:
Code changes have been made to fix the issue.

* 3937811 (Tracking ID: 3935974)

SYMPTOM:
While communicating with client process, vxrsyncd daemon terminates and after
sometime it gets started or may require a reboot to start.

DESCRIPTION:
When the client process shuts down abruptly and vxrsyncd daemon attempt to write
on the client socket, SIGPIPE signal is generated. The default action for this
signal is to terminate the process. Hence vxrsyncd gets terminated.

RESOLUTION:
This SIGPIPE signal should be handled in order to prevent the termination of
vxrsyncd.

* 3938392 (Tracking ID: 3909630)

SYMPTOM:
OS panic happens as the following stack after some DMP devices migrated to
TPD(Third
Party Driver) devices.

void vxdmp:dmp_register_stats+0x120()
int vxdmp:gendmpstrategy+0x244()
vxdmp:dmp_restart_io()
int vxdmp:dmp_process_deferbp+0xec()
void vxdmp:dmp_process_deferq+0x68()
void vxdmp:dmp_daemons_loop+0x160()

DESCRIPTION:
When updating CPU index for new path migrated to TPD, IOs on this path are
unquiesced before increasing last CPU's stats table, as a result , while
registering
IO stat for restarted IO on this path, if need to access last CPUs stats
table,
invalid memory access and panic may happen.

RESOLUTION:
Code changes have been made to fix this issue.

INSTALLING THE PATCH
--------------------
Run the Installer script to automatically install the patch:
-----------------------------------------------------------
Please be noted that the installation of this P-Patch will cause downtime.

To install the patch perform the following steps on at least one node in the cluster:
1. Copy the patch vm-aix-Patch-7.3.1.100.tar.gz to /tmp
2. Untar vm-aix-Patch-7.3.1.100.tar.gz to /tmp/hf
# mkdir /tmp/hf
# cd /tmp/hf
# gunzip /tmp/vm-aix-Patch-7.3.1.100.tar.gz
# tar xf /tmp/vm-aix-Patch-7.3.1.100.tar
3. Install the hotfix(Please be noted that the installation of this P-Patch will cause downtime.)
# pwd /tmp/hf
# ./installVRTSvxvm731P100 [<host1> <host2>...]

You can also install this patch together with 7.3.1 maintenance release using Install Bundles
1. Download this patch and extract it to a directory
2. Change to the Veritas InfoScale 7.3.1 directory and invoke the installer script
with -patch_path option where -patch_path should point to the patch directory
# ./installer -patch_path [<path to this patch>] [<host1> <host2>...]

Install the patch manually:
--------------------------
If the currently installed VRTSvxvm is below 7.3.1.000 level,
upgrade VRTSvxvm to 7.3.1.000 level before installing this patch.

AIX maintenance levels and APARs can be downloaded from the IBM web site:
http://techsupport.services.ibm.com
1. Since the patch process will configure the new kernel extensions,
a) Stop I/Os to all the VxVM volumes.
b) Ensure that no VxVM volumes are in use or open or mounted before starting the installation procedure.
c) Stop applications using any VxVM volumes.
2. Check whether root support or DMP native support is enabled. If it is enabled, it will be retained after patch upgrade.
# vxdmpadm gettune dmp_native_support
If the current value is 'on', DMP native support is enabled on this machine.
# vxdmpadm native list vgname=rootvg
If the output is some list of hdisks, root support is enabled on this machine
3. Proceed with patch installation as mentioned below
a. Before applying this VxVM 7.3.1.100 patch, stop the VEA Server's vxsvc process:
# /opt/VRTSob/bin/vxsvcctrl stop
b. If your system has Veritas Operation Manager(VOM) configured then check whether vxdclid daemon is running, if it is running then stop vxdclid daemon.
Command to check the status of vxdclid daemon
#/opt/VRTSsfmh/etc/vxdcli.sh status
Command to stop the vxdclid daemon
#/opt/VRTSsfmh/etc/vxdcli.sh stop
c. To apply this patch, use following command:
# installp -ag -d ./VRTSvxvm.bff VRTSvxvm
d. To apply and commit this patch, use following command:
# installp -acg -d ./VRTSvxvm.bff VRTSvxvm
NOTE: Please refer installp(1M) man page for clear understanding on APPLY &amp; COMMIT state of the package/patch.
e. Reboot the system to complete the patch upgrade.
#reboot
f. If you have stopped vxdclid daemon before upgrade then re-start vxdclid daemon using following command
#/opt/VRTSsfmh/etc/vxdcli.sh start

REMOVING THE PATCH
------------------
Run the Uninstaller script to automatically remove the patch:
------------------------------------------------------------
To uninstall the patch perform the following step on at least one node in the cluster:
# /opt/VRTS/install/uninstallVRTSvxvm731P100 [<host1> <host2>...]

Remove the patch manually:
-------------------------
1. Check whether root support or DMP native support is enabled or not:
# vxdmpadm gettune dmp_native_support
If the current value is &quot;on&quot;, DMP native support is enabled on this machine.
# vxdmpadm native list vgname=rootvg
If the output is some list of hdisks, root support is enabled on this machine
If disabled: goto step 3.
If enabled: goto step 2.
2. If root support or DMP native support is enabled:
a. It is essential to disable DMP native support.
Run the following command to disable DMP native support as well as root support
# vxdmpadm settune dmp_native_support=off
b. If only root support is enabled, run the following command to disable root support
# vxdmpadm native disable vgname=rootvg
c. Reboot the system
# reboot
3.
a. Before backing out patch, stop the VEA server's vxsvc process:
# /opt/VRTSob/bin/vxsvcctrl stop
b. If your system has Veritas Operation Manager(VOM) configured then check whether vxdclid daemon is running, if it is running then stop vxdclid daemon.
Command to check the status of vxdclid daemon
#/opt/VRTSsfmh/etc/vxdcli.sh status
Command to stop the vxdclid daemon
#/opt/VRTSsfmh/etc/vxdcli.sh stop
c. To reject the patch if it is in &quot;APPLIED&quot; state, use the following command and re-enable DMP support
# installp -r VRTSvxvm 7.3.1.100
d. # reboot
e. If you have stopped vxdclid daemon before upgrade then re-start vxdclid daemon using following command
#/opt/VRTSsfmh/etc/vxdcli.sh start

SPECIAL INSTRUCTIONS
--------------------
NONE

OTHERS
------
NONE