Storage Foundation HA on Linux, patch detail

To use SORT, JavaScript must be enabled. How to enable JavaScript.
sfha-rhel6_x86_64-Patch-6.1.1.700 Go to Download Center to download.
Basic information
Release type:	Patch
Release date:	2018-08-01
OS update support:	RHEL6 x86-64 Update 10
Technote:	None
Documentation:	None
Popularity:	3163 viewed downloaded
Download size:	74.87 MB
Checksum:	2228805314
Applies to one or more of the following products:
Cluster Server 6.1 On RHEL6 x86-64
Dynamic Multi-Pathing 6.1 On RHEL6 x86-64
File System 6.1 On RHEL6 x86-64
Storage Foundation 6.1 On RHEL6 x86-64
Storage Foundation Cluster File System 6.1 On RHEL6 x86-64
Storage Foundation for Oracle RAC 6.1 On RHEL6 x86-64
Storage Foundation HA 6.1 On RHEL6 x86-64
Volume Manager 6.1 On RHEL6 x86-64
Obsolete patches, incompatibilities, superseded patches, or other requirements:
This patch supersedes the following patches:	Release date
fs-rhel6_x86_64-Patch-6.1.1.400 (obsolete)	2016-08-02
fs-rhel6_x86_64-6.1.1.200 (obsolete)	2014-12-18
fs-rhel6_x86_64-6.1.1.100 (obsolete)	2014-11-25
This patch requires:	Release date
sfha-rhel6_x86_64-6.1.1	2014-07-24
Fixes the following incidents:
3370486, 3370758, 3372831, 3376505, 3383149, 3389433, 3389434, 3418489, 3422580, 3422584, 3422586, 3422604, 3422614, 3422619, 3422624, 3422626, 3422629, 3422634, 3422636, 3422638, 3422649, 3422657, 3424575, 3424619, 3430467, 3436431, 3436433, 3440232, 3444900, 3445234, 3445235, 3445236, 3446001, 3446112, 3446126, 3447244, 3447245, 3447306, 3447530, 3447894, 3449714, 3452709, 3452727, 3452811, 3453105, 3453163, 3455455, 3456729, 3457363, 3458036, 3458677, 3458799, 3466074, 3470254, 3470255, 3470260, 3470262, 3470265, 3470270, 3470272, 3470273, 3470274, 3470275, 3470276, 3470279, 3470282, 3470286, 3470290, 3470292, 3470300, 3470301, 3470303, 3470304, 3470321, 3470322, 3470345, 3470346, 3470347, 3470350, 3470352, 3470353, 3470354, 3470382, 3470383, 3470384, 3470385, 3484547, 3490147, 3492363, 3494534, 3498795, 3502847, 3504362, 3506487, 3506660, 3506679, 3506707, 3506709, 3506718, 3506747, 3507583, 3512292, 3515438, 3518943, 3519809, 3520113, 3521945, 3522003, 3522541, 3527969, 3528770, 3529243, 3529852, 3530038, 3530046, 3531906, 3532861, 3536233, 3536289, 3538792, 3539906, 3540122, 3541125, 3541262, 3543944, 3583963, 3607232, 3617774, 3617776, 3617781, 3617788, 3617790, 3617793, 3617877, 3620279, 3620284, 3620288, 3621420, 3628867, 3632970, 3636210, 3640641, 3644006, 3645825, 3646467, 3652109, 3660421, 3662397, 3674567, 3682022, 3728108, 3729172, 3729811, 3736352, 3765326, 3778391, 3790099, 3790115, 3790117, 3794198, 3795623, 3800388, 3800421, 3808593, 3812272, 3831498, 3851511, 3852147, 3852733, 3852736, 3852800, 3854390, 3854526, 3856384, 3857121, 3859012, 3861494, 3861521, 3864007, 3864010, 3864013, 3864035, 3864036, 3864037, 3864038, 3864040, 3864041, 3864042, 3864093, 3864112, 3864146, 3864148, 3864150, 3864151, 3864153, 3864154, 3864155, 3864156, 3864160, 3864161, 3864163, 3864164, 3864166, 3864167, 3864170, 3864172, 3864173, 3864177, 3864178, 3864179, 3864182, 3864184, 3864185, 3864186, 3864247, 3864248, 3864250, 3864255, 3864256, 3864259, 3864260, 3864480, 3864625, 3864628, 3864980, 3864986, 3864987, 3864988, 3864989, 3865411, 3865415, 3865631, 3865633, 3865640, 3865653, 3866241, 3866595, 3866643, 3866651, 3866675, 3866968, 3867134, 3867137, 3867315, 3867706, 3867709, 3867710, 3867711, 3867712, 3867714, 3867881, 3867925, 3869659, 3870704, 3872661, 3874662, 3874736, 3874961, 3875113, 3875161, 3875230, 3875458, 3875564, 3875633, 3876065, 3877070, 3877142, 3878983, 3889639, 3890226, 3890251, 3890262, 3890556, 3890659, 3918062, 3918063, 3918064, 3918066, 3953563, 3953564, 3953565, 3953566, 3953847, 3953853, 3953861, 3953862, 3953863, 3953864, 3953865
Patch ID:
VRTSaslapm-6.1.1.620-RHEL6
VRTSvxvm-6.1.1.600-RHEL6
VRTSvxfs-6.1.1.500-RHEL6
VRTSodm-6.1.1.500-RHEL6
VRTSglm-6.1.1.100-RHEL6
VRTSgms-6.1.1.100-RHEL6
VRTSllt-6.1.1.500-RHEL6
VRTSgab-6.1.0.400-Linux_RHEL6
VRTSvxfen-6.1.1.300-RHEL6
VRTSamf-6.1.1.300-RHEL6
VRTSdbac-6.1.1.200-RHEL6
Readme file
                          * * * READ ME * * *
            * * * Symantec Storage Foundation HA 6.1.1 * * *
                      * * * Patch 6.1.1.700 * * *
                         Patch Date: 2018-07-13


This document provides the following information:

   * PATCH NAME
   * OPERATING SYSTEMS SUPPORTED BY THE PATCH
   * PACKAGES AFFECTED BY THE PATCH
   * BASE PRODUCT VERSIONS FOR THE PATCH
   * SUMMARY OF INCIDENTS FIXED BY THE PATCH
   * DETAILS OF INCIDENTS FIXED BY THE PATCH
   * INSTALLATION PRE-REQUISITES
   * INSTALLING THE PATCH
   * REMOVING THE PATCH


PATCH NAME
----------
Symantec Storage Foundation HA 6.1.1 Patch 6.1.1.700


OPERATING SYSTEMS SUPPORTED BY THE PATCH
----------------------------------------
RHEL6 x86-64


PACKAGES AFFECTED BY THE PATCH
------------------------------
VRTSamf
VRTSaslapm
VRTSdbac
VRTSgab
VRTSglm
VRTSgms
VRTSllt
VRTSodm
VRTSvxfen
VRTSvxfs
VRTSvxvm


BASE PRODUCT VERSIONS FOR THE PATCH
-----------------------------------
   * Symantec Cluster Server 6.1
   * Symantec Dynamic Multi-Pathing 6.1
   * Symantec File System 6.1
   * Symantec Storage Foundation 6.1
   * Symantec Storage Foundation Cluster File System HA 6.1
   * Symantec Storage Foundation for Oracle RAC 6.1
   * Symantec Storage Foundation HA 6.1
   * Veritas Volume Manager 6.1


SUMMARY OF INCIDENTS FIXED BY THE PATCH
---------------------------------------
Patch ID: VRTSvxvm-6.1.1.600
* 3953847 (3951290) Retpoline support for VxVM on RHEL6.10 and RHEL6.x retpoline
kernels
* 3890226 (3880936) vxvmconvert hangs on RHEL 6.8.
* 3890251 (3478478) While Performing LVM to VxVM conversion, it fails to prepare the volume group for conversion.
* 3890262 (3879234) dd read on the Veritas Volume Manager (VxVM) character device fails with
Input/Output error while accessing end of device.
* 3953853 (3951938) Retpoline support for ASLAPM rpm on RHEL6.10 and RHEL6.x
retpoline
kernels
* 3607232 (3596330) 'vxsnap refresh' operation fails with `Transaction aborted waiting for IO
drain` error
* 3640641 (3622068) After mirroring encapsulated root disk, rootdg fails to import if any disk in
the disk group becomes unavailable.
* 3662397 (3662392) In the Cluster Volume Manager (CVM) environment, if I/Os are getting executed
on slave node, corruption can happen when the vxdisk resize(1M) command is
executing on the master node.
* 3674567 (3636089) VVR(Veritas Volume Replicator) secondary was panicked due to a bad mutex
produced by vxconfigd while committing the transactions.
* 3682022 (3648719) The server panics while adding or removing LUNs or HBAs.
* 3729172 (3726110) On systems with high number of CPUs, Dynamic Multi-Pathing (DMP) devices may perform considerably slower than OS device paths.
* 3736352 (3729078) VVR(Veritas Volume Replication) secondary site panic occurs during patch
installation because of flag overlap issue.
* 3778391 (3565212) IO failure is seen during controller giveback operations
on Netapp Arrays in ALUA mode.
* 3790099 (3581646) Logical Volumes fail to migrate back to OS devices Dynamic Multipathing (DMP) when DMP native support is disabled while root("/") is mounted on
LVM.
* 3790115 (3623617) The command "vxdmpadm settune dmp_native_support=on" may fail with a perl error
* 3790117 (3776520) Filters are not updated properly in lvm.conf file in VxDMP initrd (initial ramdisk) while Dynamic Multipathing (DMP) Native Support is being
enabled.
* 3795623 (3795622) With Dynamic Multipathing (DMP) Native Support enabled, Logical Volume Manager (LVM) global_filter is not updated properly in lvm.conf file.
* 3800388 (3581264) VxVM package uninstallation succeeds even if DMP Native Support is enable and
IO's are in progress on LV.
* 3800421 (3762580) In Linux kernels greater than or equal to RHEL6.6 (e.g. RHEL7 and SLES11SP3), the vxfen module fails to register the SCSI-3 PR keys to EMC devices when powerpath co-exists
with DMP (Dynamic Multi-Pathing).
* 3808593 (3795788) Performance degrades when many application sessions open the same data file on the VxVMvolume.
* 3812272 (3811946) When invoking "vxsnap make" command with cachesize option to create space optimized snapshot, the command succeeds but a plex I/O error message is displayed in syslog.
* 3852147 (3852146) Shared DiskGroup(DG) fails to import when "-c" and "-o noreonline" options
are
specified together
* 3852800 (3819670) When smartmove with 'vxevac' command is run in background by hitting 'ctlr-z' key and 'bg' command, the execution of 'vxevac' is terminated abruptly.
* 3854390 (3853049) The display of stats delayed beyond the set interval for vxstat and multiple
sessions of vxstat impacted the IO performance.
* 3854526 (3544020) Volume Manager (VxVM) tunables are getting reset to default after patch upgrade.
* 3856384 (3525490) System panic occurs when partial data received by VVR for incorrect type casting.
* 3857121 (3857120) Oracle database process hangs when being stopped.
* 3859012 (3859009) global_filter of lvm.conf is not updated due to some paths of LVM dmpnode are
reused during DDL(Device Discovery Layer) discovery cycle.
* 3861494 (3557636) Sequential execution of udev-rules may delay the processing of udev events, this
may cause probable data corruption due to IO getting issued to incorrect disk.
* 3864093 (3825467) SLES11-SP4 build fails.
* 3864112 (3860503) Poor performance of vxassist mirroring is observed on some high end servers.
* 3864480 (3823283) While unencapsulating a boot disk in SAN environment (Storage Area etwork),
Linux operating system sticks in grub after reboot.
* 3864625 (3599977) During a replica connection, referencing a port that is already deleted in another thread causes a system panic.
* 3864628 (3686698) vxconfigd was getting hung due to deadlock between two threads
* 3864980 (3594158) The spinlock and unspinlock are referenced to different objects when interleaving with a kernel transaction.
* 3864986 (3621232) The vradmin ibc command cannot be started or executed on Veritas Volume Replicators (VVR) secondary node.
* 3864987 (3513392) Reference to replication port that is already deleted caused panic.
* 3864988 (3625890) vxdisk resize operation on CDS disks fails with an error message of "Invalid
attribute specification"
* 3864989 (3521726) System panicked for double freeing IOHINT.
* 3865411 (3665644) The system panics due to an invalid page pointer in the Linux bio structure.
* 3865415 (3736502) Memory leakage is found when transaction aborts.
* 3865631 (3564260) VVR commands are unresponsive when replication is paused and resumed in a loop.
* 3865633 (3721565) vxconfigd hang is seen.
* 3865640 (3749557) System hangs because of high memory usage by vxvm.
* 3865653 (3795739) In a split brain scenario, cluster formation takes very long time.
* 3866241 (3790136) File system hang observed due to IO's in Ditry Region Logging (DRL).
* 3866595 (3866051) Driver name over 32 bytes may cause vxconfigd unable to startup
* 3866643 (3539548) After adding removed MPIO disk back, 'vxdisk list' or 'vxdmpadm listctlr all'
commands may show duplicate entry for DMP node with error state.
* 3866651 (3596282) Snap operations fail with error "Failed to allocate a new map due to no free
map available in DCO".
* 3866675 (3840359) Some VxVM commands fail on using the localized messages.
* 3867134 (3486861) Primary node panics when storage is removed while replication is going on with heavy
IOs.
* 3867137 (3674614) Restarting the vxconfigd(1M) daemon on the slave (joiner) node during node-join
operation may cause the vxconfigd(1M) daemon to become unresponsive on the
master and the joiner node.
* 3867315 (3672759) The vxconfigd(1M) daemon may core dump when DMP database is corrupted.
* 3867706 (3645370) vxevac command fails to evacuate disks with Dirty Region Log(DRL) plexes.
* 3867709 (3767531) In Layered volume layout with FSS configuration, when few
of the FSS_Hosts are rebooted, Full resync is happening for non-affected disks
on master.
* 3867710 (3788644) Reuse raw device number when checking for available raw devices.
* 3867711 (3802750) VxVM (Veritas Volume Manager) volume I/O-shipping functionality is not disabled even after the user issues the correct command to disable it.
* 3867712 (3807879) User data corrupts because of the writing of the backup EFT GPT disk label
during the VxVM disk-group flush operation.
* 3867714 (3819832) No syslog message seen when dmp detects controller
disabled/enabled
* 3867881 (3867145) When VVR SRL occupation > 90%, then output the SRL occupation is shown by 10
percent.
* 3867925 (3802075) Foreign disks with name having digit in it, defined by udev rules goes into error
state after vxdisk scandisks/
* 3869659 (3868444) Disk header timestamp is updated even if the disk group(DG) import fails.
* 3874736 (3874387) Disk header information is not logged to the syslog
sometimes even if the disk is missing and dg import fails.
* 3874961 (3871750) In parallel VxVM vxstat commands report abnormal disk IO statistic data
* 3875113 (3816219) VxDMP event source daemon keeps reporting UDEV change event in syslog.
* 3875161 (3860184) Turning off DMP(Dynamic Multipathing) Native support succeeds even I/Os are in
progress on LV.
* 3875230 (3554608) Mirroring a volume on 6.1 creates a larger plex than the original.
* 3875564 (3875563) While dumping the disk header information, human readable
timestamp was not converted correctly from corresponding epoch time.
* 3632970 (3631230) VRTSvxvm patch version 6.0.5 and 6.1.1 or previous will not work with RHEL6.6
update.
* 3372831 (2573229) On RHEL6, the server panics when Dynamic Multi-Pathing (DMP) executes
PERSISTENT RESERVE IN command with REPORT CAPABILITIES service action on
powerpath controlled device.
* 3440232 (3408320) Thin reclamation fails for EMC 5875 arrays.
* 3457363 (3462171) When SCSI-3 persistent reservation command 'ioctls' are
issued on non-SCSI devices, dmpnode gets disabled.
* 3470254 (3370637) In VxVM, the SmartIO feature gets enabled for volumes with type as root/swap that are under VxVM control.
* 3470255 (2847520) The resize operation on a shared linked-volume can cause data corruption on the target volume.
* 3470260 (3415188) I/O hangs during replication in Veritas Volume Replicator (VVR).
* 3470262 (3077582) A Veritas Volume Manager (VxVM) volume may become inaccessible causing the read/write operations to fail.
* 3470265 (3326964) VxVM hangs in Clustered Volume Manager (CVM) environments in the presence of FMR operations.
* 3470270 (3403390) After a crash, the linked-to volume goes into NEEDSYNC state.
* 3470272 (3385753) Replication to the Disaster Recovery (DR) site hangs even
though Replication links (Rlinks) are in the connected state.
* 3470273 (3416404) The vxdisksetup(1M) command shows a warning message about disk geometry mismatch on console.
* 3470274 (3373208) DMP wrongly sends the SCSI PR OUT command with APTPL bit value as A0A to arrays.
* 3470275 (3417044) System becomes unresponsive while creating a VVR TCP
connection.
* 3470276 (3287940) Logical unit number (LUN) from EMC CLARiiON array and having NR (Not Ready) state are shown in the state of online invalid by Veritas Volume Manager (VxVM).
* 3470279 (3300418) VxVM volume operations on shared volumes cause unnecessary read I/Os.
* 3470282 (3374200) A system panic or exceptional IO delays are observed while executing snapshot operations, such as, refresh.
* 3470286 (3417164) The virtual machine cache area size cannot be shrunk when the data size in the cache is more than the resize value.
* 3470290 (2999871) The vxinstall(1M) command gets into a hung state when it is
invoked through Secure Shell (SSH) remote execution.
* 3470292 (3416098) The vxvmconvert utility throws error during execution.
* 3470300 (3340923) For Asymmetric Logical Unit Access (ALUA) array type Logical Unit Numbers (LUN), Dynamic Multi-Pathing (DMP) disables and enables the unavailable asymmetric access state paths on I/O load.
* 3470301 (2812161) In a Veritas Volume Replicator (VVR) environment, after the Rlink is detached,
the vxconfigd(1M) daemon on the secondary host may  hang.
* 3470303 (3314647) The vxcdsconvert(1M)command fails with error: Plex column offset is not strictly increasing for column/plex.
* 3470304 (3390162) The NetMapper tool which scans the User Datagram Protocol (UDP) port 4145 causes the vxnetd daemon to consume 100% CPU, the rlink disconnects and finally the system hang.
* 3470321 (3336714) The slab of I/O request in  Linux  may get corrupted.
* 3470322 (3399323) The reconfiguration of Dynamic Multipathing (DMP) database fails.
* 3470345 (3281004) For DMP minimum queue I/O policy with large number of CPUs a couple of issues
are observed.
* 3470347 (3444765) In Cluster Volume Manager (CVM), shared volume recovery may take long time for large configurations.
* 3470350 (3437852) The system panics when Symantec Replicator Option goes to
PASSTHRU mode.
* 3470352 (3450758) The slave node was not able to join CVM cluster and resulted in panic.
* 3470353 (3236772) Heavy I/O loads on primary sites result in
transaction/session timeouts between the primary and secondary sites.
* 3470354 (3446415) A pool may get added to the file system when the file system
shrink operation is performed on FileStore.
* 3470382 (3368361) When site consistency is configured within a private disk group and CVM is up,
the reattach operation of a detached site fails.
* 3470383 (3455460) The vxfmrshowmap and verify_dco_header utilities fail with an error.
* 3470384 (3440790) The vxassist(1M) command with parameter mirror and the vxplex command(1M) with parameter att hang.
* 3470385 (3373142) Updates to vxassist and vxedit man pages for behavioral
changes after 6.0.
* 3490147 (3485907) Panic occurs in the I/O code path.
* 3492363 (3402487) The page-size allocation fails in Fast Mirror Resynchronization (FMR) operation due to fragmentation or size limitation.
* 3506660 (3325022) The VirtIO-disk interface exported disks from an SLES11 SP2 or SLES11 SP3 host are not visible.
* 3506679 (3435225) In a given CVR setup, rebooting the master node causes one of
the slaves to panic.
* 3506707 (3400504) Upon disabling the host side Host Bus Adapter (HBA) port,
extended attributes of some devices are not seen anymore.
* 3506709 (3259732) In a CVR environment, rebooting the primary slave followed by connect-disconnect
in loop causes rlink to detach.
* 3506718 (3433931) The AvxvmconvertA utility fails to get the correct LVM version.
* 3506747 (3496077) The vxvmconvert(1m) command fails with an error message while converting Logical Volume Manager (LVM) Volume Groups (VG) into VxVM disk group.
* 3515438 (3535309) VRTSvxvm patch failed to install on sfcache configure system
while upgrading SFRAC from 6.1 to 6.1.1.
* 3522541 (3533888) If default value is not choosen, the vxunroot(1M) throws error and goes into infinite loop.
* 3531906 (3526500) Disk IO failures occur with DMP IO timeout error messages when DMP (Dynamic Multi-pathing) IO statistics demon is not running.
* 3536289 (3492062) Dynamic Multi-Pathing (DMP) fails to get page 0x83 LUN identifier for EMC symmetrix LUNS and continuously logs error messages.
* 3538792 (3528498) When the Veritas Volume Manager (VxVM) smartIO feature is disabled on a VxVM volume, the system panics.
* 3539906 (3394933) The vxrecover(1M) command does not trigger volume recovery operations (plex attach, volume resynchronization, etc)in parallel for remote evices during node join operation in Cluster Volume Manager (CVM), which results in longer time for recoveries.
* 3540122 (3482026) The vxattachd(1M) daemon reattaches plexes of manually detached site.
* 3541262 (3543284) Storage devices are not visible in the vxdisk list or the vxdmpadm getdmpnode outputs.
* 3543944 (3520991) The vxconfigd(1M) daemon dumps core due to memory corruption.
* 3444900 (3399131) For PowerPath (PP) enclosure, both DA_TPD and DA_COEXIST_TPD flags are set.
* 3445234 (3358904) The system with Asymmetric Logical Unit Access (ALUA) enclosures sometimes panics during path fault scenarios.
* 3445235 (3374117) I/O hangs on VxVM SmartIOenabled data volume with one detached plex.
* 3445236 (3381006) The system panics when it collects stats for SmartIO by the VxIO driver.
* 3446001 (3380481) When you select a removed disk during the "5 Replace a failed or removed disk" operation, the vxdiskadm(1M) command displays an error message.
* 3446112 (3288744) In a Flexible Storage Sharing (FSS) diskgroup, whenever a new mirror is added to a volume, the Data Change Object (DCO) associated with the volume is not mirrored.
* 3446126 (3338208) The writes from fenced out LDOM guest node on Active-Passive (AP/F) shared storage device fails with an unexpected error.
* 3447244 (3427171) When I/Os are issued on a volume associated with the Veritas Volume Manager (VxVM) block level SmartIO caching immediately after a system reboot, a system panic happens.
* 3447245 (3385905) Data corruption occurs after VxVM makes cache area offline and online again without a reboot.
* 3447306 (3424798) Veritas Volume Manager (VxVM) mirror attach operations
(e.g., plex attach, vxassist mirror, and third-mirror break-off snapshot
resynchronization) may take longer time under heavy application I/O load.
* 3447530 (3410839) In FSS (Flexible Storage Sharing) environment, volume allocation operations may lead to full mirror synchronization for the layered volume recovery, when any node goes down.
* 3447894 (3353211) A. After EMC Symmetrix BCV (Business Continuance Volume) device switches to
read-write mode, continuous vxdmp (Veritas Dynamic Multi Pathing) error messages
flood syslog.
B. DMP metanode/path under DMP metanode gets disabled unexpectedly.
* 3449714 (3417185) Rebooting the host, after the exclusion of a dmpnode while I/O is in progress on it, leads to the vxconfigd(1M) to dump core.
* 3452709 (3317430) The vxdiskunsetup(1M) utility throws error after upgradation from 5.1SP1RP4.
* 3452727 (3279932) The vxdisksetup and vxdiskunsetup utilities were failing on
disk which is part of a deported disk group (DG), even if "-f" option is specified.
* 3452811 (3445120) Change tunable VOL_MIN_LOWMEM_SZ value to trigger early readback.
* 3453105 (3079819) vxconfigbackup and vxconfigrestore fail on FSS(Flexible Storage Sharing) disk groups with remote disks.
* 3453163 (3331769) The vxconfigrestore(1M) command does not restore the latest configuration.
* 3455455 (3409612) The value of reclaim_on_delete_start_time cannot be set to
values outside the range: 22:00-03:59
* 3456729 (3428025) When heavy parallel I/O load is issued, the system that runs Symantec Replication Option (VVR) and is configured as VVR primary crashes.
* 3458036 (3418830) A node boot-up hangs while starting the vxconfigd(1M) daemon.
* 3458799 (3197987) When Avxddladm assign names file=<filename>A is executed and the file has one or more invalid values
for enclosure vendor ID or product ID, vxconfigd(1M) dumps core.
* 3470346 (3377383) The vxconfigd crashes when a disk under Dynamic Multi-pathing
(DMP) reports device failure.
* 3484547 (3383625) When a cluster node that contributes the storage to the FSS (Flexible Storage Sharing) disk group rejoins the cluster, the local disks brought back by that node do not get reattached.
* 3498795 (3394926) vx* commands hang when mirror-stripe format is used after a reboot of master node.
Patch ID: VRTSdbac-6.1.1.200
* 3953865 (3951435) Support for RHEL 6.10 and RHEL 6.x RETPOLINE kernels.
* 3831498 (3850806) 6.1.1 vcsmm module does not load with RHEL6.7 (2.6.32-573.el6.x86_64 kernel)
Patch ID: VRTSamf-6.1.1.300
* 3953864 (3951435) Support for RHEL 6.10 and RHEL 6.x RETPOLINE kernels.
* 3918066 (3918061) Veritas Infoscale Availability does not support Red Hat Enterprise Linux 6 Update 9 
(RHEL6.9).
* 3794198 (3794154) Veritas Cluster Server (VCS) does not support Red Hat Enterprise Linux 6 Update 7
(RHEL6.7).
* 3389433 (3338946) The Process resource fails to register for offline monitoring
with the AMF kernel driver.
* 3389434 (3341320) The "Cannot delete event (rid %d) in reaper" error message is
repeatedly logged in the Syslog file.
Patch ID: VRTSvxfen-6.1.1.300
* 3953863 (3951435) Support for RHEL 6.10 and RHEL 6.x RETPOLINE kernels.
* 3918064 (3918061) Veritas Infoscale Availability does not support Red Hat Enterprise Linux 6 Update 9 
(RHEL6.9).
* 3794198 (3794154) Veritas Cluster Server (VCS) does not support Red Hat Enterprise Linux 6 Update 7
(RHEL6.7).
* 3370486 (3031216) The dash (-) in a disk group name causes vxfentsthdw(1M) and
Vxfenswap(1M) utilities to fail.
* 3530046 (3471571) Cluster nodes may panic if you stop the HAD process by force on a node and 
reboot that node.
* 3532861 (3532859) The Coordpoint agent monitor fails if the cluster has a large
number of coordination points.
Patch ID: VRTSgab-6.1.0.400
* 3953862 (3951435) Support for RHEL 6.10 and RHEL 6.x RETPOLINE kernels.
* 3918063 (3918061) Veritas Infoscale Availability does not support Red Hat Enterprise Linux 6 Update 9 
(RHEL6.9).
* 3794198 (3794154) Veritas Cluster Server (VCS) does not support Red Hat Enterprise Linux 6 Update 7
(RHEL6.7).
* 3728108 (3728106) On Linux, the value corresponding to 15 minute CPU load average as shown in /proc/loadavg file wrongly increases to about 4.
Patch ID: VRTSllt-6.1.1.500
* 3953861 (3951435) Support for RHEL 6.10 and RHEL 6.x RETPOLINE kernels.
* 3918062 (3918061) Veritas Infoscale Availability does not support Red Hat Enterprise Linux 6 Update 9 
(RHEL6.9).
* 3889639 (3877459) Veritas Cluster Server (VCS) does not support Red Hat Enterprise Linux 6 Update 
8
(RHEL6.8).
* 3794198 (3794154) Veritas Cluster Server (VCS) does not support Red Hat Enterprise Linux 6 Update 7
(RHEL6.7).
* 3646467 (3642131) VCS support for RHEL 6.6
* 3376505 (3410309) The LLT driver fails to load and logs a message in the syslog.
* 3458677 (3460985) The system panics and logs an error message in the syslog.
Patch ID: VRTSglm-6.1.1.100
* 3953565 (3951759) GLM support for RHEL6.10 and RHEL6.x retpoline kernels
Patch ID: VRTSgms-6.1.1.100
* 3953566 (3951761) GMS support for RHEL6.10 and RHEL6.x retpoline kernels
Patch ID: VRTSodm-6.1.1.500
* 3953564 (3951754) ODM support for RHEL6.10 and RHEL6.x retpoline kernels
* 3864151 (3716577) Failed thread fork causing ODM ERROR V-41-4-1-354-22
* 3864248 (3757609) CPU usage going high because of contention over ODM_IO_LOCK
* 3466074 (3349649) The Oracle Disk Manager (ODM) module fails to load on RHEL6.5
* 3507583 (3521933) Internal conformance testing with SSD cache enabled fails.
* 3527969 (3529371) The package verification of VRTSodm on Linux fails.
* 3424619 (3349649) ODM modules fail to load on RHEL6.5
Patch ID: VRTSvxfs-6.1.1.500
* 3953563 (3951752) VxFS support for RHEL6.10 and RHEL6.x retpoline kernels
* 3652109 (3553328) During internal testing full fsck failed to clean the file
system cleanly.
* 3729811 (3719523) 'vxupgrade' retains the superblock replica of old layout versions.
* 3765326 (3736398) NULL pointer dereference panic in lazy unmount.
* 3852733 (3729158) Deadlock occurs due to incorrect locking order between write 
advise and dalloc flusher thread.
* 3852736 (3457801) Kernel panics in block_invalidatepage().
* 3861521 (3549057) The "relatime" mount option is shown in /proc/mounts but it is
not supported by VxFS.
* 3864007 (3558087) The ls -l and other commands which uses stat system call may
take long time to complete.
* 3864010 (3269553) VxFS returns inappropriate message for read of hole via 
Oracle Disk Manager (ODM).
* 3864013 (3811849) System panics while executing lookup() in a directory with large directory hash(LDH).
* 3864035 (3790721) High cpu usage caused by vx_send_bcastgetemapmsg_remaus
* 3864036 (3233276) With a large file system, primary to secondary migration takes longer duration.
* 3864037 (3616907) System is unresponsive causing the NMI watchdog service to 
stall.
* 3864038 (3596329) Fix native-aio races with exiting threads
* 3864040 (3633683) vxfs thread consumes high CPU while running an 
application 
that makes excessive sync() calls.
* 3864041 (3613048) Support vectored AIO on Linux
* 3864042 (3466020) File system is corrupted with an error message "vx_direrr: vx_dexh_keycheck_1".
* 3864146 (3690078) The system panics at vx_dev_strategy() routine due to stack overflow.
* 3864148 (3695367) Unable to remove volume from multi-volume VxFS using "fsvoladm" command.
* 3864150 (3602322) System panics while flushing the dirty pages of the inode.
* 3864153 (3685391) Execute permissions for a file not honored correctly.
* 3864154 (3689354) Users having write permission on file cannot open the file
with O_TRUNC if the file has setuid or setgid bit set.
* 3864155 (3707662) Race between reorg processing and fsadm timer thread (alarm expiry) leads to panic in vx_reorg_emap.
* 3864156 (3662284) File Change Log (FCL) read may retrun ENXIO.
* 3864160 (3691633) Remove RCQ Full messages
* 3864161 (3708836) fallocate causes data corruption
* 3864163 (3712961) Stack overflow is detected in vx_dio_physio() right after submitting the I/O.
* 3864164 (3762125) Directory size increases abnormally.
* 3864166 (3731844) umount -r option fails for vxfs 6.2.
* 3864167 (3735697) vxrepquota reports error
* 3864170 (3743572) File system may get hang when reaching 1 billion inode 
limit
* 3864172 (3808091) fallocate caused data corruption
* 3864173 (3779916) vxfsconvert fails to upgrade layout verison for a vxfs file 
system with large number of inodes.
* 3864177 (3808033) When using 6.2.1 ODM on RHEL7, Oracle resource cannot be killed after forced umount via VCS.
* 3864178 (1428611) 'vxcompress' can spew many GLM block lock messages over the 
LLT network.
* 3864179 (3622323) Cluster Filesystem mounted as read-only panics when it gets sharing and/or compression statistics with the fsadm_vxfs(1M) command.
* 3864182 (3853338) Files on VxFS are corrupted while running the sequential
asynchronous write workload under high memory pressure.
* 3864184 (3857444) The default permission of /etc/vx/vxfssystem file is incorrect.
* 3864185 (3859032) System panics in vx_tflush_map() due to NULL pointer 
de-reference.
* 3864186 (3855726) Panic in vx_prot_unregister_all().
* 3864247 (3861713) High %sys CPU seen on Large CPU/Memory configurations.
* 3864250 (3833816) Read returns stale data on one node of the CFS.
* 3864255 (3827491) Data relocation is not executed correctly if the IOTEMP policy is set to AVERAGE.
* 3864256 (3830300) Degraded CPU performance during backup of Oracle archive logs
on CFS vs local filesystem
* 3864259 (3856363) Filesystem inodes have incorrect blocks.
* 3864260 (3846521) "cp -p" fails if modification time in nano seconds have 10 
digits.
* 3866968 (3866962) Data corruption seen when dalloc writes are going on the file and 
simultaneously fsync started on the same file.
* 3870704 (3867131) Kernel panic in internal testing.
* 3872661 (3857254) Assert failure because of missed flush before taking 
filesnap of the file.
* 3874662 (3871489) Performance issue observed when number of HBAs increased
on high end servers.
* 3875458 (3616694) Internal assert failure because of race condition between forced 
unmount thread and inactive processing thread.
* 3875633 (3869174) Write system call deadlock on rhel5 and sles10.
* 3876065 (3867128) Assert failed in internal native AIO testing.
* 3877070 (3880121) Internal assert failure when coalescing the extents on clone.
* 3877142 (3891801) Internal test hit debug assert.
* 3878983 (3872202) VxFS internal test hits an assert.
* 3890556 (2919310) During stress testing on cluster file system, an assertion failure was hit 
because of a missing linkage between the directory and the associated 
attribute inode.
* 3890659 (3514407) Internal stress test hit debug assert.
* 3851511 (3821686) VxFS module failed to load on SLES11 SP4.
* 3660421 (3660422) On RHEL 6.6, umount(8) system call hangs if an application is watching for inode
events using inotify(7) APIs.
* 3520113 (3451284) Internal testing hits an assert "vx_sum_upd_efree1"
* 3521945 (3530435) Panic in Internal test with SSD cache enabled.
* 3529243 (3616722) System panics because of race between the writeback cache offline thread and the writeback data flush thread.
* 3536233 (3457803) File System gets disabled intermittently with metadata IO error.
* 3583963 (3583930) When the external quota file is restored or over-written, the old quota records are preserved.
* 3617774 (3475194) Veritas File System (VxFS) fscdsconv(1M) command fails with metadata overflow.
* 3617776 (3473390) The multiple stack overflows with Veritas File System (VxFS) on RHEL6 lead to panics or system crashes.
* 3617781 (3557009) After the fallocate() function reserves allocation space, it results in the wrong file size.
* 3617788 (3604071) High CPU usage consumed by the vxfs thread process.
* 3617790 (3574404) Stack overflow during rename operation.
* 3617793 (3564076) The MongoDB noSQL db creation fails with an ENOTSUP error.
* 3617877 (3615850) Write system call hangs with invalid buffer length
* 3620279 (3558087) The ls -l command hangs when the system takes backup.
* 3620284 (3596378) The copy of a large number of small files is slower on vxfs compared to ext4
* 3620288 (3469644) The system panics in the vx_logbuf_clean() function.
* 3621420 (3621423) The VxVM caching shouldnt be disabled while mounting a file system in a situation where the VxFS cache area is not present.
* 3628867 (3595896) While creating OracleRAC 12.1.0.2 database, the node panics.
* 3636210 (3633067) While converting from ext3 file system to VxFS using vxfsconvert, it is observed that many inodes are missing..
* 3644006 (3451686) During internal stress testing on cluster file system(CFS),
debug assert is hit due to invalid cache generation count on incore inode.
* 3645825 (3622326) Filesystem is marked with fullfsck flag as an inode is
marked bad during checkpoint promote
* 3370758 (3370754) Internal test with SmartIO write-back SSD cache hit debug asserts.
* 3383149 (3383147) The ACA operator precedence error may occur while turning off
delayed allocation.
* 3422580 (1949445) System is unresponsive when files were created on large directory.
* 3422584 (2059611) The system panics due to a NULL pointer dereference while
flushing bitmaps to the disk.
* 3422586 (2439261) When the vx_fiostats_tunable value is changed from zero to
non-zero, the system panics.
* 3422604 (3092114) The information output displayed by the "df -i" command may be inaccurate for 
cluster mounted file systems.
* 3422614 (3297840) A metadata corruption is found during the file removal process.
* 3422619 (3294074) System call fsetxattr() is slower on Veritas File System (VxFS) than ext3 file system.
* 3422624 (3352883) During the rename operation, lots of nfsd threads hang.
* 3422626 (3332902) While shutting down, the system running the fsclustadm(1M) 
command panics.
* 3422629 (3335272) The mkfs (make file system) command dumps core when the log 
size provided is not aligned.
* 3422634 (3337806) The find(1) command may panic the systems with Linux kernels
with versions greater than 3.0.
* 3422636 (3340286) After a file system is resized, the tunable setting of
dalloc_enable gets reset to a default value.
* 3422638 (3352059) High memory usage occurs when VxFS uses Veritas File Replicator (VFR) on the target even when no jobs are running.
* 3422649 (3394803) A panic is observed in VxFS routine vx_upgrade7() function
while running the vxupgrade command(1M).
* 3422657 (3412667) The RHEL 6 system panics with Stack Overflow.
* 3430467 (3430461) The nested unmounts fail if the parent file system is disabled.
* 3436431 (3434811) The vxfsconvert(1M) in VxFS 6.1 hangs.
* 3436433 (3349651) Veritas File System (VxFS) modules fail to load on RHEL 6.5 and display an error message.
* 3494534 (3402618) The mmap read performance on VxFS is slow
* 3502847 (3471245) The mongodb fails to insert any record.
* 3504362 (3472551) The attribute validation (pass 1d) of full fsck takes too much time to complete.
* 3506487 (3506485) The system does not allow write-back caching with Symantec Volume Replicator (VVR).
* 3512292 (3348520) In a Cluster File System (CFS) cluster having multi volume file system of a smaller size, execution of the fsadm command causes system hang if the free space in the file system is low.
* 3518943 (3534779) Internal stress testing on Cluster File System (CFS) hits a
debug assert.
* 3519809 (3463464) Internal kernel functionality conformance test hits a kernel panic due to null pointer dereference.
* 3522003 (3523316) The writeback cache feature does not work for write size of 2MB.
* 3528770 (3449152) Failed to set 'thin_friendly_alloc' tunable in case of cluster file system (CFS).
* 3529852 (3463717) Information regarding Cluster File System (CFS) that does not support the 'thin_friendly_alloc' tunable is not updated in the vxtunefs(1M) command  man page.
* 3530038 (3417321) The vxtunefs(1M) tunable man page gives an incorrect
* 3541125 (3541083) The vxupgrade(1M) command for layout version 10 creates
64-bit quota files with inappropriate permission configurations.
* 3424575 (3349651) Veritas File System (VxFS) modules fail to load on RHEL 6.5 and display an error message.
* 3418489 (3370720) Performance degradation is seen with Smart IO feature enabled.


DETAILS OF INCIDENTS FIXED BY THE PATCH
---------------------------------------
This patch fixes the following incidents:

Patch ID: VRTSvxvm-6.1.1.600

* 3953847 (Tracking ID: 3951290)

SYMPTOM:
Retpoline support for VxVM on RHEL6.10 and RHEL6.x retpoline
kernels

DESCRIPTION:
The RHEL6.10 is new release and it has Retpoline kernel. Also
redhat released retpoline kernel for older RHEL6.x releases. The VxVM module 
should be recompiled with retpoline aware GCC to support retpoline
kernel.

RESOLUTION:
Compiled VxVM with retpoline GCC.

* 3890226 (Tracking ID: 3880936)

SYMPTOM:
vxvmconvert hangs while analysing the LVM Volume Groups for further 
conversion.

DESCRIPTION:
The issue occurs due to additional messages in the duplicate warnings when 
executing "pvdisplay" command.
The "PV UUID"s are not properly extracted because the keyword used to extract 
them is not correct.

RESOLUTION:
The code has been modified to extract PV UUIDs correctly.

* 3890251 (Tracking ID: 3478478)

SYMPTOM:
While Coverting the lvm volume to VxVM volume it fails to prepare the lvm volumes for conversion.

<snippet of error on CLI while performing lvm to vxvm conversion>
..
VxVM  ERROR V-5-2-4614 Error while preparing the volume group for conversion.
...
vxlvmconv: making log directory /etc/vx/lvmconv/myvg.d/log.
vxlvmconv: starting conversion for VG "myvg" - Thu Jun 23 12:52:55 IST 2016
vxlvmconv: Checking disk connectivity
vxlvmconv: can not access disks  /dev/sdx.
vxlvmconv: Disk connectivity check failed. Conversion aborted.
..
<snippet of error on CLI while performing lvm to vxvm conversion>

DESCRIPTION:
while doing VxVM convert from lvm volume group to VxVM volume. It adds the disks to existing disk group and replaces lvm volumes to VxVM volumes.
It fails at second stage when it has to migrate extends, it looses the connectivity with the disks and cannot access it, and aborts the VxVM conversion 
due to it.

RESOLUTION:
Code changes have been made appropriately to overcome disk-connectivity.

* 3890262 (Tracking ID: 3879234)

SYMPTOM:
dd read on the Veritas Volume Manager (VxVM) character device fails with
Input/Output error while accessing end of device like below:

[root@dn pmansukh_debug]# dd if=/dev/vx/rdsk/hfdg/vol1 of=/dev/null bs=65K
dd: reading `/dev/vx/rdsk/hfdg/vol1': Input/output error
15801+0 records in
15801+0 records out
1051714560 bytes (1.1 GB) copied, 3.96065 s, 266 MB/s

DESCRIPTION:
The issue occurs because of the change in the Linux API
generic_file_aio_read. Because of lot of changes in Linux API
generic_file_aio_read,
it does not properly handle end of device reads/writes. The Linux code has
been changed to use blkdev_aio_read which is a GPL symbol and hence
cannot be used.

RESOLUTION:
Made changes in the code to handle end of device reads/writes properly.

* 3953853 (Tracking ID: 3951938)

SYMPTOM:
Retpoline support for ASLAPM on RHEL6.10 and RHEL6.x retpoline
kernels

DESCRIPTION:
The RHEL6.10 is new release and it has Retpoline kernel. Also
redhat released retpoline kernel for older RHEL6.x releases. The APM module
should be recompiled with retpoline aware GCC to support retpoline
kernel.

RESOLUTION:
Compiled APM with retpoline GCC.

* 3607232 (Tracking ID: 3596330)

SYMPTOM:
'vxsnap refresh' operation fails with following indicants:

Errors occur from DR (Disaster Recovery) Site of VVR (Veritas
Volume Replicator):

o   vxio: [ID 160489 kern.notice] NOTICE: VxVM vxio V-5-3-1576 commit:
Timedout waiting for rvg [RVG] to quiesce, iocount [PENDING_COUNT] msg 0
o   vxvm:vxconfigd: [ID 702911 daemon.warning] V-5-1-8011 Internal
transaction failed: Transaction aborted waiting for io drain

At the same time, following errors occur from Primary Site of VVR:

vxio: [ID 218356 kern.warning] WARNING: VxVM VVR vxio V-5-0-267 Rlink
[RLINK] disconnecting due to ack timeout on update message

DESCRIPTION:
VM (Volume Manager) Transactions on DR site get aborted as pending IOs could
not be drained in stipulated time leading to failure of FMR (Fast-Mirror
Resync) 'snap' operations. These IOs could not be drained because of IO
throttling. A bug/race in conjunction with timing in VVR causes a miss in
clearing this throttling condition/state.

RESOLUTION:
Code changes have been done to fix the race condition which ensures clearance
of throttling state at appropriate time.

* 3640641 (Tracking ID: 3622068)

SYMPTOM:
After mirroring encapsulated root disk, rootdg fails to import if any disk in
the disk group becomes unavailable.
It causes system becoming unbootable if /usr is mounted on a separate disk and
is also encapsulated.

DESCRIPTION:
Starting from VxVM(Veritas Volume Manager) 6.1, by default all non-detached
disks within a disk group should be accessible to the system so that the disk
group can be successfully imported. It impacts system boot with alternative
encapsulated boot disk when original encapsulated boot disk is unavailable.

RESOLUTION:
Code changes have been made to skip such check when trying to import rootdg.

* 3662397 (Tracking ID: 3662392)

SYMPTOM:
In the CVM environment, if I/Os are getting executed on slave node, corruption
can happen when the vxdisk resize(1M) command is executing on the master
node.

DESCRIPTION:
During the first stage of resize transaction, the master node re-adjusts the
disk offsets and public/private partition device numbers.
On a slave node, the public/private partition device numbers are not adjusted
properly. Because of this, the partition starting offset is are added twice
and causes the corruption. The window is small during which public/private
partition device numbers are adjusted. If I/O occurs during this window then
only corruption is observed. After the resize operation completes its execution,
no further corruption will happen.

RESOLUTION:
The code has been changed to add partition starting offset properly to an I/O
on slave node during execution of a resize command.

* 3674567 (Tracking ID: 3636089)

SYMPTOM:
VVR secondary was panicked while committing the transactions with the following
stack:
vpanic()
vol_rv_sec_write_childdone+0x18()
vol_rv_transaction_prepare+0x63c()
vol_commit_iolock_objects+0xbc()
vol_ktrans_commit+0x2d8()
volsioctl_real+0x2ac()

DESCRIPTION:
While unwinding updates in vol_rv_transaction_prepare(), double SIO(Staged IO)
done happens result of panic.

RESOLUTION:
Code changes were made to fix the issue.

* 3682022 (Tracking ID: 3648719)

SYMPTOM:
The server panics with a following stack trace while adding or removing LUNs or HBAs:
dmp_decode_add_path()
dmp_decipher_instructions()
dmp_process_instruction_buffer()
dmp_reconfigure_db()
gendmpioctl()
vxdmpioctl()

DESCRIPTION:
While deleting a dmpnode, Dynamic Multi-Pathing (DMP) releases the memory associated with the dmpnode structure.
In case the dmpnode doesn't get deleted for some reason, and if any other tasks access the freed memory of this dmpnode, then the server panics.

RESOLUTION:
The code is modified to avoid the tasks from accessing the memory that is freed by the dmpnode, which is deleted. The change also fixed the memory leak issue in the buffer allocation code path.

* 3729172 (Tracking ID: 3726110)

SYMPTOM:
On systems with high number of CPUs, DMP devices may perform  considerably slower than OS device  paths.

DESCRIPTION:
In high CPU configuration, I/O statistics related functionality in DMP takes more CPU time because DMP statistics are collected on per CPU basis. This stat collection happens in DMP I/O code path hence it reduces the I/O performance. Because of this, DMP devices perform slower than OS device paths.

RESOLUTION:
The code is modified to remove some of the stats collection functionality from DMP I/O code path. Along with this, the following tunable need to be turned off:
1. Turn off idle lun probing.
#vxdmpadm settune dmp_probe_idle_lun=off
2. Turn off statistic gathering functionality.
#vxdmpadm iostat stop

Notes:
1. Please apply this patch if system configuration has large number of CPU and if DMP performs considerably slower than OS device paths. For normal systems this issue is not applicable.

* 3736352 (Tracking ID: 3729078)

SYMPTOM:
In VVR environment, the panic may occur after SF(Storage Foundation) patch
installation or uninstallation on the secondary site.

DESCRIPTION:
VXIO Kernel reset invoked by SF patch installation removes all Disk Group
objects that have no preserved flag set, because the preserve flag is overlapped
with RVG(Replicated Volume Group) logging flag, the RVG object won't be removed,
but its rlink object is removed, result of system panic when starting VVR.

RESOLUTION:
Code changes have been made to fix this issue.

* 3778391 (Tracking ID: 3565212)

SYMPTOM:
While performing controller giveback operations on NetApp ALUA arrays, the
below messages are observed in /etc/vx/dmpevents.log

[Date]: I/O error occured on Path <path> belonging to Dmpnode <dmpnode>
[Date]: I/O analysis done as DMP_PATH_BUSY on Path <path> belonging to
Dmpnode
<dmpnode>
[Date]: I/O analysis done as DMP_IOTIMEOUT on Path <path> belonging to
Dmpnode
<dmpnode>

DESCRIPTION:
During the asymmetric access state transition, DMP puts the buffer pointer
in the delay queue based on the flags observed in the logs. This delay
resulted in timeout and thereby filesystem went into disabled state.

RESOLUTION:
DMP code is modified to perform immediate retries instead of putting the
buffer pointer in the delay queue for transition in progress case.

* 3790099 (Tracking ID: 3581646)

SYMPTOM:
Sometimes Logical Volumes may fail to migrate back to OS devices when Dynamic Multipathing (DMP) Native Support is disabled when the root is
mounted on LVM.

DESCRIPTION:
lvmetad caches open count on devices which are present in accept section of filter in lvm.conf file. When DMP Native Support is enabled, all
non-VxVM devices are put in reject section of filter so that only "/dev/vx/dmp" devices remain in accept section of filter in lvm.conf file.
So lvmetad caches open count on "/dev/vx/dmp" devices. When DMP Native Support is disabled "/dev/vx/dmp" devices are not put in reject
section of filter causing a stale open count for lvmetad which is causing physical volumes to point to stale devices even when DMP Native
Support is disabled.

RESOLUTION:
Code changes have been made to add "/dev/vx/dmp" devices in reject section of filter in lvm.conf file so lvmetad releases open count on
these devices.

* 3790115 (Tracking ID: 3623617)

SYMPTOM:
The command "vxdmpadm settune dmp_native_support=on" may fail with a below perl
error:

Can't locate Sys/Syslog.pm in @INC (@INC contains: /usr/local/lib64/perl5
/usr/local/share/perl5 /usr/lib64/perl5/vendor_perl
/usr/share/perl5/vendor_perl
/usr/lib64/perl5 /usr/share/perl5 .) at /usr/lib/vxvm/bin/vxupdatelvm line 73.
BEGIN failed--compilation aborted at /usr/lib/vxvm/bin/vxupdatelvm line 73.

DESCRIPTION:
In one of VxVM specific script used for enabling Dynamic Multipathing (DMP) Native
Support, perl syslog module is used. It can happen
sometimes that minimal package installation of OS might not contain perl syslog
module leading to this error.

RESOLUTION:
Code changes have been done to use internally developed perl module and avoid the
dependency.

* 3790117 (Tracking ID: 3776520)

SYMPTOM:
Filters are not updated properly in lvm.conf file in VxDMP initrd while DMP Native Support is being enabled. As a result, root Logical Volume
(LV) is mounted on OS device upon reboot.

DESCRIPTION:
From LVM version 105, global_filter was introduced as part of lvm.conf file. VxDMP updates initird lvm.conf file with the filters required for
DMP Native Support to function. While updating the lvm.conf, VxDMP checks for the filter field to be updated, but ideally we should check for
global_filter field to be updated in the latest LVM version. This leads to lvm.conf file not updated with the proper filters.

RESOLUTION:
The code is modified to properly update global_filter field in lvm.conf file in VxDMP initrd.

* 3795623 (Tracking ID: 3795622)

SYMPTOM:
With Dynamic Multi-Pathing (DMP) Native Support enabled, LVM global_filter is not updated properly in lvm.conf file to reject the newly added paths.

DESCRIPTION:
With DMP Native Support enabled, when new paths are added to existing LUNs, LVM global_filter is not updated properly in lvm.conf file to reject the newly added paths. This can lead to duplicate PV (physical volumes) found error reported by LVM commands.

RESOLUTION:
The code is modified to properly update global_filter field in lvm.conf file when new paths are added to existing disks.

* 3800388 (Tracking ID: 3581264)

SYMPTOM:
VxVM package uninstallation succeeds even When DMP (Dynamic Multipathing) Native
Support is on and IO's are in progress on LV (Logical Volume).

DESCRIPTION:
While uninstalling the VxVM package, no precaution is taken to log an error
message if DMP native support could not be successfully disabled.
As a result, the uninstallation proceeds without any error message.

RESOLUTION:
Code changes have been made to display an error message if DMP native support
could not be disabled during uninstallation. Additionally, if the
rootvg is enabled, the uninstallation will proceed with an error message;
otherwise uninstallation will fail.

* 3800421 (Tracking ID: 3762580)

SYMPTOM:
In Linux kernels greater than or equal to RHEL6.6 (e.g. RHEL7 and SLES11SP3), the vxfen module fails to register the SCSI-3 PR keys to EMC devices when powerpath co-exists
with DMP (Dynamic Multi-Pathing). The following logs are printed while  setting up fencing for the cluster.

VXFEN: vxfen_reg_coord_pt: end ret = -1
vxfen_handle_local_config_done: Could not register with a majority of the
coordination points.

DESCRIPTION:
In Linux kernels greater than or equal to RHEL6.6 (e.g. RHEL7 and SLES11SP3), the interface used by DMP to send the SCSI commands to block devices does not transfer the
data to or from the device. Therefore, the SCSI-3 PR keys do not get registered.

RESOLUTION:
The code is modified to use SCSI request_queue to send the SCSI commands to the
underlying block device.
Additional patch is required from EMC to support processing SCSI commands via the request_queue mechanism on EMC PowerPath devices. Please contact EMC for patch details
for a specific kernel version.

* 3808593 (Tracking ID: 3795788)

SYMPTOM:
Performance degradation is seen when many application sessions open the same data file on Veritas Volume Manager (VxVM) volume.

DESCRIPTION:
This issue occurs because of the lock contention. When many application sessions open the same data file on the VxVM volume,  the exclusive lock is occupied on all CPUs. If there are a lot of CPUs in the system, this process could be quite time- consuming, which leads to performance degradation at the initial start of applications.

RESOLUTION:
The code is modified to change the exclusive lock to the shared lock when  the data file on the volume is open.

* 3812272 (Tracking ID: 3811946)

SYMPTOM:
When invoking "vxsnap make" command with cachesize option to create space optimized snapshot, the command succeeds but the following error message is displayed in syslog:

kernel: VxVM vxio V-5-0-603 I/O failed.  Subcache object <subcache-name> does
not have a valid sdid allocated by cache object <cache-name>.
kernel: VxVM vxio V-5-0-1276 error on Plex <plex-name> while writing volume
<volume-name> offset 0 length 2048

DESCRIPTION:
When space optimized snapshot is created using "vxsnap make" command along with cachesize option, cache and subcache objects are created by the same command. During the creation of snapshot, I/Os from the volumes may be pushed onto a subcache even though the subcache ID has not yet been allocated. As a result, the I/O fails.

RESOLUTION:
The code is modified to make sure that I/Os on the subcache are
pushed only after the subcache ID has been allocated.

* 3852147 (Tracking ID: 3852146)

SYMPTOM:
Shared DiskGroup fails to import when "-c" and "-o noreonline" options are
specified together with the below error:

VxVM vxdg ERROR V-5-1-10978 Disk group <dgname>: import failed:
Disk for disk group not found

DESCRIPTION:
When "-c" option is specified we update the DISKID and DGID of the disks in
the DG. When the
information about the disks in the DG is passed to Slave node, slave node
does not
have the latest information since the online of the disks would not happen
because of "-o noreonline" being specified. Now since slave node does not
have
the latest
information, it would not be able to identify proper disks belonging to the
DG
which leads to DG import failing with "Disk for disk group not found".

RESOLUTION:
Code changes have been done to handle the working of "-c" and "-o
noreonline"
together.

* 3852800 (Tracking ID: 3819670)

SYMPTOM:
When smartmove with 'vxevac' command is run in background by hitting 'ctlr-z' key and 'bg' command, the execution of 'vxevac' is terminated abruptly.

DESCRIPTION:
As part of "vxevac" command for data movement, VxVM submits the data as a task in the kernel, and use select() primitive on the task file descriptor to wait for task finishing events to arrive. When "ctlr-z" and bg is used to run vxevac in background, the select() returns -1 with errno EINTR. VxVM wrongly interprets it as user termination action and hence vxevac is terminated.
Instead of terminating vxevac, the select() should be retried untill task completes.

RESOLUTION:
The code is modified so that when select() returns with errno EINTR, it checks whether vxevac task is finished. If not, the select() is retried.

* 3854390 (Tracking ID: 3853049)

SYMPTOM:
On a server with more number CPUs, the stats output of vxstat is delayed beyond
the set interval. Also multiple sessions of vxstat is impacting the IO
performance.

DESCRIPTION:
The vxstat acquires an exclusive lock on each CPU in order to gather the stats.
This would affect the consolidation and display of stats in an environment with
huge number of CPUs and disks. The output of stats for interval of 1 second can
get delayed beyond the set interval. Also the acquisition of lock happens in
the IO path which would affect the IO performance due contention of these locks.

RESOLUTION:
The code modified to remove the exclusive spin lock.

* 3854526 (Tracking ID: 3544020)

SYMPTOM:
Volume Manager tunables are getting reset to default after patch upgade

DESCRIPTION:
The file "/kernel/drv/vxio.conf" which is used to store all the tunable values is
getting replaced with the default template during patch upgrade.
This leads to values of all the tunables getting reset to default after patch
upgrade.

RESOLUTION:
Changes are done to preserve the file "/kernel/drv/vxio.conf" as part of patch
upgrade.

* 3856384 (Tracking ID: 3525490)

SYMPTOM:
In VVR (Veritas Volume Replication) environment, system panic with the following
stack,

#0 crash_kexec at ffffffff800b1509
#1 __die at ffffffff80065137
#2 do_page_fault at ffffffff80067430
#3 error_exit at ffffffff8005ddf9
    [exception RIP: nmcom_wait_msg+449]
#4 nmcom_server_proc at ffffffff889f04b6 [vxio]

DESCRIPTION:
While VVR sending messages with UDP, it uses different data structure with TCP. In
case of receiving partial data, the message is explicitly typecast to the same
structure for both TCP and UDP. This will cause some pointers in UDP message header
become invalid and panic the system while copying data.

RESOLUTION:
Code changes have been made to fix the problem.

* 3857121 (Tracking ID: 3857120)

SYMPTOM:
After stopping Oracle database process, this process does not run Oracle
database again but it hangs there.

DESCRIPTION:
Two threads from VxVM(Veritas Volume Manager) trying to asynchronously
manipulate the per-cpu data structures for I/O counting, which causes to a race
condition that might make I/O count leaking, hence the volume cant be closed
and results in the process hang.

RESOLUTION:
VxVM code has been changed to avoid the race condition happening.

* 3859012 (Tracking ID: 3859009)

SYMPTOM:
pvs command will show the duplicate PV messages since global_filter of
lvm.conf is not updated after fiber switch or storage controller get rebooted.

DESCRIPTION:
When fiber switch or storage controller reboot, some paths dev No. may get reused
during DDL reconfig cycle, in this case VxDMP(Veritas Dynamic Multi-Pathing) wont
treat them as newly added devices. For those devices belong to LVM dmpnode, VxDMP
will not trigger lvm.conf update for them. As a result, the global_filter of
lvm.conf will not be updated. Hence the issue.

RESOLUTION:
The code has been changed to update lvm.conf correctly.

* 3861494 (Tracking ID: 3557636)

SYMPTOM:
Following errors may be seen with probable data corruption:
A serial number mismatch may be logged in syslog:
VxVM vxdmp V-5-3-1970 dmp_verify_devid:Graceful DR steps are not followed by the
user on the path <DeviceMajor/DeviceMinor>. The device with old serial number
<SerialNo1> is replaced with a new device with serial number <SerialNo2>

Or udev threads may get timed out or killed with following messages:
systemd-udevd: worker [ThreadId]
/devices/pci0000:00/0000:00:06.0/0000:05:00.0/host7/rport-7:0-
2/target7:0:1/7:0:1:6/block/sdaj
timeout; kill it
systemd-udevd: seq 21010
'/devices/pci0000:00/0000:00:06.0/0000:05:00.0/host7/rport-7:0-
2/target7:0:1/7:0:1:6/block/sdaj'
killed

DESCRIPTION:
In Linux, VxVM uses udev framework to listen to disk add/remove events. ESD (Event
source daemon) updates the VxVM and DMP configuration based on these events.
As the udev-rules are executed serially, the processing of udev rule for VxVM may
get delayed, in turn device add or remove events would be delivered late to ESD.
For instance, after disk is removed, if udev-remove event gets delayed, then path
may not be disabled in DMP kernel immediately; In addition, udev threads may be
timed out or killed.
Under both situation, IO might be issued to incorrect disk, which may result in
probable data corruption.

RESOLUTION:
In order to ensure udev events are processed quickly, ESD uses libudev library to
listen to disk add or remove events.

* 3864093 (Tracking ID: 3825467)

SYMPTOM:
Error showing about symbol d_lock.

DESCRIPTION:
This symbol is already present in this kernel, so its duplicate
getting declared in kernel code of VxVM.

RESOLUTION:
Code is modified to remove the definition, hence solved the issue

* 3864112 (Tracking ID: 3860503)

SYMPTOM:
Poor performance of vxassist mirroring is observed compared to using raw dd
utility to do mirroring .

DESCRIPTION:
There is huge lock contention on high end server with large number of cpus,
because doing copy on each region needs to obtain some unnecessary cpu locks.

RESOLUTION:
VxVM code has been changed to decrease the lock contention.

* 3864480 (Tracking ID: 3823283)

SYMPTOM:
Linux operating system sticks in grub after reboot. Manual kernel load is
required to make operating system functional.

DESCRIPTION:
During unencapsulation of a boot disk in SAN environment, multiple entries
corresponding to root disk are found in by-id device directory. As a
result, a parse command fails, leading to the creation of an improper menu
file in grub directory. This menu file defines the device path to load
kernel and other modules.

RESOLUTION:
The code is modified to handle multiple entries for SAN boot disk.

* 3864625 (Tracking ID: 3599977)

SYMPTOM:
During a replica connection, referencing a port that is already deleted in another thread causes a system panic with a similar stack trace as below:
.simple_lock()
soereceive()
soreceive()
.kernel_add_gate_cstack()
kmsg_sys_rcv()
nmcom_get_next_mblk()
nmcom_get_next_msg()
nmcom_wait_msg_tcp()
nmcom_server_proc_tcp()
nmcom_server_proc_enter()
vxvm_start_thread_enter()

DESCRIPTION:
During a replica connection, a port is created before increasing the count. This is to protect the port from getting deleted. However, another thread deletes the port before the count is increased and after the port is created.
While the replica connection thread proceeds, it refers to the port that is already deleted, which causes a NULL pointer reference and a system panic.

RESOLUTION:
The code is modified to prevent asynchronous access to the count that is associated with the port by means of locks.

* 3864628 (Tracking ID: 3686698)

SYMPTOM:
vxconfigd was getting hung due to deadlock between two threads

DESCRIPTION:
Two threads were waiting for same lock causing deadlock between
them. This will lead to block all vx commands.
untimeout function will not return until pending callback is cancelled  (which
is set through timeout function) OR pending callback has completed its
execution (if it has already started). Therefore locks acquired by callback
routine should not be held across call to untimeout routine or deadlock may
result.

Thread 1:
    untimeout_generic()
    untimeout()
    voldio()
    volsioctl_real()
    fop_ioctl()
    ioctl()
    syscall_trap32()

Thread 2:
    mutex_vector_enter()
    voldsio_timeout()
    callout_list_expire()
    callout_expire()
    callout_execute()
    taskq_thread()
    thread_start()

RESOLUTION:
Code changes have been made to call untimeout outside the lock
taken by callback handler.

* 3864980 (Tracking ID: 3594158)

SYMPTOM:
The system panics on a VVR secondary node with the following stack trace:
.simple_lock()
soereceive()
soreceive()
.kernel_add_gate_cstack()
kmsg_sys_rcv()
nmcom_get_next_mblk()
nmcom_get_next_msg()
nmcom_wait_msg_tcp()
nmcom_server_proc_tcp()
nmcom_server_proc_enter()
vxvm_start_thread_enter()

DESCRIPTION:
You may issue a spinlock or unspinlock to the replica to check whether to use a checksum in the received packet. During the lock or unlock operation, if there is a transaction that is being processed with the replica, which rebuilds the replica object in the kernel, then there is a possibility that the replica referenced in spinlock is different than the one which the replica has referenced in unspinlock (especially when the replica is referenced through several pointers). As a result, the system panics.

RESOLUTION:
The code is modified to set the flag in the port attribute to indicate whether to use the checksum during a port creation. Hence, for each packet that is received, you only need to check the flag in the port attribute rather than referencing it to the replica object. As part of the change, the spinlock or unspinlock statements are also removed.

* 3864986 (Tracking ID: 3621232)

SYMPTOM:
When the vradmin ibc command is executed to initiate the In-band Control (IBC) procedure, the vradmind (VVR daemon) on VVR secondary node goes into the disconnected state. Due to which, the following IBC procedure or vradmin ibc commands cannot be started or executed on VVRs secondary node and a message similar to the following appears on  VVRs primary node:

VxVM VVR vradmin ERROR V-5-52-532 Secondary is undergoing a state transition. Please re-try the command after some time.
VxVM VVR vradmin ERROR V-5-52-802 Cannot start command execution on Secondary.

DESCRIPTION:
When IBC procedure runs into the commands finish state, the vradmin on VVR secondary node goes into a disconnected state, which the vradmind on primary node fails to realize.
In such a scenario, the vradmind on primary refrains from sending a handshake request to the secondary node which can change the primary nodes state from disconnected to running. As a result, the vradmind in the primary node continues to be in the disconnected state and the vradmin ibc command fails to run on the VVR secondary node despite of being in the running state on the VVR primary node.

RESOLUTION:
The code is modified to make sure the vradmind on VVR primary node is notified while it goes into the disconnected state on VVR secondary node. As a result, it can send out a handshake request to take the secondary node out of the disconnected state.

* 3864987 (Tracking ID: 3513392)

SYMPTOM:
secondary panics when rebooted while heavy IOs are going on primary

PID: 18862  TASK: ffff8810275ff500  CPU: 0   COMMAND: "vxiod"
#0 [ffff880ff3de3960] machine_kexec at ffffffff81035b7b
#1 [ffff880ff3de39c0] crash_kexec at ffffffff810c0db2
#2 [ffff880ff3de3a90] oops_end at ffffffff815111d0
#3 [ffff880ff3de3ac0] no_context at ffffffff81046bfb
#4 [ffff880ff3de3b10] __bad_area_nosemaphore at ffffffff81046e85
#5 [ffff880ff3de3b60] bad_area_nosemaphore at ffffffff81046f53
#6 [ffff880ff3de3b70] __do_page_fault at ffffffff810476b1
#7 [ffff880ff3de3c90] do_page_fault at ffffffff8151311e
#8 [ffff880ff3de3cc0] page_fault at ffffffff815104d5
#9 [ffff880ff3de3d78] volrp_sendsio_start at ffffffffa0af07e3 [vxio]
#10 [ffff880ff3de3e08] voliod_iohandle at ffffffffa09991be [vxio]
#11 [ffff880ff3de3e38] voliod_loop at ffffffffa0999419 [vxio]
#12 [ffff880ff3de3f48] kernel_thread at ffffffff8100c0ca

DESCRIPTION:
If the replication stage IOs are started after serialization of the replica volume,
replication port  could be deleted  and set to NULL during handling the replica
connection changes,  this will cause the panic since we have not checked if the
replication port is still valid before referencing to it.

RESOLUTION:
Code changes have been done to abort the stage IO if replication port is NULL.

* 3864988 (Tracking ID: 3625890)

SYMPTOM:
After running the vxdisk resize command, the following message is displayed:
"VxVM vxdisk ERROR V-5-1-8643 Device <disk name> resize failed: Invalid
attribute specification"

DESCRIPTION:
Two reserved cylinders for special usage for CDS (Cross-platform Data Sharing)
VTOC(Volume Table of Contents) disks. In case of expanding a disk with
particular disk size on storage side, VxVM(Veritas Volume Manager) may calculate
the cylinder number as 2, which causes the vxdisk resize fails with the error
message of "Invalid attribute specification".

RESOLUTION:
The code is modified to avoid the failure of resizing a CDS VTOC disk.

* 3864989 (Tracking ID: 3521726)

SYMPTOM:
When using Symantec Replication Option, system panic happens while freeing
memory with the following stack trace on AIX,

pvthread+011500 STACK:
[0001BF60]abend_trap+000000 ()
[000C9F78]xmfree+000098 ()
[04FC2120]vol_tbmemfree+0000B0 ()
[04FC2214]vol_memfreesio_start+00001C ()
[04FCEC64]voliod_iohandle+000050 ()
[04FCF080]voliod_loop+0002D0 ()
[04FC629C]vol_kernel_thread_init+000024 ()
[0025783C]threadentry+00005C ()

DESCRIPTION:
In certain scenarios, when a write IO gets throttled or un-winded in VVR, we
free the memory related to one of our data structures. When we restart this IO,
the same memory gets illegally accessed and freed again even though it was
freed.It causes system panic.

RESOLUTION:
Code changes have been done to fix the illegal memory access issue.

* 3865411 (Tracking ID: 3665644)

SYMPTOM:
The system panics with the following stack trace due to an invalid page pointer in the Linux bio structure:
crash_kexec()
die()
do_page_fault()
error_exit()
blk_recount_segments()
bio_phys_segments()
init_request_from_bio()
make_request()
generic_make_request()
gendmpstrategy()
generic_make_request()
dmp_indirect_io()
dmpioctl()
dmp_ioctl()
dmp_compat_ioctl()

DESCRIPTION:
A falsified page pointer is returned when Dynamic Muti-Pathing (DMP) allocates memory by calling the Linux vmalloc() function and maps the allocated virtual address to the physical page in the  back trace.
The issue is observed because DMP refrains from calling the appropriate Linux kernel API leading to a system panic.

RESOLUTION:
The code is modified to call the correct Linux kernel API while DMP maps the virtual address, which the vmalloc() function allocates to the physical page.

* 3865415 (Tracking ID: 3736502)

SYMPTOM:
When FMR is configured in VVR environment, 'vxsnap refresh' fails with below
error message:
"VxVM VVR vxsnap ERROR V-5-1-10128 DCO experienced IO errors during the
operation. Re-run the operation after ensuring that DCO is accessible".
Also, multiple messages of connection/disconnection of replication
link(rlink) are seen.

DESCRIPTION:
Inherently triggered rlink connection/disconnection causes the transaction
retries. During transaction, memory is allocated for Data Change Object(DCO)
maps and is not cleared on abortion of a transaction.
This leads to a problem of memory leak and eventually to exhaustion of maps.

RESOLUTION:
The fix has been added to clear the allocated DCO maps when transaction
aborts.

* 3865631 (Tracking ID: 3564260)

SYMPTOM:
VVR commands are unresponsive when replication is paused and resumed in a loop.

DESCRIPTION:
While Veritas Volume Replicator (VVR) is in the process of sending updates then pausing a replication is deferred until acknowledgements of updates are received or until an error occurs. For some reason, if the acknowledgements get delayed or the delivery fails, the pause operation continues to get deferred resulting in unresponsiveness.

RESOLUTION:
The code is modified to resolve the issue that caused unresponsiveness.

* 3865633 (Tracking ID: 3721565)

SYMPTOM:
vxconfigd hang is seen with below stack.
genunix:cv_wait_sig_swap_core
genunix:cv_wait_sig_swap
genunix:pause
unix:syscall_trap32

DESCRIPTION:
In FMR environment, write is done on a source volume having space-optimized(SO)
snapshot. Memory is acquired first and then ILOCKs are acquired on individual SO
volumes for pushed writes. On the other hand, a user write on SO snapshot will first
acquire ILOCK and then acquire memory. This causes deadlock.

RESOLUTION:
Code is modified to resolve deadlock.

* 3865640 (Tracking ID: 3749557)

SYMPTOM:
System hangs and becomes unresponsive because of heavy memory consumption by
vxvm.

DESCRIPTION:
In the Dirty Region Logging(DRL) update code path an erroneous condition was
present that lead to an infinite loop which keeps on consuming memory. This leads
to consumption of large amounts of memory making the system unresponsive.

RESOLUTION:
Code has been fixed, to avoid the infinite loop, hence preventing the hang which
was caused by high memory usage.

* 3865653 (Tracking ID: 3795739)

SYMPTOM:
In a split brain scenario, cluster formation takes very long time.

DESCRIPTION:
In a split brain scenario, the surviving nodes in the cluster try to preempt the keys of nodes leaving the cluster. If the keys have been already preempted by one of the surviving nodes, other surviving nodes will receive UNIT Attention. DMP (Dynamic Multipathing) then retries the preempt command after a delayof 1 second if it receives Unit attention. Cluster formation cannot complete untill PGR keys of all the leaving nodes are removed from all the disks. If the number of disks are very large, the preemption of keys takes a lot of time, leading to the very long time for cluster formation.

RESOLUTION:
The code is modified to avoid adding delay for first couple of retries when reading PGR keys. This allows faster cluster formation with arrays that clear the Unit Attention condition sooner.

* 3866241 (Tracking ID: 3790136)

SYMPTOM:
File system hang can be observed sometimes due to IO's hung in DRL.

DESCRIPTION:
There might be some IO's hung in DRL of mirrored volume due to incorrect
calculation of outstanding IO's on volume and number of active IO's which are
currently in progress on DRL. The value of the outstanding IO on volume can get
modified incorrectly leading to IO's on DRL not to progress further which in
turns results in a hang kind of scenario.

RESOLUTION:
Code changes have been done to avoid incorrect modification of value of
outstanding IO's on volume and prevent the hang.

* 3866595 (Tracking ID: 3866051)

SYMPTOM:
After we load a driver with name over 32 bytes in kernel, we will not be able to
restart
vxconfigd.

DESCRIPTION:
In Kernel, if we have any driver with name over 32 bytes AND when we restart
vxconfigd. Then due to
a defect in our code about size of driver name we accept, the process stack will
be corrupted. Hence, vxconfigd becomes unable to startup.

RESOLUTION:
Code changes are made to fix the memory corruption issue.

* 3866643 (Tracking ID: 3539548)

SYMPTOM:
Adding MPIO(Multi Path I/O) disk that had been removed earlier may result in
following two issues:
1. 'vxdisk list' command shows duplicate entry for DMP (Dynamic Multi-Pathing)
node with error state.
2. 'vxdmpadm listctlr all' command shows duplicate controller names.

DESCRIPTION:
1. Under certain circumstances, deleted MPIO disk record information is left in
/etc/vx/disk.info file with its device number as -1 but its DMP node name is
reassigned to other MPIO disk. When the deleted disk is added back, it is
assigned the same name, without validating for conflict in the name.
2. When some devices are removed and added back to the system, we are adding a
new controller for each and every path that we have discovered. This leads to
duplicated controller entries in DMP database.

RESOLUTION:
1. Code is modified to properly remove all stale information about any disk
before updating MPIO disk names.
2. Code changes have been made to add the controller for selected paths only.

* 3866651 (Tracking ID: 3596282)

SYMPTOM:
FMR (Fast Mirror Resync) operations fail with error "Failed to allocate a new
map due to no free map available in DCO".

"vxio: [ID 609550 kern.warning] WARNING: VxVM vxio V-5-3-1721
voldco_allocate_toc_entry: Failed to allocate a new map due to no free map
available in DCO of [volume]"

It often leads to disabling of the snapshot.

DESCRIPTION:
For instant space optimized snapshots, stale maps are left behind for DCO (Data
Change Object) objects at the time of creation of cache objects. So, over the
time if space optimized snapshots are created that use a new cache object,
stale maps get accumulated, which eventually consume all the available DCO
space, resulting in the error.

RESOLUTION:
Code changes have been done to ensure no stale entries are left behind.

* 3866675 (Tracking ID: 3840359)

SYMPTOM:
On using localized messages, some VxVM commands fail while executing vxrootadm. The error message is as follows:
VxVM vxmkrootmir ERROR V-5-2-3943 The Master Boot Record (MBR) could not be copied to the root disk mirror.To manually install it, follow the procedures in the VxVM Boot Disk Recovery chapter of the VxVM Trouble Shooting Guide.

DESCRIPTION:
The issue occurs when the output of the sfdisk command appears in the localized format. When the output is not translated into English language, a mismatch of messages is observed and command fails.

RESOLUTION:
The code is modified to convert the output of necessary commands in the scripts into English language before comparing it with the expected output.

* 3867134 (Tracking ID: 3486861)

SYMPTOM:
Primary node panics with below stack when storage is removed while replication is
going on with heavy IOs.
Stack:
oops_end
no_context
page_fault
vol_rv_async_done
vol_rv_flush_loghdr_done
voliod_iohandle
voliod_loop

DESCRIPTION:
In VVR environment, when write to data volume failson primary node, error handling
is initiated. As a part
of it, SRL header will be flushed. As primary storage is removed, flushing will
fail. Panic will be hit as
invalid values will be accessed while logging error message.

RESOLUTION:
Code is modified to resolve the issue.

* 3867137 (Tracking ID: 3674614)

SYMPTOM:
Restarting the vxconfigd(1M) daemon on the slave (joiner) node during node-join
operation  may cause the vxconfigd(1M) daemon to become unresponsive on the
master and the joiner node.

DESCRIPTION:
When vxconfigd(1M) daemon is restarted in the middle of the node join process,
it wrongly assumes that this is a slave rejoin case, and sets the rejoin flag.
Since the rejoin flag is wrongly set, the import operation of the disk groups on
the slave node fails, and the join process is not terminated smoothly. As a
result, the vxconfigd(1M) daemon becomes unresponsive on the master and the
slave node.

RESOLUTION:
The code is modified to differentiate between the rejoin scenario and the
vxconfigd(1M) daemon restart scenario.

* 3867315 (Tracking ID: 3672759)

SYMPTOM:
When a DMP database is corrupted, the vxconfigd(1M) daemon may core dump with the following stack trace:
database is corrupted.
  ddl_change_dmpnode_state ()
  ddl_data_corruption_msgs ()
  ddl_reconfigure_all ()
  ddl_find_devices_in_system ()
  find_devices_in_system ()
  req_change_state ()
  request_loop ()
  main ()

DESCRIPTION:
The issue is observed because the corrupted DMP database is not properly destroyed.

RESOLUTION:
The code is modified to remove the corrupted DMP database.

* 3867706 (Tracking ID: 3645370)

SYMPTOM:
After running the vxevac command, if the user tries to rollback or commit the evacuation for a disk containing DRL plex, the action fails with the following errors:

/etc/vx/bin/vxevac -g testdg  commit testdg02 testdg03
VxVM vxsd ERROR V-5-1-10127 deleting plex %1:
        Record is associated
VxVM vxassist ERROR V-5-1-324 fsgen/vxsd killed by signal 11, core dumped
VxVM vxassist ERROR V-5-1-12178 Could not commit subdisk testdg02-01 in
volume testvol
VxVM vxevac ERROR V-5-2-3537 Aborting disk evacuation

/etc/vx/bin/vxevac -g testdg rollback testdg02 testdg03
VxVM vxsd ERROR V-5-1-10127 deleting plex %1:
        Record is associated
VxVM vxassist ERROR V-5-1-324 fsgen/vxsd killed by signal 11, core dumped
VxVM vxassist ERROR V-5-1-12178 Could not rollback subdisk testdg02-01 in
volume
testvol
VxVM vxevac ERROR V-5-2-3537 Aborting disk evacuation

DESCRIPTION:
When the user uses the vxevac command, new plexes are created on the target disks. Later,  during the commit or roll back operation, VxVM deletes the plexes on the source or the target disks.
For deleting a plex, VxVM should delete its sub disks first, otherwise the plex deletion fails with the following error message:
VxVM vxsd ERROR V-5-1-10127 deleting plex %1:
        Record is associated
The error is displayed because the code does not handle the deletion of subdisks of plexes marked for DRL (dirty region logging) correctly.

RESOLUTION:
The code is modified to handle evacuation of disks with DRL plexes correctly.  .

* 3867709 (Tracking ID: 3767531)

SYMPTOM:
In Layered volume layout with FSS configuration, when few of the
FSS_Hosts are rebooted, Full resync is happening for non-affected disks on
master.

DESCRIPTION:
In configuration, where there are multiple FSS-Hosts, with
layered volume created on the hosts. When the slave nodes are rebooted , few
of the
sub-volumes of non-affected disks are fully getting synced on master.

RESOLUTION:
Code-changes have been made to sync only needed part of sub-
volume.

* 3867710 (Tracking ID: 3788644)

SYMPTOM:
When DMP (Dynamic Multi-Pathing) native support enabled for Oracle ASM
environment, if we constantly adding and removing DMP devices, it will cause error
like:
/etc/vx/bin/vxdmpraw enable oracle dba 775 emc0_3f84
VxVM vxdmpraw INFO V-5-2-6157
Device enabled : emc0_3f84
Error setting raw device (Invalid argument)

DESCRIPTION:
There is a limitation (8192) of maximum raw device number N (exclusive) of
/dev/raw/rawN. This limitation is defined in boot configuration file. When binding a
raw
device to a dmpnode, it uses /dev/raw/rawN to bind the dmpnode. The rawN is
calculated by one-way incremental process. So even if we unbind the device later on,
the "released" rawN number will not be reused in the next binding. When the rawN
number is increased to exceed the maximum limitation, the error will be reported.

RESOLUTION:
Code has been changed to always use the smallest available rawN number instead of
calculating by one-way incremental process.

* 3867711 (Tracking ID: 3802750)

SYMPTOM:
Once VxVM (Veritas Volume Manager) volume I/O-shipping functionality is turned on, it is not getting disabled even after the user issues the correct command to disable it.

DESCRIPTION:
VxVM (Veritas Volume Manager) volume I/O-shipping functionality is turned off by default. The following two commands can be used to turn it on and off:
    vxdg -g <dgname> set ioship=on
    vxdg -g <dgname> set ioship=off

The command to turn off I/O-shipping is not working as intended because I/O-shipping flags are not reset properly.

RESOLUTION:
The code is modified to correctly reset I/O-shipping flags when the user issues the CLI command.

* 3867712 (Tracking ID: 3807879)

SYMPTOM:
Writing the backup EFI GPT disk label during the disk-group flush
operation may cause data corruption on volumes in the disk group. The backup
label could incorrectly get flushed to the disk public region and overwrite the
user data with the backup disk label.

DESCRIPTION:
For EFI disks initialized under VxVM (Veritas Volume Manager), it
is observed that during a disk-group flush operation, vxconfigd (veritas
configuration daemon) could stop writing the EFI GPT backup label to the volume
public region, thereby causing user data corruption. When this issue happens,
the real user data are replaced with the backup EFI disk label

RESOLUTION:
The code is modified to prevent the writing of the EFI GPT backup
label during the VxVM disk-group flush operation.

* 3867714 (Tracking ID: 3819832)

SYMPTOM:
No syslog message seen when dmp detects controller disabled/enabled

DESCRIPTION:
Whenever there is action taken from Storage whether addtion or
removal, DMP detects the events of failure or notifications of additions, and
as
prt of discovery it disables / enables the controller and message for this was
not getting logged in syslog file.

RESOLUTION:
Code-changes made to show the syslog message for controller
disable/enable event.

* 3867881 (Tracking ID: 3867145)

SYMPTOM:
When VVR SRL occupation > 90%, then output the SRL occupation is shown by 10
percent.

DESCRIPTION:
This is kind of enhancement, to show the SRL Occupation when it's more than
90% is previously shown with 10 percentage gap.
Here the enhancement is to show the logs with 1 percentage granularity.

RESOLUTION:
Changes are done to show the syslog messages wih 1 percent granularity, when
SRL is filled > 90%.

* 3867925 (Tracking ID: 3802075)

SYMPTOM:
Disks having digit in its name and which are added as foreign path using "vxddladm
addforeign" goes into ERROR state
after running vxdisk scandisks.

DESCRIPTION:
When the disk is added as foreign using 'vxddladm addforeign' , and
after performing device-discovery
using vxdisk scandisks, we use disk whole disk name, which is not the exact name of
the disk. When digits are added in
the name of the disk using udev rule, we now use the actual name of disk instead of
whole disk name.

RESOLUTION:
The code is modified to use the exact disk-device name which adds the
foreign disk successfully.

* 3869659 (Tracking ID: 3868444)

SYMPTOM:
Disk header timestamp is updated even if the disk group import fails.

DESCRIPTION:
While doing dg import operation, during join operation disk header timestamps are updated. This makes difficult for
support to understand which disk is having latest config copy if dg import is failed and decision is to be made if
force dg import is safe or not.

RESOLUTION:
Dump the old disk header timestamp and sequence number in the syslog which can be referred on deciding if force dg
import would be safe or not

* 3874736 (Tracking ID: 3874387)

SYMPTOM:
Disk header information is not logged to the syslog sometimes
even if the disk is missing and dg import fails.

DESCRIPTION:
In scenarios where disk has config copy enabled
and get active disk record, then disk header information was not getting
logged even though the disk is missing thereafter dg import fails.

RESOLUTION:
Dump the disk header information even if the disk record is
active and attached to the disk group.

* 3874961 (Tracking ID: 3871750)

SYMPTOM:
In parallel VxVM(Veritas Volume Manager) vxstat commands report abnormal
disk IO statistic data. Like below:
# /usr/sbin/vxstat -g <dg name> -u k -dv -i 1 -S
......
dm  emc0_2480                       4294967210 4294962421           -382676k
4294967.38 4294972.17
......

DESCRIPTION:
After VxVM IO statistics was optimized for huge CPUs and disks, there's a
race condition when multiple vxstat commands are running to collect disk IO
statistic data. It causes disk's latest IO statistic value become smaller
than previous one, hence VxVM treates the value overflow so that abnormal
large IO statistic value is printed.

RESOLUTION:
Code changes are done to eliminate such race condition.

* 3875113 (Tracking ID: 3816219)

SYMPTOM:
VxDMP  (Veritas Dynamic Multi-Pathing) event source daemon (vxesd) keeps
reporting a lot of messages in syslog as below:
"vxesd: Device sd*(*/*) is changed"

DESCRIPTION:
The vxesd daemon registers with the UDEV framework and keeps VxDMP up-to-date
with devices' status. Due to some change at device, vxesd keeps reporting this
kind of change-event
listened by udev. VxDMP only cares about "add" and "remove" UDEV events. For
UDEV "change" event, we can avoid logging for these events to VxDMP.

RESOLUTION:
The code is modified to stop logging UDEV change-event related messages in
syslog.

* 3875161 (Tracking ID: 3860184)

SYMPTOM:
vxdmpadm settune dmp_native_support=off succeeds even when I/O's are in progress
on LV (Logical Volume).

DESCRIPTION:
While turning off the DMP native support, no precaution is taken to log an error
message if DMP native support could not be successfully disabled. As a result,
even when the I/Os are in progress DMP Native support is turned off without any
error.

RESOLUTION:
Code changes have been made to display an error message if DMP native support
could not be disabled. Due to fix following error messages can be seen.
# vxdmpadm settune dmp_native_support=off
VxVM vxdmpadm ERROR V-5-1-15690 Operation failed for one or more volume groups
VxVM vxdmpadm ERROR V-5-1-15686 The following vgs could not be migrated as they
are in use -
VxVM vxdmpadm ERROR V-5-1-15686 The following vgs could not be migrated due to
unknown error -

* 3875230 (Tracking ID: 3554608)

SYMPTOM:
Mirroring a volume on 6.1 creates a larger plex than the original.

DESCRIPTION:
when mirror a volume, VXVM should ignore the disk alignment and use the DG(Disk
Group) alignment. After VXVM got all the configuration from user command, it
didn't check whether disk alignment will be used. Since disk alignment isn't
ignored, the length of the mirror will be based on disk alignment. Hence the
issue.

RESOLUTION:
Code changes have been made to use the DG alignment instead of disk alignment
when mirror a volume.

* 3875564 (Tracking ID: 3875563)

SYMPTOM:
While dumping the disk header information, human readable timestamp was
not converted correctly from corresponding epoch time.

DESCRIPTION:
When disk group import fails if one of the disk is missing while
importing the disk group, it will dump the disk header information the syslog.
But, human readable time stamp was not getting converted correctly from
corresponding epoch time.

RESOLUTION:
Code changes done to dump disk header information correctly.

* 3632970 (Tracking ID: 3631230)

SYMPTOM:
VRTSvxvm patch version 6.0.5 and 6.1.1 will not work with RHEL6.6 update.

# rpm -ivh VRTSvxvm-6.1.1.000-GA_RHEL6.x86_64.rpm
Preparing...                ########################################### [100%]
   1:VRTSvxvm               ########################################### [100%]
Installing file /etc/init.d/vxvm-boot
creating VxVM device nodes under /dev
WARNING: No modules found for 2.6.32-494.el6.x86_64, using compatible modules
for
2.6.32-71.el6.x86_64.
FATAL: Error inserting vxio (/lib/modules/2.6.32-
494.el6.x86_64/veritas/vxvm/vxio.ko): Unknown symbol in module, or unknown
parameter
(see dmesg) ERROR: modprobe error for vxio. See documentation.
warning: %post(VRTSvxvm-6.1.1.000-GA_RHEL6.x86_64) scriptlet failed, exit
status 1

#

Or after OS update, the system log file will have the following messages logged.

vxio: disagrees about version of symbol poll_freewait
vxio: Unknown symbol poll_freewait
vxio: disagrees about version of symbol poll_initwait
vxio: Unknown symbol poll_initwait

DESCRIPTION:
Installation of VRTSvxvm patch version 6.0.5 and 6.1.1 fails on RHEL6.6 due to
the changes in poll_initwait() and poll_freewait() interfaces.

RESOLUTION:
The VxVM package has re-compiled with RHEL6.6 build environment.

* 3372831 (Tracking ID: 2573229)

SYMPTOM:
On RHEL6, the server panics when Dynamic Multi-Pathing (DMP) executes
PERSISTENT RESERVE IN command with REPORT CAPABILITIES service action on
powerpath controlled device. The following stack trace is displayed:

enqueue_entity at ffffffff81068f09
enqueue_task_fair at ffffffff81069384
enqueue_task at ffffffff81059216
activate_task at ffffffff81059253
pull_task at ffffffff81065401
load_balance_fair at ffffffff810657b7
thread_return at ffffffff81527d30
schedule_timeout at ffffffff815287b5
wait_for_common at ffffffff81528433
wait_for_completion at ffffffff8152854d
blk_execute_rq at ffffffff8126d9dc
emcp_scsi_cmd_ioctl at ffffffffa04920a2 [emcp]
PowerPlatformBottomDispatch at ffffffffa0492eb8 [emcp]
PowerSyncIoBottomDispatch at ffffffffa04930b8 [emcp]
PowerBottomDispatchPirp at ffffffffa049348c [emcp]
PowerDispatchX at ffffffffa049390d [emcp]
MpxSendScsiCmd at ffffffffa061853e [emcpmpx]
ClariionKLam_groupReserveRelease at ffffffffa061e495 [emcpmpx]
MpxDefaultRegister at ffffffffa061df0a [emcpmpx]
MpxTestPath at ffffffffa06227b5 [emcpmpx]
MpxExtraTry at ffffffffa06234ab [emcpmpx]
MpxTestDaemonCalloutGuts at ffffffffa062402f [emcpmpx]
MpxIodone at ffffffffa0624621 [emcpmpx]
MpxDispatchGuts at ffffffffa0625534 [emcpmpx]
MpxDispatch at ffffffffa06256a8 [emcpmpx]
PowerDispatchX at ffffffffa0493921 [emcp]
GpxDispatch at ffffffffa0644775 [emcpgpx]
PowerDispatchX at ffffffffa0493921 [emcp]
GpxDispatchDown at ffffffffa06447ae [emcpgpx]
VluDispatch at ffffffffa068b025 [emcpvlumd]
GpxDispatch at ffffffffa0644752 [emcpgpx]
PowerDispatchX at ffffffffa0493921 [emcp]
GpxDispatchDown at ffffffffa06447ae [emcpgpx]
XcryptDispatchGuts at ffffffffa0660b45 [emcpxcrypt]
XcryptDispatch at ffffffffa0660c09 [emcpxcrypt]
GpxDispatch at ffffffffa0644752 [emcpgpx]
PowerDispatchX at ffffffffa0493921 [emcp]
GpxDispatch at ffffffffa0644775 [emcpgpx]
PowerDispatchX at ffffffffa0493921 [emcp]
PowerSyncIoTopDispatch at ffffffffa04978b9 [emcp]
emcp_send_pirp at ffffffffa04979b9 [emcp]
emcp_pseudo_blk_ioctl at ffffffffa04982dc [emcp]
__blkdev_driver_ioctl at ffffffff8126f627
blkdev_ioctl at ffffffff8126faad
block_ioctl at ffffffff811c46cc
dmp_ioctl_by_bdev at ffffffffa074767b [vxdmp]
dmp_kernel_scsi_ioctl at ffffffffa0747982 [vxdmp]
dmp_scsi_ioctl at ffffffffa0786d42 [vxdmp]
dmp_send_scsireq at ffffffffa078770f [vxdmp]
dmp_do_scsi_gen at ffffffffa077d46b [vxdmp]
dmp_pr_check_aptpl at ffffffffa07834dd [vxdmp]
dmp_make_mp_node at ffffffffa0782c89 [vxdmp]
dmp_decode_add_disk at ffffffffa075164e [vxdmp]
dmp_decipher_instructions at ffffffffa07521c7 [vxdmp]
dmp_process_instruction_buffer at ffffffffa075244e [vxdmp]
dmp_reconfigure_db at ffffffffa076f40e [vxdmp]
gendmpioctl at ffffffffa0752a12 [vxdmp]
dmpioctl at ffffffffa0754615 [vxdmp]
dmp_ioctl at ffffffffa07784eb [vxdmp]
dmp_compat_ioctl at ffffffffa0778566 [vxdmp]
compat_blkdev_ioctl at ffffffff8128031d
compat_sys_ioctl at ffffffff811e0bfd
sysenter_dispatch at ffffffff81050c20

DESCRIPTION:
Dynamic Multi-Pathing (DMP) uses PERSISTENT RESERVE IN command with the REPORT
CAPABILITIES service action to discover target capabilities. On RHEL6, system
panics unexpectedly when Dynamic Multi-Pathing (DMP) executes PERSISTENT
RESERVE IN command with REPORT CAPABILITIES service action on powerpath
controlled device coming from EMC Clarion/VNX array. This bug has been reported
to EMC powperpath engineering.

RESOLUTION:
The Dynamic Multi-Pathing (DMP) code is modified to execute PERSISTENT RESERVE
IN command with the REPORT CAPABILITIES service action to discover target
capabilities only on non-third party controlled devices.

* 3440232 (Tracking ID: 3408320)

SYMPTOM:
Thin reclamation fails for EMC 5875 arrays with the following message:
# vxdisk reclaim <disk>
Reclaiming thin storage on:
Disk <disk> : Reclaim Partially Done. Device Busy.

DESCRIPTION:
As a result of recent changes in EMC Microcode 5875, Thin Reclamation for EMC
5875 arrays fails because reclaim request length exceeds the maximum
"write_same" length supported by the array.

RESOLUTION:
The code has been modified to correctly set the maximum "write_same" length of
the array.

* 3457363 (Tracking ID: 3462171)

SYMPTOM:
When SCSI-3 Persistent Reservation command ioctls are issued on
non-SCSI devices, dmpnode gets disabled with the following messages in the
system log:
[..]
Mar 10 22:44:19 s40sb1 vxdmp: NOTICE: VxVM vxdmp V-5-3-0 dmp_scsi_ioctl:
devno=0x11700000002 ret=0x19
Mar 10 22:44:19 s40sb1 vxdmp: NOTICE: VxVM vxdmp V-5-0-0 [Warn] SCSI error
opcode=0x5e returned rq_status=0x7 cdb_status=0x0 key=0x0 asc=0x0 ascq=0x0 on
path 279/0x2
Mar 10 22:44:19 s40sb1 vxdmp: NOTICE: VxVM vxdmp V-5-3-1476 dmp_notify_events:
Total number of events = 1
Mar 10 22:44:19 s40sb1 vxdmp: NOTICE: VxVM vxdmp V-5-3-0 dmp_pr_send_cmd failed
with transport error: uscsi_rqstatus = 7ret = -1 status = 0 on dev 279/0x2
Mar 10 22:44:19 s40sb1 vxdmp: NOTICE: VxVM vxdmp V-5-0-112 [Warn] disabled path
279/0x0 belonging to the dmpnode 302/0x0 due to path failure
Mar 10 22:44:19 s40sb1 vxdmp: NOTICE: VxVM vxdmp V-5-3-1476 dmp_notify_events:
Total number of events = 2
Mar 10 22:44:19 s40sb1 vxdmp: NOTICE: VxVM vxdmp V-5-0-111 [Warn] disabled
dmpnode 302/0x0
[..]

DESCRIPTION:
Non-SCSI devices do not support SCSI persistent reservation commands. Thus,
SCSI-3 persistent commands 'ioctls' on non-SCSI device fails with unsupported
error codes. Due to which, Dynamic Multipathing (DMP) ends up treating ioctl
failure as path error causing dmpnode to get disabled.

RESOLUTION:
The DMP code is modified such that SCSI-3 persistent reservation
commands are not sent on non-SCSI devices.

* 3470254 (Tracking ID: 3370637)

SYMPTOM:
In VxVM, the SmartIO feature gets enabled for volumes with type as root/swap.

DESCRIPTION:
The SmartIO feature should be disabled for volumes of root/swap use type.

RESOLUTION:
The code is modified to disable the SmartIO feature for volumes with root/swap use type.

* 3470255 (Tracking ID: 2847520)

SYMPTOM:
Users can create linked volume using 'vxsnap addmir ... mirdg=<dg> mirvol=<vol>' CLI. The target volume then can be used to create snapshot using 'vxsnap make source=<src-vol> snap=<tgt-vol> snapdg=<tgt-dg>' CLI. When such linked volume is created in the clustered environment and the resize operation is executed on the volume, it can cause corruption of target volume. So, if such a target volume is used to create snapshot, it would be corrupted as well. If the linked volume had a VxFS filesystem created on it and the user tries to mount the snapshot created using such a corrupted target volume, it might fail with the following error message:

UX:vxfs mount: ERROR: V-3-26883: fsck log replay exits with 12
UX:vxfs mount: ERROR: V-3-26881: Cannot be mounted until it has been cleaned by
fsck. Please run "fsck -V vxfs -y /dev/vx/dsk/snapdg/snapvol" before mounting.

DESCRIPTION:
When a linked volume is resized, maps are updated to keep track of regions inconsistent between the source volume and the target volume for the grown/shrunk region. Such update of map should ideally happen only from one node/machine in the cluster. Due to some problems, the map was getting updated concurrently from two different nodes/machine causing inconsistent maps. When this map was used to synchronize the target volume, it would lead to data on target volume not getting synchronized correctly and thus led to corrupted target volume.

RESOLUTION:
The code is modified to make sure that the map is updated only from one node if the volume is shared.

* 3470260 (Tracking ID: 3415188)

SYMPTOM:
Filesystem/IO hangs during data replication with Symantec Replication Option (VVR) with the following stack trace:
schedule()
volsync_wait()
volopobjenter()
vol_object_ioctl()
voliod_ioctl()
volsioctl_real()
vols_ioctl()
vols_compat_ioctl()
compat_sys_ioctl()
sysenter_dispatch()

DESCRIPTION:
One of the structures of Symantec Replication Option that are associated with Storage Replicator Log (SRL) can become invalid because of improper locking mechanism in code that leads to IO/file system hang.

RESOLUTION:
The code is changed to have appropriate locks to protect the code.

* 3470262 (Tracking ID: 3077582)

SYMPTOM:
A Veritas Volume Manager (VxVM) volume may become inaccessible causing the read/write operations to fail with the following error:
# dd if=/dev/vx/dsk/<dg>/<volume> of=/dev/null count=10
dd read error: No such device
0+0 records in
0+0 records out

DESCRIPTION:
If I/Os to the disks timeout due to some hardware failures like weak Storage Area Network (SAN) cable link or Host Bus Adapter (HBA) failure, VxVM assumes that the disk is faulty or slow and it sets the failio flag on the disk. Due to this flag, all the subsequent I/Os fail with the ANo such deviceA error.

RESOLUTION:
The code is modified such that vxdisk now provides a way to clear the failio flag. To check whether the failio flag is set on the disks, use the vxkprint(1M) utility (under /etc/vx/diag.d). To reset the failio flag, execute the Avxdisk set <disk_name> failio=offA command, or deport and import the disk group that holds these disks.

* 3470265 (Tracking ID: 3326964)

SYMPTOM:
VxVM () hangs in CVM environment in presence of Fast Mirror Resync FMR)/Flashsnap operations with the following stack trace:
voldco_cvm_serialize()
voldco_serialize()
voldco_handle_dco_error()
voldco_mapor_sio_done()
voliod_iohandle()
voliod_loop()
child_rip()
voliod_loop()
child_rip()

DESCRIPTION:
During split brain testing in presence of FMR activities, when errors occur on the Data change object (DCO), the DCO error handling code sets up a flag due to which the same error gets set again in its handler. Consequently the VxVM Staged I/O (SIO) loop around the same code and causes the hang.

RESOLUTION:
The code is changed to appropriately handle the scenario.

* 3470270 (Tracking ID: 3403390)

SYMPTOM:
The linked-to volume goes into NEEDSYNC state if the system crashes while I/Os are ongoing on the linked-from volume or the linked-from volume is open by any application.

DESCRIPTION:
In this case, when the system comes up, volume recovery is performed on the linked-from volume which also recovers the linked-to volumes associated to it. But still, the linked-to volumes were shown in NEEDSYNC state even though no recovery was required.

RESOLUTION:
The code is modified to prevent linked-to volume from going into NEEDSYNC
state in the above mentioned scenarios and hence no recovery is required for the linked-to volume.

* 3470272 (Tracking ID: 3385753)

SYMPTOM:
Replication to the Disaster Recovery (DR) site hangs even though Replication
links (Rlinks) are in the connected state.

DESCRIPTION:
Based on Network conditions under User Datagram Protocol (UDP), Symantec
Replication Option (Veritas Volume Replicator (VVR) has its own
flow control mechanism to control the flow/amount of data to be sent over the
network. Under error prone network conditions which cause
timeouts, VVR's flow control values become invalid, resulting in replication
hang.

RESOLUTION:
The code is modified to ensure valid values for the flow control even under
error prone network conditions.

* 3470273 (Tracking ID: 3416404)

SYMPTOM:
The vxdisksetup(1M) command shows the following warning message:
"Warning: The disk CHS geometry (261, 255, 63) reported by the operating system does not match the geometry stored on the disk label (1024, 128, 32)."

DESCRIPTION:
The warning message is displayed due to changes in the parted tool, which is internally used in the vxdisksetup(1M) command. In the new version of parted, error messages are re-directed to stderr rather than stdout.

RESOLUTION:
The code is modified to redirect the errors to stderr, not stdout.

* 3470274 (Tracking ID: 3373208)

SYMPTOM:
Veritas Dynamic Multipathing (DMP) wrongly sends the SCSI PR OUT command with Activate Persist Through Power Loss (APTPL) bit with value as A0A to array that supports the APTPL capabilities.

DESCRIPTION:
DMP correctly recognizes the APTPL bit settings and stores them in the database. DMP verifies this information before sending the SCSI PR OUT command so that the APTPL bit can be set appropriately in the command. But, due to issue in the code, DMP was not handling the node's device number properly. Due to which, the APTPL bit was getting incorrectly set in the SCSI PR OUT command.

RESOLUTION:
The code is modified to handle the node's device number properly in the DMP SCSI command code path.

* 3470275 (Tracking ID: 3417044)

SYMPTOM:
The system becomes unresponsive while creating Veritas Volume Replication (VVR)
TCP connection. The vxiod kernel thread reports the following stack trace:

mt_pause_trigger()
wait_for_lock()
spinlock_usav()
kfree()
t_kfree()
kmsg_sys_free()
nmcom_connect()
vol_rp_connect()
vol_rp_connect_start()
voliod_iohandle()
voliod_loop()

DESCRIPTION:
When multiple TCP connections are configured, some of these connections are
still in the active state and the connection request process function attempts
to free a memory block. If this block is already freed by a previous connection,
then the kernel thread may become unresponsive on a HPUX platform.

RESOLUTION:
The code is modified to resolve the issue of freeing a memory block which is
already freed by another connection.

* 3470276 (Tracking ID: 3287940)

SYMPTOM:
Logical unit number (LUN) from the EMC CLARiiONarray and having NR (Not Ready) state are shown in the state of online invalid by  Veritas Volume Manager (VxVM).

DESCRIPTION:
Logical unit number (LUN) from EMC CLARiiON array and having NR state are shown in the state of online invalid by Veritas Volume Manager (VxVM). The EMC CLARiiON array does not have mechanism to communicate the NR state of LUN, so VxVM cannot recognize it. However, read operation on these LUNs fails. Due to defect in the disk online operation, the read failure is ignored and causes the disk online to succeed. Thus, these LUNs are shown as online invalid.

RESOLUTION:
Changes have been made to recognize and propagate the disk read operation failure during the online operation. So the EMC CLARiiON disks with NR state are shown in error state.

* 3470279 (Tracking ID: 3300418)

SYMPTOM:
VxVM volume operations on shared volumes cause unnecessary read I/Os on the
disks that have both configuration copy and log copy disabled on slaves.

DESCRIPTION:
The unnecessary disk read I/Os are generated on slaves when VxVM is refreshing
the private region information into memory during VxVM transaction. In fact,
there is no need to refresh the private region information if the disk already
has disabled the configuration copy and log copy.

RESOLUTION:
The code has been changed to skip the refreshing if both configuration copy and
log copy are already disabled on master and slaves.

* 3470282 (Tracking ID: 3374200)

SYMPTOM:
A Linux-based system panic or exceptional IO delay is observed on the volume when a snapshot operation is executed. In case of a panic, the following stack trace is reported:
spin_lock_irqsave()
volpage_getlist_internal()
volpage_getlist()
voldco_needupdate_instant()
volfmr_needupdate_instant()

DESCRIPTION:
Volume Manager uses a system wide pool of memory to manage IO and snapshot operations (paging module). The default size is 6MB on Linux, which suffices for 1 TB volume. For snapshot operations, the default size is increased dynamically without considering the IO performed by other volumes. Thus, during a snapshot operation, contention on the paging module occurs leading to delays in IO handling or sometimes panics.

RESOLUTION:
The code is modified such that the default paging module size on Linux is increased to 64M. However, if you still face the issue then you can   manually increase the paging module size using the volpagemod_max_memsz tunable. Or, to avoid manual intervention, To avoid contention during the snapshot operation, the paging module size is increased considering the other online volumes in the system.

* 3470286 (Tracking ID: 3417164)

SYMPTOM:
The Asfcache(1M) resizeA command fails when the size value is lesser than the data stored in the cache.

DESCRIPTION:
The cache area contains the metadata and cached data. The cache area size can be shrunk to a value which is an addition of metadata size and cached data size. However, the cache area cannot be shrunk to match the the metadata size value.

RESOLUTION:
The code is modified such that the minimal size to which a cache area can be shrunk is calculated matching the metadata size.

* 3470290 (Tracking ID: 2999871)

SYMPTOM:
The vxinstall(1M) command gets into a hung state when it is invoked
through Secure Shell (SSH) remote execution.

DESCRIPTION:
The vxconfigd process which starts from the vxinstall script fails
to close the inherited file descriptors, causing the vxinstall to enter the hung
state.

RESOLUTION:
The code is modified to handle the inherited file descriptors for
the vxconfigd process.

* 3470292 (Tracking ID: 3416098)

SYMPTOM:
"vxvmconvert" utility throws error during execution. The
following error message is displayed.

"[: 100-RHEL6: integer expression expected".

DESCRIPTION:
In vxvmconvert utility, the sed (stream editor) expression which is used to
parse and extract the Logical Volume Manager (LVM) version is not appropriate
and fails for particular RHEL(Red Hat Enterprise Linux) version.

RESOLUTION:
Code changes are done so that the expression parses correctly.

* 3470300 (Tracking ID: 3340923)

SYMPTOM:
The following kind of messages are seen in system log corresponding to
the unavailable asymmetric access state paths.

..
..
kernel: [859680.551729] end_request: I/O error, dev
sdc, sector 44515143296
kernel: [859680.552219] VxVM vxdmp V-5-0-112 disabled
path 8/0x20 belonging to the dmpnode 201/0x40 due to path failure
kernel: [859690.554947] VxVM vxdmp V-5-0-1652
dmp_alua_update_alua_info: change in aas detected for node=8/32 old_aas: 0
new_aas: 3
kernel: [859690.554980] VxVM vxdmp V-5-0-148 enabled
path 8/0x20 belonging to the dmpnode 201/0x40
..
..

DESCRIPTION:
The paths with unavailable asymmetric access state do not service I/O. Hence by design, DMP tries to return path failure for such paths. This avoids such paths getting selected for I/Os. But due to an issue, DMP returns path okay and causes path to be enabled back.

RESOLUTION:
The code is modified to fix this issue.

* 3470301 (Tracking ID: 2812161)

SYMPTOM:
In a VVR environment, after the Rlink is detached, the vxconfigd(1M) daemon on
the secondary host may hang. The following stack trace is observed:
cv_wait
delay_common
delay
vol_rv_service_message_start
voliod_iohandle
voliod_loop
...

DESCRIPTION:
There is a race condition if there is a node crash on the primary site of VVR
and if any subsequent Rlink is detached. The vxconfigd(1M) daemon on the
secondary site may hang, because it is unable to clear the I/Os received from
the primary site.

RESOLUTION:
The code is modified to resolve the race condition.

* 3470303 (Tracking ID: 3314647)

SYMPTOM:
The vxcdsconvert(1M) command fails with following options when disk group (DG) contains multiple volumes:

/etc/vx/bin/vxcdsconvert -o novolstop -g <DGNAME> group move_subdisks_ok=yes
evac_subdisks_ok=yes

relayout: ERROR! Plex column offset is not strictly increasing for column/plex
0/<VOLUME NAME>

DESCRIPTION:
When the vxcdsconvert(1M) command is invoked with "group" option, it converts non-cds disks to Cross-Platform-Data Sharing(CDS) formatted disks. The command will align and resize subdisks within the DG to make sure alignment of all volumes is 8k and finally set CDS flag on the DG.

The command may need to relocate subdisks when it formats the disks with CDS format. To do this, it uses some global variables which indicate the subdisks from volume which need to be re-analyzed. To align all volumes at 8K boundaries, same global variables are used and since these global variables were already set during disk initialization, stale values were used during volume alignment which causes the error.

RESOLUTION:
The code is modified so that before volumes are considered for 8K alignment, the global variables, which indicate whether some subdisks need to be reanalyzed for alignment or not, are reset.

* 3470304 (Tracking ID: 3390162)

SYMPTOM:
The NetMapper tool which scans the User Datagram Protocol (UDP) port 4145 causes the vxnetd daemon to consume 100% CPU, the rlink
disconnects and finally the system hang.

DESCRIPTION:
As the vxnetd daemon skips the zero byte UDP packet which is sent from NetMapper, the packet is kept in socket receive buffer, and it leads to the vxnetd daemon ceaselessly receive the packet.

RESOLUTION:
The code changes are made not to skip the zero byte UDP packet.

* 3470321 (Tracking ID: 3336714)

SYMPTOM:
The slab of I/O request in Linux may get corrupted.

DESCRIPTION:
A slab of I/O request may get corrupt in Dynamic Multi-Pathing (DMP) SCSI bypass I/O path when a member pointer of Linux kernel I/O request structure is set to NULL after the I/O request pointer is freeded.

RESOLUTION:
The code is modified to delete the needless code, which sets the member pointer to NULL.

* 3470322 (Tracking ID: 3399323)

SYMPTOM:
The reconfiguration of Dynamic Multipathing (DMP) database fails with the below error: VxVM vxconfigd DEBUG  V-5-1-0 dmp_do_reconfig: DMP_RECONFIGURE_DB failed: 2

DESCRIPTION:
As part of the DMP database reconfiguration process, controller information from DMP user-land database is not removed even though it is removed from DMP kernel database. This creates inconsistency between the user-land and kernel-land DMP database. Because of this, subsequent DMP reconfiguration fails with above error.

RESOLUTION:
The code changes have been made to properly remove the controller information from the user-land DMP database.

* 3470345 (Tracking ID: 3281004)

SYMPTOM:
For DMP minimum queue I/O policy with large number of CPUs, the following
issues are observed since the VxVM 5.1 SP1 release:
1. CPU usage is high.
2. I/O throughput is down if there are many concurrent I/Os.

DESCRIPTION:
The earlier minimum queue I/O policy is used to consider the host controller
I/O load to select the least loaded path. For VxVM 5.1 SP1 version, an addition
was made to consider the I/O load of the underlying paths of the selected host
based controllers. However, this resulted in the performance issues, as there
were lock contentions with the I/O processing functions and the DMP statistics
daemon.

RESOLUTION:
The code is modified such that the host controller paths I/O load is not
considered to avoid the lock contention.

* 3470347 (Tracking ID: 3444765)

SYMPTOM:
In Cluster Volume Manager (CVM), shared volume recovery may take long time for large configurations.

DESCRIPTION:
In the CVM environment, the volume recovery operation involves following tasks:
1. Unconditional Data Change Object (DCO) volume recovery.
2. Recovery using the vxvol noderecover(1M) command.

But the DCO volume recovery is done serially and the noderecover(1M) command is executed separately for each volume which needs recovery. Therefore, the complete recovery operation can take longer time.

RESOLUTION:
The code changes are done to recover DCO volume only if required and the recovery of multiple DCO volumes will be done in parallel. Similarly, single vxvol noderecover(1M) command will be issued for multiple volumes.

* 3470350 (Tracking ID: 3437852)

SYMPTOM:
The system panics when  Symantec Replicator Option goes to PASSTHRU
mode. Panic stack trace might look like:

vol_rp_halt()
vol_rp_state_trans()
vol_rv_replica_reconfigure()
vol_rv_error_handle()
vol_rv_errorhandler_callback()
vol_klog_start()
voliod_iohandle()
voliod_loop()

DESCRIPTION:
When Storage Replicator Log (SRL) gets faulted for any reason, VVR
goes into the PASSTHRU Mode. At this time, a few updates are erroneously freed.
When these updates are accessed during the correct processing, access to these
updates results in panic as the updates are already freed.

RESOLUTION:
The code changes have been made not to free the updates erroneously.

* 3470352 (Tracking ID: 3450758)

SYMPTOM:
The slave node panics when it tries to join the cluster with the following stack:
bad_area_nosemaphore()
page_fault()
vol_plex_iogen()
volobject_iogen()
vol_cvol_volobject_iogen()
vol_cvol_init_iogen()
vol_cache_linkdone()
cvm_msg_cfg_end()
vol_kmsg_request_receive()
vol_kmsg_receiver()
kernel_thread()

DESCRIPTION:
The panic happens when the node generates a Staged I/O (SIO) for the plex object in the process of validating the cache object (parent) and plex object (child) associations. Some of the fields of the plex object which have not been populated are accessed as part of the SIO generation. This access to NULL fields leads to panic.

RESOLUTION:
The code changes have been made to avoid accessing those NULL fields of plex object while the slave node joins the cluster.

* 3470353 (Tracking ID: 3236772)

SYMPTOM:
Replication with heavy I/O loads on primary sites result in the following errors
on the secondary site:
1. "Transaction aborted waiting for io drain"
and/or
2. "vradmin ERROR Lost connection to host"

DESCRIPTION:
A deadlock between 'transaction' and messages delivered to the secondary site
results in repeated timeouts of transactions on the secondary site. These
repeated transaction timeouts cause transaction failures and/or session timeouts
between primary and secondary sites.

RESOLUTION:
The code is modified to resolve the deadlock condition.

* 3470354 (Tracking ID: 3446415)

SYMPTOM:
When the file system shrink operation is performed on FileStore, a
different pool may get added to the file system
Example:

# fs create mirrored <FS> 8g 2 pool01 protection=disk

# fs list
FS                        STATUS       SIZE    LAYOUT              MIRRORS
COLUMNS   USE%  NFS SHARED  CIFS SHARED  SECONDARY TIER  POOL LIST
========================= ======       ====    ======              =======
=======   ====  ==========  ===========  ==============  =========
<FS>                     online      8.00G    mirrored            2         -

1%     no          no           no           pool01


# fs shrinkto primary <FS> 4g

# fs list
FS                        STATUS       SIZE    LAYOUT              MIRRORS
COLUMNS   USE%  NFS SHARED  CIFS SHARED  SECONDARY TIER  POOL LIST
========================= ======       ====    ======              =======
=======   ====  ==========  ===========  ==============  =============
<FS>                 online      4.00G    mirrored            2         -

2%     no          no           no           pool01, pool02

DESCRIPTION:
While performing shrink operation on volume, associated DCO volume gets
recreated. But during this operation, specified site tags on command line are
not taken into consideration. Thus, new DCO volumes may get created on the disk
that has different site tag. Due to this, different pools are added to file
system on FileStore.

RESOLUTION:
The code is modified to properly allocate DCO volume within the
specified pool of the disk during the volume shrink operation.

* 3470382 (Tracking ID: 3368361)

SYMPTOM:
When site consistency is configured within a private disk group and Cluster
Volume Manager (CVM) is up, the reattach operation of a detached site fails.

DESCRIPTION:
When you try to reattach the detached site configured in a private disk-group
with CVM up on that node, the reattach operation fails with
the following error

"Disk (disk_name) do not have connectivity from one or more cluster nodes".

The reattach operation fails because you are not checking the shared attribute
of the disk group when you apply the disk connectivity
check for a private disk group.

RESOLUTION:
The code is modified to make the disk connectivity check explicit for a shared
disk group by checking the shared attribute of a disk group.

* 3470383 (Tracking ID: 3455460)

SYMPTOM:
The vxfmrshowmap and verify_dco_header utilities fail with the following error
message:
vxfmrshowmap:
VxVM  ERROR V-5-1-15443 seek to <offset-address> for <device-name> FAILED:Error
0

verify_dco_header:
Cannot lseek to offset:<offset-address>

DESCRIPTION:
The issue occurs because the large offsets are not handled properly while
seeking using 'lseek', as a part of the vxfmrshowmap and verify_dco_header
utilities.

RESOLUTION:
The code is modified to properly handle large offsets as a part of the
vxfmrhsowmap and verify_dco_header utilities.

* 3470384 (Tracking ID: 3440790)

SYMPTOM:
Sometimes the vxassist(1M) command with parameter mirror and the vxplex(1M) command  with parameter att hang. The hang is observed by looking at the vxtask list output which shows no progress of the tasks.

DESCRIPTION:
Sometimes the vxassist(1M) command with parameter mirror and the  vxplex command(1M) with parameter att result into hang, showing no progress of operation in the vxtask list o/p. This operation requires a kernel memory to copy data from one plex to another and this memory is allocated from a dedicated Veritas Volume Manager (VxVM) memory pool. Due to some problems in calculating free memory in the pool, the task forever waits for memory even though enough memory is available in pool.

RESOLUTION:
The code is modified to properly compute the free memory available in VxVM's memory pool use for these operations.

* 3470385 (Tracking ID: 3373142)

SYMPTOM:
Manual pages for vxedit and vxassist do not contain details about updated
behavior of these commands.

DESCRIPTION:
1. vxedit manual page
On this page it explains that if the reserve flag is set
for a disk, then vxassist will not allocate a data subdisk on that disk unless
the disk is specified on the vxassist command line. But Data Change Object (DCO)
volume creation by vxassist or vxsnap command will not honor the reserved flag.

2. vxassist manual page
DCO allocation policy has been updated starting from 6.0. The allocation policy
may not succeed if there is insufficient disk space. The vxassist command then
uses available space on the remaining disks of the disk group. This may prevent
certain disk group from splitting or moving if the DCO plexes cannot accompany
their parent data volume.

RESOLUTION:
The manual pages for both commands have been updated to reflect the new
behavioral changes.

* 3490147 (Tracking ID: 3485907)

SYMPTOM:
Panic occurs in the I/O code path. The following stack trace is observed:
...
volkcontext_process()
volkiostart()
vxiostrategy()
...

or
...
voliod_iohandle()
voliod_loop()
...

DESCRIPTION:
When the snapshot reattach operation is in progress on a volume, the metadata
of the snapshot get's updated. If any parallel I/O during this operation gets
incorrect state of
Metadata, this leads to IO's of zero size being created. This leads to system
panic.

RESOLUTION:
The code is modified to avoid the generation of I/Os of zero length, on volumes
which are under the snapshot operations.

* 3492363 (Tracking ID: 3402487)

SYMPTOM:
Page-size allocation fails in Fast Mirror Resynchronization (FMR) operation due to fragmentation or size limitation. As a result, vxiod daemon is found hogging the CPU.

DESCRIPTION:
Paging module used for Fast Mirror Resynchronization (FMR) operation requires a large number of pages. The kernel memory allocator in Volume Manager on Linux is such that the page-size allocation (8K) is done directly from the mapped area. Page-size allocation in some situations fails because of fragmentation or size limitation.

RESOLUTION:
The code is modified to introduce new API which allocates memory virtually.

* 3506660 (Tracking ID: 3325022)

SYMPTOM:
The disks that are exported using the VirtIO-disk interface from an SLES11 SP2 or SP3 host are invisible to Veritas Volume Manager running inside the Kernel-based Virtual Machine (KVM) guests.

DESCRIPTION:
Array Support Library (ASL) claims devices during device discovery. But it doesn't support devices that are exported from a host running SLES11 SP2 or SLES11 SP3, to guest using VirtIO-disk interfaces. Therefore the devices are not visible to Veritas Volume Manager running inside a KVM guest. For example, if disks vda, vdb are the only exported disks to guest "guest1" using the VirtIO-disk interface, they're not visible in "vxdisk list" output:
guest1:~ # vxdisk list
DEVICE TYPE DISK GROUP STATUS
As, none of the ASLs claimed those devices:
guest1:~ # vxddladm list devices
DEVICE               TARGET-ID    STATE   DDL-STATUS (ASL)
===============================================================
vdb                  -            Online  -
vda                  -            Online  -

RESOLUTION:
The code is fixed to make ASL claim the disk exported using VirtIO-disk Interface.

* 3506679 (Tracking ID: 3435225)

SYMPTOM:
In a given CVR setup, rebooting the master node causes one of the slaves to
panic with following stack:

pse_sleep_thread
vol_rwsleep_rdlock
vol_kmsg_send_common
vol_kmsg_send_prealloc
cvm_obj_sendmsg_prealloc
vol_rv_async_done
volkcontext_process
voldiskiodone

DESCRIPTION:
The issue is triggered by one of the code paths sleeping in interrupt context.

RESOLUTION:
The code is modified so that sleep is not invoked in interrupt context.

* 3506707 (Tracking ID: 3400504)

SYMPTOM:
While disabling the host side HBA port, extended attributes of some devices are
not present anymore. This happens even when there is a redundant controller
present on the host which is in enabled state.

An example output is shown below where the 'srdf' attribute of an EMC device
(which has multiple paths through multiple controllers) gets affected.

Before the port is disabled-
# vxdisk -e list emc1_4028
emc1_4028    auto:cdsdisk   emc1_4028    dg21        online
c6t5000097208191154d112s2 srdf-r1

After the port is disabled-
# vxdisk -e list emc1_4028
emc1_4028    auto:cdsdisk   emc1_4028    dg21        online
c6t5000097208191154d112s2 -

DESCRIPTION:
The code which prints the extended attributes used to print the attributes of
the first path in the list of all paths. If the first path belongs to the
controller which is disabled, its attributes will be empty.

RESOLUTION:
The code is modified to look for the path in enabled state among all the paths
and then print the attributes of such path.

* 3506709 (Tracking ID: 3259732)

SYMPTOM:
In a Clustered Volume Replicator (CVR) environment, if the SRL size grows and if
it is followed by a slave node leaving and then re-joining the cluster then
rlink is detached.

DESCRIPTION:
After the slave re-joins the cluster, it does not correctly receive and process
the SRL resize information received from the master. This means that application
writes initiated on this slave may corrupt the SRL causing rlink to detach.

RESOLUTION:
The code is modified so that when a slave joins the cluster, make sure that the
SRL resize related information is correctly received and processed by the slave.

* 3506718 (Tracking ID: 3433931)

SYMPTOM:
The AvxvmconvertAutility fails to get the correct Logical Volume Manager (LVM) version and reports following message:
"vxvmconvert is not supported if LVM version is lower than 2.02.61 and Disk having multiple path"

DESCRIPTION:
In the AvxvmconvertA utility, an incorrect stream editor (sed) expression is used to parse and extract the LVM version. Due to which the expression failed for build versions greater than two digits.

RESOLUTION:
The code is modified to parse the expression correctly.

* 3506747 (Tracking ID: 3496077)

SYMPTOM:
The vxvmconvert(1m) command fails while converting Logical Volume Manager (LVM) Volume Groups (VG)into VxVM disk group and displays the following error message:
AThe specified Volume Group (VG name) was not found.A

DESCRIPTION:
A race condition is observed between AblockdevA and AvgdisplayA commands, wherein, if the AblockdevA command does not flush the buffered data onto the device, then the AvgdisplayA fails to provide accurate output. As a result, VxVM assumes that Volume Groups (VG) is not present and displays the error message.

RESOLUTION:
The code is modified to allow finite number of trials for AvgdisplayAcommand before failure so that there is sufficient time for buffers to get flushed.

* 3515438 (Tracking ID: 3535309)

SYMPTOM:
VRTSvxvm patch failed to install on sfcache configure system while upgrading
SFRAC from 6.1 to 6.1.1. The patch installation failed with the following error
message:
UX:vxfs fscache: ERROR: V-3-25963: Failed to open /dev/vxca: No such file or
directory
SFCache ERROR V-3-25963: Failed to open /dev/vxca: No such file or directory

DESCRIPTION:
SFRAC installer stops daemon/services and removes all modules related to SFRAC
packages before the upgrade process starts.
The vxrecover(1M) command gets executed during the VRTSvxvm package upgrade and
attempts to start the cache volume on sfcache configured system. However, it
fails because sfcache related module does not exist at the time of upgrade task.

RESOLUTION:
The code is modified to redirect the vxrecover(1M) command's standard error and
output to /dev/null at configuration stage when SFRAC starts all services and
daemons.

* 3522541 (Tracking ID: 3533888)

SYMPTOM:
If the default value is not chosen, the vxunroot(1M) throws error and goes into infinite loop. The following error messages are displayed:
"VxVM vxunroot ERROR V-5-2-2534 Grub menu entry is not a valid menu entry
VxVM vxunroot INFO V-5-2-2607 The grub boot menu entries in the current
configuration are:
        0. "title Red Hat Enterprise Linux (2.6.32-358.el6.x86_64)"

DESCRIPTION:
When the vxunroot(1M) command is executed, if the file descriptor was not closed,  the system prints error and turns into infinite looping.

RESOLUTION:
The code is modified to solve this issue.

* 3531906 (Tracking ID: 3526500)

SYMPTOM:
Disk IO failures occur with DMP IO timeout error messages when DMP (Dynamic Multi-pathing) IO statistics daemon is not running. Following are the timeout error messages:

VxVM vxdmp V-5-3-0 I/O failed on path 65/0x40 after 1 retries for disk 201/0x70
VxVM vxdmp V-5-3-0 Reached DMP Threshold IO TimeOut (100 secs) I/O with start
3e861909fa0 and end 3e86190a388 time
VxVM vxdmp V-5-0-0 [Error] i/o error occurred (errno=0x206) on dmpnode 201/0x70

DESCRIPTION:
When IO is submitted to DMP, it sets the start time on the IO buffer. The value of the start time depends on whether the DMP IO statistics daemon is running or not. When the IO is returned as error from SCSI to DMP, instead of retrying the IO on alternate paths, DMP failed that IO with 300 seconds timeout error, but the IO has elapsed only few milliseconds in its execution. The miscalculation of DMP timeout happens only when DMP IO statistics daemon is not running.

RESOLUTION:
The code is modified to calculate appropriate DMP IO timeout value when DMP IO statistics demon is not running.

* 3536289 (Tracking ID: 3492062)

SYMPTOM:
Dynamic Multi-Pathing (DMP) fails to get page 0x83 LUN identifier for EMC symmetrix LUNS and continuously logs the following error message:
AVxVM vxdmp V-5-3-1984 dmp_restore_callback: devid could not be extracted from pg 0x83 on pathA

DESCRIPTION:
If DMP fails to get page 0x83 LUN identifier for EMC symmetrix LUNS during discovery, then DMP should set a device identifier unsupported flag on the corresponding DMP node. Currently, there is no code to set this flag and hence, the restore daemon identifies such devices as device identifier mismatch case and logs error messages.

RESOLUTION:
The code is modified to mark device identifier unsupported flag if it could not be extracted during discovery process so that the restore daemon does not pick it up for device identifier mismatch case. Also, the message is converted from NOTE message to LOG message type.

* 3538792 (Tracking ID: 3528498)

SYMPTOM:
When the VxVM smartIO is disabled on a VxVM volume, VxIO causes a system panic with the following stack trace:

COMMAND: "vxiod"
machine_kexec()
crash_kexec()
panic()
vol_cvol_delete_iogen()
vol_cache_request_start()
voliod_iohandle()
voliod_loop()
kthread()
kernel_thread()

DESCRIPTION:
When the VxVM SmartIO is disabled on a VxVM volume, the cached data for the volume is deleted asynchronously.
VxVM keeps a list of asynchronous delete operations in a queue protected by a lock. In a special case, this queue is modified without holding the lock. The corrupted queue results in a system panic.

RESOLUTION:
The code is modified to hold the lock while modifying the queue.

* 3539906 (Tracking ID: 3394933)

SYMPTOM:
In Flexible Storage Sharing (FSS) disk group, if a node providing storage is joining back to the cluster, the volume recovery operations are started. These recovery operations are not fired in parallel for all remote devices. As a result, the volume recovery operations take a longer time and also impact the application I/O performance.

DESCRIPTION:
Veritas Volume Manager (VxVM) controls the number of recovery operations that can be started in parallel. If there is a scenario wherein, more than the desired maximum number of recovery operations is to be run on a particular disk, then VxVM ensures that at any given point of time, not more than the desired maximum task is run. Due to a bug, all the remote disks are deemed same in vxrecover operation. Hence, parallel recovery operation is not allowed on volumes in FSS disk group even when it is possible.

RESOLUTION:
The Code is modified in vxrecover to identify different remote disks as different to ensure that parallel recovery operation runs.

* 3540122 (Tracking ID: 3482026)

SYMPTOM:
The vxattachd(1M) daemon reattaches plexes of manually detached site.

DESCRIPTION:
The vxattachd daemon reattaches plexes for a manually detached site that is the site with state as OFFLINE. As there was no check to differentiate between a manually detach site and the site that was detached due to IO failure. Hence, the vxattachd(1M) daemon brings the plexes online for manually detached site also.

RESOLUTION:
The code is modified to differentiate between manually detached site and the site detached due to IO failure.

* 3541262 (Tracking ID: 3543284)

SYMPTOM:
Storage devices are not visible in the vxdisk list or the vxdmpadm getdmpnode outputs.

DESCRIPTION:
The Device Discovery Layer (DDL) discovers storage devices and iterates through Array Support Libraries (ASL). DDL maintains bitmap data structure for the ASLs present on hosts. In some case of 64 ASLs, code was not considering the 64th bit in bitmap and hence the corresponding ASL is not called in the device claim cycle.
Due to this, devices are not visible in VxVM.

RESOLUTION:
The DDL code is modified such that the device details are listed in the VxVM list.

* 3543944 (Tracking ID: 3520991)

SYMPTOM:
The vxconfigd(1M) daemon dumps core due to memory corruption and displays the following stack trace:

1.

malloc_y()
malloc_common
misc.xalloc()
dll_iter_next()
ddl_claim_single_disk()
ddl_thread_claim_disk()
ddl_task_start()
volddl_thread_start()

OR

2.

free_y()
free_common()
ddl_name_value_api.nv_free
ddl_vendor_claim_device()
ddl_claim_single_disk()
ddl_thread_claim_disk()
ddl_task_start() at
volddl_thread_start()

DESCRIPTION:
Device Discovery Layer (DDL) maintains a bitmap data structure for the Array Support Libraries (ASLs) present on hosts. If the number of Array Support Libraries (ASLs) are greater than or equal to 64, then, while setting the 64th bit in the bitmap data structure associated with the ASLs, DDL performs an out of bound write operation. This memory corruption causes either vxconfigd daemon to dump core or other unexpected issues.

RESOLUTION:
The Device Discovery Layer (DDL) code is modified to fix this issue.

* 3444900 (Tracking ID: 3399131)

SYMPTOM:
The following command fails with an error for a path managed by Third Party
Driver (TPD) which co-exists with DMP.
# vxdmpadm -f disable path=<path name>
VxVM vxdmpadm ERROR V-5-1-11771 Operation not supported

DESCRIPTION:
The Third Party Drivers manage the devices with or without the co-existence of
Dynamic Multi Pathing driver. Disabling the paths managed by a third party
driver which does not co-exist with DMP is not supported. But due to bug in the
code, disabling the paths managed by a third party driver which co-exists with
DMP also fails. The same flags are set for all third party driver devices.

RESOLUTION:
The code has been modified to block this command only for the third party
drivers which cannot co-exist with DMP.

* 3445234 (Tracking ID: 3358904)

SYMPTOM:
The system with Asymmetric Logical Unit Access (ALUA) enclosures sometimes panics during path fault scenarios, with the following stack:

dmp_alua_get_owner_state()
dmp_alua_get_path_state()
dmp_get_path_state()
dmp_get_enabled_ctlrs()
dmp_info_ioctl()
gendmpioctl()
dmpioctl()
vol_dmp_ktok_ioctl()
dmp_get_enabled_cntrls()
vx_dmp_config_ioctl()
quiescesio_start()
voliod_iohandle()
voliod_loop()
kernel_thread()

DESCRIPTION:
System running with ALUA logical unit numbers (LUNs) sometimes panics during path fault scenarios. This happens due to possible NULL pointer access problems.

RESOLUTION:
The code is modified to fix the NULL pointer access problem, thus fixing the issue.

* 3445235 (Tracking ID: 3374117)

SYMPTOM:
Write I/O issued on a layered Veritas Volume Manager (VxVM) volume may hang if it has a detached plex and has VxVM SmartIO enabled.

DESCRIPTION:
VxVM uses interlocking to prevent parallel writes that are issued on a data volume. In certain scenarios that involve a detached plex, write I/O issued on a SmartIO enabled VxVM volume tries to take the interlock multiple times, which leads to I/O hang.

RESOLUTION:
The code is modified to avoid double interlocking in above mentioned scenarios.

* 3445236 (Tracking ID: 3381006)

SYMPTOM:
When I/Os are issued on the volume associated with Veritas Volume Manager(VxVM) block level SmartIO caching, the system panic happens with the stack trace below:

volmv_accum_cache_stats()
vol_cache_write_done()
volkcontext_process()
voldiskiodone()
...

DESCRIPTION:
The improper lock for the stats is taken when collecting SmartIO stats.
This leads to the system panic, because another thread modified the pointer to NULL.

RESOLUTION:
The code is modified so that a proper lock is held before collecting stats for SmartIO in kernel.

* 3446001 (Tracking ID: 3380481)

SYMPTOM:
When the vxdiskadm(1M) command selects a removed disk during the "5 Replace a failed or removed disk" operation, the vxdiskadm(1M) command displays the following error message:
"/usr/lib/vxvm/voladm.d/lib/vxadm_syslib.sh: line 2091:return: -1: invalid option".

DESCRIPTION:
From bash version 4.0, bash doesnAt accept negative error values. If VxVM scripts return negative values to bash, the error message is displayed.

RESOLUTION:
The code is modified so that VxVM scripts donAt return negative values to bash.

* 3446112 (Tracking ID: 3288744)

SYMPTOM:
In a Flexible Storage Sharing (FSS) diskgroup, whenever a new mirror is added to a volume, the Data Change Object (DCO) associated with the volume is not mirrored.

DESCRIPTION:
In an FSS environment, if a volume's DCO object is not mirrored across all hosts which are contributing to the storage of the volume, there is a possibility that volume recovery and mirror attaches for detached mirrors require full-re-synchronization data. There is also a possibility of losing snapshots if nodes which are contributing to the storage of the DCO volume go down. This essentially defeats the purpose of having the default DCO volume in FSS environments.

RESOLUTION:
The code is modified so that whenever a new data mirror is added to a volume in an FSS diskgroup, a new DCO mirror is also added on the same disks as the data volume. This increases the redundancy of DCO volume associated with the data volume and helps avoiding full data re-synchronization cases described above.

* 3446126 (Tracking ID: 3338208)

SYMPTOM:
The writes from fenced out LDOM guest node on Active-Passive (AP/F) shared storage device fails with the following error message:

..
Mon Oct 14 06:03:39.411: I/O retry(6) on Path c0d8s2 belonging to Dmpnode
emc_clariion0_48
Mon Oct 14 06:03:39.951: SCSI error occurred on Path c0d8s2: opcode=0x2a
reported device not ready (status=0x2, key=0x2, asc=0x4, ascq=0x3) LUN not
ready, manual intervention required
Mon Oct 14 06:03:39.951: I/O analysis done as DMP_PATH_FAILURE on Path c0d8s2
belonging to Dmpnode emc_clariion0_48
Mon Oct 14 06:03:40.311: Marked as failing Path c0d1s2 belonging to Dmpnode
emc_clariion0_48
Mon Oct 14 06:03:40.671: Disabled Path c0d8s2 belonging to Dmpnode
emc_clariion0_48 due to path failure
..
..

DESCRIPTION:
The write SCSI commands from fenced out host should fail with reservation conflict from the shared device. This error code needs to be propagated to upper layers for appropriate action.
In the DMP ioctl interface, DMP first sends command through available active paths. If the command fails, then it means that the command was tried through passive paths. The command fails with a not ready error code and this error code gets propagated to upper layer instead of reservation conflict.

RESOLUTION:
The code is modified to prevent retrying of the IO SCSI commands on passive paths in case of Active-Passive (AP/F) shared storage device.

* 3447244 (Tracking ID: 3427171)

SYMPTOM:
When I/Os are issued on a volume associated with the Veritas Volume Manager (VxVM) block level SmartIO caching immediately after a system reboot, a system panic happens with the following stack trace:
vol_unspinlock()
volmv_accum_cache_stats()
vol_cache_read_done()
volkcontext_process()
voldiskiodone()
...

DESCRIPTION:
VxVM detects all the CPUs on which I/O is happening. VxVM SmartIO uses this facility to enable stats collection for each CPU detected. A problem with SmartIO has caused SmartIO not to detect CPU changes and to use wrong pointers for unlocking, thus leading to the system panic.

RESOLUTION:
The code is modified to properly identify the newly detected CPU before using locks.

* 3447245 (Tracking ID: 3385905)

SYMPTOM:
Corrupt data might be returned for reading a data volume with VxVM (Veritas Volume Manager) SmartIO caching enabled, if the VxVM cache area is brought offline with the command "sfcache offline <cachearea>" and later brought online without a system reboot.

DESCRIPTION:
When the cache area is brought offline, few incore data structures of cache area are not reset. When it is brought online again, the cache area is re-created afresh as warm cache is not used in this case. Since the incore structures now have stale information about cache area it results in corruption.

RESOLUTION:
The code is modified to reinitalize the incore data structures of cache area, if VxVM doesnAt use the warm cache during the cache area online operations.

* 3447306 (Tracking ID: 3424798)

SYMPTOM:
Veritas Volume Manager (VxVM) mirror attach operations (e.g., plex attach,
vxassist mirror, and third-mirror break-off snapshot resynchronization) may take
longer time under heavy application I/O load. The vxtask list command shows
tasks are in the 'auto-throttled (waiting)' state for a long time.

DESCRIPTION:
With the AdminIO de-prioritization feature, VxVM administrative I/O's (e.g. plex
attach, vxassist mirror, and third-mirror break-off snapshot resynchronization)
are de-prioritized under heavy application I/O load, but this can lead to very
slow progress of these operations.

RESOLUTION:
The code is modified to disable the AdminIO de-prioritization feature.

* 3447530 (Tracking ID: 3410839)

SYMPTOM:
Volume allocation operations, such as make, grow, and convert, may internally lead to layered layouts. In FSS (Flexible Storage Sharing) environment, when any node goes down, this eventually causes full mirror synchronization for the layered volume recovery.

DESCRIPTION:
During volume allocation operations, the vxassist(1M) command sometimes automatically creates a layered layout for better resiliency. The creation depends on criteria like volume size or certain upper limit settings.
Layered layouts prevent the complete volume from being disabled, in case the disable operation affects underlying disk(s). However, layered layouts donAt support DCO (Data Change Object). Hence, volume recovery cannot do optimized mirror synchronization, when any
node goes down. This is a likely scenario in a FSS environment.

RESOLUTION:
The code is modified to add checks in vxassist allocation operations, to prevent internal change to layered layout for volumes in a FSS diskgroup.

* 3447894 (Tracking ID: 3353211)

SYMPTOM:
A. After EMC Symmetrix BCV (Business Continuance Volume) device switches to
read-write mode, continuous vxdmp (Veritas Dynamic Multi Pathing) error messages
flood syslog as shown below:

NOTE VxVM vxdmp V-5-3-1061 dmp_restore_node: The path 18/0x2 has not yet aged -
299
NOTE VxVM vxdmp 0 dmp_tur_temp_pgr: open failed: error = 6 dev=0x24/0xD0
NOTE VxVM vxdmp V-5-3-1062 dmp_restore_node: Unstable path 18/0x230 will not be
available for I/O until 300 seconds
NOTE VxVM vxdmp V-5-3-1061 dmp_restore_node: The path 18/0x2 has not yet aged -
299
NOTE VxVM vxdmp V-5-0-0 [Error] i/o error occurred (errno=0x6) on dmpnode
36/0xD0
..
..

B. DMP metanode/path under DMP metanode gets disabled unexpectedly.

DESCRIPTION:
A. DMP caches the last discovery NDELAY open for the BCV dmpnode paths. BCV
device switching to read-write mode is an array side operation. Typically in
such cases, the
system administrators are required to run the following command:

1. vxdisk rm <accessname>

OR

In case of parallel backup jobs,

1. vxdisk offline <accessname>
2. vxdisk online <accessname>

This causes DMP to close cached open, and during the next discovery, the device
is opened in read-write mode. If the above steps are skipped then it causes the
DMP device to go in state where one of the paths is in read-write mode and the
others remain in NDELAY mode.

If the above layers request for NORMAL open, then the DMP has the code to close
NDELAY cached open and reopen in NORMAL mode. When the dmpnode is online, this
happens only for one of the paths of dmpnode.
B. DMP performs error analysis for paths on which I/O has failed. In some cases,
the SCSI probes are sent, failed with the return value/sense codes that are not
handled by DMP. This causes the paths to get disabled.

RESOLUTION:
A. The code is modified for the DMP EMC ASL (Array Support Library) to handle
case A for EMC Symmetrix arrays.
B. The DMP code is modified to handle the SCSI conditions correctly for case B.

* 3449714 (Tracking ID: 3417185)

SYMPTOM:
Rebooting the host, after the exclusion of a dmpnode while I/O is in progress on it, leads to the vxconfigd(1M) to dump core.

DESCRIPTION:
The function which deletes the path after exclusion, does not update the corresponding data structures properly. Consequently, rebooting the host, after the exclusion of a dmpnode while I/O is in progress on it, leads to the vxconfigd(1M) to dump core with the following stack:

ddl_find_devno_in_table()
ddl_get_disk_policy()
devintf_add_autoconfig_main()
devintf_add_autoconfig()
mode_set()
req_vold_enable()
request_loop()
main()

RESOLUTION:
The code is modified to update the data structure properly.

* 3452709 (Tracking ID: 3317430)

SYMPTOM:
The vxdiskunsetup(1M) utility reports error during execution with the following error message:
"Device unexport failed: Operation is not supported".

DESCRIPTION:
The vxdisksetup(1M) utility calls the Avxdisk unexportA command without checking if the disk is exported or the CVM protocol version. When the bits are upgraded from 5.1SP1RP4, the CVM protocol version is not updated, which causes the error.

RESOLUTION:
The code is modified so that the Avxdisk unexportA is called after proper checks.

* 3452727 (Tracking ID: 3279932)

SYMPTOM:
The vxdisksetup and vxdiskunsetup utilities were failing on disk which
is part of deported disk group (DG), even if '-f' option is specified.
The vxdisksetup command fails with following error:

VxVM vxedpart ERROR V-5-1-10089 partition modification failed :
Device or resource busy

The vxdiskunsetup command fails following error:

VxVM vxdisk ERROR ERROR V-5-1-0 Device <Disk name> appears to be
owned by disk group <Disk group>. Use -f option to force destroy.
VxVM vxdiskunsetup ERROR V-5-2-5052 <Disk name>: Disk destroy failed.

DESCRIPTION:
The vxdisksetup and vxdiskunsetup utilities internally call the
'vxdisk' utility. Due to a defect in vxdisksetup and vxdiskunsetup, the vxdisk
operation used to fail on disk which is part of deported DG, even if '-f'
operation is requested by user.

RESOLUTION:
Code changes are done to the vxdisksetup and vxdiskunsetup
utilities so that when "-f" option is specified the operation succeeds.

* 3452811 (Tracking ID: 3445120)

SYMPTOM:
Change of the Avol_min_lowme_szA tunable triggers early readback.

DESCRIPTION:
Default value of the Avol_min_lowmem_szA tunable  is not consistent on all platforms.

RESOLUTION:
The code is modified to set the default value of tunable  to 32MB on all platforms.

* 3453105 (Tracking ID: 3079819)

SYMPTOM:
In Cluster Volume Manager(CVM), configuration backup and configuration restore operations fail on FSS disk groups.

DESCRIPTION:
When CVM takes configuration backup and does configuration restoration of disk groups, CVM carries out SCSI inquiry on disks belonging to a disk group. If a disk group has remote, lfailed, or lmissing disks on the specific node of a cluster, SCSI inquiry on such disks fails as node does not have connectivity to such disks. As a result, configuration backup and configuration restore may fail.

RESOLUTION:
The code is modified to address the afore-mentioned issue in the FSS environment, so that CVM only considers the disks physically connected to a node for configuration backup and restore.

* 3453163 (Tracking ID: 3331769)

SYMPTOM:
Latest disk group configuration fails to get restored while the disk group configuration corresponding to the old backup is restored.

DESCRIPTION:
When the configuration backup of a disk group is restored, the private region content gets updated on the disk. If the configuration backup is taken for the first time, the vxconfigbackup(1M) commands reads the data from the disk and updates the buffer
cache in turn. If you modify the disk group configuration and take a backup again within a short span, vxconfigbackup(1M) gets the data from the buffer cache. In such cases, buffer cache has the stale data, and hence vxconfigbackup(1M) backs up old configuration data again.
Vxconfigrestore(1M) fails to restore the latest configuration when it is invoked on the backup.

RESOLUTION:
The code is modified to make sure vxconfigbackup always
queries for the required configuration data from disk and bypass the buffer cache.

* 3455455 (Tracking ID: 3409612)

SYMPTOM:
Fails to run "vxtune reclaim_on_delete_start_time <value>" if the specified
value is
outside the range of 22:00-03:59 (E.g. setting it to 04:00 or 19:30 fails).

DESCRIPTION:
Tunable reclaim_on_delete_start_time can be set to any time value within 00:00
to 23:59. But because of the wrong regular expression to parse time, it cannot
be set to all values in 00:00 - 23:59.

RESOLUTION:
The regular expression has been updated to parse time format correctly.
Now all values in 00:00-23:59 can be set.

* 3456729 (Tracking ID: 3428025)

SYMPTOM:
When heavy parallel I/O load is issued, the system that runs Symantec Replication Option (VVR) and is configured as VVR primary panics with the stack:

vol_alloc()
vol_zalloc()
volsio_alloc_kc_types()
vol_subdisk_iogen_base()
volobject_iogen()
vol_rv_iogen()
volobject_iogen()
vol_rv_batch_write_start()
volkcontext_process()
vol_rv_start_next_batch()
vol_rv_batch_write_done()
[...]
vol_rv_batch_write_done()
volkcontext_process()
vol_rv_start_next_batch()
vol_rv_batch_write_done()
volkcontext_process()
vol_rv_start_next_batch()
vol_rv_batch_kio()
volkiostart()
vol_linux_kio_start()
vol_fsvm_strategy()

DESCRIPTION:
Heavy parallel I/O load leads to I/O throttling in Symantec Replication Option (VVR). Improper throttle handling leads to kernel stack overflow.

RESOLUTION:
The code is modified to handle I/O throttle correctly, which avoids stack overflow and subsequent panic.

* 3458036 (Tracking ID: 3418830)

SYMPTOM:
The node boot-up hangs while starting the vxconfigd(1M)daemon. It happens when the '/etc/VRTSvcs/conf/sysname' file or the '/etc/vx/vxvm-hostprefix' file is not present. You can see the following messages on the console.

# Starting up VxVM
...
# VxVM general startup...

After these messages, the system hangs.

DESCRIPTION:
While generating a unique prefix, we call scanf instead of sscanf for fetching the prefix. So while starting the vxconfigd(1M) daemon, the system waits for some user inputs because of scanf, which results in the hang.

RESOLUTION:
The code is modified to address this issue.

* 3458799 (Tracking ID: 3197987)

SYMPTOM:
When the Avxddladm assign names file=<filename>A command is executed and the file has one or more invalid values for enclosure vendor ID or product ID, the vxconfigd(1M) daemon dumps core.

DESCRIPTION:
When the input file that is provided to the Avxddladm assign names file=<filename>A command has invalid vendor ID or product ID, the vxconfigd(1M) daemon is unable to find the corresponding enclosure that is referred and makes an invalid memory reference. The following stack trace can be seen:

strncasecmp() from /lib/libc.so.6
ddl_load_namefile()
req_ddl_set_names()
request_loop()
main()

RESOLUTION:
The code is modified to make the vxconfigd(1M) daemon verify the validity of input vendor ID and product ID before it makes a memory reference to the corresponding enclosure in its internal data structures.

* 3470346 (Tracking ID: 3377383)

SYMPTOM:
vxconfigd crashes when a disk under DMP reports device failure. After this, the
following error will be seen when a VxVM (Veritas Volume
Manager)command is excuted:-
"VxVM vxdisk ERROR V-5-1-684 IPC failure: Configuration daemon is not
accessible"

DESCRIPTION:
If a disk fails and reports certain failure to DMP (Veritas Dynamic
Multipathing), then vxconfigd crashes because that error is not handled
properly.

RESOLUTION:
The code is modified to properly handle the device failures reported by a failed
disk under DMP.

* 3484547 (Tracking ID: 3383625)

SYMPTOM:
When a cluster node that contributes the storage to the FSS (Flexible Storage Sharing) disk group rejoins the cluster, the local disks brought back by that node do not get reattached.

DESCRIPTION:
When any cluster node that contributes the storage to the FSS (Flexible Storage Sharing) disk group rejoins the cluster, the vxattachd daemon fails to identify the local disks of the joiner and hence the local disks brought back by that node are not available to the other cluster nodes.

RESOLUTION:
The code is modified to identify the disks properly.

* 3498795 (Tracking ID: 3394926)

SYMPTOM:
During cluster reconfiguration, VxVM (Veritas Volume Manager) commands hang intermittently with the below stack
on a slave node in a CVM (Veritas Cluster Volume Manager) environment with Flexible Storage Sharing(FSS) feature in
use:
vol_commit_iowait_objects+
volktcvm_kmsgwait_objects
volktcvm_iolock_wait
vol_ktrans_commit
volconfig_ioctl
volsioctl_real
vols_ioctl
vols_compat_ioctl

DESCRIPTION:
During transaction if a joining node attempts to perform error processing and if it gets aborted then this may
lead to VxVM commands getting hung as I/O drain will not proceed on slave node due to FSS diskgroup on CVM.

RESOLUTION:
Code changes are made to handle this case.

Patch ID: VRTSdbac-6.1.1.200

* 3953865 (Tracking ID: 3951435)

SYMPTOM:
RHEL6.x RETPOLINE kernels and RHEL 6.10 are not supported

DESCRIPTION:
Red Hat has released RHEL 6.10 which has RETPOLINE kernel, and also released RETPOLINE kernels for older RHEL6.x Updates. Veritas Cluster Server kernel modules need to be recompiled with RETPOLINE aware GCC to support RETPOLINE kernel.

RESOLUTION:
Support for RHEL 6.10 and RETPOLINE kernels on RHEL6.x kernels is now introduced.

* 3831498 (Tracking ID: 3850806)

SYMPTOM:
VRTSdbac patch version does not work with RHEL6.7 (2.6.32-573.el6.x86_64 
kernel) and is unable to load the vcsmm module on RHEL6.7.

DESCRIPTION:
Installation of VRTSdbac patch version 6.1.1 fails on RHEL6.7 as 
the VCSMM module is not available on RHEL6.7 kernel 2.6.32-573.el6.x86_64. 
The system log file logs the following messages:
Starting VCSMM: 
ERROR: No appropriate modules found.
Error in loading module "vcsmm". See documentation.
Error : VCSMM driver could not be loaded.
Error : VCSMM could not be started.
Error : VCSMM could not be started.

RESOLUTION:
The VRTSdbac package is re-compiled with RHEL6.7 kernel in the build 
environment to mitigate the failure.

Patch ID: VRTSamf-6.1.1.300

* 3953864 (Tracking ID: 3951435)

SYMPTOM:
RHEL6.x RETPOLINE kernels and RHEL 6.10 are not supported

DESCRIPTION:
Red Hat has released RHEL 6.10 which has RETPOLINE kernel, and also released RETPOLINE kernels for older RHEL6.x Updates. Veritas Cluster Server kernel modules need to be recompiled with RETPOLINE aware GCC to support RETPOLINE kernel.

RESOLUTION:
Support for RHEL 6.10 and RETPOLINE kernels on RHEL6.x kernels is now introduced.

* 3918066 (Tracking ID: 3918061)

SYMPTOM:
Veritas Infoscale Availability does not support Red Hat Enterprise Linux 6 Update 9 
(RHEL6.9).

DESCRIPTION:
Veritas Infoscale Availability does not support RHEL versions later than RHEL6
Update 8.

RESOLUTION:
Veritas Infoscale Availability support for Red Hat Enterprise Linux 6 Update 9
(RHEL6.9) is now introduced.

* 3794198 (Tracking ID: 3794154)

SYMPTOM:
Veritas Cluster Server (VCS) does not support Red Hat Enterprise Linux 6 Update 7
(RHEL6.7).

DESCRIPTION:
VCS did not support RHEL versions released after RHEL6 Update 6.

RESOLUTION:
VCS support for Red Hat Enterprise Linux 6 Update 7 (RHEL6.7) is now introduced.

* 3389433 (Tracking ID: 3338946)

SYMPTOM:
The Process resource fails to register for offline monitoring with the
AMF kernel driver.

DESCRIPTION:
Sometimes when a process offline registration is requested and the
system is under a heavy load, the Asynchronous Monitoring Framework (AMF)
library fails to verify if the resource is actually offline. This failure
results in registration fails.

RESOLUTION:
The AMF library code is modified to handle the failure.

* 3389434 (Tracking ID: 3341320)

SYMPTOM:
The "Cannot delete event (rid %d) in reaper" error message is
repeatedly logged in the Syslog file.

DESCRIPTION:
This message is logged redundantly into Syslog when the agent
calls unregister request to Asynchronous Monitoring Framework (AMF) for an event
whose state has already changed.

RESOLUTION:
The AMF code is modified so that the error message does not appear
in the Syslog.

Patch ID: VRTSvxfen-6.1.1.300

* 3953863 (Tracking ID: 3951435)

SYMPTOM:
RHEL6.x RETPOLINE kernels and RHEL 6.10 are not supported

DESCRIPTION:
Red Hat has released RHEL 6.10 which has RETPOLINE kernel, and also released RETPOLINE kernels for older RHEL6.x Updates. Veritas Cluster Server kernel modules need to be recompiled with RETPOLINE aware GCC to support RETPOLINE kernel.

RESOLUTION:
Support for RHEL 6.10 and RETPOLINE kernels on RHEL6.x kernels is now introduced.

* 3918064 (Tracking ID: 3918061)

SYMPTOM:
Veritas Infoscale Availability does not support Red Hat Enterprise Linux 6 Update 9 
(RHEL6.9).

DESCRIPTION:
Veritas Infoscale Availability does not support RHEL versions later than RHEL6
Update 8.

RESOLUTION:
Veritas Infoscale Availability support for Red Hat Enterprise Linux 6 Update 9
(RHEL6.9) is now introduced.

* 3794198 (Tracking ID: 3794154)

SYMPTOM:
Veritas Cluster Server (VCS) does not support Red Hat Enterprise Linux 6 Update 7
(RHEL6.7).

DESCRIPTION:
VCS did not support RHEL versions released after RHEL6 Update 6.

RESOLUTION:
VCS support for Red Hat Enterprise Linux 6 Update 7 (RHEL6.7) is now introduced.

* 3370486 (Tracking ID: 3031216)

SYMPTOM:
1. If the specified disk group name contains a dash (-), (e.g. vxfendg-vmax) 
and there is another disk group in the system (e.g. vxfendg) whose name matches
the characters to the left of the dash in the specified disk group name, the
vxfentsthdw(1M) utility fails.

2. If the current disk group name contains a dash (-), (e.g. vxfendg-vmax) and
the replacement disk group name (e.g. vxfendg) matches the characters to the
left of the dash in the current disk group name, the vxfenswap(1M) utility fails
with the following message:
vxfenconfig ERROR V-11-2-1004 There must be an odd number of coordinator 
disks 
defined

DESCRIPTION:
1. Under condition 1, the "vxfentsthdw -g ||replacementDiskGroup!!" command 
attempts to test all the disks in both the replacement disk group and the 
current fencing disk group.  Since the disks in the current disk group are in 
use and have keys, the utility fails, which requests the user to shut down 
fencing and continue.

2. Under condition 2, the "vxfenswap -g ||replacementDiskGroup!!" command fails 
to validate the new coordination points because it conflates the disks in the 
replacement disk group with the disks in the original disk group, which leads 
to an even number of coordinator disks.

RESOLUTION:
The code is modified to correctly determine the disks of the disk
group which has a dash (-) in its name.

* 3530046 (Tracking ID: 3471571)

SYMPTOM:
Cluster nodes may panic if you stop the HAD process by force on a node and 
reboot or shutdown that node.

DESCRIPTION:
When you execute the following steps on any of the nodes in a cluster, the 
kernel may panic on any of the nodes in the cluster:

1.      Stop the HAD process with the force flag:
# hastop -local -force or
# hastop -all -force

2.      Reboot or shut down the node.

Cluster nodes may panic because forcefully stopping VCS on a node leaves all 
the applications, file systems, CVM, and other processes on that node online. 
If you reboot the node with applications and processes in online state, VCS 
triggers a fencing race  to avoid data corruption. The sub-cluster that loses 
the race panics. To ensure that rebooted node loses the race, increase LLT 
peerinact timeout on that node by running the lltconfig command.

RESOLUTION:
Symantec has fixed this issue such that rebooted node loses the 
fencing race and panics. The fix prevents the case where the rebooted node can 
win the fencing race and therefore cause other nodes in the cluster to panic.

* 3532861 (Tracking ID: 3532859)

SYMPTOM:
The Coordination Point (Coordpoint) agent monitor fails if the cluster
includes more than 15 disks, or disks in raw mode with more than 15 paths, or 15
disks and 1 or more Coordination Point Servers.

DESCRIPTION:
The Coordpoint agent issue is caused because the data used by the
agent about the coordination points is acquired using the vxfenconfig -L(1M)
command which is not able to handle these many coordination points and prints
invalid message.

RESOLUTION:
The code is modified to fix this issue.

Patch ID: VRTSgab-6.1.0.400

* 3953862 (Tracking ID: 3951435)

SYMPTOM:
RHEL6.x RETPOLINE kernels and RHEL 6.10 are not supported

DESCRIPTION:
Red Hat has released RHEL 6.10 which has RETPOLINE kernel, and also released RETPOLINE kernels for older RHEL6.x Updates. Veritas Cluster Server kernel modules need to be recompiled with RETPOLINE aware GCC to support RETPOLINE kernel.

RESOLUTION:
Support for RHEL 6.10 and RETPOLINE kernels on RHEL6.x kernels is now introduced.

* 3918063 (Tracking ID: 3918061)

SYMPTOM:
Veritas Infoscale Availability does not support Red Hat Enterprise Linux 6 Update 9 
(RHEL6.9).

DESCRIPTION:
Veritas Infoscale Availability does not support RHEL versions later than RHEL6
Update 8.

RESOLUTION:
Veritas Infoscale Availability support for Red Hat Enterprise Linux 6 Update 9
(RHEL6.9) is now introduced.

* 3794198 (Tracking ID: 3794154)

SYMPTOM:
Veritas Cluster Server (VCS) does not support Red Hat Enterprise Linux 6 Update 7
(RHEL6.7).

DESCRIPTION:
VCS did not support RHEL versions released after RHEL6 Update 6.

RESOLUTION:
VCS support for Red Hat Enterprise Linux 6 Update 7 (RHEL6.7) is now introduced.

* 3728108 (Tracking ID: 3728106)

SYMPTOM:
On Linux, the value corresponding to 15 minute CPU load average increases to about 4 even when clients of GAB are not running and the CPU usage is relatively low.

DESCRIPTION:
GAB reads the count of currently online CPUs to correctly adapt the client heartbeat timeout with the system load. Due to an issue, it accidentally overwrites the kernel's load average value. As a result, even though the actual CPU usage does not increase, the value
that is observed from /proc/loadavg for the 15 minute CPU load average is 
increased.

RESOLUTION:
The code is modified so that the GAB module does not overwrite the kernel's load average value.

Patch ID: VRTSllt-6.1.1.500

* 3953861 (Tracking ID: 3951435)

SYMPTOM:
RHEL6.x RETPOLINE kernels and RHEL 6.10 are not supported

DESCRIPTION:
Red Hat has released RHEL 6.10 which has RETPOLINE kernel, and also released RETPOLINE kernels for older RHEL6.x Updates. Veritas Cluster Server kernel modules need to be recompiled with RETPOLINE aware GCC to support RETPOLINE kernel.

RESOLUTION:
Support for RHEL 6.10 and RETPOLINE kernels on RHEL6.x kernels is now introduced.

* 3918062 (Tracking ID: 3918061)

SYMPTOM:
Veritas Infoscale Availability does not support Red Hat Enterprise Linux 6 Update 9 
(RHEL6.9).

DESCRIPTION:
Veritas Infoscale Availability does not support RHEL versions later than RHEL6
Update 8.

RESOLUTION:
Veritas Infoscale Availability support for Red Hat Enterprise Linux 6 Update 9
(RHEL6.9) is now introduced.

* 3889639 (Tracking ID: 3877459)

SYMPTOM:
Veritas Cluster Server (VCS) does not support Red Hat Enterprise Linux 6 Update 
8
(RHEL6.8).

DESCRIPTION:
VCS did not support RHEL versions released after RHEL6 Update 7.

RESOLUTION:
VCS support for Red Hat Enterprise Linux 6 Update 8 (RHEL6.8) is now introduced.

* 3794198 (Tracking ID: 3794154)

SYMPTOM:
Veritas Cluster Server (VCS) does not support Red Hat Enterprise Linux 6 Update 7
(RHEL6.7).

DESCRIPTION:
VCS did not support RHEL versions released after RHEL6 Update 6.

RESOLUTION:
VCS support for Red Hat Enterprise Linux 6 Update 7 (RHEL6.7) is now introduced.

* 3646467 (Tracking ID: 3642131)

SYMPTOM:
Low Latency Transport (LLT) fails to start on Red Hat Enterprise Linux 
(RHEL) 6 Update 6.

DESCRIPTION:
On RHEL 6.6, LLT fails to start due to kABI incompatibility. The 
following error appears: 

# rpm -ivh VRTSllt-6.1.1.000-RHEL6.x86_64.rpm 
Preparing...                ########################################### [100%]
   1:VRTSllt                ########################################### [100%]

# /etc/init.d/llt start
Starting LLT: 
LLT: loading module...
ERROR: No appropriate modules found.
Error in loading module "llt". See documentation.
LLT:Error: cannot find compatible module binary

Or after OS update, the following messages will be logged in the system log 
file: 

kernel: llt: disagrees about version of symbol ib_create_cq
kernel: llt: Unknown symbol ib_create_cq
kernel: llt: disagrees about version of symbol rdma_resolve_addr
kernel: llt: Unknown symbol rdma_resolve_addr
kernel: llt: disagrees about version of symbol ib_dereg_mr
kernel: llt: Unknown symbol ib_dereg_mr
kernel: llt: disagrees about version of symbol rdma_reject
kernel: llt: Unknown symbol rdma_reject
kernel: llt: disagrees about version of symbol rdma_disconnect
kernel: llt: Unknown symbol rdma_disconnect
kernel: llt: disagrees about version of symbol rdma_resolve_route
kernel: llt: Unknown symbol rdma_resolve_route
kernel: llt: disagrees about version of symbol rdma_bind_addr
kernel: llt: Unknown symbol rdma_bind_addr
kernel: llt: disagrees about version of symbol rdma_create_qp
kernel: llt: Unknown symbol rdma_create_qp

RESOLUTION:
VRTSllt package now includes RHEL 6.6 compatible kernel module.

* 3376505 (Tracking ID: 3410309)

SYMPTOM:
The Low Latency Transport (LLT) driver fails to load and logs the
following message in the syslog when a mismatch is observed in the RDMA-specific
symbols.
llt: disagrees about version of symbol rdma_connect
llt: Unknown symbol rdma_connect
llt: disagrees about version of symbol rdma_destroy_id
llt: Unknown symbol rdma_destroy_id

DESCRIPTION:
The LLT driver fails to load when an external Open Fabrics 
Enterprise Distribution (OFED) (OFA or MLNX_OFED) stack is installed on a 
system. The OFED replaces the native RDMA-related drivers (shipped with the OS)
with the external OFED drivers. Since LLT is built against the native 
RDMA drivers, LLT fails due to a symbol mismatch when the startup script tries
to load LLT.

RESOLUTION:
The code is modified in the LLT startup script to detect if any
external OFED is installed on the system. If the script detects an external 
OFED, then it loads an LLT driver (without the RDMA symbols) of a non-RDMA
version. Since this LLT does not contain RDMA-specific symbols, the LLT 
driver successfully loads. However, the LLT driver does not have the RDMA 
functionality. In this case, LLT can be used either in Ethernet or in a UDP 
mode.

* 3458677 (Tracking ID: 3460985)

SYMPTOM:
The system panics and logs the following error message in the syslog: 
kernel:BUG: soft lockup - CPU#8
stuck for 67s! [llt_rfreem:7125]

DESCRIPTION:
With the LLT-RDMA feature, Low Latency Transport (LLT) threads 
can be busy for a longer period of time without scheduling out of CPU 
depending on workload. The Linux kernel may falsely detect this activity as 
an unresponsive task or a soft lockup, thus causing the panic. This issue is 
observed in the LLT threads, such as lltd, llt_rdlv and llt_rfreem.

RESOLUTION:
The code is modified in order to avoid false alarms for soft
lockups. The LLT threads now schedule themselves out of the CPU after the
specified time, which is configured by default based on the 
watchdog_threshold/softlockup_threshold value in the system configuration. 
The default value can be changed using the lltconfig utility (lltconfig -T 
yieldthresh:value). This value decides when the LLT threads schedule 
themselves out.

Patch ID: VRTSglm-6.1.1.100

* 3953565 (Tracking ID: 3951759)

SYMPTOM:
GLM support for RHEL6.10 and RHEL6.x retpoline kernels

DESCRIPTION:
The RHEL6.10 is new release and it has retpoline kernel. Also
redhat released retpoline kernel for older RHEL6.x releases. The GLM module 
should recompile with retpoline aware GCC to support retpoline
kernel.

RESOLUTION:
Compiled GLM with Retpoline GCC.

Patch ID: VRTSgms-6.1.1.100

* 3953566 (Tracking ID: 3951761)

SYMPTOM:
GMS support for RHEL6.10 and RHEL6.x retpoline kernels

DESCRIPTION:
The RHEL6.10 is new release and it has Retpoline kernel. Also
redhat released retpoline kernel for older RHEL6.x releases. The GMS module 
should recompile with retpoline aware GCC to support retpoline kernel.

RESOLUTION:
Compiled GMS with Retpoline GCC.

Patch ID: VRTSodm-6.1.1.500

* 3953564 (Tracking ID: 3951754)

SYMPTOM:
ODM support for RHEL6.10 and RHEL6.x retpoline kernels

DESCRIPTION:
The RHEL6.10 is new release and it has Retpoline kernel. Also
redhat released retpoline kernel for older RHEL6.x releases. The ODM module 
should recompile with retpoline aware GCC to support retpoline
kernel.

RESOLUTION:
Compiled ODM with Retpoline GCC.

* 3864151 (Tracking ID: 3716577)

SYMPTOM:
Customer upgraded from SFRAC 6.0.1 to 6.1.1 and tried creating INDEX of
100k+ BLOB which failed with error like this:
Errors in file <file name> 
ORA-00600: internal error code, arguments: [ksfd_odmio1], [0x7F5FBBF8FF70], 
[0x7F5FBBF8FF98], [1], [ODM ERROR V-41-4-1-354-22 Invalid argument], [], [], 
[], 
[], [], [], []
...

DESCRIPTION:
The process in question got a SIGALARM during a clone() system call, which
caused the clone() call to back out of what it was doing.  But this backing out
of the partially-done clone() call has confused ODMs process-exit-detection
mechanism, causing it to think that the process calling clone() is the one being
torn down.  ODM tears down its kernel structures for that process, and whenever
it next tries to issue an i/o (more than 30 minutes later in this case), it will
get the error.

RESOLUTION:
Fixed the code, so that a failed clone will not confuse ODM.

* 3864248 (Tracking ID: 3757609)

SYMPTOM:
High CPU usage because of contention over ODM_IO_LOCK

DESCRIPTION:
While performing ODM IO, to update some of the ODM counters we take
ODM_IO_LOCK which leads to contention from multiple  of iodones trying to update
 these counters at the same time. This is results in high CPU usage.

RESOLUTION:
Code modified to remove the lock contention.

* 3466074 (Tracking ID: 3349649)

SYMPTOM:
Oracle Disk Manager (ODM) module fails to load on RHEL6.5 with the following system log error message:

kernel: vxodm: disagrees about version of symbol putname
kernel: vxodm: disagrees about version of symbol getname

DESCRIPTION:
In RHEL6.5, the kernel interfaces for AgetnameA and AputnameA used by VxFS have changed.

RESOLUTION:
The code is modified to use the latest definitions of AgetnameA and AputnameA kernel interfaces.

* 3507583 (Tracking ID: 3521933)

SYMPTOM:
Internal conformance testing with SSD cache enabled fails.

DESCRIPTION:
When the SSD cache is enabled, failure is observed due to a mirror read error for ActlA file type. The mirror read fails due to Are-silvering supportA restriction based on the combination of oracle version such as 10gR2, 11g, etc.

RESOLUTION:
The code is modified such that the re-silvering support is independent of oracle versions.

* 3527969 (Tracking ID: 3529371)

SYMPTOM:
The package verification of VRTSodm on Linux using the rpm(1M) command with A-V:optionA fails.

DESCRIPTION:
The package verification fails due to permission mode of /dev/odm directory after odm device is mounted differs from entry in rpm database.

RESOLUTION:
The Creation time permission mode for /dev/odm directory is corrected.

* 3424619 (Tracking ID: 3349649)

SYMPTOM:
ODM modules fail to load on RHEL6.5 and following error messages are
reported in system log.
kernel: vxodm: disagrees about version of symbol putname
kernel: vxodm: disagrees about version of symbol getname

DESCRIPTION:
In RHEL6.5 the kernel interfaces for getname and putname used by
VxFS have changed.

RESOLUTION:
Code modified to use latest definitions of getname and putname
kernel interfaces.

Patch ID: VRTSvxfs-6.1.1.500

* 3953563 (Tracking ID: 3951752)

SYMPTOM:
VxFS support for RHEL6.10 and RHEL6.x retpoline kernels

DESCRIPTION:
The RHEL6.10 is new release and it has Retpoline kernel. Also
redhat released retpoline kernel for older RHEL6.x releases. The VxFS module 
should recompile with retpoline aware GCC to support retpoline
kernel.

RESOLUTION:
Compiled VxFS with retpoline GCC.

* 3652109 (Tracking ID: 3553328)

SYMPTOM:
During internal testing it was found that per node LCT file was
corrupted, due to which attribute inode reference counts were mismatching,
resulting in fsck failure.

DESCRIPTION:
During clone creation LCT from 0th pindex is copied to the new
clone's LCT. Any update to this LCT file from non-zeroth pindex can cause count
mismatch in the new fileset.

RESOLUTION:
The code is modified to handle this issue.

* 3729811 (Tracking ID: 3719523)

SYMPTOM:
'vxupgrade' does not clear the superblock replica of old layout versions.

DESCRIPTION:
While upgrading the file system to a new layout version, a new superblock inode is allocated and an extent is allocated for the replica superblock. After writing the new superblock (primary + replica), VxFS frees the extent of the old superblock replica.
Now, if the primary superblock corrupts, the full fsck searches for replica to repair the file system. If it finds the replica of old superblock, it restores the file system to the old layout, instead of creating a new one. This behavior is wrong.
In order to take the file system to a new version, we should clear the replica of old superblock as part of vxupgrade, so that full fsck won't detect it later.

RESOLUTION:
Clear the replica of old superblock as part of vxupgrade.

* 3765326 (Tracking ID: 3736398)

SYMPTOM:
Panic occurs in the lazy unmount path during deinit of VxFS-VxVM API.

DESCRIPTION:
The panic occurs when an exiting thread drops the last reference to 
a lazy-unmounted VxFS file system which is the last VxFS mount on the system. The 
exiting thread does unmount, which then makes call into VxVM to de-initialize the 
private FS-VM API as it is the last VxFS mounted file system. The function to be 
called in VxVM is looked-up via the files under /proc. This requires a file to be 
opened, but the exit processing has removed the structures needed by the thread 
to open a file, because of which a panic is observed

RESOLUTION:
The code is modified to pass the deinit work to worker thread.

* 3852733 (Tracking ID: 3729158)

SYMPTOM:
The fuser and other commands hang on VxFS file systems.

DESCRIPTION:
The hang is seen while 2 threads contest for 2 locks -ILOCK and 
PLOCK. The writeadvise thread owns the ILOCK but is waiting for the PLOCK, 
while the dalloc thread owns the PLOCK and is waiting for the ILOCK.

RESOLUTION:
The code is modified to correct the order of locking. Now PLOCK is 
followed by ILOCK.

* 3852736 (Tracking ID: 3457801)

SYMPTOM:
Kernel panics in block_invalidatepage().

DESCRIPTION:
The address-space struct of a page has "a_ops" as "vx_empty_aops".  
This is an empty structure, so do_invalidatepage() calls block_invalidatepage() - 
but these pages have VxFS's page buffer-heads attached, not kernel buffer-heads.  
So, block_invalidatepage() panics.

RESOLUTION:
Code is modified to fix this by flushing pages before 
vx_softcnt_flush.

* 3861521 (Tracking ID: 3549057)

SYMPTOM:
The "relatime" mount option wrongly shown in /proc/mounts.

DESCRIPTION:
The "relatime" mount option wrongly shown in /proc/mounts. VxFS does
not understand relatime mount option. It comes from Linux kernel.

RESOLUTION:
Code is modified to handle the issue.

* 3864007 (Tracking ID: 3558087)

SYMPTOM:
When stat system call is executed on VxFS File System with delayed
allocation feature enabled, it may take long time or it may cause high cpu
consumption.

DESCRIPTION:
When delayed allocation (dalloc) feature is turned on, the
flushing process takes much time. The process keeps the get page lock held, and
needs writers to keep the inode reader writer lock held. Stat system call may
keeps waiting for inode reader writer lock.

RESOLUTION:
Delayed allocation code is redesigned to keep the get page lock
unlocked while flushing.

* 3864010 (Tracking ID: 3269553)

SYMPTOM:
VxFS returns inappropriate message for read of hole via ODM.

DESCRIPTION:
Sometimes sparse files containing temp or backup/restore files are
created outside the Oracle database. And, Oracle can read these files only using
the ODM. As a result, ODM fails with an ENOTSUP error.

RESOLUTION:
The code is modified to return zeros instead of an error.

* 3864013 (Tracking ID: 3811849)

SYMPTOM:
System panics due to size mismatch in the cluster-wide buffers containing hash bucket data. Offending stack looks like below:

   $cold_vm_hndlr
   bubbledown
   as_ubcopy
   vx_populate_bpdata
   vx_getblk_clust
   $cold_vx_getblk
   vx_exh_getblk
   vx_exh_get_bucket
   vx_exh_lookup
   vx_dexh_lookup
   vx_dirscan
   vx_dirlook
   vx_pd_lookup
   vx_lookup_pd
   vx_lookup
   lookupname
   lstat
   syscall

On some platforms, instead of panic, LDH corruption can be reported. Full fsck can report some meta-data inconsistencies, which looks like the 
below sample messages: 

fileset 999 primary-ilist inode 263 has invalid alternate directory index
        (fileset 999 attribute-ilist inode 8193), clear index? (ynq)y
fileset 999 primary-ilist inode 29879 has invalid alternate directory index
        (fileset 999 attribute-ilist inode 8194), clear index? (ynq)y
fileset 999 primary-ilist inode 1070691 has invalid alternate directory 
index
        (fileset 999 attribute-ilist inode 24582), clear index? (ynq)y
fileset 999 primary-ilist inode 1262102 has invalid alternate directory 
index
        (fileset 999 attribute-ilist inode 8198), clear index? (ynq)y

DESCRIPTION:
On a very fragmented file system with FS block sizes 1K, 2K or 4K, any segment of the hash inode (i.e. buckets/CDF/directory segment with fixed size: 8K) can 
spread across multiple extents.

Instead of initializing the buffers on the final bmap after all allocations are finished, LDH code allocates the buffer-cache buffers as the allocations come along.As a result, small allocations can be merged in final bmap, e.g. two CFS nodes can end up having buffers representing same metadata, with different sizes. This leads to panics because the buffers are passed around the cluster or the corruption reaches LDH portions on the disk.

RESOLUTION:
The code is modified to separate the allocation and buffer initialization in LDH code paths.

* 3864035 (Tracking ID: 3790721)

SYMPTOM:
High CPU usage on the vxfs thread process. The backtrace of such kind of threads
usually look like this:

schedule
schedule_timeout
__down
down
vx_send_bcastgetemapmsg_remaus
vx_send_bcastgetemapmsg
vx_recv_getemapmsg
vx_recvdele
vx_msg_recvreq
vx_msg_process_thread
vx_kthread_init
kernel_thread

DESCRIPTION:
The locking mechanism in vx_send_bcastgetemapmsg_process() is inefficient. So that
every
time vx_send_bcastgetemapmsg_process() is called, it will perform a series of
down-up
operation on a certain semaphore. This can result in a huge CPU cost when multiple
threads have contention on this semaphore.

RESOLUTION:
Optimize the locking mechanism in vx_send_bcastgetemapmsg_process(),
so that it only do down-up operation on the semaphore once.

* 3864036 (Tracking ID: 3233276)

SYMPTOM:
On a 40 TB file system, the fsclustadm setprimary command consumes more than 2 minutes for execution. And, the unmount operation consumes more time causing a primary migration.

DESCRIPTION:
The old primary needs to process the delegated allocation units while migrating
from primary to secondary. The inefficient implementation of the allocation unit
list is consuming more time while removing the element from the list. As the file system size increases, the allocation unit list also increases, which results in additional migration time.

RESOLUTION:
The code is modified to process the allocation unit list efficiently. With this modification, the primary migration is completed in 1 second on the 40 TB file system.

* 3864037 (Tracking ID: 3616907)

SYMPTOM:
While performing the garbage collection operation, VxFS causes the non-maskable 
interrupt (NMI) service to stall.

DESCRIPTION:
With a highly fragmented Reference Count Table (RCT), when a garbage collection 
operation is performed, the CPU could be used for a longer duration. The CPU 
could be busy if a potential entry that could be freed is not identified.

RESOLUTION:
The code is modified such that the CPU is released after a when it is idle 
after a specified time interval.

* 3864038 (Tracking ID: 3596329)

SYMPTOM:
System panic in aio codepaths

DESCRIPTION:
On Linux, there has always been an issue with threads exiting with async DIOs 
inflight.  With the kernel's export restrictions, it is not possible for VxFS 
to take/drop a hold on a thread's mm_struct.  This leads to the issue where 
VxFS can use the mm_struct after it has been destroyed (due to thread exit).
On RHEL7, the issue is worse in that no threads (cloned and non-cloned) wait 
for any AIO before the mm_struct/task_struct is destroyed.  This can lead to 
panic and memory corruption (as VxFS continues to use stale pointers).

RESOLUTION:
This code is to fix the reference of destoryed structs in VxFS for AIO, 
while having good IO throughput.

* 3864040 (Tracking ID: 3633683)

SYMPTOM:
"top" command output shows vxfs thread consuming high CPU while 
running an application that makes excessive sync() calls.

DESCRIPTION:
To process sync() system call vxfs scans through inode cache 
which is a costly operation. If an user application is issuing excessive 
sync() calls and there are vxfs file systems mounted, this can make vxfs 
sync 
processing thread to consume high CPU.

RESOLUTION:
Combine all the sync() requests issued in last 60 second into a 
single request.

* 3864041 (Tracking ID: 3613048)

SYMPTOM:
System can panic with the following stack::
 
machine_kexec
crash_kexec
oops_end
die
do_invalid_op
invalid_op
aio_complete
vx_naio_worker
vx_kthread_init

DESCRIPTION:
VxFS does not correctly support IOCB_CMD_PREADV and IOCB_CMD_PREADV, which 
causes a BUG to fire in the kernel code (in fs/aio.c:__aio_put_req()).

RESOLUTION:
Add support for the vectored AIO commands and fixed the increment of ->ki_users 
so it is guarded by the required spinlock.

* 3864042 (Tracking ID: 3466020)

SYMPTOM:
File system is corrupted with the following error message in the log:

WARNING: msgcnt 28 mesg 008: V-2-8: vx_direrr: vx_dexh_keycheck_1 - /TraceFile
file system dir inode 3277090 dev/block 0/0 diren
 WARNING: msgcnt 27 mesg 008: V-2-8: vx_direrr: vx_dexh_keycheck_1 - /TraceFile
file system dir inode 3277090 dev/block 0/0 diren
 WARNING: msgcnt 26 mesg 008: V-2-8: vx_direrr: vx_dexh_keycheck_1 - /TraceFile
file system dir inode 3277090 dev/block 0/0 diren
 WARNING: msgcnt 25 mesg 096: V-2-96: vx_setfsflags -
 /dev/vx/dsk/a2fdc_cfs01/trace_lv01 file system fullfsck flag set - vx_direr
 WARNING: msgcnt 24 mesg 008: V-2-8: vx_direrr: vx_dexh_keycheck_1 - /TraceFile
file system dir inode 3277090 dev/block 0/0 diren

DESCRIPTION:
In case an error is returned from the vx_dirbread() function via the 
vx_dexh_keycheck1() function, the FULLFSCK flag is set on the file system 
unconditionally. A corrupted Large Directory Hash (LDH) can lead to the 
incorrect block being read, this results in the FULLFSCK flag being set. The 
system does not verify whether it reads the incorrect value due to a corrupted 
LDH. Subsequently, the FULLFSCK flag is set unnecessarily, because a corrupted 
LDH is fixed online by recreating the hash.

RESOLUTION:
The code is modified such that when a LDH corruption is detected, the system 
removes the LDH, instead of setting FULLFSCK. The LDH is recreated the next 
time the directory is modified.

* 3864146 (Tracking ID: 3690078)

SYMPTOM:
The system panics at vx_dev_strategy() routine with the following stack trace:
vx_snap_strategy()
vx_logbuf_write() 
vx_logbuf_io()
vx_logbuf_flush() 
vx_logflush()
vx_mapstrategy() 
vx_snap_strategy() 
vx_clonemap() 
vx_unlockmap() 
vx_holdmap() 
vx_extmaptran() 
vx_extmapchange() 
vx_extprevfind() 
vx_extentalloc() 
vx_te_bmap_alloc() 
vx_bmap_alloc_typed() 
vx_bmap_alloc() 
vx_get_alloc() 
vx_cfs_pagealloc() 
vx_alloc_getpage() 
vx_do_getpage() 
vx_internal_alloc() 
vx_write_alloc() 
vx_write1() 
vx_write_common_slow()
vx_write_common() 
vx_vop_write()
vx_writev() 
vx_naio_write_v2() 
vfs_writev()

DESCRIPTION:
The issue was observed due to low handoff limit of vx_extprevfind.

RESOLUTION:
The code is modified  to avoid the stack overflow.

* 3864148 (Tracking ID: 3695367)

SYMPTOM:
Unable to remove volume from multi-volume VxFS using "fsvoladm" command. It fails with "Invalid argument" error.

DESCRIPTION:
Volumes are not being added in the in-core volume list structure correctly. Therefore while removing volume from multi-volume VxFS using "fsvoladm", command fails.

RESOLUTION:
The code is modified to add volumes in the in-core volume list structure correctly.

* 3864150 (Tracking ID: 3602322)

SYMPTOM:
System may panic while flushing the dirty pages of the inode.

DESCRIPTION:
Panic may occur due to the synchronization problem between one
thread that flushes the inode, and the other thread that frees the chunks that
contain the inodes on the freelist. 

The thread that frees the chunks of inodes on the freelist grabs an inode, and 
clears/de-reference the inode pointer while deinitializing the inode. This may 
result in the pointer de-reference, if the flusher thread is working on the 
same
inode.

RESOLUTION:
The code is modified to resolve the race condition by taking proper
locks on the inode and freelist, whenever a pointer in the inode is de-
referenced. 

If the inode pointer is already de-initialized to NULL, then the flushing is 
attempted on the next inode.

* 3864153 (Tracking ID: 3685391)

SYMPTOM:
Execute permissions for a file not honored correctly.

DESCRIPTION:
The user was able to execute the file regardless of not having the execute permissions.

RESOLUTION:
The code is modified such that an error is reported when the execute permissions are not applied.

* 3864154 (Tracking ID: 3689354)

SYMPTOM:
Users having write permission on file cannot open the file with O_TRUNC
if the file has setuid or setgid bit set.

DESCRIPTION:
On Linux, kernel triggers an explicit mode change as part of
O_TRUNC processing to clear setuid/setgid bit. Only the file owner or a
privileged user is allowed to do a mode change operation. Hence for a
non-privileged user who is not the file owner, the mode change operation fails
making open() system call to return EPERM.

RESOLUTION:
Mode change request to clear setuid/setgid bit coming as part of
O_TRUNC processing is allowed for other users.

* 3864155 (Tracking ID: 3707662)

SYMPTOM:
Race between reorg processing and fsadm timer thread (alarm expiry) leads to panic in vx_reorg_emap with the following stack::

vx_iunlock
vx_reorg_iunlock_rct_reorg
vx_reorg_emap
vx_extmap_reorg
vx_reorg
vx_aioctl_full
vx_aioctl_common
vx_aioctl
vx_ioctl
fop_ioctl
ioctl

DESCRIPTION:
When the timer expires (fsadm with -t option), vx_do_close() calls vx_reorg_clear() on local mount which performs cleanup on reorg rct inode. Another thread currently active in vx_reorg_emap() will panic due to null pointer dereference.

RESOLUTION:
When fop_close is called in alarm handler context, we defer the cleaning up untill the kernel thread performing reorg completes its operation.

* 3864156 (Tracking ID: 3662284)

SYMPTOM:
File Change Log (FCL) read may retrun ENXIO as follows:

# file changelog 
changelog: ERROR: cannot read `changelog' (No such device or address)

DESCRIPTION:
VxFS reads FCL file and returns ENXIO when there is a HOLE in the file.

RESOLUTION:
The code is modified to zero out the user buffer when hitting a hole if FCL read
is from user space.

* 3864160 (Tracking ID: 3691633)

SYMPTOM:
Remove RCQ Full messages

DESCRIPTION:
Too many unnecessary RCQ Full messages were logging in the system log.

RESOLUTION:
The RCQ Full messages removed from the code.

* 3864161 (Tracking ID: 3708836)

SYMPTOM:
When using fallocate together with delayed extending write, data corruption may happen.

DESCRIPTION:
When doing fallocate after EOF, vxfs grows the file by splitting the last extent of the file into two parts, then converts the part after EOF to a ZFOD extent. During this procedure, a stale file size is used to calculate the start offset of the newly zeroed extent. This may overwrite the blocks which contain the unflushed data generated by the extending write and cause data corruption.

RESOLUTION:
The code is modified to use up-to-date file size instead of the stale file size, to make sure the new ZFOD extent is created correctly.

* 3864163 (Tracking ID: 3712961)

SYMPTOM:
SFCFS cluster with ODM panics by the following steps:
1)	use RDMA heartbeat for LLT
2)	use FSS
3)	disconnect one LLT link, one machine will panic

Panic stack is as follows:

vx_dio_physio
vx_dio_rdwri
fdd_write_end at
fdd_rw
fdd_odm_rw
odm_vx_io
odm_io_start
odm_io_req
odm_io
odm_io_stat
odm_ioctl_ctl
odm_ioctl_ctl_unlocked
vfs_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath

DESCRIPTION:
VxFS detects stack overflow and calls BUG_ON() to panic the kernel. The stack
overflow is detected right after VxFS submits the I/O to lower layer. The stack
overflow does not happen in VxFS, but somewhere in lower layers.

RESOLUTION:
There are two kernel parameters which can be used to resolve this issue. By
configuring these two parameters, a thread hand-off can be added before
submitting the I/Os to VxVM when there's no sufficient stack space left. These
parameters are not run time parameters. These can be set at module load time only.

The following two module parameters need to be configured for this solution:

1. vxfs_io_proxy_vxvm  - 	If set, include VxVM devices in I/O hand-off decisions.

2. vxfs_io_proxy_level - 	When free stack space is less than this, an I/O is
handed-off to a proxy. The default value of vxfs_io_proxy_level is 4K bytes.

Set above VxFS kernel parameters using .conf file as follows.

a) Create  vxfs.conf file inside /etc/modprobe.d directory.
    
      touch /etc/modprobe.d/vxfs.conf
       
b) Now copy the following lines into vxfs.conf file.
   options vxfs vxfs_io_proxy_vxvm=1
   options vxfs vxfs_io_proxy_level=6144

Check through crash debugger or through any other debugger whether these values
has been set or not.

E.g

crash> vxfs_io_proxy_vxvm
vxfs_io_proxy_vxvm = $3 = 1
crash> vxfs_io_proxy_level
vxfs_io_proxy_level = $4 = 6144

* 3864164 (Tracking ID: 3762125)

SYMPTOM:
Directory size sometimes keeps increasing even though the number of files inside it doesn't 
increase.

DESCRIPTION:
This only happens to CFS. A variable in the directory inode structure marks the start of 
directory free space. But when the directory ownership changes, the variable may become stale, which 
could cause this issue.

RESOLUTION:
The code is modified to reset this free space marking variable when there's 
ownershipchange. Now the space search goes from beginning of the directory inode.

* 3864166 (Tracking ID: 3731844)

SYMPTOM:
umount -r option fails for vxfs 6.2 with error "invalid options"

DESCRIPTION:
Till 6.2 vxfs did not have a umount helper on linux. We added a helper in 6.2,
because of this, each call to linux's umount also gets called to the umount
helper binary. Due to this the -r option, which was only handled by the linux
native umount, is forwarded to the umount.vxfs helper, which exits while
processing the option string becase we don't support readonly remounts.

RESOLUTION:
To solve this, we've changed the umount.vxfs code to not exit on
"-r" option, although we do not support readonly remounts, so if umount -r
actually fails and the os umount attempts a readonly remount, the mount.vxfs
binary will then exit with an error. This solves the problem of linux's default
scripts not working for our fs.

* 3864167 (Tracking ID: 3735697)

SYMPTOM:
vxrepquota reports error like,
# vxrepquota -u /vx/fs1
UX:vxfs vxrepquota: ERROR: V-3-20002: Cannot access 
/dev/vx/dsk/sfsdg/fs1:ckpt1: 
No such file or directory
UX:vxfs vxrepquota: ERROR: V-3-24996: Unable to get disk layout version

DESCRIPTION:
vxrepquota checks each mount point entry in mounted file system 
table. If any checkpoint mount point entry presents before the mount point 
specified in the vxrepquota command, vxrepquota will report errors, but the 
command can succeed.

RESOLUTION:
Skip checkpoint mount point in the mounted file system table.

* 3864170 (Tracking ID: 3743572)

SYMPTOM:
File system may get hang when reaching 1 billion inode limit, the 
hung stack is as following:

vx_svar_sleep_unlock
vx_event_wait
vx_async_waitmsg
vx_msg_send
llt_msgalloc
vx_cfs_getias
vx_update_ilist
vx_find_partial_au
vx_cfs_noinode
vx_noinode
vx_dircreate_tran
vx_pd_create
vx_dirlook
vx_create1_pd
vx_create1
vx_create_vp
vx_create

DESCRIPTION:
The maximum number of inodes supported by VxFS is 1 billion. 
When the file system is running out of inodes and the maximum inode 
allocation unit(IAU) limit is reached, VxFS can still create two extra IAUs 
if there is a hole in the last IAU. Because of the hole, when a secondary 
requests more inodes, the primary still thinks there is a hole available and 
notifies the secondary to retry. However, the secondary fails to find a slot 
since the 1 billion limit is hit, then it goes back to the primary to 
request free inodes again, and this loops infinitely, hence the hang.

RESOLUTION:
When the maximum IAU number is reached, prevent primary to 
create the extra IAUs.

* 3864172 (Tracking ID: 3808091)

SYMPTOM:
When using fallocate together with delayed extending write, data corruption may 
happen.

DESCRIPTION:
When doing fallocate after EOF, vxfs will grow the file by splitting the last 
extent of the file into two parts and convert the part after EOF to a ZFOD 
extent. During this procedure, we use a stale file size to calculate the start 
offset of the newly zeroed extent, thus may end up overwriting the blocks which 
contain the unflushed data generated by extending write and causes data 
corruption.

RESOLUTION:
Use up to date file size instead of the stale file size, to make sure the new 
ZFOD extent is created correctly.

* 3864173 (Tracking ID: 3779916)

SYMPTOM:
vxfsconvert fails to upgrade layout verison for a vxfs file system with 
large number of inodes. Error message will show some inode discrepancy.

DESCRIPTION:
vxfsconvert walks through the ilist and converts inode. It stores 
chunks of inodes in a buffer and process them as a batch. The inode number 
parameter for this inode buffer is of type unsigned integer. The offset of a 
particular inode in the ilist is calculated by multiplying the inode number with 
size of inode structure. For large inode numbers this product of inode_number * 
inode_size can overflow the unsigned integer limit, thus giving wrong offset 
within the ilist file. vxfsconvert therefore reads wrong inode and eventually 
fails.

RESOLUTION:
The inode number parameter is defined as unsigned long to avoid 
overflow.

* 3864177 (Tracking ID: 3808033)

SYMPTOM:
After a service group is set offline via VOM or VCSOracle process is left in an unkillable state.

DESCRIPTION:
Whenever ODM issues an async request to FDD, FDD is required to do iodone processing on it, regardless of how far the request gets. The forced unmount causes FDD to take one of the early error branch which misses iodone routine for this particular async request. From ODM's perspective, the request is submitted, but iodone will never be called. This has several bad consequences, one of which is a user thread is blocked uninterruptibly forever, if it waits for request.

RESOLUTION:
The code is modified to add iodone routine in the error handling code.

* 3864178 (Tracking ID: 1428611)

SYMPTOM:
'vxcompress' command can cause many GLM block lock messages to be 
sent over the network. This can be observed with 'glmstat -m' output under the 
section "proxy recv", as shown in the example below -

bash-3.2# glmstat -m
         message     all      rw       g      pg       h     buf     oth    
loop
master send:
           GRANT     194       0       0       0       2       0     192      
98
          REVOKE     192       0       0       0       0       0     192      
96
        subtotal     386       0       0       0       2       0     384     
194

master recv:
            LOCK     193       0       0       0       2       0     191      
98
         RELEASE     192       0       0       0       0       0     192      
96
        subtotal     385       0       0       0       2       0     383     
194

    master total     771       0       0       0       4       0     767     
388

proxy send:
            LOCK      98       0       0       0       2       0      96      
98
         RELEASE      96       0       0       0       0       0      96      
96
      BLOCK_LOCK    2560       0       0       0       0    2560       0       
0
   BLOCK_RELEASE    2560       0       0       0       0    2560       0       
0
        subtotal    5314       0       0       0       2    5120     192     
194

DESCRIPTION:
'vxcompress' creates placeholder inodes (called IFEMR inodes) to 
hold the compressed data of files. After the compression is finished, IFEMR 
inode exchange their bmap with the original file and later given to inactive 
processing. Inactive processing truncates the IFEMR extents (original extents 
of the regular file, which is now compressed) by sending cluster-wide buffer 
invalidation requests. These invalidations need GLM block lock. Regular file 
data need not be invalidated across the cluster, thus making these GLM block 
lock requests unnecessary.

RESOLUTION:
Pertinent code has been modified to skip the invalidation for the 
IFEMR inodes created during compression.

* 3864179 (Tracking ID: 3622323)

SYMPTOM:
Cluster Filesystem mounted as read-only panics when it gets sharing and/or compression statistics using the fsadm_vxfs(1M) command with the following stack:
	 
	- vx_irwlock
	- vx_clust_fset_curused
	- vx_getcompstats
	- vx_aioctl_getcompstats
	- vx_aioctl_common
	- vx_aioctl
	- vx_unlocked_ioctl
	- vx_ioctl
	- vfs_ioctl
	- do_vfs_ioctl
	- sys_ioctl
	- system_call_fastpath

DESCRIPTION:
When file system is mounted as read-only, part of the initial setup is skipped, including loading of few internal structures. These structures are referenced while gathering statistics for sharing and/or compression. As a result, panic occurs.

RESOLUTION:
The code is modified to only allow "fsadm -HS all" to gather sharing and/or compression statistics on read-write file systems. On read-only file systems, this command fails.

* 3864182 (Tracking ID: 3853338)

SYMPTOM:
Files on VxFS are corrupted while running the sequential write workload
under high memory pressure.

DESCRIPTION:
VxFS may miss out writes sometimes under excessive write workload.
Corruption occurs because of the race between the writer thread which is doing
sequential asynchronous writes and the flusher thread which flushes the in-core
dirty pages. Due to an overlapping write, they are serialized 
over a page lock. Because of an optimization, this lock is released, leading to
a small window where the waiting thread could race.

RESOLUTION:
The code is modified to fix the race by reloading the inode write
size after taking the page lock.

* 3864184 (Tracking ID: 3857444)

SYMPTOM:
The default permission of /etc/vx/vxfssystem file is incorrect.

DESCRIPTION:
When creating the file "/etc/vx/vxfssystem", no permission is passed, which results in having the permission to this file as 000.

RESOLUTION:
The code is modified to create the file "/etc/vx/vxfssystem" with default permission as "600".

* 3864185 (Tracking ID: 3859032)

SYMPTOM:
System panics in vx_tflush_map() due to NULL pointer dereference.

DESCRIPTION:
When converting VxFS using vxconvert, new blocks are allocated to 
the structural files like smap etc which can contain garbage. This is done with 
the expectation that fsck will rebuild the correct smap. but in fsck, we have 
missed to distinguish between EAU fully EXPANDED and ALLOCATED. because of
which, if allocation to the file which has the last allocation from such
affected EAU is done, it will create the sub transaction on EAU which are in
allocated state. Map buffers of such EAUs are not initialized properly in VxFS
private buffer cache, as a result, these buffers will be released back as stale
during the transaction commit. Later, if any file-system wide sync tries to
flush the metadata, it can refer to these buffer pointers and panic as these
buffers are already released and reused.

RESOLUTION:
Code is modified in fsck to correctly set the state of EAU on 
disk. Also, modified the involved code paths as to avoid using doing
transactions on unexpanded EAUs.

* 3864186 (Tracking ID: 3855726)

SYMPTOM:
Panic happens in vx_prot_unregister_all(). The stack looks like this:

- vx_prot_unregister_all
- vxportalclose
- __fput
- fput
- filp_close
- sys_close
- system_call_fastpath

DESCRIPTION:
The panic is caused by a NULL fileset pointer, which is due to referencing the
fileset before it's loaded, plus, there's a race on fileset identity array.

RESOLUTION:
Skip the fileset if it's not loaded yet. Add the identity array lock to prevent
the possible race.

* 3864247 (Tracking ID: 3861713)

SYMPTOM:
Contention observed on vx_sched_lk and vx_worklist_lk spinlock when profiled using lockstats.

DESCRIPTION:
Internal worker threads take a lock to sleep on a CV while waiting
for work. This lock is global, If there are large numbers of CPU's and large numbers of worker threads then contention 
can be seen on the vx_sched_lk and vx_worklist_lk using lockstat as well as an increased %sys CPU

RESOLUTION:
Make the lock more scalable in large CPU configs

* 3864250 (Tracking ID: 3833816)

SYMPTOM:
In a CFS cluster, one node returns stale data.

DESCRIPTION:
In a 2-node CFS cluster, when node 1 opens the file and writes to
it, the locks are used with CFS_MASTERLESS flag set. But when node 2 tries to
open the file and write to it, the locks on node 1 are normalized as part of
HLOCK revoke. But after the Hlock revoke on node 1, when node 2 takes the PG
Lock grant to write, there is no PG lock revoke on node 1, so the dirty pages on
node 1 are not flushed and invalidated. The problem results in reads returning
stale data on node 1.

RESOLUTION:
The code is modified to cache the PG lock before normalizing it in
vx_hlock_putdata, so that after the normalizing, the cache grant is still with
node 1.When node 2 requests PG lock, there is a revoke on node 1 which flushes
and invalidates the pages.

* 3864255 (Tracking ID: 3827491)

SYMPTOM:
Data relocation is not executed correctly if the IOTEMP policy is set to AVERAGE.

DESCRIPTION:
Database table is not created correctly which results in an error on the database query. This affects the relocation policy of data and the files are not relocated properly.

RESOLUTION:
The code is modified fix the database table creation issue. Therelocation policy based calculations are done correctly.

* 3864256 (Tracking ID: 3830300)

SYMPTOM:
Heavy cpu usage while oracle archive process are running on a clustered
fs.

DESCRIPTION:
The cause of the poor read performance in this case was due to fragmentation,
fragmentation mainly happens when there are multiple archivers running on the
same node. The allocation pattern of the oracle archiver processes is 

1. write header with O_SYNC
2. ftruncate-up the file to its final size ( a few GBs typically)
3. do lio_listio with 1MB iocbs

The problem occurs because all the allocations in this manner go through
internal allocations i.e. allocations below file size instead of allocations
past the file size. Internal allocations are done at max 8 Pages at once. So if
there are multiple processes doing this, they all get these 8 Pages alternately
and the fs becomes very fragmented.

RESOLUTION:
Added a tunable, which will allocate zfod extents when ftruncate
tries to increase the size of the file, instead of creating a hole. This will
eliminate the allocations internal to file size thus the fragmentation. Fixed
the earlier implementation of the same fix, which ran into
locking issues. Also fixed the performance issue while writing from secondary node.

* 3864259 (Tracking ID: 3856363)

SYMPTOM:
vxfs reports mapbad errors in the syslog as below:
vxfs: msgcnt 15 mesg 003: V-2-3: vx_mapbad - vx_extfind - 
/dev/vx/dsk/vgems01/lvems01 file system free extent bitmap in au 0 marked 
bad.

And, full fsck reports following metadata inconsistencies:

fileset 999 primary-ilist inode 6 has invalid number of blocks 
(18446744073709551583)
fileset 999 primary-ilist inode 6 failed validation clear? (ynq)n
pass2 - checking directory linkage
fileset 999 directory 8192 block devid/blknum 0/393216 offset 68 references 
free 
inode
                                ino 6 remove entry? (ynq)n
fileset 999 primary-ilist inode 8192 contains invalid directory blocks
                                clear? (ynq)n
pass3 - checking reference counts
fileset 999 primary-ilist inode 5 unreferenced file, reconnect? (ynq)n
fileset 999 primary-ilist inode 5 clear? (ynq)n
fileset 999 primary-ilist inode 8194 unreferenced file, reconnect? (ynq)n
fileset 999 primary-ilist inode 8194 clear? (ynq)n
fileset 999 primary-ilist inode 8195 unreferenced file, reconnect? (ynq)n
fileset 999 primary-ilist inode 8195 clear? (ynq)n
pass4 - checking resource maps

DESCRIPTION:
While processing the VX_IEZEROEXT extop, VxFS frees the extent without 
setting VX_TLOGDELFREE flag. Similarly, there are other cases where the flag 
VX_TLOGDELFREE is not set in the case of the delayed extent free, this could 
result in mapbad errors and invalid block counts.

RESOLUTION:
Since the flag VX_TLOGDELFREE need to be set on every extent free, 
modified to code to discard this flag and treat every extent free as delayed 
extent free implicitly.

* 3864260 (Tracking ID: 3846521)

SYMPTOM:
cp -p is failing with EINVAL for files with 10 digit 
modification time. EINVAL error is returned if the value in tv_nsec field is 
greater than/outside the range of 0 to 999, 999, 999.  VxFS supports the 
update in usec but when copying in the user space, we convert the usec to 
nsec. So here in this case, usec has crossed the upper boundary limit i.e 
999, 999.

DESCRIPTION:
In a cluster, its possible that time across nodes might 
differ.so 
when updating mtime, vxfs check if it's cluster inode and if nodes mtime is 
newer 
time than current node time, then accordingly increment the tv_usec instead of 
changing mtime to older time value. There might be chance that it,  tv_usec 
counter got overflowed here, which resulted in 10 digit mtime.tv_nsec.

RESOLUTION:
Code is modified to reset usec counter for mtime/atime/ctime when 
upper boundary limit i.e. 999999 is reached.

* 3866968 (Tracking ID: 3866962)

SYMPTOM:
Data corruption seen when dalloc writes are going on the file and 
simultaneously fsync started on the same file.

DESCRIPTION:
In case if dalloc writes are going on the file and simultaneously 
synchronous flushing is started on the same file, then synchronous flushing will try 
to flush all the dirty pages of the file without considering underneath allocation. 
In this case, flushing can happen on the unallocated blocks and this can result into 
data loss.

RESOLUTION:
Code is modified to flush data till actual allocation in case of dalloc 
writes.

* 3870704 (Tracking ID: 3867131)

SYMPTOM:
Kernel panic in internal testing.

DESCRIPTION:
In internal testing the vdl_fsnotify_sb is found NULL because we
are not allocating and initializing it in initialization routine i.e
vx_fill_super(). The vdl_fsnotify_sb would be initialized in vx_fill_super()
only when the kernel's fsnotify feature is available. But fsnotify feature is
not available in RHEL5/SLES10 kernel.

RESOLUTION:
code is added to check if fsnotify feature is available in the
running kernel.

* 3872661 (Tracking ID: 3857254)

SYMPTOM:
Assert failure because of missed flush before taking filesnap of the file.

DESCRIPTION:
If the delayed extended write on the file is not completed but the snap of the file is taken, then the inode size is not updated correctly. This will trigger internal assert because of incorrect inode size.

RESOLUTION:
The code is modified to flush the delayed extended write before taking filesnap.

* 3874662 (Tracking ID: 3871489)

SYMPTOM:
IO service times increased with IO intensive workload on high end 
server.

DESCRIPTION:
VxFS has worklist threads which sleep on single conditional
variable. while waking up the worker threads contention can be seen on the
OS sleep dispatch locks and service time for the IO can increase due to this
contention.

RESOLUTION:
Scale the number of conditional variables to reduce contention.
And also add padding to the conditional variable structure to avoid cache
allocation problems. Also make sure to wakeup exact number of threads that
required.

* 3875458 (Tracking ID: 3616694)

SYMPTOM:
Internal assert failure because of race condition between forced unmount 
thread and inactive processing thread.

DESCRIPTION:
There is a possible race condition between forced umount and inactive 
processing. In linux, last close of file is done with umount but this may not result 
in getting called for inactive processing due to dentry hold. Dentry can be purged due 
to memory pressure which will result in inactive processing which can happen during 
unmount as well. So if inactivation happened after gone over inactive list as part of 
unmount and the entry got added to inactive list, assert will trigger because of the 
wrong flags on the inode.

RESOLUTION:
Code is modified to resolve this race condition.

* 3875633 (Tracking ID: 3869174)

SYMPTOM:
Write system call might get into deadlock on rhel5 and sles10.

DESCRIPTION:
Issue exists due to page fault handling when holding the page lock.
On rhel5 and sles10 when we go for write we may hold page locks and now if page
fault happens, page fault handler will be waiting on the lock which we have
already held resulting in deadlock.

RESOLUTION:
This behavior has been taken care of. Now we prefault so that deadlock
can be skipped.

* 3876065 (Tracking ID: 3867128)

SYMPTOM:
Assert failed in internal native AIO testing.

DESCRIPTION:
On RHEL5/SLES10, in NAIO, the iovec comes from the kernel stack. So
when handed-off the work item to the worker thread, then the work item points
to an iovec structure in a stack-frame which no longer exists.  So, the iovecs
memory can be corrupted when it is used for a new stack-frame.

RESOLUTION:
Code is modified to allocate the iovec dynamically in naio hand-off code and 
copy it into the work item before doing handoff.

* 3877070 (Tracking ID: 3880121)

SYMPTOM:
Internal assert failure when coalescing the extents on clone.

DESCRIPTION:
When coalescing extents on clone, resolving overlay extent is not 
supported but still code try to resolve these overlay extents. This was 
resulting in internal assert failure.

RESOLUTION:
Code is modified to not resolve these overlay extents when 
coalescing.

* 3877142 (Tracking ID: 3891801)

SYMPTOM:
Internal test hit debug assert.

DESCRIPTION:
Got an debug assert while creating page in shared page cache for
zfod extent which is same as creating for HOLEs, which VxFS don't do.

RESOLUTION:
Added a check for page creation so that we don't create shared pages
for zfod extent.

* 3878983 (Tracking ID: 3872202)

SYMPTOM:
VxFS internal test hits an assert.

DESCRIPTION:
In page create case VxFS was taking the ipglock twice in a thread,
due to which the VxFS test hit the internal assert.

RESOLUTION:
Removed the ipglock from vx_wb_dio_write().

* 3890556 (Tracking ID: 2919310)

SYMPTOM:
During stress testing on cluster file system, an assertion failure was hit 
because of a missing linkage between the directory and the associated 
attribute inode.

DESCRIPTION:
As per the designed behavior, the node which owns the inode of the file, 
receives the request to remove the file from the directory. If the directory 
has an alternate index (hash directory) present, then in the file remove 
receive handler, the attribute inode is read from the disk. However, VxFS 
does not create a linkage between the directory and the corresponding inode, 
which results in an assert failure.

RESOLUTION:
The code is modified to set the directory inodes i_dirhash field to 
attribute inode. This change is exercised while bringing the inode incore 
during file or directory removal.

* 3890659 (Tracking ID: 3514407)

SYMPTOM:
Internal stress test hit debug assert.

DESCRIPTION:
In deli-cache code, when it reuses the inode, it is updating the
inode generation count only for reorg inodes.

RESOLUTION:
Code is added to update inode generation count unconditionally.

* 3851511 (Tracking ID: 3821686)

SYMPTOM:
VxFS module might not get loaded on SLES11 SP4.

DESCRIPTION:
Since SLES11 SP4 is new release therefore VxFS module failed to load
on it.

RESOLUTION:
Added VxFS support for SLES11 SP4.

* 3660421 (Tracking ID: 3660422)

SYMPTOM:
On RHEL 6.6, umount(8) system call hangs if an application is watching for inode
events using inotify(7) APIs.

DESCRIPTION:
On RHEL 6.6, additional counters were added in the super block to track inotify
watches, these new counters were not implemented in VxFS.
Hence while doing umount, the operation hangs until the counter in the
superblock drops to zero, which would never happen since they are not handled in
VXFS.

RESOLUTION:
Code is modified to handle additional counters added in RHEL6.6.

* 3520113 (Tracking ID: 3451284)

SYMPTOM:
While allocating extent during write operation, if summary and bitmap data for
filesystem allocation unit get mismatched then the assert hits.

DESCRIPTION:
if extent was allocated using SMAP on the deleted inode, and part of the AU
space is moved from deleted inode to the new inode. At this point SMAP state is
set to VX_EAU_ALLOCATED and EMAP is not initialized. When more space is needed
for new inode, it tries to allocate from the same AU using EMAP and can hit
"f:vx_sum_upd_efree1:2a" assert, as EMAP is not initialized.

RESOLUTION:
Code has been modified to expand AU while moving partial AU space from one inode
to other inode.

* 3521945 (Tracking ID: 3530435)

SYMPTOM:
Panic in Internal test with SSD cache enabled.

DESCRIPTION:
The record end of the write back log record was wrongly getting modified
while adding a skip list node  in the punch hole case where expunge flag is set where then insertion of new 
node is skipped

RESOLUTION:
Code to modified to skip modification of the writeback log record when the
expunge flag is set and left end of the record is smaller or equal to the end
offset of the next punch hole request.

* 3529243 (Tracking ID: 3616722)

SYMPTOM:
Race between the writeback cache offline thread and the writeback data flush thread causes null pointer dereference, resulting in system panic.

DESCRIPTION:
While disabling writeback, the writeback cache information is deinitialized from each inode which results in the removal of writeback bmap lock pointer. But during this time frame, if the writeback flushing is still going on through some other thread which has writeback bmap lock, then while removing the writeback bmap lock, null pointer dereference hits since it was already removed through previous thread.

RESOLUTION:
The code is modified to handle such race conditions.

* 3536233 (Tracking ID: 3457803)

SYMPTOM:
File System gets disabled with the following message in the system log:
WARNING: V-2-37: vx_metaioerr - vx_iasync_wait - /dev/vx/dsk/testdg/test  file system meta data write error in dev/block

DESCRIPTION:
The inode's incore information gets inconsistent as one of its field is getting modified without the locking protection.

RESOLUTION:
Protect the inode's field properly by taking the lock operation.

* 3583963 (Tracking ID: 3583930)

SYMPTOM:
When external quota file is over-written or restored from backup, new settings which were added after the backup still remain.

DESCRIPTION:
The internal quota file is not always updated with correct limits, so the quotaon operation is to copy the quota limits from external to internal quota file. To complete the copy operation, the extent of external file is compared to the extent of internal file at the corresponding offset.     
If the external quota file is overwritten (or restored to its original copy) and the size of internal file is more than that of external, the quotaon operation does not clear the additional (stale) quota records in the internal file. Later, the sync operation (part of quotaon) copies these stale records from internal to external file. Hence, both internal and external files contain stale records.

RESOLUTION:
The code has been modified to remove the stale records in the internal file at the time of quotaon.

* 3617774 (Tracking ID: 3475194)

SYMPTOM:
Veritas File System (VxFS) fscdsconv(1M) command fails with the following error message:
...
UX:vxfs fscdsconv: INFO: V-3-26130: There are no files violating the CDS limits for this target.
UX:vxfs fscdsconv: INFO: V-3-26047:  Byteswapping in progress ...
UX:vxfs fscdsconv: ERROR: V-3-25656:  Overflow detected
UX:vxfs fscdsconv: ERROR: V-3-24418: fscdsconv: error processing primary inode 
list for fset 999
UX:vxfs fscdsconv: ERROR: V-3-24430: fscdsconv: failed to copy metadata
UX:vxfs fscdsconv: ERROR: V-3-24426: fscdsconv: Failed to migrate.

DESCRIPTION:
The fscdsconv(1M) command takes a filename argument which is used as a recovery failure, to be used to restore the original file system in case of failure when the file system conversion is in progress. This file has two parts: control part and data part. The control part is used to store information about all the metadata like inodes and extents etc. In this instance, the length of the control part is being underestimated for some file systems where there are few inodes, but the average number of extents per file is very large (this can be seen in the fsadm E report).

RESOLUTION:
Make recovery file sparse, start the data part after 1TB offset, and then the control part can do allocating writes to the hole from the beginning of the file.

* 3617776 (Tracking ID: 3473390)

SYMPTOM:
In memory pressure scenarios, you see panics or system crashes due to stack overflows.

DESCRIPTION:
Specifically on RHEL6, the memory allocation routines consume much more memory than other distributions like SLES, or even RHEL5. Due to this, multiple overflows are reported for the RHEL6 platform. Most of these overflows occur when VxFS tries to allocate memory under memory pressure.

RESOLUTION:
The code is modified to fix multiple overflows by adding handoff code paths, adjusting handoff limits, removing on-stack structures and reducing the number of function frames on stack wherever possible.

* 3617781 (Tracking ID: 3557009)

SYMPTOM:
Run the fallocate command with -l option to specify the length of the reserve allocation. The file size is not expected, but multiple of file system block size. For example:
If block size = 8K:
# fallocate -l 8860 testfile1
# ls -l
total 16
drwxr-xr-x. 2 root root    96 Jul  1 11:40 lost+found/
-rw-r--r--. 1 root root 16384 Jul  1 11:41 testfile1
The file size should be 8860, but it's 16384(which is 2*8192).

DESCRIPTION:
The vx_fallocate() function on Veritas File System (VxFS) creates larger file than specified because it allocates the extent in blocks. So the reserved file size is multiples of block size, instead of what the fallocate command specifies.

RESOLUTION:
The code is modified so that the vx_fallocate() function on VxFS sets the reserved file size to what it specifies, instead of multiples of block size.

* 3617788 (Tracking ID: 3604071)

SYMPTOM:
With the thin reclaim feature turned on, you can observe high CPU usage on the vxfs thread process. The backtrace of such kind of threads usually look like this:
	 
	 - vx_dalist_getau
	 - vx_recv_bcastgetemapmsg
	 - vx_recvdele
	 - vx_msg_recvreq
	 - vx_msg_process_thread
	 - vx_kthread_init

DESCRIPTION:
In the routine to get the broadcast information of a node which contains maps of Allocation Units (AUs) for which node holds the delegations, the locking mechanism is inefficient. Thus every time when this routine is called, it will perform a series of down-up operation on a certain semaphore. This can result in a huge CPU cost when many threads calling the routine in parallel.

RESOLUTION:
The code is modified to optimize the locking mechanism in the routine to get the broadcast information of a node which contains maps of Allocation Units (AUs) for which node holds the delegations, so that it only does down-up operation on the semaphore once.

* 3617790 (Tracking ID: 3574404)

SYMPTOM:
System panics because of a stack overflow during rename operation. The following stack trace can be seen during the panic:

machine_kexec 
crash_kexec 
oops_end 
no_context 
__bad_area_nosemaphore 
bad_area_nosemaphore 
__do_page_fault 
do_page_fault 
page_fault 
task_tick_fair 
scheduler_tick 
update_process_times 
tick_sched_timer 
__run_hrtimer 
hrtimer_interrupt 
local_apic_timer_interrupt 
smp_apic_timer_interrupt 
apic_timer_interrupt 
--- <IRQ stack> ---
apic_timer_interrupt 
mempool_free_slab 
mempool_free 
vx_pgbh_free  
vx_pgbh_detach  
vx_releasepage  
try_to_release_page 
shrink_page_list.clone.3 
shrink_inactive_list 
shrink_mem_cgroup_zone 
shrink_zone 
zone_reclaim 
get_page_from_freelist 
__alloc_pages_nodemask 
alloc_pages_current 
__get_free_pages 
vx_getpages  
vx_alloc  
vx_bc_getfreebufs  
vx_bc_getblk  
vx_getblk_bp  
vx_getblk_cmn  
vx_getblk  
vx_getmap  
vx_getemap  
vx_extfind  
vx_searchau_downlevel  
vx_searchau_downlevel  
vx_searchau_downlevel  
vx_searchau_downlevel  
vx_searchau_uplevel  
vx_searchau  
vx_extentalloc_device  
vx_extentalloc  
vx_bmap_ext4  
vx_bmap_alloc_ext4  
vx_bmap_alloc  
vx_write_alloc3  
vx_tran_write_alloc  
vx_idalloc_off1  
vx_idalloc_off  
vx_int_rename  
vx_do_rename  
vx_rename1  
vx_rename  
vfs_rename 
sys_renameat 
sys_rename 
system_call_fastpath

DESCRIPTION:
The stack is overflown by 88 bytes in the rename code path. The thread_info structure is disrupted with VxFS page buffer head addresses..

RESOLUTION:
We now use dynamic allocation of local structures in vx_write_alloc3 and vx_int_rename. Thissaves 256 bytes and gives enough room.

* 3617793 (Tracking ID: 3564076)

SYMPTOM:
The MongoDB noSQL db creation fails with an ENOTSUP error. MongoDB uses
posix_fallocate to create a file first. When it writes at offset which is not
aligned with File System block boundary, an ENOTSUP error comes up.

DESCRIPTION:
On a file system with 8k bsize and 4k page size, the application creates a file
using posix_fallocate, and then writes at some offset which is not aligned with
fs block boundary. In this case, the pre-allocated extent is split at the
unaligned offset into two parts for the write. However the alignment requirement
of the split fails the operation.

RESOLUTION:
Split the extent down to block boundary.

* 3617877 (Tracking ID: 3615850)

SYMPTOM:
The write system call writes up to count bytes from the pointed buffer to the file referred to by the file descriptor field:

ssize_t write(int fd, const void *buf, size_t count);

When the count parameter is invalid, sometimes it can cause the write() to hang on VxFS file system. E.g. with a 10000 bytes buffer, but the count is set to 30000 by mistake, then you may encounter such problem.

DESCRIPTION:
On recent linux kernels, you cannot take a page-fault while holding a page locked so as to avoid a deadlock. This means uiomove can copy less than requested, and any partially populated pages created in routine which establish a virtual mapping for the page are destroyed.
This can cause an infinite loop in the write code path when the given user-buffer is not aligned with a page boundary and the length given to write() causes an EFAULT; uiomove() does a partial copy, segmap_release destroys the partially populated pages and unwinds the uio. The operation is then repeated.

RESOLUTION:
The code is modified to move the pre-faulting to the buffered IO write-loops; The system either shortens the length of the copy if all of the requested pages cannot be faulted, or fails with EFAULT if no pages are pre-faulted. This prevents the infinite loop.

* 3620279 (Tracking ID: 3558087)

SYMPTOM:
Run simultaneous dd threads on a mount point and start the ls l command on the same mount point. Then the system hangs.

DESCRIPTION:
When the delayed allocation (dalloc) feature is turned on, the flushing process takes much time. The process keeps the glock held, and needs writers to keep the irwlock held. Thels l command starts stat internally and keeps waiting for irwlock to real ACLs.

RESOLUTION:
Redesign dalloc to keep the glock unlocked while flushing.

* 3620284 (Tracking ID: 3596378)

SYMPTOM:
The copy of a large number of small files is slower on Veritas File System (VxFS) compared to EXT4.

DESCRIPTION:
VxFS implements the fsetxattr() system call in a synchronized way. Hence, before returning to the system call, the VxFS will take some time to flush the data to the disk. In this way, the VxFS guarantees the file system consistency in case of file system crash. However, this implementation has a side-effect that it serializes the whole processing, which takes more time.

RESOLUTION:
The code is modified to change the transaction to flush the data in a delayed way.

* 3620288 (Tracking ID: 3469644)

SYMPTOM:
The system panics in the vx_logbuf_clean() function when it  traverses chain of transactions off the intent log buffer. The stack trace is as follows:

vx_logbuf_clean ()
vx_logadd ()
vx_log()
vx_trancommit()
vx_exh_hashinit ()
vx_dexh_create ()
vx_dexh_init ()
vx_pd_rename ()
vx_rename1_pd()
vx_do_rename ()
vx_rename1 ()
vx_rename ()
vx_rename_skey ()

DESCRIPTION:
The system panics as the vx_logbug_clean() function tries to access an already freed transaction from transaction chain to flush it to log.

RESOLUTION:
The code has been modified to make sure that the transaction gets flushed to the log before it is freed.

* 3621420 (Tracking ID: 3621423)

SYMPTOM:
The Veritas Volume manager (VxVM) caching is disabled or stopped after mounting a file system in a situation where the Veritas File System (VxFS) cache area is not present.

DESCRIPTION:
When the VxFS cache area is not present and the VxVM cache area is present and in ENABLED state, if you mount a file system on any of the volumes, the VxVM caching gets stopped for that volume, which is not an expected behavior.

RESOLUTION:
The code is modified not to disable VxVM caching for any mounted file system if the VxFS cache area is not present.

* 3628867 (Tracking ID: 3595896)

SYMPTOM:
While creating OracleRAC 12.1.0.2 database, the node panics with the following stack:
aio_complete()
vx_naio_do_work()
vx_naio_worker()
vx_kthread_init()

DESCRIPTION:
For a zero size request (with a correctly aligned buffer), Veritas File System (VxFS) wrongly queues the work internally and returns -EIOCBQUEUED. The kernel calls function aio_complete() for this zero size request. However, while VxFS is performing the queued work internally, the aio_complete() function gets called again. The double call of the aio_complete() function results in the panic.

RESOLUTION:
The code is modified so that the zero size requests will not queue elements inside VxFS work queue.

* 3636210 (Tracking ID: 3633067)

SYMPTOM:
While converting from ext3 file system to VxFS using vxfsconvert, it is observed that many inodes are missing.

DESCRIPTION:
When vxfsconvert(1M) is run on an ext3 file system, it misses an entire block group of inodes. This happens because of an incorrect calculation of block group number of a given inode in border case. The inode which is the last inode for a given block group is calculated to have the correct inode offset, but is calculated to be in the next block group. This causes
 the entire next block group to be skipped when the code attempts to find the next consecutive inode.

RESOLUTION:
The code is modified to correct the calculation of block group number.

* 3644006 (Tracking ID: 3451686)

SYMPTOM:
During internal stress testing on cluster file system(CFS), debug
assert is hit due to invalid cache generation count on incore inode.

DESCRIPTION:
Reset of the cache generation count in incore inode used in Disk
Layout Version(DLV) 10 was missed during inode reuse, causing the debug assert.

RESOLUTION:
The code is modified to reset the cache generation count in incore
inode during inode reuse.

* 3645825 (Tracking ID: 3622326)

SYMPTOM:
Filesystem is marked with fullfsck flag as an inode is marked bad
during checkpoint promote

DESCRIPTION:
VxFS incorrectly skipped pushing of data to clone inode due to
which the inode is marked bad during checkpoint promote which intern resulted in
filesystem being marked with fullfsck flag.

RESOLUTION:
Code is modified to push the proper data to clone inode.

* 3370758 (Tracking ID: 3370754)

SYMPTOM:
Internal test with SmartIO write-back SSD cache hit debug asserts.

DESCRIPTION:
The debug asserts are hit due to race condition in various code segments for write-back SSD cache feature.

RESOLUTION:
The code is modified to fix the race conditions.

* 3383149 (Tracking ID: 3383147)

SYMPTOM:
The ACA operator precedence error may occur while turning AoffA delayed
allocation.

DESCRIPTION:
Due to the C operator precedence issue, VxFS evaluates a condition
wrongly.

RESOLUTION:
The code is modified to evaluate the condition correctly.

* 3422580 (Tracking ID: 1949445)

SYMPTOM:
System is unresponsive when files are created on large directory. The following stack is logged:

vxg_grant_sleep()                                             
vxg_cmn_lock()
vxg_api_lock()                                             
vx_glm_lock()
vx_get_ownership()                                                  
vx_exh_coverblk()  
vx_exh_split()                                                 
vx_dexh_setup() 
vx_dexh_create()                                              
vx_dexh_init() 
vx_do_create()

DESCRIPTION:
For large directories, large directory hash (LDH) is enabled to improve the lookup feature. When a system takes ownership of LDH inode twice in same thread context (while building hash for directory), it becomes unresponsive

RESOLUTION:
The code is modified to avoid taking ownership again if we already have the ownership of the LDH inode.

* 3422584 (Tracking ID: 2059611)

SYMPTOM:
The system panics due to a NULL pointer dereference while flushing the
bitmaps to the disk and the following stack trace is displayed:
a|
a|
vx_unlockmap+0x10c
vx_tflush_map+0x51c
vx_fsq_flush+0x504
vx_fsflush_fsq+0x190
vx_workitem_process+0x1c
vx_worklist_process+0x2b0
vx_worklist_thread+0x78

DESCRIPTION:
The vx_unlockmap() function unlocks a map structure of the file
system. If the map is being used, the hold count is incremented. The
vx_unlockmap() function attempts to check whether this is an empty mlink doubly
linked list. The asynchronous vx_mapiodone routine can change the link at random
even though the hold count is zero.

RESOLUTION:
The code is modified to change the evaluation rule inside the
vx_unlockmap() function, so that further evaluation can be skipped over when map
hold count is zero.

* 3422586 (Tracking ID: 2439261)

SYMPTOM:
When the vx_fiostats_tunable is changed from zero to non-zero, the
system panics with the following stack trace:
vx_fiostats_do_update
vx_fiostats_update
vx_read1
vx_rdwr
vno_rw
rwuio
pread

DESCRIPTION:
When vx_fiostats_tunable is changed from zero to non-zero, all the
incore-inode fiostats attributes are set to NULL. When these attributes are
accessed, the system panics due to the NULL pointer dereference.

RESOLUTION:
The code has been modified to check the file I/O stat attributes are
present before dereferencing the pointers.

* 3422604 (Tracking ID: 3092114)

SYMPTOM:
The information output by the "df -i" command can often be inaccurate for 
cluster mounted file systems.

DESCRIPTION:
In Cluster File System 5.0 release a concept of delegating metadata to nodes in 
the cluster is introduced. This delegation of metadata allows CFS secondary 
nodes to update metadata without having to ask the CFS primary to do it. This 
provides greater node scalability. 
However, the "df -i" information is still collected by the CFS primary 
regardless of which node (primary or secondary) the "df -i" command is executed 
on.

For inodes the granularity of each delegation is an Inode Allocation Unit 
[IAU], thus IAUs can be delegated to nodes in the cluster.
When using a VxFS 1Kb file system block size each IAU will represent 8192 
inodes.
When using a VxFS 2Kb file system block size each IAU will represent 16384 
inodes.
When using a VxFS 4Kb file system block size each IAU will represent 32768 
inodes.
When using a VxFS 8Kb file system block size each IAU will represent 65536 
inodes.
Each IAU contains a bitmap that determines whether each inode it represents is 
either allocated or free, the IAU also contains a summary count of the number 
of inodes that are currently free in the IAU.
The ""df -i" information can be considered as a simple sum of all the IAU 
summary counts.
Using a 1Kb block size IAU-0 will represent inodes numbers      0 -  8191
Using a 1Kb block size IAU-1 will represent inodes numbers   8192 - 16383
Using a 1Kb block size IAU-2 will represent inodes numbers  16384 - 32768
etc.
The inaccurate "df -i" count occurs because the CFS primary has no visibility 
of the current IAU summary information for IAU that are delegated to Secondary 
nodes.
Therefore the number of allocated inodes within an IAU that is currently 
delegated to a CFS Secondary node is not known to the CFS Primary.  As a 
result, the "df -i" count information for the currently delegated IAUs is 
collected from the Primary's copy of the IAU summaries. Since the Primary's 
copy of the IAU is stale, therefore the "df -i" count is only accurate when no 
IAUs are currently delegated to CFS secondary nodes.
In other words - the IAUs currently delegated to CFS secondary nodes will cause 
the "df -i" count to be inaccurate.
Once an IAU is delegated to a node it can "timeout" after a 3 minutes  of 
inactivity. However, not all IAU delegations will timeout. One IAU will always 
remain delegated to each node for performance reasons. Also an IAU whose inodes 
are all allocated (so no free inodes remain in the IAU) it would not timeout 
either.
The issue can be best summarized as:
The more IAUs that remain delegated to CFS secondary nodes, the greater the 
inaccuracy of the "df -i" count.

RESOLUTION:
Allow the delegations for IAU's whose inodes are all allocated (so no free 
inodes in the IAU) to "timeout" after 3 minutes of inactivity.

* 3422614 (Tracking ID: 3297840)

SYMPTOM:
A metadata corruption is found during the file removal process with the inode block count getting negative.

DESCRIPTION:
When the user removes or truncates a file having the shared indirect blocks, there can be an instance where the block count can be updated to reflect the removal of the shared indirect blocks when the blocks are not removed from the file. The next iteration of the loop updates the block count again while removing these blocks. This will eventually lead to the block count being a negative value after all the blocks are removed from the file. The removal code expects the block count to be zero before updating the rest of the metadata.

RESOLUTION:
The code is modified to update the block count and other tracking metadata in the same transaction as the blocks are removed from the file.

* 3422619 (Tracking ID: 3294074)

SYMPTOM:
System call fsetxattr() is slower on Veritas File System (VxFS) than ext3 file system.

DESCRIPTION:
VxFS implements the fsetxattr() system call in a synchronized sync way.  Hence, it will take some time to flush the data to the disk before returning to the system call to guarantee file system consistency in case of file system crash.

RESOLUTION:
The code is modified to allow the transaction to flush the data in a delayed way.

* 3422624 (Tracking ID: 3352883)

SYMPTOM:
During the rename operation, lots of nfsd threads waiting for mutex operation hang with the following stack trace :
vxg_svar_sleep_unlock 
vxg_get_block
vxg_api_initlock  
vx_glm_init_blocklock
vx_cbuf_lookup  
vx_getblk_clust 
vx_getblk_cmn 
vx_getblk
vx_fshdchange
vx_unlinkclones
vx_clone_dispose
vx_workitem_process
vx_worklist_process
vx_worklist_thread
vx_kthread_init


vxg_svar_sleep_unlock 
vxg_grant_sleep 
vxg_cmn_lock 
vxg_api_trylock 
vx_glm_trylock 
vx_glmref_trylock 
vx_mayfrzlock_try 
vx_walk_fslist 
vx_log_sync 
vx_workitem_process 
vx_worklist_process 
vx_worklist_thread 
vx_kthread_init

DESCRIPTION:
A race condition is observed between the NFS rename and additional dentry alias created by the current vx_splice_alias()function. 
This race condition causes two different directory dentries pointing to the same inode, which results in mutex deadlock in lock_rename()function.

RESOLUTION:
The code is modified to change the vx_splice_alias()function to prevent the creation of additional dentry alias.

* 3422626 (Tracking ID: 3332902)

SYMPTOM:
The system running the fsclustadm(1M) command panics while shutting down. The 
following stack trace is logged along with the panic:

machine_kexec
crash_kexec
oops_end
page_fault [exception RIP: vx_glm_unlock]
vx_cfs_frlpause_leave [vxfs]
vx_cfsaioctl [vxfs]
vxportalkioctl [vxportal]
vfs_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath

DESCRIPTION:
There exists a race-condition between "fsclustadm(1M) cfsdeinit"
and "fsclustadm(1M) frlpause_disable". The "fsclustadm(1M) cfsdeinit" fails
after cleaning the Group Lock Manager (GLM), without downgrading the CFS state.
Under the false CFS state, the "fsclustadm(1M) frlpause_disable" command enters
and accesses the GLM lock, which "fsclustadm(1M) cfsdeinit" frees resulting in a
panic.

Another race condition exists between the code in vx_cfs_deinit() and the code 
in
fsck, and it leads to the situation that although fsck has a reservation
held, but this couldn't prevent vx_cfs_deinit() from freeing vx_cvmres_list
because there is no such a check for vx_cfs_keepcount.

RESOLUTION:
The code is modified to add appropriate checks in the "fsclustadm(1M) 
cfsdeinit" and "fsclustadm(1M) frlpause_disable" to avoid the race-condition.

* 3422629 (Tracking ID: 3335272)

SYMPTOM:
The mkfs (make file system) command dumps core when the log size 
provided is not aligned. The following stack trace is displayed:

(gdb) bt
#0  find_space ()
#1  place_extents ()
#2  fill_fset ()
#3  main ()
(gdb)

DESCRIPTION:
While creating the VxFS file system using the mkfs command, if the 
log size provided is not aligned properly, you may end up in doing 
miscalculations for placing the RCQ extents and finding no place. This leads to 
illegal memory access of AU bitmap and results in core dump.

RESOLUTION:
The code is modified to place the RCQ extents in the same AU where 
log extents are allocated.

* 3422634 (Tracking ID: 3337806)

SYMPTOM:
On linux kernels greater than 3.0 find(1) command, the kernel may panic
in the link_path_walk() function with the following stack trace:

do_page_fault
page_fault
link_path_walk
path_lookupat
do_path_lookup
user_path_at_empty
vfs_fstatat
sys_newfstatat
system_call_fastpath

DESCRIPTION:
VxFS overloads a bit of the dentry flag at 0x1000 for internal
usage. Linux didn't use this bit until kernel version 3.0 onwards. Therefore it
is possible that both Linux and VxFS strive for this bit, which panics the kernel.

RESOLUTION:
The code is modified not to use 0x1000 bit in the dentry flag .

* 3422636 (Tracking ID: 3340286)

SYMPTOM:
The tunable setting of dalloc_enable gets reset to a default value
after a file system is resized.

DESCRIPTION:
The file system resize operation triggers the file system re-initialization
process. 
During this process, the tunable value of dalloc_enable gets reset to the
default value instead of retaining the old tunable value.

RESOLUTION:
The code is fixed such that the old tunable value of dalloc_enable is retained.

* 3422638 (Tracking ID: 3352059)

SYMPTOM:
Due to memory leak, high memory usage occurs with vxfsrepld on target when no jobs are running.

DESCRIPTION:
On the target side, high memory usage may occur even when
there are no jobs running because the memory allocated for some structures is not freed for every job iteration.

RESOLUTION:
The code is modified to resolve the memory leaks.

* 3422649 (Tracking ID: 3394803)

SYMPTOM:
The vxupgrade(1M) command causes VxFS to panic with the following stack trace:
panic_save_regs_switchstack()
panic
bad_kern_reference()
$cold_pfault()
vm_hndlr()
bubbleup()
vx_fs_upgrade()
vx_upgrade()
$cold_vx_aioctl_common()
vx_aioctl()
vx_ioctl()
vno_ioctl()
ioctl()
syscall()

DESCRIPTION:
The panic is caused due to de_referencing the operator in the NULL device (one
of the devices in the DEVLIST is showing as a NULL device).

RESOLUTION:
The code is modified to skip the NULL devices when the device in EVLIST is
processed.

* 3422657 (Tracking ID: 3412667)

SYMPTOM:
On RHEL 6, the inode update operation may create deep stack and cause system panic  due to stack overflow. Below is the stack trace:
dequeue_entity()
dequeue_task_fair()
dequeue_task()
deactivate_task()
thread_return()
io_schedule()
get_request_wait()
blk_queue_bio()
generic_make_request()
submit_bio()
vx_dev_strategy()
vx_bc_bwrite()
vx_bc_do_bawrite()
vx_bc_bawrite()
 vx_bwrite()
vx_async_iupdat()
vx_iupdat_local()
vx_iupdat_clustblks()
vx_iupdat_local()
vx_iupdat()
vx_iupdat_tran()
vx_tflush_inode()
vx_fsq_flush()
vx_tranflush()
vx_traninit()
vx_get_alloc()
vx_tran_get_alloc()
vx_alloc_getpage()
vx_do_getpage()
vx_internal_alloc()
vx_write_alloc()
vx_write1()
vx_write_common_slow()
vx_write_common()
vx_vop_write()
vx_writev()
vx_naio_write_v2()
do_sync_readv_writev()
do_readv_writev()
vfs_writev()
nfsd_vfs_write()
nfsd_write()
nfsd3_proc_write()
nfsd_dispatch()
svc_process_common()
svc_process()
nfsd()
kthread()
kernel_thread()

DESCRIPTION:
Some VxFS operation may need inode update. This may create very deep stack and cause system panic due to stack overflow.

RESOLUTION:
The code is modified to add a handoff point in the inode update function. If the stack usage reaches a threshold, it will start a separate thread to do the work to limit stack usage.

* 3430467 (Tracking ID: 3430461)

SYMPTOM:
The nested unmounts as well as the force unmounts fail if, the parent file system is disabled which further inhibits the unmounting of the child file system.

DESCRIPTION:
If a file system is mounted inside another vxfs mount, and if the parent file system gets disabled, then it is not possible to sanely unmount the child even with the force unmounts. This issue is observed because a disabled file system does not allow directory look up on it. On Linux, a file system can be unmounted only by providing the path of the mount point.

RESOLUTION:
The code is modified to allow the exceptional path look for unmounts. These are read only operations and hence are safer. This makes it possible for the unmount of child file system to proceed.

* 3436431 (Tracking ID: 3434811)

SYMPTOM:
In VxFS 6.1, the vxfsconvert(1M) command hangs within the vxfsl3_getext()
Function with following stack trace:

search_type()
bmap_typ()
vxfsl3_typext()
vxfsl3_getext()
ext_convert()
fset_convert()
convert()

DESCRIPTION:
There is a type casting problem for extent size. It may cause a non-zero value to overflow and turn into zero by mistake. This further leads to infinite looping inside the function.

RESOLUTION:
The code is modified to remove the intermediate variable and avoid type casting.

* 3436433 (Tracking ID: 3349651)

SYMPTOM:
Veritas File System (VxFS) modules fail to load on RHEL6.5 and display the following error message:
kernel: vxfs: disagrees about version of symbol putname
kernel: vxfs: disagrees about version of symbol getname

DESCRIPTION:
In RHEL6.5, the kernel interfaces for getname() and putname() functions used by VxFS have changed.

RESOLUTION:
The code is modified to use the latest kernel interfaces definitions for getname() and putname()functions.

* 3494534 (Tracking ID: 3402618)

SYMPTOM:
The mmap read performance on VxFS is slow.

DESCRIPTION:
The mmap read performance on VxFS is not good, because the read ahead operation is not triggered while the mmap reads is executed.

RESOLUTION:
An enhancement has been made to the read ahead operation. It helps improve the mmap read performance.

* 3502847 (Tracking ID: 3471245)

SYMPTOM:
The Mongodb fails to insert any record because lseek fails to seek to the EOF.

DESCRIPTION:
Fallocate doesn't update the inode's i_size on linux, which causes lseek unable to seek to the EOF.

RESOLUTION:
Before returning from the vx_fallocate() function, call the vx_getattr()function to update the Linux inode with the VxFS inode.

* 3504362 (Tracking ID: 3472551)

SYMPTOM:
The attribute validation (pass 1d) of full fsck takes too much time to complete.

DESCRIPTION:
The current implementation of full fsck Pass 1d (attribute inode validation) is single threaded. This causes slow full fsck performance on large file system, especially the ones having large number of attribute inodes.

RESOLUTION:
The Pass 1d is modified to work in parallel using multiple threads, which enables full fsck to process the attribute inode
validation faster.

* 3506487 (Tracking ID: 3506485)

SYMPTOM:
The system does not allow write-back caching with VVR.

DESCRIPTION:
If the volume or vset is a part of a RVG (Replicated Volume Group) on which the file system is mounted with the write-back feature, then the mount operation should succeed without enabling the write-back feature to maintain write order fidelity.
Similarly, if the write-back feature is enabled on the file system, then an attempt to add that volume or vset to RVG should fail.

RESOLUTION:
The code is modified to add the required limitation.

* 3512292 (Tracking ID: 3348520)

SYMPTOM:
In a Cluster File System (CFS) cluster having multi volume file system of a smaller size, execution of the fsadm command causes system hang if the free space in the file system is low. The following stack trace is displayed:
 
vx_svar_sleep_unlock()
vx_extentalloc_device() 
vx_extentalloc()
vx_reorg_emap()
vx_extmap_reorg()
vx_reorg()
vx_aioctl_full()
vx_aioctl_common()
vx_aioctl()
vx_unlocked_ioctl()
vfs_ioctl()
do_vfs_ioctl()
sys_ioctl()
tracesys()

And 

vxg_svar_sleep_unlock() 
vxg_grant_sleep()
vxg_api_lock()
vx_glm_lock()
vx_cbuf_lock()
vx_getblk_clust()
vx_getblk_cmn()
vx_getblk()
vx_getmap()
vx_getemap()
vx_extfind()
vx_searchau_downlevel() 
vx_searchau_uplevel()
vx_searchau()
vx_extentalloc_device() 
vx_extentalloc()
vx_reorg_emap()
vx_extmap_reorg()
vx_reorg()
vx_aioctl_full()
vx_aioctl_common()
vx_aioctl()
vx_unlocked_ioctl()
vfs_ioctl()
do_vfs_ioctl()
sys_ioctl()
tracesys()

DESCRIPTION:
While performing the fsadm operation, the secondary node in the CFS cluster is unable to allocate space from EAU (Extent Allocation Unit)
delegation given by the primary node. It requests the primary node for another delegation.
While giving such delegations, the primary node does not verify whether the EAU has exclusion zones set on it. It only verifies if it has enough free space.
On secondary node, the extent allocation cannot be done from EAU which has exclusion zone set, resulting in loop.

RESOLUTION:
The code is modified such that the primary node will not delegate EAU to the secondary node which have exclusion zone set on it.

* 3518943 (Tracking ID: 3534779)

SYMPTOM:
Internal stress testing on Cluster File System (CFS) hits a debug assert.

DESCRIPTION:
The assert was hit while refreshing the incore reference count
queue (rcq) values from the disk in response to a loadfs message. Due to which,
a race occurs with a rcq processing thread that has already advanced the incore
rcq indexes on a primary node in CFS.

RESOLUTION:
The code is modified to avoid selective updates in incore rcq.

* 3519809 (Tracking ID: 3463464)

SYMPTOM:
Internal kernel functionality conformance test hits a kernel panic due to null pointer dereference.

DESCRIPTION:
In the vx_fsadm_query()function, error handling code path incorrectly sets the nodeid to AnullA in the file system structure. As a result of clearing nodeid, any subsequent access to this field results in the kernel panic.

RESOLUTION:
The code is modified to improve the error handling code path.

* 3522003 (Tracking ID: 3523316)

SYMPTOM:
The writeback cache feature does not work for write size of 2MB.

DESCRIPTION:
In vx_wb_possible()function, the condition for checking of write size compatibility with write back caching skips the write request of 2MB from caching.

RESOLUTION:
The code is modified such that the conditions for checking compatibility of write size in vx_wb_possible() function allows write request of 2 MB for caching.

* 3528770 (Tracking ID: 3449152)

SYMPTOM:
The vxtunefs(1M) command fails to set the thin_friendly_alloc tunable in CFS.

DESCRIPTION:
The thin_friendly_alloc tunable is not supported on CFS. But when the vxtunefs(1M) command is used to set it in CFS, a false successful message is displayed.

RESOLUTION:
The code is modified to report error for the attempt to set the thin_friendly_alloc tunable in CFS.

* 3529852 (Tracking ID: 3463717)

SYMPTOM:
CFS does not support the 'thin_friendly_alloc' tunable. And, the vxtunefs(1M) command  man page is not updated with this information.

DESCRIPTION:
Since the man page does not explicitly mention that the 'thin_friendly_alloc' tunable is not supported, it is assumed that CFS supports this feature.

RESOLUTION:
The man page pertinent to the vxtunefs(1M) command  is updated to denote that CFS does not support the  'thin_friendly_alloc' tunable.

* 3530038 (Tracking ID: 3417321)

SYMPTOM:
The vxtunefs(1M) man page gives an incorrect

DESCRIPTION:
According to the current design, the tunable Adelicache_enableA is
enabled by default both in case of local mount and cluster mount. But, the man
page is not updated accordingly. It still specifies that this tunable is enabled
by default only in case of a local mount. The man page needs to be updated to
correct the

RESOLUTION:
The code is modified to update the man page of the vxtunefs(1m) tunable to
display the correct contents for the Adelicache_enableA tunable. Additional
information is provided with respect to the performance benefits, in case of CFS
being limited as compared to the local mount.
Also, in case of CFS, unlike the other CFS tunable parameters, there is a need
to explicitly turn this tunable on or off on each node.

* 3541125 (Tracking ID: 3541083)

SYMPTOM:
The vxupgrade(1M) command for layout version 10 creates 64-bit quota
files with inappropriate permission configurations.

DESCRIPTION:
Layout version 10 supports 64-bit quota feature. Thus, while
upgrading to version 10, 32-bit external quota files are converted to 64-bit.
During this conversion process, 64-bit files are created without specifying any
permission. Hence, random permissions are assigned to the 64-bit file, which
creates an impression that the conversion process was not successful as expected.

RESOLUTION:
The code is modified such that appropriate permissions are provided
while creating 64-bit quota files.

* 3424575 (Tracking ID: 3349651)

SYMPTOM:
Veritas File System (VxFS) modules fail to load on RHEL6.5 and display the following error message:
kernel: vxfs: disagrees about version of symbol putname
kernel: vxfs: disagrees about version of symbol getname

DESCRIPTION:
In RHEL6.5, the kernel interfaces for getname() and putname() functions used by VxFS have changed.

RESOLUTION:
The code is modified to use the latest kernel interfaces definitions for getname() and putname()functions.

* 3418489 (Tracking ID: 3370720)

SYMPTOM:
I/OAs pause periodically and this result in performance degradation. No explicit error is seen.

DESCRIPTION:
To avoid the kernel stack overflow, the work which consumes a large amount of stack is not done in the context of the original thread. Instead, such work items are added to a high priority work queue to be processed by a set of worker threads. If all the worker threads are busy, then there is an issue wherein the processing of the newly added work items in the work queue is subjected to an additional delay which in turn results in periodic stalls.

RESOLUTION:
The code is modified such that the high priority work items are processed by a set of dedicated worker threads. These dedicated threads do not have an issue when all the threads are busy and hence do not trigger periodic stalls.



INSTALLING THE PATCH
--------------------
Run the Installer script to automatically install the patch:
-----------------------------------------------------------
Please be noted that the installation of this P-Patch will cause downtime.

To install the patch perform the following steps on at least one node in the cluster:
1. Copy the patch sfha-rhel6_x86_64-Patch-6.1.1.700.tar.gz to /tmp
2. Untar sfha-rhel6_x86_64-Patch-6.1.1.700.tar.gz to /tmp/hf
    # mkdir /tmp/hf
    # cd /tmp/hf
    # gunzip /tmp/sfha-rhel6_x86_64-Patch-6.1.1.700.tar.gz
    # tar xf /tmp/sfha-rhel6_x86_64-Patch-6.1.1.700.tar
3. Install the hotfix(Please be noted that the installation of this P-Patch will cause downtime.)
    # pwd /tmp/hf
    # ./installSFHA611P700 [<host1> <host2>...]

You can also install this patch together with 6.1.1 maintenance release using Install Bundles
1. Download this patch and extract it to a directory
2. Change to the Veritas InfoScale 6.1.1 directory and invoke the installmr script
   with -patch_path option where -patch_path should point to the patch directory
    # ./installmr -patch_path [<path to this patch>] [<host1> <host2>...]

Install the patch manually:
--------------------------
Manual installation is not supported.


REMOVING THE PATCH
------------------
Manual uninstallation is not supported.


SPECIAL INSTRUCTIONS
--------------------
NONE


OTHERS
------
NONE