Storage Foundation HA on Solaris, patch detail

To use SORT, JavaScript must be enabled. How to enable JavaScript.

sfha-sol11_x64-Patch-6.0.5.3100 Go to Download Center to download.

Basic information

Release type:	Patch
Release date:	2021-02-04
OS update support:	None
Technote:	None
Documentation:	None
Popularity:	223 viewed downloaded
Download size:	65.68 MB
Checksum:	3892918364

Applies to one or more of the following products:

VirtualStore 6.0.1 On Solaris 11 X64
Cluster Server 6.0.1 On Solaris 11 X64
Dynamic Multi-Pathing 6.0.1 On Solaris 11 X64
Storage Foundation 6.0.1 On Solaris 11 X64
Storage Foundation Cluster File System 6.0.1 On Solaris 11 X64
Storage Foundation for Oracle RAC 6.0.1 On Solaris 11 X64
Storage Foundation HA 6.0.1 On Solaris 11 X64

Obsolete patches, incompatibilities, superseded patches, or other requirements:

None.

Fixes the following incidents:

2705336, 2912412, 2912435, 2923805, 2927359, 2928921, 2933290, 2933291, 2933292, 2933294, 2933296, 2933300, 2933301, 2933309, 2933313, 2933325, 2933326, 2933751, 2933822, 2937367, 2947029, 2959557, 2972674, 2976664, 2978227, 2978234, 2978236, 2982161, 2983249, 2984589, 2987373, 2999566, 3007184, 3018873, 3021281, 3027250, 3040130, 3056103, 3059000, 3108176, 3131798, 3131799, 3131826, 3248029, 3248031, 3248042, 3248046, 3248051, 3248054, 3248089, 3248090, 3248094, 3248096, 3248099, 3284764, 3296988, 3299685, 3306410, 3310758, 3321730, 3322294, 3323912, 3338024, 3338026, 3338030, 3338063, 3338762, 3338776, 3338779, 3338780, 3338787, 3338790, 3339230, 3339884, 3340029, 3351946, 3351947, 3359278, 3364285, 3364289, 3364302, 3364307, 3364317, 3364333, 3364335, 3364338, 3364349, 3370650, 3372909, 3380905, 3396539, 3402484, 3402643, 3405172, 3426534, 3430687, 3469683, 3496010, 3496715, 3498950, 3498976, 3498978, 3499005, 3499008, 3499011, 3499030, 3501358, 3514824, 3515559, 3515842, 3517702, 3521727, 3526501, 3531332, 3540777, 3544831, 3552411, 3579957, 3581566, 3584297, 3590573, 3593181, 3597560, 3600161, 3603811, 3612801, 3614184, 3621240, 3622069, 3638039, 3648603, 3654163, 3682640, 3683021, 3690795, 3713320, 3726112, 3734985, 3737823, 3774137, 3788751, 3796626, 3796633, 3796644, 3796652, 3796684, 3796687, 3796727, 3796731, 3796733, 3796745, 3796759, 3796763, 3796766, 3799822, 3799901, 3799999, 3800394, 3800396, 3800449, 3800452, 3800788, 3801225, 3805938, 3806808, 3807761, 3809787, 3814808, 3816233, 3821416, 3821425, 3821490, 3826918, 3851967, 3852297, 3852512, 3856482, 3862425, 3862435, 3864474, 3864751, 3866970, 3871503, 3875894, 3914135, 3934424, 3934645, 3938132, 3940138, 3947295, 3990430, 4003603, 4003629, 4003630, 4004119, 4004573, 4004574, 4004575, 4004576, 4004578, 4005400, 4005401, 4005558, 4005559, 4005560, 4005566, 4005681, 4005702, 4005749, 4005752, 4005753, 4005754, 4005765, 4005767, 4005771, 4005773, 4005776, 4005873, 4005890, 4006754, 4006755, 4007647, 4008103, 4008104, 4008105, 4008106, 4019779

Patch ID:

VRTSvxvm-6.0.500.5100
VRTSaslapm-6.0.500.800
VRTSllt-6.0.500.1100
VRTSgab-6.0.500.1100
VRTSvxfen-6.0.500.3100
VRTSamf-6.0.500.1100
VRTSodm-6.0.500.4200
VRTSvxfs-6.0.500.6200

Readme file

* * * READ ME * * *
* * * Veritas Storage Foundation HA 6.0.5 * * *
* * * Patch 6.0.5.3100 * * *
Patch Date: 2020-11-30

This document provides the following information:

* PATCH NAME
* OPERATING SYSTEMS SUPPORTED BY THE PATCH
* PACKAGES AFFECTED BY THE PATCH
* BASE PRODUCT VERSIONS FOR THE PATCH
* SUMMARY OF INCIDENTS FIXED BY THE PATCH
* DETAILS OF INCIDENTS FIXED BY THE PATCH
* INSTALLATION PRE-REQUISITES
* INSTALLING THE PATCH
* REMOVING THE PATCH

PATCH NAME
----------
Veritas Storage Foundation HA 6.0.5 Patch 6.0.5.3100

OPERATING SYSTEMS SUPPORTED BY THE PATCH
----------------------------------------
Solaris 11 X86

PACKAGES AFFECTED BY THE PATCH
------------------------------
VRTSamf
VRTSaslapm
VRTSgab
VRTSllt
VRTSodm
VRTSvxfen
VRTSvxfs
VRTSvxvm

BASE PRODUCT VERSIONS FOR THE PATCH
-----------------------------------
* Symantec VirtualStore 6.0.1
* Veritas Cluster Server 6.0.1
* Veritas Dynamic Multi-Pathing 6.0.1
* Veritas Storage Foundation 6.0.1
* Veritas Storage Foundation Cluster File System HA 6.0.1
* Veritas Storage Foundation for Oracle RAC 6.0.1
* Veritas Storage Foundation HA 6.0.1

SUMMARY OF INCIDENTS FIXED BY THE PATCH
---------------------------------------
Patch ID: VRTSvxvm-6.0.500.5100
* 3856482 (3852146) Shared DiskGroup(DG) fails to import when "-c" and "-o noreonline" options
are
specified together
* 3934424 (3932241) VxVM (Veritas Volume Manager) creates some required files under /tmp and
/var/tmp directories. These directories could be modified by non-root users and
will affect the Veritas Volume Manager Functioning.
* 3934645 (3934618) System couldn't start up after encapsulated boot disk.
* 3938132 (3269268) vxmake utility may core dumps when there are many VxVM records pending.
* 3940138 (3921994) Failure in the backup for disk group, Temporary files such as
<DiskGroup>.bslist .cslist .perm are seen in the directory /var/temp.
* 3947295 (3947294) /usr/lib/vxvm/bin/vxattachd: line 1228: /tmp/vx.XXXX/vx.XXXX: No such file or
directory
* 4004119 (3964779) Changes to support Solaris 11.4 with Volume Manager
* 4005558 (3386862) "vxdmpadm iostat show" does not display IO statistics on LDOM with Solaris
11.
* 4005559 (3722219) VVR(Veritas Volume Replicator) Rlink status does not change to up-to-date
few times even after synchronization completes.
* 4005560 (3857120) Commands like vxdg deport which try to close a VxVM volume might hang.
* 4005566 (3926067) vxassist relayout /vxassist commands may fail in Campus Cluster environment.
* 4005749 (3890031) Node panicked due to release a lock twice in vxio kernel extension.
* 4005752 (3893756) 'vxconfigd' is holding a task device for long time, after the kernel counter rewinds, it may create a boundary issue.
* 4005753 (3390959) The vxconfigd(1M) daemon hangs in the kernel while
processing the I/O request.
* 4005754 (3900463) vxassist may fail to create a volume on disks having size in terabytes when '-o
ordered' clause is used along with 'col_switch' attribute while creating the volume.
* 4005765 (3910228) Registration of GAB(Global Atomic Broadcast) port u fails on slave nodes after
multiple new devices are added to the system.
* 4005767 (3795739) In a split brain scenario, cluster formation takes very long time.
* 4005771 (3187997) In a split brain scenario, cluster formation takes very long time.
* 4005773 (3868533) IO hang happens because of a deadlock situation.
* 4005776 (3864063) Application I/O hangs because of a race between the Master Pause SIO (Staging
I/O) and the Error Handler SIO.
Patch ID: VRTSvxvm-6.0.500.300
* 3496010 (3107699) VxDMP (Veritas Dynamic MultiPathing) causes system panic
after a shutdown/reboot.
* 3496715 (3281004) For DMP minimum queue I/O policy with large number of CPUs a couple of issues
are observed.
* 3501358 (3399323) The reconfiguration of Dynamic Multipathing (DMP) database fails.
* 3521727 (3521726) System panicked for double freeing IOHINT.
* 3526501 (3526500) Disk IO failures occur with DMP IO timeout error messages when DMP (Dynamic Multi-pathing) IO statistics demon is not running.
* 3531332 (3077582) A Veritas Volume Manager (VxVM) volume may become inaccessible causing the read/write operations to fail.
* 3540777 (3539548) After adding removed MPIO disk back, 'vxdisk list' or 'vxdmpadm listctlr all'
commands may show duplicate entry for DMP node with error state.
* 3552411 (3482026) The vxattachd(1M) daemon reattaches plexes of manually detached site.
* 3600161 (3599977) During a replica connection, referencing a port that is already deleted in another thread causes a system panic.
* 3603811 (3594158) The spinlock and unspinlock are referenced to different objects when interleaving with a kernel transaction.
* 3612801 (3596330) 'vxsnap refresh' operation fails with `Transaction aborted waiting for IO
drain` error
* 3614184 (3614182) First system reboot after migration from Solaris Multi-Pathing (MPXIO) to
Symantec Dynamic Multi-Pathing (DMP) native takes extremely long time.
* 3621240 (3621232) The vradmin ibc command cannot be started or executed on Veritas Volume Replicators (VVR) secondary node.
* 3622069 (3513392) Reference to replication port that is already deleted caused panic.
* 3638039 (3625890) vxdisk resize operation on CDS disks fails with an error message of "Invalid
attribute specification"
* 3648603 (3564260) VVR commands are unresponsive when replication is paused and resumed in a loop.
* 3654163 (2916877) vxconfigd hangs on a node leaving the cluster.
* 3683021 (3544980) Vxconfigd reports error message like "vxconfigd V-5-1-7920 di_init() failed" after
SAN tape online event.
* 3690795 (2573229) On RHEL6, the server panics when Dynamic Multi-Pathing (DMP) executes
PERSISTENT RESERVE IN command with REPORT CAPABILITIES service action on
powerpath controlled device.
* 3713320 (3596282) Snap operations fail with error "Failed to allocate a new map due to no free
map available in DCO".
* 3734985 (3727939) Data corruption may occur due to stale device entries in the /dev/vx/[r]dmp directories.
* 3737823 (3736502) Memory leakage is found when transaction aborts.
* 3774137 (3565212) IO failure is seen during controller giveback operations
on Netapp Arrays in ALUA mode.
* 3788751 (3788644) Reuse raw device number when checking for available raw devices.
* 3799822 (3573262) System panic during space optimized snapshot operations
on recent UltraSPARC architectures.
* 3800394 (3672759) The vxconfigd(1M) daemon may core dump when DMP database is corrupted.
* 3800396 (3749557) System hangs because of high memory usage by vxvm.
* 3800449 (3726110) On systems with high number of CPU's, Dynamic Multipathing (DMP) devices may perform
considerably slower than OS device
paths.
* 3800452 (3437852) The system panics when Symantec Replicator Option goes to
PASSTHRU mode.
* 3800788 (3648719) The server panics while adding or removing LUNs or HBAs.
* 3801225 (3662392) In the Cluster Volume Manager (CVM) environment, if I/Os are getting executed
on slave node, corruption can happen when the vxdisk resize(1M) command is
executing on the master node.
* 3805938 (3790136) File system hang observed due to IO's in Ditry Region Logging (DRL).
* 3806808 (3645370) vxevac command fails to evacuate disks with Dirty Region Log(DRL) plexes.
* 3807761 (3729078) VVR(Veritas Volume Replication) secondary site panic occurs during patch
installation because of flag overlap issue.
* 3809787 (3677359) VxDMP (Veritas Dynamic MultiPathing) causes system panic
after a shutdown or reboot.
* 3814808 (3606480) Enabling dmp_native_support fails since the devid and phys_path are not
populated.
* 3816233 (3686698) vxconfigd was getting hung due to deadlock between two threads
* 3821425 (3783356) VxDMP Module failure handling in Solaris
* 3826918 (3819670) poll() return -1 with errno EINTR should be handled correctly in
vol_admintask_wait()
Patch ID: VRTSaslapm-6.0.500.800
* 4019779 (4019778) VRTSaslapm support for Solaris 11.4
Patch ID: VRTSodm-6.0.500.4200
* 4003630 (3968788) ODM module failed to load on Solaris 11.4.
* 4006755 (4006757) ODM module failed to load on Solaris 11.4 x86 arch.
* 4007647 (3616753) In solaris 11, odm module is not loading automatically after
rebooting the machine.
Patch ID: VRTSodm-6.0.500.400
* 3799901 (3451730) Installation of VRTSodm, VRTSvxfs in a zone
fails when running zoneadm -z Zone attach -U
Patch ID: VRTSodm-6.0.500.100
* 3515842 (3481825) The system is unable to turn off the vxodm service only for Smart Flash Cache. The system panics
when it attempts to use the ZFS raw.
* 3544831 (3525858) The system panics when the Oracle Disk Manager (ODM) device (/dev/odm) is mounted in a Solaris Zone.
Patch ID: VRTSodm-6.0.500.000
* 3322294 (3323866) Some ODM operations may fail with "ODM ERROR V-41-4-1-328-22 Invalid argument"
Patch ID: VRTSodm-6.0.300.000
* 3018873 (3018869) On Solaris 11 update 1 fsadm command shows that the mountpoint is not a vxfs
file system
Patch ID: VRTSvxfs-6.0.500.6200
* 3990430 (3911048) LDH corrupt and filesystem hang.
* 4003603 (3972641) Handle deprecated Solaris function calls in VxFS code
* 4003629 (3968785) VxFS module failed to load on Solaris 11.4.
* 4004573 (3560187) The kernel may panic when the buffer is freed in the vx_dexh_preadd_space()
function with the message "Data Key Miss Fault in kernel mode".
* 4004574 (3779916) vxfsconvert fails to upgrade layout verison for a vxfs file
system with large number of inodes.
* 4004575 (3870832) Panic due to a race between force umount and nfs lock
manager
vnode get operation.
* 4004576 (3904794) Extending qio file fail with EINVAL error if reservation block is not set.
* 4004578 (3457803) File System gets disabled intermittently with metadata IO error.
* 4005873 (3978615) VxFS filesystem is not getting mounted after OS upgrade and first reboot
* 4005890 (3898565) Solaris no longer supports F_SOFTLOCK.
* 4006754 (4006756) VxFS module failed to load on Solaris 11.4 x86 arch.
Patch ID: VRTSvxfs-6.0.500.400
* 2927359 (2927357) Assert hit in internal testing.
* 2933300 (2933297) Compression support for the dedup ioctl.
* 2972674 (2244932) Internal assert failure during testing.
* 3040130 (3137886) Thin Provisioning Logging does not work for reclaim
operations triggered via fsadm command.
* 3248031 (2628207) Full fsck operation taking long time.
* 3682640 (3637636) Cluster File System (CFS) node initialization and protocol upgrade may hang
during the rolling upgrade.
* 3726112 (3704478) The library routine to get the mount point, fails to return
mount point of root file system.
* 3796626 (3762125) Directory size increases abnormally.
* 3796633 (3729158) Deadlock occurs due to incorrect locking order between write advise and dalloc flusher thread.
* 3796644 (3269553) VxFS returns inappropriate message for read of hole via
Oracle Disk Manager (ODM).
* 3796652 (3686438) NMI panic in the vx_fsq_flush function.
* 3796684 (3601198) Replication makes the copies of 64-bit external quota files too.
* 3796687 (3604071) High CPU usage consumed by the vxfs thread process.
* 3796727 (3617191) Checkpoint creation takes a lot of time.
* 3796731 (3558087) The ls -l and other commands which uses stat system call may
take long time to complete.
* 3796733 (3695367) Unable to remove volume from multi-volume VxFS using "fsvoladm" command.
* 3796745 (3667824) System panicked while delayed allocation(dalloc) flushing.
* 3796759 (3615043) Data loss when writing to a file while dalloc is on.
* 3796763 (2560032) System panics after SFHA is upgraded from 5.1SP1 to 5.1SP1RP2 or from 6.0.1 to 6.0.5
* 3796766 (3451730) Installation of VRTSodm, VRTSvxfs in a zone
fails when running zoneadm -z Zone attach -U
* 3799999 (3602322) System panics while flushing the dirty pages of the inode.
* 3821416 (3817734) Direct command to run fsck with -y|Y option was mentioned in
the message displayed to user when file system mount fails.
* 3821490 (3451730) Installation of VRTSodm, VRTSvxfs in a zone
fails when running zoneadm -z Zone attach -U
* 3851967 (3852324) Assert failure during internal stress testing.
* 3852297 (3553328) During internal testing full fsck failed to clean the file
system cleanly.
* 3852512 (3846521) "cp -p" fails if modification time in nano seconds have 10
digits.
* 3862425 (3859032) System panics in vx_tflush_map() due to NULL pointer
de-reference.
* 3862435 (3833816) Read returns stale data on one node of the CFS.
* 3864751 (3867147) Assert failed in internal dedup testing.
* 3866970 (3866962) Data corruption seen when dalloc writes are going on the file and
simultaneously fsync started on the same file.
Patch ID: VRTSvxfs-6.0.500.100
* 3402643 (3413926) Internal testing hangs due to high memory consumption resulting in fork failure.
* 3469683 (3469681) File system is disabled while free space defragmentation is going on.
* 3498950 (3356947) When there are multi-threaded writes with fsync calls between them, VxFS becomes slow.
* 3498976 (3434811) The vxfsconvert(1M) in VxFS 6.1 hangs.
* 3498978 (3424564) fsppadm fails with ENODEV and "file is encrypted or is not a
database" errors
* 3499005 (3469644) System panics in the vx_logbuf_clean() function.
* 3499008 (3484336) The fidtovp() system call can panic in the vx_itryhold_locked () function.
* 3499011 (3486726) VFR logs too much data on the target node.
* 3499030 (3484353) The file system may hang with a partitioned directory feature enabled.
* 3514824 (3443430) Fsck allocates too much memory.
* 3515559 (3498048) while the system is making backup, the Als AlA command on the same file system may hang.
* 3517702 (3517699) Return code 240 for command fsfreeze(1M) is not documented in man page for fsfreeze.
* 3579957 (3233315) "fsck" utility dumps core, with full scan.
* 3581566 (3560968) The delicache_enable tunable is not persistent in the Cluster File System (CFS) environment.
* 3584297 (3583930) While external quota file is restored or over-written, old quota records are preserved.
* 3590573 (3331010) Command fsck(1M) dumped core with segmentation fault
* 3593181 (3331105) Command fsck can not handle the case wherein two reorg inodes
point to same source inode.
* 3597560 (3597482) The pwrite(2) function fails with the EOPNOTSUPP error.
Patch ID: VRTSvxfs-6.0.500.000
* 2705336 (2059611) The system panics due to a NULL pointer dereference while
flushing bitmaps to the disk.
* 2933290 (2756779) The code is modified to improve the fix for the read and write performance
concerns on Cluster File System (CFS) when it runs applications that rely on
the POSIX file-record using the fcntl lock.
* 2933301 (2908391) It takes a long time to remove checkpoints from the VxFS file system, when there
are a large number of files present.
* 2947029 (2926684) In rare cases, the system may panic while performing a logged write.
* 2959557 (2834192) You are unable to mount the file system after the full fsck(1M) utility is run.
* 2978234 (2972183) The fsppadm(1M) enforce command takes a long time on the secondary nodes
compared to the primary nodes.
* 2978236 (2977828) The file system is marked bad after an inode table overflow
error.
* 2982161 (2982157) During internal testing, the Af:vx_trancommit:4A debug asset was hit when the available transaction space is lesser than required.
* 2983249 (2983248) The vxrepquota(1M) command dumps core.
* 2999566 (2999560) The 'fsvoladm'(1M) command fails to clear the 'metadataok' flag on a volume.
* 3027250 (3031901) The 'vxtunefs(1M)' command accepts the garbage value for the 'max_buf_dat_size' tunable.
* 3056103 (3197901) prevent duplicate symbol in VxFS libvxfspriv.a and
vxfspriv.so
* 3059000 (3046983) Invalid CFS node number in ".__fsppadm_fclextract", causes the DST policy
enforcement failure.
* 3108176 (2667658) The 'fscdsconv endian' conversion operation fails because of a macro overflow.
* 3131798 (2839871) On a system with DELICACHE enabled, several file system operations may
hang.
* 3131799 (2833450) The fstyp(1M) command displays a negative value for ninode
on file systems more than 2 Tera Byte (TB).
* 3131826 (2966277) Systems with high file system activity like read/write/open/lookup may panic
the system.
* 3248029 (2439261) When the vx_fiostats_tunable value is changed from zero to
non-zero, the system panics.
* 3248042 (3072036) Read operations from secondary node in CFS can sometimes fail with the ENXIO
error code.
* 3248046 (3092114) The information output displayed by the "df -i" command may be inaccurate for
cluster mounted file systems.
* 3248051 (3121933) The pwrite(2) function fails with the EOPNOTSUPP error.
* 3248054 (3153919) The fsadm (1M) command may hang when the structural file set re-organization is
in progress.
* 3248089 (3003679) When running the fsppadm(1M) command and removing a file with the named
stream attributes (nattr) at the same time, the file system does not respond.
* 3248090 (2963763) When the thin_friendly_alloc() and deliache_enable() functionality is enabled,
VxFS may enter a deadlock.
* 3248094 (3192985) Checkpoints quota usage on Cluster File System (CFS) can be negative.
* 3248096 (3214816) With the DELICACHE feature enabled, frequent creation and deletion of the inodes
of a user may result in corruption of the user quota file.
* 3248099 (3189562) Oracle daemons get hang with the vx_growfile() kernel function.
* 3284764 (3042485) During internal stress testing, the f:vx_purge_nattr:1 assert fails.
* 3296988 (2977035) A debug assert issue was encountered in vx_dircompact() function while running an internal noise test in the Cluster File System (CFS) environment
* 3299685 (2999493) The file system check validation fails with an error message after a successful full fsck operation during internal testing.
* 3306410 (2495673) Mismatch of concurrent I/O related data in an inode is observed during communication between the nodes in a cluster.
* 3310758 (3310755) Internal testing hits a debug assert Avx_rcq_badrecord:9:corruptfsA.
* 3321730 (3214328) A mismatch is observed between the states for the Global Lock Manager (GLM) grant level and the Global Lock Manager (GLM) data in a Cluster File System (CFS) inode.
* 3323912 (3259634) A Cluster File System (CFS) with blocks larger than 4GB may
become corrupt.
* 3338024 (3297840) A metadata corruption is found during the file removal process.
* 3338026 (3331419) System panic because of kernel stack overflow.
* 3338030 (3335272) The mkfs (make file system) command dumps core when the log
size provided is not aligned.
* 3338063 (3332902) While shutting down, the system running the fsclustadm(1M)
command panics.
* 3338762 (3096834) Intermittent vx_disable messages are displayed in the system log.
* 3338776 (3224101) After you enable the optimization for updating the i_size across the cluster
nodes lazily, the system panics.
* 3338779 (3252983) On a high-end system greater than or equal to 48 CPUs, some file system operations may hang.
* 3338780 (3253210) File system hangs when it reaches the space limitation.
* 3338787 (3261462) File system with size greater than 16TB corrupts with vx_mapbad messages in the system log.
* 3338790 (3233284) FSCK binary hangs while checking Reference Count Table (RCT.
* 3339230 (3308673) A fragmented file system is disabled when delayed allocations
feature is enabled.
* 3339884 (1949445) System is unresponsive when files were created on large directory.
* 3340029 (3298041) With the delayed allocation feature enabled on a locally
mounted file system, observable performance degradation might be experienced
when writing to a file and extending the file size.
* 3351946 (3194635) The internal stress test on a locally mounted file system exited with an error message.
* 3351947 (3164418) Internal stress test on locally mounted VxFS filesytem results in data corruption in no space on device scenario while doing spilt on Zero Fill-On-Demand(ZFOD) extent
* 3359278 (3364290) The kernel may panic in Veritas File System (VxFS) when it is
internally working on reference count queue (RCQ) record.
* 3364285 (3364282) The fsck(1M) command fails to correct inode list file
* 3364289 (3364287) Debug assert may be hit in the vx_real_unshare() function in the cluster environment.
* 3364302 (3364301) Assert failure because of improper handling of inode lock while truncating a reorg inode.
* 3364307 (3364306) Stack overflow seen in extent allocation code path.
* 3364317 (3364312) The fsadm(1M) command is unresponsive while processing the VX_FSADM_REORGLK_MSG message.
* 3364333 (3312897) System can hang when the Cluster File System (CFS) primary node is disabled.
* 3364335 (3331109) The full fsck does not repair the corrupted reference count queue (RCQ) record.
* 3364338 (3331045) Kernel Oops in unlock code of map while referring freed mlink due to a race with iodone routine for delayed writes.
* 3364349 (3359200) Internal test on Veritas File System (VxFS) fsdedup(1M) feature in cluster file system environment results in
a hang.
* 3370650 (2735912) The performance of tier relocation using the fsppadm(1M)
enforce command degrades while migrating a large number of files.
* 3372909 (3274592) Internal noise test on cluster file system is unresponsive while executing the fsadm(1M) command
* 3380905 (3291635) Internal testing found debug assert Avx_freeze_block_threads_all:7cA on locally mounted file systems while processing preambles for transactions.
* 3396539 (3331093) Issue with MountAgent Process for vxfs. While doing repeated
switchover on HP-UX, MountAgent got stuck.
* 3402484 (3394803) A panic is observed in VxFS routine vx_upgrade7() function
while running the vxupgrade command(1M).
* 3405172 (3436699) An assert failure occurs because of a race condition between clone mount thread and directory removal thread while pushing data on clone.
* 3426534 (3426511) Unloading the VxFS modules may fail on Solaris 11 even after successful uninstallation of the VxFS package.
* 3430687 (3444775) Internal noise testing on cluster file system results in a kernel panic in function vx_fsadm_query() with an error message.
Patch ID: VRTSvxfs-6.0.300.000
* 2928921 (2843635) Internal testing is having some failures.
* 2933290 (2756779) The code is modified to improve the fix for the read and write performance
concerns on Cluster File System (CFS) when it runs applications that rely on
the POSIX file-record using the fcntl lock.
* 2933291 (2806466) A reclaim operation on a file system that is mounted on a
Logical Volume Manager (LVM) may panic the system.
* 2933292 (2895743) Accessing named attributes for some files stored in CFS seems to be slow.
* 2933294 (2750860) Performance of the write operation with small request size
may degrade on a large file system.
* 2933296 (2923105) Removal of the VxFS module from the kernel takes a longer time.
* 2933309 (2858683) Reserve extent attributes changed after vxrestore, for files greater than
8192bytes.
* 2933313 (2841059) full fsck fails to clear the corruption in attribute inode 15
* 2933325 (2905820) If the file is being read via the NFSv4 client, then removing
the same file on the NFSv4 server may hang if the file system is VxFS.
* 2933326 (2827751) High kernel memory allocation is observed when Oracle Disk
Manager (ODM) is used with non-VxVM devices.
* 2933751 (2916691) Customer experiencing hangs when doing dedups
* 2933822 (2624262) Filestore:Dedup:fsdedup.bin hit oops at vx_bc_do_brelse
* 2937367 (2923867) Internal test hits an assert "f:xted_set_msg_pri1:1".
* 2976664 (2906018) The vx_iread errors are displayed after successful log replay and mount of the
file system.
* 2978227 (2857751) The internal testing hits the assert "f:vx_cbdnlc_enter:1a".
* 2984589 (2977697) A core dump is generated while you are removing the clone.
* 2987373 (2881211) File ACLs not preserved in checkpoints properly if file has hardlink.
* 3007184 (3018869) On Solaris 11 update 1 fsadm command shows that the mountpoint is not a vxfs
file system
* 3021281 (3013950) Solaris 11 update 1 validation encounters the following test
assert "f:vx_info_init:2" during internal testing.
Patch ID: VRTSvxfs-6.0.100.200
* 2912412 (2857629) File system corruption can occur requiring a full fsck of the
system.
* 2912435 (2885592) vxdump to the vxcompress file system is aborted
* 2923805 (2590918) Delay in freeing unshared extents upon primary switch over.
Patch ID: VRTSamf-6.0.5.1100
* 4008106 (4008102) Veritas Cluster Server does not support Oracle Solaris 11.4.
Patch ID: VRTSamf-6.0.5.100
* 3871503 (3871501) On Solaris 11.2 SRU 8 or above with Asynchronous
monitoring Framework (AMF) enabled, VCS agent processes may not respond or
may encounter AMF errors during registration.
* 3875894 (3873866) Veritas Infoscale Availability does not support Oracle Solaris 11.3.
Patch ID: VRTSvxfen-6.0.5.3100
* 3864474 (3864470) The I/O fencing configuration script exits after a limited number of retries.
* 3914135 (3913303) Non root users should have read permissions for VxFEN log files.
* 4005681 (3960112) Fencing support for vSAN over iSCSI
* 4005702 (3935040) The Cluster Server component creates some required files in
the /tmp and /var/tmp directories.
* 4008105 (4008102) Veritas Cluster Server does not support Oracle Solaris 11.4.
Patch ID: VRTSvxfen-6.0.500.300
* 3864474 (3864470) The I/O fencing configuration script exits after a limited number of retries.
* 3914135 (3913303) Non root users should have read permissions for VxFEN log files.
Patch ID: VRTSgab-6.0.5.1100
* 4005400 (3984685) In a rare case, peer MAC/IP learning may fail, and the current debug capability is not enough to detect the problem.
* 4008104 (4008102) Veritas Cluster Server does not support Oracle Solaris 11.4.
Patch ID: VRTSllt-6.0.5.1100
* 4005400 (3984685) In a rare case, peer MAC/IP learning may fail, and the current debug capability is not enough to detect the problem.
* 4005401 (3935040) The Cluster Server component creates some required files in
the /tmp and /var/tmp directories.
* 4008103 (4008102) Veritas Cluster Server does not support Oracle Solaris 11.4.

DETAILS OF INCIDENTS FIXED BY THE PATCH
---------------------------------------
This patch fixes the following incidents:

Patch ID: VRTSvxvm-6.0.500.5100

* 3856482 (Tracking ID: 3852146)

SYMPTOM:
In a CVM cluster, when importing a shared diskgroup specifying both -c and -o
noreonline options, the following error may be returned:
VxVM vxdg ERROR V-5-1-10978 Disk group <dgname>: import failed: Disk for disk
group not found.

DESCRIPTION:
The -c option will update the disk ID and disk group ID on the private region
of the disks in the disk group being imported. Such updated information is not
yet seen by the slave because the disks have not been re-onlined (given that
noreonline option is specified). As a result, the slave cannot identify the
disk(s) based on the updated information sent from the master, causing the
import to fail with the error Disk for disk group not found.

RESOLUTION:
The code is modified to handle the working of the "-c" and "-o noreonline"
options together.

* 3934424 (Tracking ID: 3932241)

SYMPTOM:
VxVM (Veritas Volume Manager) creates some required files under /tmp
and /var/tmp directories.

DESCRIPTION:
VxVM (Veritas Volume Manager) creates some required files under /tmp and
/var/tmp directories.
The non-root users have access to these folders, and they may accidently modify,
move or delete those files.
Such actions may interfere with the normal functioning of the Veritas Volume
Manager.

RESOLUTION:
This Hot Fix address the issue by moving the required Veritas Volume Manager
files to secure location.

* 3934645 (Tracking ID: 3934618)

SYMPTOM:
System couldn't start up after encapsulated boot disk. System keeps panic
with bellow info:
NOTICE: VxVM vxio V-5-0-74 Cannot open disk ROOTDISK: kernel error 6
Cannot open mirrored root device, error 6
Cannot remount root on /pseudo/vxio@0:0 fstype ufs

panic[cpu0]/thread=180e000: vfs_mountroot: cannot remount root
genunix:vfs_mountroot+398 ()
genunix:main+10c ()

DESCRIPTION:
Due to a bug in DMP(Dynamic Multi-Pathing), after encapsulated boot disk,
DMP gets a wrong device number of boot disk. This will cause the failure of
boot device open, and further cause OS couldn't mount the file system, hence
the issue.

RESOLUTION:
Code changes have been done to initialize the boot device number correctly
in boot mode.

* 3938132 (Tracking ID: 3269268)

SYMPTOM:
vxmake utility may core dumps when there are many VxVM records pending.

DESCRIPTION:
vxmake command may fail and generates core in case there are many VxVM records
pending.In this situation, an VxVM object is freed and later tried to be used
again. As the object has invalid address, accessing it later leads to core dump.

RESOLUTION:
Appropriate code changes are done to avoid the core dump.

* 3940138 (Tracking ID: 3921994)

SYMPTOM:
Temporary files such as <DiskGroup>.bslist .cslist .perm are seen in the
directory /var/temp.

DESCRIPTION:
When ADD and REMOVE operations of disks of a disk group are done between the
interval of two backups, a failure in the next backup of the same disk group is
observed, which is why the files are left behind in the directory as specified
in

RESOLUTION:
Corrected the syntax errors in the code, to handle the vxconfigbackup issue.

* 3947295 (Tracking ID: 3947294)

SYMPTOM:
"vxattachd" generates following error while trying to remove a file on restart:

/usr/lib/vxvm/bin/vxattachd: line 1228:
/tmp/vx.23445.13597.16922.3573/vx.8791.9854.20413.3573: No such file or
directory
/bin/cat: /tmp/vx.23445.13597.16922.3573/vx.19111.10626.17489.3573: No such file
or directory
/bin/rm: cannot remove
`/tmp/vx.23445.13597.16922.3573/vx.19111.10626.17489.3573': No such file or
directory

DESCRIPTION:
Issue occurs while trying to remove a temporary file which is not present on
system.

RESOLUTION:
Made code changes so that error does not occur even if the file is not present.

* 4004119 (Tracking ID: 3964779)

SYMPTOM:
Current load of Vxvm modules i.e vxio and vxspec is failing on Solaris 11.4

DESCRIPTION:
The function page_numtopp_nolock has been replaced and renamed as pp_for_pfn_canfail. The _depends_on has been deprecated and cannot be used. VxVM was making use of the attribute to specify the dependency between the modules.

RESOLUTION:
The changes are mainly around the way we handle unmapped buf in vxio driver.
The Solaris API that we were using is no longer valid and is a private API.
Replaced hat_getpfnum() -> ppmapin/ppmapout calls with bp_copyin/bp_copyout in I/O code path.
In ioshipping, replaced it with miter approach and hat_kpm_paddr_mapin()/hat_kpm_paddr_mapout.

* 4005558 (Tracking ID: 3386862)

SYMPTOM:
"vxdmpadm iostat show" command display zeroes in I/O statistics on Solaris
11
LDOM after throwing i/os
vxdmpadm iostat show dmpnodename=dmp_node_name interval=10
count=10
cpu usage = 6us per cpu memory = 0b
OPERATIONS BLOCKS AVG
TIME(ms)
PATHNAME READS WRITES READS WRITES READS WRITES
c3t500009780800A1E4d614s2 0 0 0 0 0.00
0.00
c4t500009780800A1E5d614s2 0 0 0 0 0.00
0.00

c3t500009780800A1E4d614s2 0 0 0 0 0.00
0.00
c4t500009780800A1E5d614s2 0 0 0 0 0.00
0.00

DESCRIPTION:
From Solaris 11, the _ncpu has been increased.Since the memory is not
allocated for storing the iostat records, DMP is unable to store statistics
for same.The statistics comes out as NULL hence no records are maintained.
This caused DMP iostat to be shown as 0.

RESOLUTION:
The Macro which represents the maximum size of per-CPU iostats buffer has
been changed to resolve this issue.

* 4005559 (Tracking ID: 3722219)

SYMPTOM:
VVR(Veritas Volume Replicator) Rlink status may not change to up-to-date
even after synchronization completes.

This is inconsistent issue, vxrlink status command may show
incorrect/negative number of outstanding writes as follows:
Rlink <rlink_name >has -5044 outstanding writes, occupying 87440 Kbytes (8%) on
the SRL

DESCRIPTION:
In VVR, an internal variable which records the number of outstanding writes for
the rlink was not protected by a spin lock in few functions. This may result in
incorrect value of pending writes, causing 'vxrlink status' command to display
an inconsistent value of outstanding writes.

RESOLUTION:
Code changes has been done to update rlink outstanding write count with spin
lock at all the places.

* 4005560 (Tracking ID: 3857120)

SYMPTOM:
If the volume is shared in a CVM configuration, the following stack traces will
be seen under a vxiod daemon suggesting an attempt to drain I/O. In this case,
CVM halt will be blocked and eventually time out.
The stack trace may appear as:
sleep+0x3f0
vxvm_delay+0xe0
volcvm_iodrain_dg+0x150
volcvmdg_abort_complete+0x200
volcvm_abort_sio_start+0x140
voliod_iohandle+0x80

cv_wait+0x3c()
delay_common+0x6c
vol_mv_close_check+0x68
vol_close_device+0x1e4
vxioclose+0x24
spec_close+0x14c
fop_close+0x8c
closef2+0x11c
closeall+0x3c
proc_exit+0x46c
exit+8
post_syscall+0x42c
syscall_trap+0x188

Since the vxconfigd would be busy in transaction of trying to close the volume
or drain the IO then all the other threads which send a request to vxconfigd
will hang.

DESCRIPTION:
VxVM maintains an I/O count of the in-progress I/O on the volume. When two
threads from VxVM asynchronously manipulate the I/O count on the volume, the
race between these threads might lead to stale I/O count remaining on the volume
even though the volume has actually completed all I/Os . Since there is an
invalid pending I/O count on the volume due to the race condition, the volume
cannot be closed.

RESOLUTION:
This issue has been fixed in the VxVM code manipulating the I/O count to avoid
the race condition between the two threads.

* 4005566 (Tracking ID: 3926067)

SYMPTOM:
In a Campus Cluster environment, vxassist relayout command may fail with
following error:
VxVM vxassist ERROR V-5-1-13124 Site offline or detached
VxVM vxassist ERROR V-5-1-4037 Relayout operation aborted. (20)

vxassist convert command also might fail with following error:
VxVM vxassist ERROR V-5-1-10128 No complete plex on the site.

DESCRIPTION:
For vxassist "relayout" and "convert" operations in Campus Cluster
environment, VxVM (Veritas Volume Manager) needs to sort the plexes of volume
according to sites. When the number of
plexes of volumes are greater than 100, the sorting of plexes fail due to a
bug in the code. Because of this sorting failure, vxassist relayout/convert
operations fail.

RESOLUTION:
Code changes are done to properly sort the plexes according to site.

* 4005749 (Tracking ID: 3890031)

SYMPTOM:
System panic with following thread:
000094C4].unlock_enable_mem+0001B8 ()
[05FB6D3C]voldsio_timeout+00065C (0000000000000000)
[F1000000C0187698]csq_protect+0003B8 (??, ??, ??, ??, ??, ??)
[F1000000C018A3A0]str_to+000120 ()
[00014D70].hkey_legacy_gate+00004C ()
[00510498]Netintr+0002F8 ()
[00510160]netisr_thread+000020 ()
[00231734]threadentry+000094 (??, ??, ??, ??)
[kdb_read_mem] no real storage @ FFFFFFFFFFF6070

DESCRIPTION:
Due to a bug in our code, a spinlock was freed twice, which caused system panic.

RESOLUTION:
Code changes were done to remove the code line that introduce the double free of
the
spinlock.

* 4005752 (Tracking ID: 3893756)

SYMPTOM:
Under certain circumstances, after vxconfigd running for a long time, a task might be dangling in system. Which may be seen by issuing 'vxtask -l list'.

DESCRIPTION:
- voltask_dump() gets a task id by calling ' vol_task_dump' in kernel (ioctl) as the minor number of the taskdev.
- the task id (or minor number) increases by 1 when a new task is registered.
- task id starts from 160 and rewinds when it meets 65536. there is a global counter 'vxtask_next_minor' indicating next task id.
- at the time vxconfigd opens a taskdev by calling voltask_dump() and holding it, it gets a task id too (let's say 165). from then on, there's a
vnode with this minor number (major=273, minor=165) exists in kernel.
- as time goes by, the task id increases and meets 65536, it then rewinds and starts from 160 again.
- when taskid goes by 165 again with a cli command (say 'vxdisk -othin, fssize list'), then it's taskdev gets the same major and minor number
(165) as vxconfigd's.
- at the same time, vxconfigd is still holding this vnode too. vxdisk doesn't know this and opens the taskdev, and registers a task structure in
kernel hash table, this adds a reference to the same vnode which vxconfigd is holding, now the reference count of the common snode is 2.
- when vxdisk (fsusage_collect_stats_task) has done it's job, it calls voltask_complete->close()->spec_close(), trying to remove this task
(165). but the os function spec_close() ( from specfs ) gets in the way, it detects reference count of the common snode (vnode->v_data-
>snode->s_commonvp->v_data->common snode). spec_close() finds out the value of s_count is 2, then it only drops the reference by one
and returns success to caller, without calling the actual closing function 'volsclose()'.
- volsclose() is not called by spec_close(), then it's subsequent functions are not called too: volsclose_real()->voltask_close()
->vxtask_rm_task(), among those, vxtask_rm_task() does the actual job removing a task from the kernel hashtable.
- after calling close(), fsusage_collect_stats_task returns, and vxdisk command exits. from this point on, the task is dangling in kernel hash
table, until vxconfigd exits.

RESOLUTION:
Source change to avoid vxconfigd holding task device.

* 4005753 (Tracking ID: 3390959)

SYMPTOM:
The vxconfigd(1M) daemon hangs in the kernel, while processing the I/O request.
The following stack trace is observed :
slpq_swtch_core()
sleep_pc()
biowait()
physio()
dmpread()
spec_rdwr()
vno_rw()
read()
syscall()

DESCRIPTION:
The vxconfigd(1M) daemon hangs, while processing the I/O request.
It is noticed that b_dev of one path was set to -1. Since we do not check if it is -1
during disk open, we wrongly increase the open reference count.

RESOLUTION:
The code is modified to add some debug messages, to confirm the current probable
cause analysis, when this issue occurs. Also add checking for b_dev when opening a
disk.

* 4005754 (Tracking ID: 3900463)

SYMPTOM:
vxassist may fail to create a volume on disks having size in terabytes when '-o
ordered' clause is used along with 'col_switch' attribute while creating the volume.

Following error may be reported:
VxVM vxvol ERROR V-5-1-1195 Volume <volume-name> has more than one associated
sparse plex but no complete plex

DESCRIPTION:
The problem is seen specially when user attempts to create a volume on large sized
disks using '-o ordered' option along with the 'col_switch' attribute.
The error reports the plex to be sparse because plex length is getting incorrectly
calculated in the code due to an integer overflow of a variable which handles the
col_switch attribute.

RESOLUTION:
The code is fixed to avoid the integer overflow.

* 4005765 (Tracking ID: 3910228)

SYMPTOM:
Registration of GAB(Global Atomic Broadcast) port u fails on slave nodes after
multiple new devices are added to the system..

DESCRIPTION:
Vxconfigd sends command to GAB for port u registration and waits for a respnse
from GAB. During this timeframe if the vxconfigd is interrupted by any other
module apart from GAB then it will not be able to receive the signal from GAB
of successful registration. Since the signal is not received, vxconfigd
believes the registration did not succeed and treats it as a failure.

RESOLUTION:
Mask the signals which vxconfigd can receive before waiting for the signal from
GAB for registration of gab u port.

* 4005767 (Tracking ID: 3795739)

SYMPTOM:
In a split brain scenario, cluster formation takes very long time.

DESCRIPTION:
In a split brain scenario, the surviving nodes in the cluster try to preempt the keys of nodes leaving the cluster. If the keys have been already preempted by one of the surviving nodes, other surviving nodes will receive UNIT Attention. DMP (Dynamic Multipathing) then retries the preempt command after a delayof 1 second if it receives Unit attention. Cluster formation cannot complete untill PGR keys of all the leaving nodes are removed from all the disks. If the number of disks are very large, the preemption of keys takes a lot of time, leading to the very long time for cluster formation.

RESOLUTION:
The code is modified to avoid adding delay for first couple of retries when reading PGR keys. This allows faster cluster formation with arrays that clear the Unit Attention condition sooner.

* 4005771 (Tracking ID: 3187997)

SYMPTOM:
In a split brain scenario, cluster formation takes very long time.

DESCRIPTION:
In a split brain scenario, the surviving nodes in the cluster try to preempt the keys of nodes leaving the cluster. If the keys have been
already preempted by one of the surviving nodes, other surviving nodes might receive Reservation Conflict error. DMP (Dynamic Multipathing) for
confirmation that the keys have been really removed does a re-registration of the keys from the current node. The re-registration of the keys
takes time if the number of disks are very large leading to delay in cluster formation.

RESOLUTION:
The code is modified to avoid re-registration of keys if the keys have been already removed. This is confirmed by reading keys from the disk and
checking if keys are present or not.

* 4005773 (Tracking ID: 3868533)

SYMPTOM:
IO hang happens when starting replication. VXIO deamon hang with stack like
following:

vx_cfs_getemap at ffffffffa035e159 [vxfs]
vx_get_freeexts_ioctl at ffffffffa0361972 [vxfs]
vxportalunlockedkioctl at ffffffffa06ed5ab [vxportal]
vxportalkioctl at ffffffffa06ed66d [vxportal]
vol_ru_start at ffffffffa0b72366 [vxio]
voliod_iohandle at ffffffffa09f0d8d [vxio]
voliod_loop at ffffffffa09f0fe9 [vxio]

DESCRIPTION:
While performing DCM replay in case Smart Move feature is enabled, VxIO
kernel needs to issue IOCTL to VxFS kernel to get file system free region.
VxFS kernel needs to clone map by issuing IO to VxIO kernel to complete this
IOCTL. Just at the time RLINK disconnection happened, so RV is serialized to
complete the disconnection. As RV is serialized, all IOs including the
clone map IO form VxFS is queued to rv_restartq, hence the deadlock.

RESOLUTION:
Code changes have been made to handle the dead lock situation.

* 4005776 (Tracking ID: 3864063)

SYMPTOM:
Application I/O hangs after the Master Pause command is issued.

DESCRIPTION:
Some flags (VOL_RIFLAG_DISCONNECTING or VOL_RIFLAG_REQUEST_PENDING) in VVR
(Veritas Volume Replicator) kernel are not cleared because of a race between the
Master Pause SIO and the Error Handler SIO. This causes the RU (Replication
Update) SIO to fail to proceed, which leads to I/O hang.

RESOLUTION:
The code is modified to handle the race condition.

Patch ID: VRTSvxvm-6.0.500.300

* 3496010 (Tracking ID: 3107699)

SYMPTOM:
VxDMP causes system panic after a shutdown or reboot and displays
the following stack trace:
mutex_enter()
volinfo_ioct()
volsioctl_real()
cdev_ioctl()
dmp_signal_vold()
dmp_throttle_paths()
dmp_process_stats()
dmp_daemons_loop()
thread_start()
OR
panicsys()
vpanic_common()
panic+0x1c()
mutex_enter()
cdev_ioctl()
dmp_signal_vold()
dmp_check_path_state()
dmp_restore_callback()
dmp_process_scsireq()
dmp_daemons()
thread_start()

DESCRIPTION:
In a special scenario of system shutdown/reboot, the DMP
(Dynamic MultiPathing) I/O statistic daemon tries to call the ioctl functions
in VXIO module which is being unloaded and this causes system panic.

RESOLUTION:
The code is modified to stop the DMP I/O statistic daemon before
system shutdown/reboot. Also added a code change to avoid to other probe to vxio
devices during shutdown.

* 3496715 (Tracking ID: 3281004)

SYMPTOM:
For DMP minimum queue I/O policy with large number of CPUs, the following
issues are observed since the VxVM 5.1 SP1 release:
1. CPU usage is high.
2. I/O throughput is down if there are many concurrent I/Os.

DESCRIPTION:
The earlier minimum queue I/O policy is used to consider the host controller
I/O load to select the least loaded path. For VxVM 5.1 SP1 version, an addition
was made to consider the I/O load of the underlying paths of the selected host
based controllers. However, this resulted in the performance issues, as there
were lock contentions with the I/O processing functions and the DMP statistics
daemon.

RESOLUTION:
The code is modified such that the host controller paths I/O load is not
considered to avoid the lock contention.

* 3501358 (Tracking ID: 3399323)

SYMPTOM:
The reconfiguration of Dynamic Multipathing (DMP) database fails with the below error: VxVM vxconfigd DEBUG V-5-1-0 dmp_do_reconfig: DMP_RECONFIGURE_DB failed: 2

DESCRIPTION:
As part of the DMP database reconfiguration process, controller information from DMP user-land database is not removed even though it is removed from DMP kernel database. This creates inconsistency between the user-land and kernel-land DMP database. Because of this, subsequent DMP reconfiguration fails with above error.

RESOLUTION:
The code changes have been made to properly remove the controller information from the user-land DMP database.

* 3521727 (Tracking ID: 3521726)

SYMPTOM:
When using Symantec Replication Option, system panic happens while freeing
memory with the following stack trace on AIX,

pvthread+011500 STACK:
[0001BF60]abend_trap+000000 ()
[000C9F78]xmfree+000098 ()
[04FC2120]vol_tbmemfree+0000B0 ()
[04FC2214]vol_memfreesio_start+00001C ()
[04FCEC64]voliod_iohandle+000050 ()
[04FCF080]voliod_loop+0002D0 ()
[04FC629C]vol_kernel_thread_init+000024 ()
[0025783C]threadentry+00005C ()

DESCRIPTION:
In certain scenarios, when a write IO gets throttled or un-winded in VVR, we
free the memory related to one of our data structures. When we restart this IO,
the same memory gets illegally accessed and freed again even though it was
freed.It causes system panic.

RESOLUTION:
Code changes have been done to fix the illegal memory access issue.

* 3526501 (Tracking ID: 3526500)

SYMPTOM:
Disk IO failures occur with DMP IO timeout error messages when DMP (Dynamic Multi-pathing) IO statistics daemon is not running. Following are the timeout error messages:

VxVM vxdmp V-5-3-0 I/O failed on path 65/0x40 after 1 retries for disk 201/0x70
VxVM vxdmp V-5-3-0 Reached DMP Threshold IO TimeOut (100 secs) I/O with start
3e861909fa0 and end 3e86190a388 time
VxVM vxdmp V-5-0-0 [Error] i/o error occurred (errno=0x206) on dmpnode 201/0x70

DESCRIPTION:
When IO is submitted to DMP, it sets the start time on the IO buffer. The value of the start time depends on whether the DMP IO statistics daemon is running or not. When the IO is returned as error from SCSI to DMP, instead of retrying the IO on alternate paths, DMP failed that IO with 300 seconds timeout error, but the IO has elapsed only few milliseconds in its execution. The miscalculation of DMP timeout happens only when DMP IO statistics daemon is not running.

RESOLUTION:
The code is modified to calculate appropriate DMP IO timeout value when DMP IO statistics demon is not running.

* 3531332 (Tracking ID: 3077582)

SYMPTOM:
A Veritas Volume Manager (VxVM) volume may become inaccessible causing the read/write operations to fail with the following error:
# dd if=/dev/vx/dsk/<dg>/<volume> of=/dev/null count=10
dd read error: No such device
0+0 records in
0+0 records out

DESCRIPTION:
If I/Os to the disks timeout due to some hardware failures like weak Storage Area Network (SAN) cable link or Host Bus Adapter (HBA) failure, VxVM assumes that the disk is faulty or slow and it sets the failio flag on the disk. Due to this flag, all the subsequent I/Os fail with the No such device error.

RESOLUTION:
The code is modified such that vxdisk now provides a way to clear the failio flag. To check whether the failio flag is set on the disks, use the vxkprint(1M) utility (under /etc/vx/diag.d). To reset the failio flag, execute the vxdisk set <disk_name> failio=off command, or deport and import the disk group that holds these disks.

* 3540777 (Tracking ID: 3539548)

SYMPTOM:
Adding MPIO(Multi Path I/O) disk that had been removed earlier may result in
following two issues:
1. 'vxdisk list' command shows duplicate entry for DMP (Dynamic Multi-Pathing)
node with error state.
2. 'vxdmpadm listctlr all' command shows duplicate controller names.

DESCRIPTION:
1. Under certain circumstances, deleted MPIO disk record information is left in
/etc/vx/disk.info file with its device number as -1 but its DMP node name is
reassigned to other MPIO disk. When the deleted disk is added back, it is
assigned the same name, without validating for conflict in the name.
2. When some devices are removed and added back to the system, we are adding a
new controller for each and every path that we have discovered. This leads to
duplicated controller entries in DMP database.

RESOLUTION:
1. Code is modified to properly remove all stale information about any disk
before updating MPIO disk names.
2. Code changes have been made to add the controller for selected paths only.

* 3552411 (Tracking ID: 3482026)

SYMPTOM:
The vxattachd(1M) daemon reattaches plexes of manually detached site.

DESCRIPTION:
The vxattachd daemon reattaches plexes for a manually detached site that is the site with state as OFFLINE. As there was no check to differentiate between a manually detach site and the site that was detached due to IO failure. Hence, the vxattachd(1M) daemon brings the plexes online for manually detached site also.

RESOLUTION:
The code is modified to differentiate between manually detached site and the site detached due to IO failure.

* 3600161 (Tracking ID: 3599977)

SYMPTOM:
During a replica connection, referencing a port that is already deleted in another thread causes a system panic with a similar stack trace as below:
.simple_lock()
soereceive()
soreceive()
.kernel_add_gate_cstack()
kmsg_sys_rcv()
nmcom_get_next_mblk()
nmcom_get_next_msg()
nmcom_wait_msg_tcp()
nmcom_server_proc_tcp()
nmcom_server_proc_enter()
vxvm_start_thread_enter()

DESCRIPTION:
During a replica connection, a port is created before increasing the count. This is to protect the port from getting deleted. However, another thread deletes the port before the count is increased and after the port is created.
While the replica connection thread proceeds, it refers to the port that is already deleted, which causes a NULL pointer reference and a system panic.

RESOLUTION:
The code is modified to prevent asynchronous access to the count that is associated with the port by means of locks.

* 3603811 (Tracking ID: 3594158)

SYMPTOM:
The system panics on a VVR secondary node with the following stack trace:
.simple_lock()
soereceive()
soreceive()
.kernel_add_gate_cstack()
kmsg_sys_rcv()
nmcom_get_next_mblk()
nmcom_get_next_msg()
nmcom_wait_msg_tcp()
nmcom_server_proc_tcp()
nmcom_server_proc_enter()
vxvm_start_thread_enter()

DESCRIPTION:
You may issue a spinlock or unspinlock to the replica to check whether to use a checksum in the received packet. During the lock or unlock operation, if there is a transaction that is being processed with the replica, which rebuilds the replica object in the kernel, then there is a possibility that the replica referenced in spinlock is different than the one which the replica has referenced in unspinlock (especially when the replica is referenced through several pointers). As a result, the system panics.

RESOLUTION:
The code is modified to set the flag in the port attribute to indicate whether to use the checksum during a port creation. Hence, for each packet that is received, you only need to check the flag in the port attribute rather than referencing it to the replica object. As part of the change, the spinlock or unspinlock statements are also removed.

* 3612801 (Tracking ID: 3596330)

SYMPTOM:
'vxsnap refresh' operation fails with following indicants:

Errors occur from DR (Disaster Recovery) Site of VVR (Veritas
Volume Replicator):

o vxio: [ID 160489 kern.notice] NOTICE: VxVM vxio V-5-3-1576 commit:
Timedout waiting for rvg [RVG] to quiesce, iocount [PENDING_COUNT] msg 0
o vxvm:vxconfigd: [ID 702911 daemon.warning] V-5-1-8011 Internal
transaction failed: Transaction aborted waiting for io drain

At the same time, following errors occur from Primary Site of VVR:

vxio: [ID 218356 kern.warning] WARNING: VxVM VVR vxio V-5-0-267 Rlink
[RLINK] disconnecting due to ack timeout on update message

DESCRIPTION:
VM (Volume Manager) Transactions on DR site get aborted as pending IOs could
not be drained in stipulated time leading to failure of FMR (Fast-Mirror
Resync) 'snap' operations. These IOs could not be drained because of IO
throttling. A bug/race in conjunction with timing in VVR causes a miss in
clearing this throttling condition/state.

RESOLUTION:
Code changes have been done to fix the race condition which ensures clearance
of throttling state at appropriate time.

* 3614184 (Tracking ID: 3614182)

SYMPTOM:
First system reboot after migration from Solaris Multi-Pathing (MPXIO) to
Symantec Dynamic Multi-Pathing (DMP) native takes extremely long time.
Node may appear hung. This happens when there are a large number of
zpools to be migrated. In some cases, the system may take 20+ hours to come up
after first reboot.

DESCRIPTION:
This issue is specific to a configuration where the zpools (and the LUNs hosting
the zpools) are shared between multiple systems, and the zpools are already
imported on a different system than the one where MPXIO to DMP migration is
being performed. An example of this would be multiple nodes of a Symantec VCS
(Veritas Cluster Server) configuration. Such zpools should be skipped during
DMP migration. Instead, the migration logic was unnecessarily running time
consuming ZFS commands on each of these zpools. These commands were failing and
contributing to extremely long boot up times.

RESOLUTION:
DMP migration code was changed to detect above mentioned zpools and skip
running ZFS commands on them.

* 3621240 (Tracking ID: 3621232)

SYMPTOM:
When the vradmin ibc command is executed to initiate the In-band Control (IBC) procedure, the vradmind (VVR daemon) on VVR secondary node goes into the disconnected state. Due to which, the following IBC procedure or vradmin ibc commands cannot be started or executed on VVRs secondary node and a message similar to the following appears on VVRs primary node:

VxVM VVR vradmin ERROR V-5-52-532 Secondary is undergoing a state transition. Please re-try the command after some time.
VxVM VVR vradmin ERROR V-5-52-802 Cannot start command execution on Secondary.

DESCRIPTION:
When IBC procedure runs into the commands finish state, the vradmin on VVR secondary node goes into a disconnected state, which the vradmind on primary node fails to realize.
In such a scenario, the vradmind on primary refrains from sending a handshake request to the secondary node which can change the primary nodes state from disconnected to running. As a result, the vradmind in the primary node continues to be in the disconnected state and the vradmin ibc command fails to run on the VVR secondary node despite of being in the running state on the VVR primary node.

RESOLUTION:
The code is modified to make sure the vradmind on VVR primary node is notified while it goes into the disconnected state on VVR secondary node. As a result, it can send out a handshake request to take the secondary node out of the disconnected state.

* 3622069 (Tracking ID: 3513392)

SYMPTOM:
secondary panics when rebooted while heavy IOs are going on primary

PID: 18862 TASK: ffff8810275ff500 CPU: 0 COMMAND: "vxiod"
#0 [ffff880ff3de3960] machine_kexec at ffffffff81035b7b
#1 [ffff880ff3de39c0] crash_kexec at ffffffff810c0db2
#2 [ffff880ff3de3a90] oops_end at ffffffff815111d0
#3 [ffff880ff3de3ac0] no_context at ffffffff81046bfb
#4 [ffff880ff3de3b10] __bad_area_nosemaphore at ffffffff81046e85
#5 [ffff880ff3de3b60] bad_area_nosemaphore at ffffffff81046f53
#6 [ffff880ff3de3b70] __do_page_fault at ffffffff810476b1
#7 [ffff880ff3de3c90] do_page_fault at ffffffff8151311e
#8 [ffff880ff3de3cc0] page_fault at ffffffff815104d5
#9 [ffff880ff3de3d78] volrp_sendsio_start at ffffffffa0af07e3 [vxio]
#10 [ffff880ff3de3e08] voliod_iohandle at ffffffffa09991be [vxio]
#11 [ffff880ff3de3e38] voliod_loop at ffffffffa0999419 [vxio]
#12 [ffff880ff3de3f48] kernel_thread at ffffffff8100c0ca

DESCRIPTION:
If the replication stage IOs are started after serialization of the replica volume,
replication port could be deleted and set to NULL during handling the replica
connection changes, this will cause the panic since we have not checked if the
replication port is still valid before referencing to it.

RESOLUTION:
Code changes have been done to abort the stage IO if replication port is NULL.

* 3638039 (Tracking ID: 3625890)

SYMPTOM:
After running the vxdisk resize command, the following message is displayed:
"VxVM vxdisk ERROR V-5-1-8643 Device <disk name> resize failed: Invalid
attribute specification"

DESCRIPTION:
Two reserved cylinders for special usage for CDS (Cross-platform Data Sharing)
VTOC(Volume Table of Contents) disks. In case of expanding a disk with
particular disk size on storage side, VxVM(Veritas Volume Manager) may calculate
the cylinder number as 2, which causes the vxdisk resize fails with the error
message of "Invalid attribute specification".

RESOLUTION:
The code is modified to avoid the failure of resizing a CDS VTOC disk.

* 3648603 (Tracking ID: 3564260)

SYMPTOM:
VVR commands are unresponsive when replication is paused and resumed in a loop.

DESCRIPTION:
While Veritas Volume Replicator (VVR) is in the process of sending updates then pausing a replication is deferred until acknowledgements of updates are received or until an error occurs. For some reason, if the acknowledgements get delayed or the delivery fails, the pause operation continues to get deferred resulting in unresponsiveness.

RESOLUTION:
The code is modified to resolve the issue that caused unresponsiveness.

* 3654163 (Tracking ID: 2916877)

SYMPTOM:
vxconfigd hangs, if a node leaves the cluster, while I/O error handling is in
progress. Stack observed is as follows:
volcvm_iodrain_dg
volcvmdg_abort_complete
volcvm_abort_sio_start
voliod_loop
vol_kernel_thread_init

DESCRIPTION:
A bug in DCO error handling code can lead to an infinite loop if a node leaves
cluster while I/O error handling is in progress. This causes vxconfigd to hang
and stop responding to VxVM commands like vxprint, vxdisk
etc.

RESOLUTION:
DCO error handling code has been changed so that I/O errors are handled
correctly. Hence, hang is avoided.

* 3683021 (Tracking ID: 3544980)

SYMPTOM:
Vxconfigd reports error message like below when scanning disks.
"vxconfigd V-5-1-7920 di_init() failed"

DESCRIPTION:
In Solaris discovery codepath, when di_init() system call fails, walknode() is
called to retrieve the device info. If any error occurs during discovery, all
disks will disappear after scanning disks. OS vendor suggested to retry calling
di_init() with delay if di_init() fails, instead of calling walknode().

RESOLUTION:
Code changes have been made to deprecate the call of walknode() and to retry
calling di_init() 3 times with delay if di_init() fails.

* 3690795 (Tracking ID: 2573229)

SYMPTOM:
On RHEL6, the server panics when Dynamic Multi-Pathing (DMP) executes
PERSISTENT RESERVE IN command with REPORT CAPABILITIES service action on
powerpath controlled device. The following stack trace is displayed:

enqueue_entity at ffffffff81068f09
enqueue_task_fair at ffffffff81069384
enqueue_task at ffffffff81059216
activate_task at ffffffff81059253
pull_task at ffffffff81065401
load_balance_fair at ffffffff810657b7
thread_return at ffffffff81527d30
schedule_timeout at ffffffff815287b5
wait_for_common at ffffffff81528433
wait_for_completion at ffffffff8152854d
blk_execute_rq at ffffffff8126d9dc
emcp_scsi_cmd_ioctl at ffffffffa04920a2 [emcp]
PowerPlatformBottomDispatch at ffffffffa0492eb8 [emcp]
PowerSyncIoBottomDispatch at ffffffffa04930b8 [emcp]
PowerBottomDispatchPirp at ffffffffa049348c [emcp]
PowerDispatchX at ffffffffa049390d [emcp]
MpxSendScsiCmd at ffffffffa061853e [emcpmpx]
ClariionKLam_groupReserveRelease at ffffffffa061e495 [emcpmpx]
MpxDefaultRegister at ffffffffa061df0a [emcpmpx]
MpxTestPath at ffffffffa06227b5 [emcpmpx]
MpxExtraTry at ffffffffa06234ab [emcpmpx]
MpxTestDaemonCalloutGuts at ffffffffa062402f [emcpmpx]
MpxIodone at ffffffffa0624621 [emcpmpx]
MpxDispatchGuts at ffffffffa0625534 [emcpmpx]
MpxDispatch at ffffffffa06256a8 [emcpmpx]
PowerDispatchX at ffffffffa0493921 [emcp]
GpxDispatch at ffffffffa0644775 [emcpgpx]
PowerDispatchX at ffffffffa0493921 [emcp]
GpxDispatchDown at ffffffffa06447ae [emcpgpx]
VluDispatch at ffffffffa068b025 [emcpvlumd]
GpxDispatch at ffffffffa0644752 [emcpgpx]
PowerDispatchX at ffffffffa0493921 [emcp]
GpxDispatchDown at ffffffffa06447ae [emcpgpx]
XcryptDispatchGuts at ffffffffa0660b45 [emcpxcrypt]
XcryptDispatch at ffffffffa0660c09 [emcpxcrypt]
GpxDispatch at ffffffffa0644752 [emcpgpx]
PowerDispatchX at ffffffffa0493921 [emcp]
GpxDispatch at ffffffffa0644775 [emcpgpx]
PowerDispatchX at ffffffffa0493921 [emcp]
PowerSyncIoTopDispatch at ffffffffa04978b9 [emcp]
emcp_send_pirp at ffffffffa04979b9 [emcp]
emcp_pseudo_blk_ioctl at ffffffffa04982dc [emcp]
__blkdev_driver_ioctl at ffffffff8126f627
blkdev_ioctl at ffffffff8126faad
block_ioctl at ffffffff811c46cc
dmp_ioctl_by_bdev at ffffffffa074767b [vxdmp]
dmp_kernel_scsi_ioctl at ffffffffa0747982 [vxdmp]
dmp_scsi_ioctl at ffffffffa0786d42 [vxdmp]
dmp_send_scsireq at ffffffffa078770f [vxdmp]
dmp_do_scsi_gen at ffffffffa077d46b [vxdmp]
dmp_pr_check_aptpl at ffffffffa07834dd [vxdmp]
dmp_make_mp_node at ffffffffa0782c89 [vxdmp]
dmp_decode_add_disk at ffffffffa075164e [vxdmp]
dmp_decipher_instructions at ffffffffa07521c7 [vxdmp]
dmp_process_instruction_buffer at ffffffffa075244e [vxdmp]
dmp_reconfigure_db at ffffffffa076f40e [vxdmp]
gendmpioctl at ffffffffa0752a12 [vxdmp]
dmpioctl at ffffffffa0754615 [vxdmp]
dmp_ioctl at ffffffffa07784eb [vxdmp]
dmp_compat_ioctl at ffffffffa0778566 [vxdmp]
compat_blkdev_ioctl at ffffffff8128031d
compat_sys_ioctl at ffffffff811e0bfd
sysenter_dispatch at ffffffff81050c20

DESCRIPTION:
Dynamic Multi-Pathing (DMP) uses PERSISTENT RESERVE IN command with the REPORT
CAPABILITIES service action to discover target capabilities. On RHEL6, system
panics unexpectedly when Dynamic Multi-Pathing (DMP) executes PERSISTENT
RESERVE IN command with REPORT CAPABILITIES service action on powerpath
controlled device coming from EMC Clarion/VNX array. This bug has been reported
to EMC powperpath engineering.

RESOLUTION:
The Dynamic Multi-Pathing (DMP) code is modified to execute PERSISTENT RESERVE
IN command with the REPORT CAPABILITIES service action to discover target
capabilities only on non-third party controlled devices.

* 3713320 (Tracking ID: 3596282)

SYMPTOM:
FMR (Fast Mirror Resync) operations fail with error "Failed to allocate a new
map due to no free map available in DCO".

"vxio: [ID 609550 kern.warning] WARNING: VxVM vxio V-5-3-1721
voldco_allocate_toc_entry: Failed to allocate a new map due to no free map
available in DCO of [volume]"

It often leads to disabling of the snapshot.

DESCRIPTION:
For instant space optimized snapshots, stale maps are left behind for DCO (Data
Change Object) objects at the time of creation of cache objects. So, over the
time if space optimized snapshots are created that use a new cache object,
stale maps get accumulated, which eventually consume all the available DCO
space, resulting in the error.

RESOLUTION:
Code changes have been done to ensure no stale entries are left behind.

* 3734985 (Tracking ID: 3727939)

SYMPTOM:
Data corruption or VTOC label corruption may occur due to stale device
entries present in the /dev/vx/[r]dmp directories.

DESCRIPTION:
The /dev/vx/[r]dmp directories, where the DMP device entries are created, are mounted as tmpfs (swap) during a boot cycle. These directories can be unmounted while the
vxconfigd(1M) daemon is running and the DMP devices are in use.
The DMP device entries are recreated on the disk instead of swap device in such situations.
These are re-mounted on tmpfs during the next boot cycle thus the device entries created on disk become stale eventually. In such scenarios, the subsequent unmount of these directories results in reading those stale entries. Thus, data corruption or VTOC label corruption occurs.

RESOLUTION:
The code is modified to clear the stale entries in the /dev/vx/[r]dmp directories before starting the vxconfigd(1M) daemon.

* 3737823 (Tracking ID: 3736502)

SYMPTOM:
When FMR is configured in VVR environment, 'vxsnap refresh' fails with below
error message:
"VxVM VVR vxsnap ERROR V-5-1-10128 DCO experienced IO errors during the
operation. Re-run the operation after ensuring that DCO is accessible".
Also, multiple messages of connection/disconnection of replication
link(rlink) are seen.

DESCRIPTION:
Inherently triggered rlink connection/disconnection causes the transaction
retries. During transaction, memory is allocated for Data Change Object(DCO)
maps and is not cleared on abortion of a transaction.
This leads to a problem of memory leak and eventually to exhaustion of maps.

RESOLUTION:
The fix has been added to clear the allocated DCO maps when transaction
aborts.

* 3774137 (Tracking ID: 3565212)

SYMPTOM:
While performing controller giveback operations on NetApp ALUA arrays, the
below messages are observed in /etc/vx/dmpevents.log

[Date]: I/O error occured on Path <path> belonging to Dmpnode <dmpnode>
[Date]: I/O analysis done as DMP_PATH_BUSY on Path <path> belonging to
Dmpnode
<dmpnode>
[Date]: I/O analysis done as DMP_IOTIMEOUT on Path <path> belonging to
Dmpnode
<dmpnode>

DESCRIPTION:
During the asymmetric access state transition, DMP puts the buffer pointer
in the delay queue based on the flags observed in the logs. This delay
resulted in timeout and thereby filesystem went into disabled state.

RESOLUTION:
DMP code is modified to perform immediate retries instead of putting the
buffer pointer in the delay queue for transition in progress case.

* 3788751 (Tracking ID: 3788644)

SYMPTOM:
When DMP (Dynamic Multi-Pathing) native support enabled for Oracle ASM
environment, if we constantly adding and removing DMP devices, it will cause error
like:
/etc/vx/bin/vxdmpraw enable oracle dba 775 emc0_3f84
VxVM vxdmpraw INFO V-5-2-6157
Device enabled : emc0_3f84
Error setting raw device (Invalid argument)

DESCRIPTION:
There is a limitation (8192) of maximum raw device number N (exclusive) of
/dev/raw/rawN. This limitation is defined in boot configuration file. When binding a
raw
device to a dmpnode, it uses /dev/raw/rawN to bind the dmpnode. The rawN is
calculated by one-way incremental process. So even if we unbind the device later on,
the "released" rawN number will not be reused in the next binding. When the rawN
number is increased to exceed the maximum limitation, the error will be reported.

RESOLUTION:
Code has been changed to always use the smallest available rawN number instead of
calculating by one-way incremental process.

* 3799822 (Tracking ID: 3573262)

SYMPTOM:
On recent UltraSPARC-T4 architectures, Panic is observed with the
topmost stack frame pointing to bcopy during snapshot operations involving
space optimized snapshots.

<trap>SPARC-T4:bcopy_more()
SPARC-T4:bcopy()
vxio:vol_cvol_bplus_delete()
vxio:vol_cvol_dshadow1_done()
vxio:voliod_iohandle()
vxio:voliod_loop()

DESCRIPTION:
The bcopy kernel library routine on Solaris was optimized to take advantage
of recent Ultrasparc-T4 architectures. But it has some known issues for large
size copy in some patch versions of Solaris 10. The use of bcopy was causing
in-core corruption of cache object metadata. The corruption later lead to
system panic.

RESOLUTION:
The code is modified to use word by word copy of the buffer
instead of bcopy kernel library routine.

* 3800394 (Tracking ID: 3672759)

SYMPTOM:
When a DMP database is corrupted, the vxconfigd(1M) daemon may core dump with the following stack trace:
database is corrupted.
ddl_change_dmpnode_state ()
ddl_data_corruption_msgs ()
ddl_reconfigure_all ()
ddl_find_devices_in_system ()
find_devices_in_system ()
req_change_state ()
request_loop ()
main ()

DESCRIPTION:
The issue is observed because the corrupted DMP database is not properly destroyed.

RESOLUTION:
The code is modified to remove the corrupted DMP database.

* 3800396 (Tracking ID: 3749557)

SYMPTOM:
System hangs and becomes unresponsive because of heavy memory consumption by
vxvm.

DESCRIPTION:
In the Dirty Region Logging(DRL) update code path an erroneous condition was
present that lead to an infinite loop which keeps on consuming memory. This leads
to consumption of large amounts of memory making the system unresponsive.

RESOLUTION:
Code has been fixed, to avoid the infinite loop, hence preventing the hang which
was caused by high memory usage.

* 3800449 (Tracking ID: 3726110)

SYMPTOM:
On systems with high number of CPU's, Dynamic Multipathing (DMP) devices may perform
considerably slower than OS device
paths.

DESCRIPTION:
In high CPU configuration, IO statistics related functionality in DMP takes more
CPU time as DMP statistics are collected on per CPU basis. This stat collection
happens in DMP IO code path hence it reduces the IO performance. Because of
this DMP devices perform slower than OS device paths.

RESOLUTION:
Code changes are made to remove some of the stats collection functionality from
DMP IO code path. Along with this, following tunable need to be turned off.
1. Turn off idle lun probing.
#vxdmpadm settune dmp_probe_idle_lun=off
2. Turn off statistic gathering functionality.
#vxdmpadm iostat stop

Notes:
1. Please apply this patch if system configuration has large number of CPU and
if DMP is performing considerably slower than OS device paths. For normal
systems this issue is not applicable.

* 3800452 (Tracking ID: 3437852)

SYMPTOM:
The system panics when Symantec Replicator Option goes to PASSTHRU
mode. Panic stack trace might look like:

vol_rp_halt()
vol_rp_state_trans()
vol_rv_replica_reconfigure()
vol_rv_error_handle()
vol_rv_errorhandler_callback()
vol_klog_start()
voliod_iohandle()
voliod_loop()

DESCRIPTION:
When Storage Replicator Log (SRL) gets faulted for any reason, VVR
goes into the PASSTHRU Mode. At this time, a few updates are erroneously freed.
When these updates are accessed during the correct processing, access to these
updates results in panic as the updates are already freed.

RESOLUTION:
The code changes have been made not to free the updates erroneously.

* 3800788 (Tracking ID: 3648719)

SYMPTOM:
The server panics with a following stack trace while adding or removing LUNs or HBAs:
dmp_decode_add_path()
dmp_decipher_instructions()
dmp_process_instruction_buffer()
dmp_reconfigure_db()
gendmpioctl()
vxdmpioctl()

DESCRIPTION:
While deleting a dmpnode, Dynamic Multi-Pathing (DMP) releases the memory associated with the dmpnode structure.
In case the dmpnode doesn't get deleted for some reason, and if any other tasks access the freed memory of this dmpnode, then the server panics.

RESOLUTION:
The code is modified to avoid the tasks from accessing the memory that is freed by the dmpnode, which is deleted. The change also fixed the memory leak issue in the buffer allocation code path.

* 3801225 (Tracking ID: 3662392)

SYMPTOM:
In the CVM environment, if I/Os are getting executed on slave node, corruption
can happen when the vxdisk resize(1M) command is executing on the master
node.

DESCRIPTION:
During the first stage of resize transaction, the master node re-adjusts the
disk offsets and public/private partition device numbers.
On a slave node, the public/private partition device numbers are not adjusted
properly. Because of this, the partition starting offset is are added twice
and causes the corruption. The window is small during which public/private
partition device numbers are adjusted. If I/O occurs during this window then
only
corruption is observed.
After the resize operation completes its execution, no further corruption will
happen.

RESOLUTION:
The code has been changed to add partition starting offset properly to an I/O
on slave node during execution of a resize command.

* 3805938 (Tracking ID: 3790136)

SYMPTOM:
File system hang can be observed sometimes due to IO's hung in DRL.

DESCRIPTION:
There might be some IO's hung in DRL of mirrored volume due to incorrect
calculation of outstanding IO's on volume and number of active IO's which are
currently in progress on DRL. The value of the outstanding IO on volume can get
modified incorrectly leading to IO's on DRL not to progress further which in
turns results in a hang kind of scenario.

RESOLUTION:
Code changes have been done to avoid incorrect modification of value of
outstanding IO's on volume and prevent the hang.

* 3806808 (Tracking ID: 3645370)

SYMPTOM:
After running the vxevac command, if the user tries to rollback or commit the evacuation for a disk containing DRL plex, the action fails with the following errors:

/etc/vx/bin/vxevac -g testdg commit testdg02 testdg03
VxVM vxsd ERROR V-5-1-10127 deleting plex %1:
Record is associated
VxVM vxassist ERROR V-5-1-324 fsgen/vxsd killed by signal 11, core dumped
VxVM vxassist ERROR V-5-1-12178 Could not commit subdisk testdg02-01 in
volume testvol
VxVM vxevac ERROR V-5-2-3537 Aborting disk evacuation

/etc/vx/bin/vxevac -g testdg rollback testdg02 testdg03
VxVM vxsd ERROR V-5-1-10127 deleting plex %1:
Record is associated
VxVM vxassist ERROR V-5-1-324 fsgen/vxsd killed by signal 11, core dumped
VxVM vxassist ERROR V-5-1-12178 Could not rollback subdisk testdg02-01 in
volume
testvol
VxVM vxevac ERROR V-5-2-3537 Aborting disk evacuation

DESCRIPTION:
When the user uses the vxevac command, new plexes are created on the target disks. Later, during the commit or roll back operation, VxVM deletes the plexes on the source or the target disks.
For deleting a plex, VxVM should delete its sub disks first, otherwise the plex deletion fails with the following error message:
VxVM vxsd ERROR V-5-1-10127 deleting plex %1:
Record is associated
The error is displayed because the code does not handle the deletion of subdisks of plexes marked for DRL (dirty region logging) correctly.

RESOLUTION:
The code is modified to handle evacuation of disks with DRL plexes correctly. .

* 3807761 (Tracking ID: 3729078)

SYMPTOM:
In VVR environment, the panic may occur after SF(Storage Foundation) patch
installation or uninstallation on the secondary site.

DESCRIPTION:
VXIO Kernel reset invoked by SF patch installation removes all Disk Group
objects that have no preserved flag set, because the preserve flag is overlapped
with RVG(Replicated Volume Group) logging flag, the RVG object won't be removed,
but its rlink object is removed, result of system panic when starting VVR.

RESOLUTION:
Code changes have been made to fix this issue.

* 3809787 (Tracking ID: 3677359)

SYMPTOM:
VxDMP causes system panic after a shutdown or reboot with the
following stack trace:
mutex_enter()
volinfo_ioct()
volsioctl_real()
cdev_ioctl()
dmp_signal_vold()
dmp_throttle_paths()
dmp_process_stats()
dmp_daemons_loop()
thread_start()
OR
panicsys()
vpanic_common()
panic+0x1c()
mutex_enter()
cdev_ioctl()
dmp_signal_vold()
dmp_check_path_state()
dmp_restore_callback()
dmp_process_scsireq()
dmp_daemons()
thread_start()

DESCRIPTION:
In a special scenario of system shutdown or reboot, the DMP
(Dynamic MultiPathing) I/O statistic daemon tries to call the ioctl functions
in VXIO module which is being unloaded. As a result, the system panics.

RESOLUTION:
The code is modified to stop the DMP I/O statistic daemon & DMP
restore daemon before system shutdown or reboot. Also, the code is modified to
avoid other probes to vxio devices during shutdown.

* 3814808 (Tracking ID: 3606480)

SYMPTOM:
Issue 1:
Enabling dmp_native_support tunable fails on Solaris with following error:
VxVM vxdmpadm ERROR V-5-1-15690 Operation failed for one or more zpools
VxVM vxdmpadm ERROR V-5-1-15686 The following zpool(s) could not be migrated
as failed to obtain root pool information - <zpool name>

Issue 2:
After enabling dmp_native_support, zfs can panic due to I/O error since
phys_path for zpool points to OS device path and not DMP (Dynamic Multipathing)
device.

DESCRIPTION:
Issue 1:

Enabling the dmp_native_support tunable is failing because devid (device id)
on the Dynamic Multipathing (DMP) device is not available. The devid is not
updated by DMP while populating the devid on root disk under DMP control.
This causes dmp_native_support enable command to fail.

Issue 2:

Ideally the phys_path should have been set to NULL when the zpool is migrated
from OS device to DMP device. But phys_path was getting set to OS device when
zpool is migrated and dmp_native_support is set to
on.

RESOLUTION:
Issue 1:
Code changes are done such that the devid on the DMP devices get updated for the
root disk.

Issue 2:
Code changes are done so that phys_path corresponding to DMP device is populated
for ZFS pools when the migration happens.

* 3816233 (Tracking ID: 3686698)

SYMPTOM:
vxconfigd was getting hung due to deadlock between two threads

DESCRIPTION:
Two threads were waiting for same lock causing deadlock between
them. This will lead to block all vx commands.
untimeout function will not return until pending callback is cancelled (which
is set through timeout function) OR pending callback has completed its
execution (if it has already started). Therefore locks acquired by callback
routine should not be held across call to untimeout routine or deadlock may
result.

Thread 1:
untimeout_generic()
untimeout()
voldio()
volsioctl_real()
fop_ioctl()
ioctl()
syscall_trap32()

Thread 2:
mutex_vector_enter()
voldsio_timeout()
callout_list_expire()
callout_expire()
callout_execute()
taskq_thread()
thread_start()

RESOLUTION:
Code changes have been made to call untimeout outside the lock
taken by callback handler.

* 3821425 (Tracking ID: 3783356)

SYMPTOM:
After dmp module failed to load, dmp_idle_vector was not NULL.

DESCRIPTION:
After module load failure, dmp resources were not cleared off the
system memory. SO some of the things were having NON-NULL value. And when the code
retried to load, it tries to free this invalid data, hence system panics saying
BAD FREE, since we are trying to free the data which is not valid at that point in
time .

RESOLUTION:
When the module failure happens, cleared the dmp resources.

* 3826918 (Tracking ID: 3819670)

SYMPTOM:
When running smartmove with "vxevac", customer let it run in background by typing
ctlr-
z and bg command, however, this result in termination of data moving.

DESCRIPTION:
When doing data moving from user command like "vxevac", we submit the data moving
as a task in the kernel, and use select() primitive on the task file descriptor to
wait for
task finishing events arrived.
However, when typing "ctlr-z" plus bg, the select() returns -1 with errno EINTR,
which is
interpreted by our code logic as user termination action. Hence we terminate the
data
moving. The correct behavior should be retrying the select() to wait for task
finishing
events.

RESOLUTION:
Code changes has been done that when select() returns with errno EINTR, we check if
task is finished or not; if not finished, retry the select().

Patch ID: VRTSaslapm-6.0.500.800

* 4019779 (Tracking ID: 4019778)

SYMPTOM:
VRTSaslapm support for Solaris 11.4

DESCRIPTION:
Solaris 11.4 is a new OS release hence ASLAPM is compiled on it .

RESOLUTION:
Sol11.4 being a new OS release, aslapm is compiled on it.

Patch ID: VRTSodm-6.0.500.4200

* 4003630 (Tracking ID: 3968788)

SYMPTOM:
ODM module failed to load on Solaris 11.4.

DESCRIPTION:
The ODM module failed to load on Solaris 11.4 release, due to the kernel level changes in 11.4.

RESOLUTION:
Added ODM support for Solaris 11.4 release.

* 4006755 (Tracking ID: 4006757)

SYMPTOM:
ODM module failed to load on Solaris 11.4 x86 arch.

DESCRIPTION:
The ODM module failed to load on Solaris 11.4 x86 arch release.

RESOLUTION:
Added ODM support for Solaris 11.4 x86 arch release.

* 4007647 (Tracking ID: 3616753)

SYMPTOM:
In solaris 11, odm module is not loading automatically after rebooting
the machine.

DESCRIPTION:
The ODM service is offline and /dev/odm is not mounted due to which
ODM module is not loading automatically after rebooting the machine.

RESOLUTION:
Code is added to bring up the odm services as soon as the package is
installed.

Patch ID: VRTSodm-6.0.500.400

* 3799901 (Tracking ID: 3451730)

SYMPTOM:
Installation of VRTSodm, VRTSvxfs in a zone fails when
running zoneadm -z Zoneattach -U

DESCRIPTION:
When you upgrade a zone using attach U option, the checkinstall
script is executed. There were certain zone-irrelevant commands (which should
not be executed during attach) in the checkinstall script which failed the
installation of VRTSodm, VRTSvxfs.

RESOLUTION:
Code is added in the postinstall script to fix the checkinstall script.

Patch ID: VRTSodm-6.0.500.100

* 3515842 (Tracking ID: 3481825)

SYMPTOM:
The panic stack is like this:
unix:panicsys
unix:vpanic_common
unix:panic
unix:die
unix:trap
unix:ktl0
zfs:dmu_objset_spa
zvol_ioctl
specfs:spec_ioctl
genunix:fop_ioctl
odm:odm_raw_check
odm:odm_identify
odm:odmioctl
genunix:fop_ioctl

DESCRIPTION:
The odm_raw_check()function calls fop_ioctl() without first opening the zvol folder.

RESOLUTION:
Symantec has disabled Oracle Disk Manager (ODM)for Solaris on non-VxVM raw devices.

* 3544831 (Tracking ID: 3525858)

SYMPTOM:
The system panics in the odmmount_getzid() function with the following stack:

odm:odmmount_getzid ()
odm:odmroot_zone ()
odm:odmroot()
genunix:fsop_root(
genunix:lookuppnvp()
genunix:lookuppnat()
genunix:lookupnameat()
genunix:cstatat_getvp()
genunix:cstatat64_32()
unix:syscall_trap32()

DESCRIPTION:
Whenever /dev/odm is mounted inside a zone, ODM maintains information for that mount in a linked list of elements - each element representing a zone mount. This linked list gets corrupted if zones are unmounted simultaneously, due to a race condition. When such a list is referred to during subsequent zone mounts, it leads to panic.

RESOLUTION:
Symantec has fixed the race condition in odmmount_putentry() called from the ODM zone umount code paths.

Patch ID: VRTSodm-6.0.500.000

* 3322294 (Tracking ID: 3323866)

SYMPTOM:
Some ODM operations may fail with the following error:
ODM ERROR V-41-4-1-328-22 Invalid argument

DESCRIPTION:
On systems having heavy database activity using ODM some operations may fail an error. This is a corner case and it occurs when a new task enters in ODM. To avoid deadlocks ODM maintains two lists of tasks viz. hold list and deny list. All the active tasks are maintained in the hold list and the task that are being exited from ODM are stored in the deny list. The error is returned when the ODM PID structure gets re-used for a PID that is still being exited from the ODM and is there in the deny list in that case ODM don't allow the task to enter in the ODM and above error is returned.

RESOLUTION:
The code is modified such as to add an extra check while adding a new task in ODM to avoid returning the error in such scenarios.

Patch ID: VRTSodm-6.0.300.000

* 3018873 (Tracking ID: 3018869)

SYMPTOM:
fsadm command shows that the mountpoint is not a vxfs file system

DESCRIPTION:
The Solaris11 update1 has some changes in function fstatvfs() [VFS layer] which
breaks VxFS's previous assumptions. The statvfs.f_basetype gets populated with
some garbage value instead of "vxfs". So, during the fsadm, when we check for
the file system type, the check fails and so we get the error.

RESOLUTION:
Made changes to fetch correct value for fstype using OS provided API's so that
the statvfs.f_basetype field gets valid i.e. "vxfs" value.

Patch ID: VRTSvxfs-6.0.500.6200

* 3990430 (Tracking ID: 3911048)

SYMPTOM:
The LDH bucket validation failure message is logged and system hang.

DESCRIPTION:
When modifying a large directory, vxfs needs to find a new bucket in the LDH
for this
directory, and once the bucket is full, it will be be split to get more
bucket to use.
When the bucket is split to maximum amount, overflow bucket will be
allocated. Under
some condition, the available bucket lookup on overflow bucket will may got
incorrect
result and overwrite the existing bucket entry thus corrupt the LDH file.
Another
problem is that when the bucket invalidation failed, the bucket buffer is
released
without checking whether the buffer is already in a previous transaction,
this may
cause the transaction flush thread to hang and finally stuck the whole
filesystem.

RESOLUTION:
Correct the LDH bucket entry change code to avoid the corrupt. And release
the bucket
buffer without throw it out of memory to avoid blocking the transaction
flush.

* 4003603 (Tracking ID: 3972641)

SYMPTOM:
Handle deprecated Solaris function calls in VxFS code

DESCRIPTION:
The page_numtopp(_nolock) and hat_getpfnum has deprecated, so can not use it in VxFS code

RESOLUTION:
Appropriate code changes are done in VxFS

* 4003629 (Tracking ID: 3968785)

SYMPTOM:
VxFS module failed to load on Solaris 11.4.

DESCRIPTION:
The VxFS module failed to load on Solaris 11.4 release, due to the kernel level changes in 11.4 kernel.

RESOLUTION:
Added VxFS support for Solaris 11.4 release.

* 4004573 (Tracking ID: 3560187)

SYMPTOM:
The kernel may panic when the buffer is freed in the vx_dexh_preadd_space()
function with the message "Data Key Miss Fault in kernel mode". The following
stack trace is observed:
kmem_arena_free()
vx_free()
vx_dexh_preadd_space()
vx_dopreamble()
vx_dircreate_tran()
vx_do_create()
vx_create1()
vx_create0()
vx_create()
vn_open()

DESCRIPTION:
The buffers in the extended-hash structure are allocated, zeroed outside, and
freed outside the transaction retry loop. For some error scenarios, the
transaction is re-executed from the beginning. Since the buffers are zeroed
outside of the transaction retry loop, during the transaction retry the
extended-hash structure may have some stale buffers from the last try. As a
result, some stale parts of the structure are freed incorrectly.This results in
panic.

RESOLUTION:
The code is modified to zero-out the extended-hash structure within the retry
loop, so that the stale values are not used during retry.

* 4004574 (Tracking ID: 3779916)

SYMPTOM:
vxfsconvert fails to upgrade layout verison for a vxfs file system with
large number of inodes. Error message will show some inode discrepancy.

DESCRIPTION:
vxfsconvert walks through the ilist and converts inode. It stores
chunks of inodes in a buffer and process them as a batch. The inode number
parameter for this inode buffer is of type unsigned integer. The offset of a
particular inode in the ilist is calculated by multiplying the inode number with
size of inode structure. For large inode numbers this product of inode_number *
inode_size can overflow the unsigned integer limit, thus giving wrong offset
within the ilist file. vxfsconvert therefore reads wrong inode and eventually
fails.

RESOLUTION:
The inode number parameter is defined as unsigned long to avoid
overflow.

* 4004575 (Tracking ID: 3870832)

SYMPTOM:
System panic due to a race between force umount and the nfs lock
manager
thread trying to get a vnode with the stack as below:

vx_active_common_flush
vx_do_vget
vx_vget
fsop_vget
lm_nfs3_fhtovp
lm_get_vnode
lm_unlock
lm_nlm4_dispatch
svc_getreq
svc_run
svc_do_run
nfssys

DESCRIPTION:
When the nfs mounted filesystem is unshared and force unmounted,
if
there is a file that was locked from the nfs client before that, there could be
a
panic. In nfs3 the unshare does not clear the existing locks or clear/kill the
lock manager threads, so when the force umount wins the race, it would go and
free
the vx_fsext and vx_vfs structures. Later when the lockmanager threads try to
get
the vnode of this force unmounted filesystem it panics on the vx_fsext
structure
that is freed.

RESOLUTION:
The code is modified to mark the solaris vfs flag with
VFS_UNMOUNTED
flag during a force umount. This flag is later checked in the vx_vget function
when the lock manager thread comes to get vnode, if the flag is set, then it
returns an error.

* 4004576 (Tracking ID: 3904794)

SYMPTOM:
Extending qio file fail with EINVAL error if reservation block is not set.

DESCRIPTION:
when extending file size through the qiomkfile, extend size will be calculated
depending on
reserve block. but in a case, if the reserve block is reset, extending the file will fail with
EINVAL error because
qiomkfile issues setext ioctl with a size which is smaller than current file size.

RESOLUTION:
Code is modified to calculate new extend size depending on the maximum of
reserved blocks
and currently allocated blocks to the file.

* 4004578 (Tracking ID: 3457803)

SYMPTOM:
File System gets disabled with the following message in the system log:
WARNING: V-2-37: vx_metaioerr - vx_iasync_wait - /dev/vx/dsk/testdg/test file system meta data write error in dev/block

DESCRIPTION:
The inode's incore information gets inconsistent as one of its field is getting modified without the locking protection.

RESOLUTION:
Protect the inode's field properly by taking the lock operation.

* 4005873 (Tracking ID: 3978615)

SYMPTOM:
VxFS filesystem is not getting mounted after OS upgrade and first reboot

DESCRIPTION:
The vxfs-modload service is not getting called before local_fs service. The vxfs-modload service is use to replace the appropriate kernel modules after OS upgrade and reboot. The vxfs device files are also not configuring properly after system boot from sol11.4. Due to this the filesystem is not getting mounted after OS upgrade and first reboot.

RESOLUTION:
Code changes are done in service file.

* 4005890 (Tracking ID: 3898565)

SYMPTOM:
System panicked with this stack:
- panicsys
- panic_common
- panic
- segshared_fault
- as_fault
- vx_memlock
- vx_dio_zero
- vx_dio_rdwri
- vx_dio_read
- vx_read_common_inline
- vx_read_common_noinline
- vx_read1
- vx_read
- fop_read

DESCRIPTION:
Solaris no longer supports F_SOFTLOCK. The vx_memlock() uses F_SOFTLOCK to fault
in the page.

RESOLUTION:
Change vxfs code to avoid using F_SOFTLOCK.

* 4006754 (Tracking ID: 4006756)

SYMPTOM:
VxFS module failed to load on Solaris 11.4 x86 arch.

DESCRIPTION:
The VxFS module failed to load on Solaris 11.4 x86 arch release.

RESOLUTION:
Added VxFS support for Solaris 11.4 x86 arch release.

Patch ID: VRTSvxfs-6.0.500.400

* 2927359 (Tracking ID: 2927357)

SYMPTOM:
Assert hit in internal testing.

DESCRIPTION:
Got the assert in internal testing when attribute inode being purged is marked
bad or the file system is disabled

RESOLUTION:
Code is modified to return EIO in such cases.

* 2933300 (Tracking ID: 2933297)

SYMPTOM:
Compression support for the dedup ioctl.

DESCRIPTION:
Compression support for the dedup ioctl for NBU.

RESOLUTION:
Added limited support for compressed extents to the dedup ioctl for NBU.

* 2972674 (Tracking ID: 2244932)

SYMPTOM:
Deadlock is seen when trying to reuse the inode number in case of
checkpoint.

DESCRIPTION:
In the presence of a checkpoint, there is a possibility that inode
number is assigned in such a way that parent->child can become child->parent in
different fset. This is leading to a deadlock scenario. So, possible fix is that
in presence of checkpoint , when its mounted, dont try to take a blocking lock
on the second inode(as we do it in rename vnode operation ) , as this inode may
have been reused in the clone fset and lock has been already been
taken, as even if there is no push set for the inode, rwlock goes on whole
clone chain, even if chain inodes are unrelated. In the context of rename vnode
operation , when there is a need for creation of a hidden hash directory, parent
directory need to be exclusively rwlocked(right now, its a blocking lock) ,
this may also lead to a scenario where two renames are going
simultaneously and same parent inode involved can cause deadlock.

RESOLUTION:
Code is modified to avoid this deadlock.

* 3040130 (Tracking ID: 3137886)

SYMPTOM:
Thin Provisioning Logging does not work for reclaim operations
triggered via fsadm command.

DESCRIPTION:
Thin Provisioning Logging does not work for reclaim operations
triggered via fsadm command.

RESOLUTION:
Code is added to log reclamation issued by fsadm command; to create
backup log file once size of reclaim log file exceeds 1MB and for saving command
string of fsadm command.

* 3248031 (Tracking ID: 2628207)

SYMPTOM:
A full fsck operation on a file system with a large number of Access Control
List (ACL) settings and checkpoints takes a long time (in some cases, more than
a week) to complete.

DESCRIPTION:
The fsck operation is mainly blocked in pass1d phase. The process of pass1d is
as follows:
1. For each fileset, pass1d goes through all the inodes in the ilist of
the fileset and then retrieves the inode information from the disk.
2. It extracts the corresponding attribute inode number.
3. Then, it reads the attribute inode and counts the number of rules
inside it.
4. Finally, it reads the rules into the buffer and performs the checking.

The attribute rules data can be parsed anywhere on the file system and the
checkpoint link may be followed to locate the real inodes, which consumes a
significant amount of time.

RESOLUTION:
The code is modified to:
1. Add a read ahead mechanism for the attribute ilist which boosts the
read for attribute inodes via buffering.
2. Add a bitmap to record those attribute inodes which have been already
checked to avoid redundant checks.
3. Add an option to the fsck operation which enables it to check a
specific fileset separately rather than checking the entire file system.

* 3682640 (Tracking ID: 3637636)

SYMPTOM:
Cluster File System (CFS) node initialization and protocol upgrade may hang
during rolling upgrade with the following stack trace:
vx_svar_sleep_unlock()
vx_event_wait()
vx_async_waitmsg()
vx_msg_broadcast()
vx_msg_send_join_version()
vx_msg_send_join()
vx_msg_gab_register()
vx_cfs_init()
vx_cfs_reg_fsckd()
vx_cfsaioctl()
vxportalunlockedkioctl()
vxportalunlockedioctl()

And

vx_delay()
vx_recv_protocol_upgrade_intent_msg()
vx_recv_protocol_upgrade()
vx_ctl_process_thread()
vx_kthread_init()

DESCRIPTION:
CFS node initialization waits for the protocol upgrade to complete. Protocol
upgrade waits for the flag related to the CFS initialization to be cleared. As
the result, the deadlock occurs.

RESOLUTION:
The code is modified so that the protocol upgrade process does not wait to clear
the CFS initialization flag.

* 3726112 (Tracking ID: 3704478)

SYMPTOM:
The library routine to get the mount point, fails to return mount point
of root file system.

DESCRIPTION:
For getting the mount point of a path, we scan the input path name
and get the nearest path which represents a mount point. But, when the file is
in the root file system ("/"), this function returns an error code 1 and hence
does not return the mount point of that file. This is because of a bug in the
logic of path name parsing, which neglects the root mount point while parsing.

RESOLUTION:
Fix the logic in path name parsing so that the mount point of root
file system is returned.

* 3796626 (Tracking ID: 3762125)

SYMPTOM:
Directory size sometimes keeps increasing even though the number of files inside it doesn't
increase.

DESCRIPTION:
This only happens to CFS. A variable in the directory inode structure marks the start of
directory free space. But when the directory ownership changes, the variable may become stale, which
could cause this issue.

RESOLUTION:
The code is modified to reset this free space marking variable when there's
ownershipchange. Now the space search goes from beginning of the directory inode.

* 3796633 (Tracking ID: 3729158)

SYMPTOM:
The fuser and other commands hang on VxFS file systems.

DESCRIPTION:
The hang is seen while 2 threads contest for 2 locks -ILOCK and PLOCK. The writeadvise thread owns the ILOCK but is waiting for the PLOCK, while the dalloc thread owns the PLOCK and is waiting for the ILOCK.

RESOLUTION:
The code is modified to correct the order of locking. Now PLOCK is followed by ILOCK.

* 3796644 (Tracking ID: 3269553)

SYMPTOM:
VxFS returns inappropriate message for read of hole via ODM.

DESCRIPTION:
Sometimes sparse files containing temp or backup/restore files are
created outside the Oracle database. And, Oracle can read these files only using
the ODM. As a result, ODM fails with an ENOTSUP error.

RESOLUTION:
The code is modified to return zeros instead of an error.

* 3796652 (Tracking ID: 3686438)

SYMPTOM:
System panicked with NMI during file system transaction flushing.

DESCRIPTION:
In vx_iflush_list we take the icachelock to traverse the icache list and flush
the dirty inodes on the icache list to the disk. In that context, when we do a
vx_iunlock we may sleep while flushing and holding the icachelock which is a
spinlock. The other processors which are busy waiting for the same icache lock
spinlock have the interrupts disabled, and this results in the NMI panic.

RESOLUTION:
In vx_iflush_list, use VX_iUNLOCK_NOFLUSH instead of vx_iunlock which avoids
flushing and sleeping holding the spinlock.

* 3796684 (Tracking ID: 3601198)

SYMPTOM:
Replication copies the 64-bit external quota files ('quotas.64' and
'quotas.grp.64') to the destination file system.

DESCRIPTION:
The external quota files hold the quota limits for users and
groups. While replicating the file system, the vfradmin (1M) command copies
these external quota files to the destination file system. But, the quota limits
of source FS may not be applicable for destination FS. Hence, we should ideally
skip the external quota files from being replicated.

RESOLUTION:
Exclude the 64-bit external quota files in the replication process.

* 3796687 (Tracking ID: 3604071)

SYMPTOM:
With the thin reclaim feature turned on, you can observe high CPU usage on the
vxfs thread process.

DESCRIPTION:
In the routine to get the broadcast information of a node which contains maps of
Allocation Units (AUs) for which node holds the delegations, the locking
mechanism is inefficient. Thus every time when this routine is called, it will
perform a series of down-up operation on a certain semaphore. This can result in
a huge CPU cost when many threads calling the routine in parallel.

RESOLUTION:
The code is modified to optimize the locking mechanism in the routine to get the
broadcast information of a node which contains maps of Allocation Units (AUs)
for which node holds the delegations, so that it only does down-up operation on
the semaphore once.

* 3796727 (Tracking ID: 3617191)

SYMPTOM:
Checkpoint creation may take hours.

DESCRIPTION:
During checkpoint creation, with an inode marked for removal and
being overlaid, there may be a downstream clone and VxFS starts pulling all the
data. With Oracle it's evident because of temporary files deletion during
checkpoint creation.

RESOLUTION:
The code is modified to selectively pull the data, only if a
downstream push inode exists for file.

* 3796731 (Tracking ID: 3558087)

SYMPTOM:
When stat system call is executed on VxFS File System with delayed
allocation feature enabled, it may take long time or it may cause high cpu
consumption.

DESCRIPTION:
When delayed allocation (dalloc) feature is turned on, the
flushing process takes much time. The process keeps the get page lock held, and
needs writers to keep the inode reader writer lock held. Stat system call may
keeps waiting for inode reader writer lock.

RESOLUTION:
Delayed allocation code is redesigned to keep the get page lock
unlocked while flushing.

* 3796733 (Tracking ID: 3695367)

SYMPTOM:
Unable to remove volume from multi-volume VxFS using "fsvoladm" command. It fails with "Invalid argument" error.

DESCRIPTION:
Volumes are not being added in the in-core volume list structure correctly. Therefore while removing volume from multi-volume VxFS using "fsvoladm", command fails.

RESOLUTION:
The code is modified to add volumes in the in-core volume list structure correctly.

* 3796745 (Tracking ID: 3667824)

SYMPTOM:
Race between vx_dalloc_flush() and other threads turning off dalloc
and reusing the inode results in a kernel panic.

DESCRIPTION:
Dalloc flushing can race with dalloc being disabled on the
inode. In vx_dalloc_flush we drop the dalloc lock, the inode is no longer held,
so the inode can be reused while the dalloc flushing is still in
progress.

RESOLUTION:
Resolve the race by taking a hold on the inode before doing the
actual dalloc flushing. The hold prevents the inode from getting reused and
hence prevents the kernel panic.

* 3796759 (Tracking ID: 3615043)

SYMPTOM:
At times, while writing to a file, data could be missed.

DESCRIPTION:
While writing to a file when delayed allocation is on, Solaris
could dishonor the NON_CLUSTERING flag and cluster pages beyond the range for
which we have issued the flushing, leading to data loss.

RESOLUTION:
Make sure we clear the flag and flush the exact range, in case of
dalloc.

* 3796763 (Tracking ID: 2560032)

SYMPTOM:
System may panics while upgrading VRTSvxfs in the presence of a zone
mounted on VxFS.

DESCRIPTION:
When the upgrade happens from base version to the target version, The post
install script unloads the base level fdd module and loads the target level fdd
modules when the VxFS module is still at the "base version" level. This leads
to an inconsistency in the file device driver (fdd) and VxFS modules.

RESOLUTION:
The post install script is modified such as to avoid inconsistency.

* 3796766 (Tracking ID: 3451730)

SYMPTOM:
Installation of VRTSodm, VRTSvxfs in a zone fails when
running zoneadm -z Zoneattach -U

RESOLUTION:
Code is added in the postinstall script to fix the checkinstall script.

* 3799999 (Tracking ID: 3602322)

SYMPTOM:
System may panic while flushing the dirty pages of the inode.

DESCRIPTION:
Panic may occur due to the synchronization problem between one
thread that flushes the inode, and the other thread that frees the chunks that
contain the inodes on the freelist.

The thread that frees the chunks of inodes on the freelist grabs an inode, and
clears/de-reference the inode pointer while deinitializing the inode. This may
result in the pointer de-reference, if the flusher thread is working on the same
inode.

RESOLUTION:
The code is modified to resolve the race condition by taking proper
locks on the inode and freelist, whenever a pointer in the inode is de-referenced.

If the inode pointer is already de-initialized to NULL, then the flushing is
attempted on the next inode.

* 3821416 (Tracking ID: 3817734)

SYMPTOM:
If file system with full fsck flag set is mounted, direct command message
is printed to the user to clean the file system with full fsck.

DESCRIPTION:
When mounting file system with full fsck flag set, mount will fail
and a message will be printed to clean the file system with full fsck. This
message contains direct command to run, which if run without collecting file
system metasave will result in evidences being lost. Also since fsck will remove
the file system inconsistencies it may lead to undesired data being lost.

RESOLUTION:
More generic message is given in error message instead of direct
command.

* 3821490 (Tracking ID: 3451730)

SYMPTOM:
Installation of VRTSodm, VRTSvxfs in a zone fails when
running zoneadm -z Zoneattach -U

RESOLUTION:
Code is added in the postinstall script to fix the checkinstall script.

* 3851967 (Tracking ID: 3852324)

SYMPTOM:
Assert failure during internal stress testing.

DESCRIPTION:
While reading the partition directory, offset is right shifted by 8
but while retrying, offset wasn't left shifted back to original value. This can
lead to offset as 0 which results in assert failure.

RESOLUTION:
Code is modified to left shift offset by 8 before retrying.

* 3852297 (Tracking ID: 3553328)

SYMPTOM:
During internal testing it was found that per node LCT file was
corrupted, due to which attribute inode reference counts were mismatching,
resulting in fsck failure.

DESCRIPTION:
During clone creation LCT from 0th pindex is copied to the new
clone's LCT. Any update to this LCT file from non-zeroth pindex can cause count
mismatch in the new fileset.

RESOLUTION:
The code is modified to handle this issue.

* 3852512 (Tracking ID: 3846521)

SYMPTOM:
cp -p is failing with EINVAL for files with 10 digit
modification time. EINVAL error is returned if the value in tv_nsec field is
greater than/outside the range of 0 to 999, 999, 999. VxFS supports the
update in usec but when copying in the user space, we convert the usec to
nsec. So here in this case, usec has crossed the upper boundary limit i.e
999, 999.

DESCRIPTION:
In a cluster, its possible that time across nodes might
differ.so
when updating mtime, vxfs check if it's cluster inode and if nodes mtime is
newer
time than current node time, then accordingly increment the tv_usec instead of
changing mtime to older time value. There might be chance that it, tv_usec
counter got overflowed here, which resulted in 10 digit mtime.tv_nsec.

RESOLUTION:
Code is modified to reset usec counter for mtime/atime/ctime when
upper boundary limit i.e. 999999 is reached.

* 3862425 (Tracking ID: 3859032)

SYMPTOM:
System panics in vx_tflush_map() due to NULL pointer dereference.

DESCRIPTION:
When converting VxFS using vxconvert, new blocks are allocated to
the structural files like smap etc which can contain garbage. This is done with
the expectation that fsck will rebuild the correct smap. but in fsck, we have
missed to distinguish between EAU fully EXPANDED and ALLOCATED. because of
which, if allocation to the file which has the last allocation from such
affected EAU is done, it will create the sub transaction on EAU which are in
allocated state. Map buffers of such EAUs are not initialized properly in VxFS
private buffer cache, as a result, these buffers will be released back as stale
during the transaction commit. Later, if any file-system wide sync tries to
flush the metadata, it can refer to these buffer pointers and panic as these
buffers are already released and reused.

RESOLUTION:
Code is modified in fsck to correctly set the state of EAU on
disk. Also, modified the involved code paths as to avoid using doing
transactions on unexpanded EAUs.

* 3862435 (Tracking ID: 3833816)

SYMPTOM:
In a CFS cluster, one node returns stale data.

DESCRIPTION:
In a 2-node CFS cluster, when node 1 opens the file and writes to
it, the locks are used with CFS_MASTERLESS flag set. But when node 2 tries to
open the file and write to it, the locks on node 1 are normalized as part of
HLOCK revoke. But after the Hlock revoke on node 1, when node 2 takes the PG
Lock grant to write, there is no PG lock revoke on node 1, so the dirty pages on
node 1 are not flushed and invalidated. The problem results in reads returning
stale data on node 1.

RESOLUTION:
The code is modified to cache the PG lock before normalizing it in
vx_hlock_putdata, so that after the normalizing, the cache grant is still with
node 1.When node 2 requests PG lock, there is a revoke on node 1 which flushes
and invalidates the pages.

* 3864751 (Tracking ID: 3867147)

SYMPTOM:
Assert failed in internal dedup testing.

DESCRIPTION:
If dedup is run on the same file twice simultaneously and extent split
is happened, then the second extent split of the same file can cause assert
failure. It is due to stale extent information being used for second split after
first one.

RESOLUTION:
Code is modified to handle lookup for new bmap if same inode detected.

* 3866970 (Tracking ID: 3866962)

SYMPTOM:
Data corruption seen when dalloc writes are going on the file and
simultaneously fsync started on the same file.

DESCRIPTION:
In case if dalloc writes are going on the file and simultaneously
synchronous flushing is started on the same file, then synchronous flushing will try
to flush all the dirty pages of the file without considering underneath allocation.
In this case, flushing can happen on the unallocated blocks and this can result into
data loss.

RESOLUTION:
Code is modified to flush data till actual allocation in case of dalloc
writes.

Patch ID: VRTSvxfs-6.0.500.100

* 3402643 (Tracking ID: 3413926)

SYMPTOM:
Internal testing hangs due to high memory consumption resulting in fork failure

DESCRIPTION:
The issue of high swap usage occurs with recent updates of Solaris10 and Solaris11. This issue is predominantly seen with internal stress/noise testing. The recent solaris update release had increased ncpu. As large number of buffer cache free lists in VxFS are spawned with respect to ncpu, there is high memory consumption which results in fork failure.

RESOLUTION:
For large number of CPU greater than 16, the number of buffer cache free lists is
adjusted according to the maximum number of CPU supported.

* 3469683 (Tracking ID: 3469681)

SYMPTOM:
Free space defragmentation results into EBUSY error and file system is disabled.

DESCRIPTION:
While remounting the file system, the re-initialization gives EBUSY error if the in-core and on-disk version numbers of an inode does not match. When pushing data blocks to the clone, the inode version of the immediate clone inode is bumped. But if there is another clone in the chain, then the ILIST extent of this immediate clone inode is not pushed onto that clone. This is not right because the inode has been modified.

RESOLUTION:
The code is modified so that the ILIST extents of the immediate clone inode is pushed onto the next clone in chain.

* 3498950 (Tracking ID: 3356947)

SYMPTOM:
VxFS doesnAt work as fast as expected when multi-threaded writes are
issued onto a file, which is interspersed by fsync calls.

DESCRIPTION:
When multi-threaded writes are issued with fsync calls in between the writes, fsync can serialise the writes by taking IRWLOCK on the inode and doing the whole file putpages. Therefore out-of-the box performance is relatively slow in terms of throughput.

RESOLUTION:
The code is fixed to remove the fsync's serialisation with IRWLOCK and make it conditional only for some cases.

* 3498976 (Tracking ID: 3434811)

SYMPTOM:
In VxFS 6.1, the vxfsconvert(1M) command hangs within the vxfsl3_getext()
Function with following stack trace:

search_type()
bmap_typ()
vxfsl3_typext()
vxfsl3_getext()
ext_convert()
fset_convert()
convert()

DESCRIPTION:
There is a type casting problem for extent size. It may cause a non-zero value to overflow and turn into zero by mistake. This further leads to infinite looping inside the function.

RESOLUTION:
The code is modified to remove the intermediate variable and avoid type casting.

* 3498978 (Tracking ID: 3424564)

SYMPTOM:
fsppadm fails with ENODEV and "file is encrypted or is not a database"
errors

DESCRIPTION:
The error handler was missing for ENODEV, while we process only the
directory inodes and the database got corrupted for 2nd error.

RESOLUTION:
Added a error handler to ignore the ENODEV while processing
directory inode only and for database corruption: we added a log message to
capture all the db logs to understand/know why corruption happened.

* 3499005 (Tracking ID: 3469644)

SYMPTOM:
System panics in the vx_logbuf_clean() function while traversing chain of transactions off the intent log buffer. The stack trace is as follows:

vx_logbuf_clean ()
vx_logadd ()
vx_log()
vx_trancommit()
vx_exh_hashinit ()
vx_dexh_create ()
vx_dexh_init ()
vx_pd_rename ()
vx_rename1_pd()
vx_do_rename ()
vx_rename1 ()
vx_rename ()
vx_rename_skey ()

DESCRIPTION:
The system panics as the vx_logbug_clean() function tries to access an already freed transaction from transaction chain to flush it to log.

RESOLUTION:
The code is modified to make sure that transaction gets flushed to the log before it is freed.

* 3499008 (Tracking ID: 3484336)

SYMPTOM:
The fidtovp() system call can panic in the vx_itryhold_locked() function with the following stack trace:

vx_itryhold_locked
vx_iget
vx_common_vget
vx_do_vget
vx_vget_skey
vfs_vget
fidtovp
kernel_add_gate_cstack
nfs3_fhtovp
rfs3_getattr
rfs_dispatch
svc_getreq
threadentry
[kdb_read_mem]

DESCRIPTION:
Some VxFS operations like the vx_vget() function try to get a hold on an in-core inode using the vx_itryhold_locked() function, but it doesnAt take the lock on the corresponding directory inode. This may lead to a race condition when this inode is present on the delicache list and is inactivated. Thereby this results in a panic when the vx_itryhold_locked() function tries to remove it from a free list. This is actually a known issue, but the last fix was not complete. It missed some functions which may also cause the race condition.

RESOLUTION:
The code is modified to take inode list lock inside the vx_inactive_tran(), vx_tranimdone() and vx_tranuninode() functions to avoid race condition.

* 3499011 (Tracking ID: 3486726)

SYMPTOM:
VFR logs too much data on the target node.

DESCRIPTION:
At the target node, it logs debug level messages evenif the debug mode was off. Also it doesnAt consider the debug mode specified at the time of job creation.

RESOLUTION:
The code is modified to not log the debug level messages on the target node if the specified debug mode is set off.

* 3499030 (Tracking ID: 3484353)

SYMPTOM:
It is a self-deadlock caused by a missing unlock of DIRLOCK. Its typical stack trace is like the following:

slpq_swtch_core()
real_sleep()
sleep_one()
vx_smp_lock()
vx_dirlock()
vx_do_rename()
vx_rename1()
vx_rename()
vn_rename()
rename()
syscall()

DESCRIPTION:
When a partitioned directory feature (PD) of Veritas File System (VxFS) is enabled, there is a possibility of self-deadlock when there are multiple renaming threads operating on the same target directory.
The issue is due to the fact that there is a missing unlock of DIRLOCK in the vx_int_rename() function.

RESOLUTION:
The code is modified by adding missing unlock for directory lock in the vx_int_rename()function..

* 3514824 (Tracking ID: 3443430)

SYMPTOM:
Fsck allocates too much memory.

DESCRIPTION:
Since Storage Foundation 6.0, parallel inode list processing with multiple threads is introduced to help reduce the fsck time. However, the parallel threads have to allocate redundant memory instead of reusing buffers in the buffer cache efficiently when inode list has many holes.

RESOLUTION:
The code is fixed to make each thread to maintain its own buffer cache from which it can reuse free memory.

* 3515559 (Tracking ID: 3498048)

SYMPTOM:
while the system is making backup, the Als AlA command on the same file system may hang.

DESCRIPTION:
When the dalloc (delayed allocation) feature is turned on, flushing takes quite a lot of time which keeps hold on getpage lock, as this lock is needed by writers which keep read write lock held on inodes. The Als AlA command needs ACLs(access control lists) to display information. But in Veritas File System (VxFS), ACLS are accessed only under protection of inode read write lock, which results in the hang.

RESOLUTION:
The code is modified to turn dalloc off and improve write throttling by restricting the kernel flusher from updating Intenal counter for write page flush..

* 3517702 (Tracking ID: 3517699)

SYMPTOM:
Return code 240 for command fsfreeze(1M) is not documented in man page for fsfreeze.

DESCRIPTION:
Return code 240 for command fsfreeze(1M) is not documented in man page for fsfreeze.

RESOLUTION:
The man page for fsfreeze(1M) is modified to document return code 240.

* 3579957 (Tracking ID: 3233315)

SYMPTOM:
"fsck" utility dumps core, while checking the RCT file.

DESCRIPTION:
"fsck" utility dumps core, while checking the RCT file. "bmap_search_typed()"
function is passed with wrong parameter, and results in the core dump with the
following stack trace:

bmap_get_typeparms ()
bmap_search_typed_raw()
bmap_search_typed()
rct_walk()
bmap_check_typed_raw()
rct_check()
main()

RESOLUTION:
Fixed the code to pass the correct parameters to "bmap_search_typed()" function.

* 3581566 (Tracking ID: 3560968)

SYMPTOM:
The delicache_enable tunable is inconsistent in the CFS environment.

DESCRIPTION:
On the secondary nodes, the tunable values are exported from the primary mount, while the delicache_enable tunable value comes from the AtunefstabA file. Therefore the tunable values are not persistent.

RESOLUTION:
The code is fixed to read the "tunefstab" file only for the delicache_enable tunable during mount and set the value accordingly.

* 3584297 (Tracking ID: 3583930)

SYMPTOM:
When external quota file is over-written or restored from backup, new settings which were added after the backup still remain.

DESCRIPTION:
The purpose of the quotaon operation is to copy the quota limits from external to internal quota file, because internal quota file is not always updated with correct limits. To complete the copy operation, the extent of external file is compared to the extent of internal file at the corresponding offset.
Now, if external quota file is overwritten (or restored to its original copy) and the size of internal file is more than that of external, the quotaon operation does not clear the additional (stale) quota records in the internal file. Later, the sync operation (part of quotaon) copies these stale records from internal to external file. Hence, both internal and external files contain stale records.

RESOLUTION:
The code is modified to get rid of the stale records in the internal file at the time of quotaon.

* 3590573 (Tracking ID: 3331010)

SYMPTOM:
Command fsck(1M) dumped core with segmentation fault.
Following stack is observed.

fakebmap()
rcq_apply_op()
rct_process_pending_tasklist()
process_device()
main()

DESCRIPTION:
While working on the device in function precess_device(), command fsck tries to
access already freed device related structures available in pending task list
during retry code path.

RESOLUTION:
Code is modified to free up the pending task list before retrying in function
precess_device().

* 3593181 (Tracking ID: 3331105)

SYMPTOM:
Command fsck can not handle the case wherein two reorg inodes point to
same source inode.

DESCRIPTION:
Command fsck don't know how to handle a case if a disk corruption
results in to a situation where two reorg inode points to a same source inode.
In this case, command fsck processes first reorg inode and clears the VX_IEREORG
flag on the corresponding source inode. Later at the mount time mount command
finds the second reorg inode pointing
to the same source inode which was already processed by fsck. This situation
causes mount to hit a internal assert which may result in to
mount failure in production environment.

RESOLUTION:
Command fsck now invalidates all the reorg inodes that points to the
same
source inode.

* 3597560 (Tracking ID: 3597482)

SYMPTOM:
The pwrite(2) function fails with EOPNOTSUPP when the write range is in two indirect extents.

DESCRIPTION:
When the range of pwrite() falls in two indirect extents, one ZFOD extent belonging to DB2 pre-allocated files created with setext( , VX_GROWFILE, ) ioctl and another DATA extent belonging to adjacent INDIR, write fails with EOPNOTSUPP.
The reason is that Veritas File System (VxFS) is trying to coalesce extents which belong to different indirect address extents as part of this transaction A such a meta-data change consumes more transaction resources which VxFS transaction engine is unable to support in the current implementation.

RESOLUTION:
The code is modified to retry the write transaction without combining the extents.

Patch ID: VRTSvxfs-6.0.500.000

* 2705336 (Tracking ID: 2059611)

SYMPTOM:
The system panics due to a NULL pointer dereference while flushing the
bitmaps to the disk and the following stack trace is displayed:
a|
a|
vx_unlockmap+0x10c
vx_tflush_map+0x51c
vx_fsq_flush+0x504
vx_fsflush_fsq+0x190
vx_workitem_process+0x1c
vx_worklist_process+0x2b0
vx_worklist_thread+0x78

DESCRIPTION:
The vx_unlockmap() function unlocks a map structure of the file
system. If the map is being used, the hold count is incremented. The
vx_unlockmap() function attempts to check whether this is an empty mlink doubly
linked list. The asynchronous vx_mapiodone routine can change the link at random
even though the hold count is zero.

RESOLUTION:
The code is modified to change the evaluation rule inside the
vx_unlockmap() function, so that further evaluation can be skipped over when map
hold count is zero.

* 2933290 (Tracking ID: 2756779)

SYMPTOM:
Write and read performance concerns on Cluster File System (CFS) when running
applications that rely on POSIX file-record locking (fcntl).

DESCRIPTION:
The usage of fcntl on CFS leads to high messaging traffic across nodes thereby
reducing the performance of readers and writers.

RESOLUTION:
The code is modified to cache the ranges that are being file-record locked on
the node. This is tried whenever possible to avoid broadcasting of messages
across the nodes in the cluster.

* 2933301 (Tracking ID: 2908391)

SYMPTOM:
Checkpoint removal takes too long if Veritas File System (VxFS) has a large
number of files. The cfsumount(1M) command could hang if removal of multiple
checkpoints is in progress for such a file system.

DESCRIPTION:
When removing a checkpoint, VxFS traverses every inode to determine if
pull/push is needed for upstream/downstream checkpoint in its chain. This is
time consuming if the file system has large number of files. This results in
the slow checkpoint removal.

The command "cfsumount -c fsname" forces the umounts operation on a VxFS file
system if there is any asynchronous checkpoint removal job in progress by
checking if the value of vxfs stat "vxi_clonerm_jobs" is larger than zero.
However, the stat does not count in the jobs in the checkpoint removal working
queue and the jobs are entered into the working queue. The "force umount"
operation does not happen even if there are pending checkpoint removal jobs
because of the incorrect value of "vxi_clonerm_jobs" (zero).

RESOLUTION:
For slow checkpoint removal issue:
Code is modified to create multiple threads to work on different Inode
Allocation Units (IAUs) in parallel and to reduce the inode push work by
sorting the checkpoint removal jobs by the creation time in ascending order and
enlarged the checkpoint push size.

For the cfsumount(1M) command hang issue:
Code is modified to add the counts of jobs in the working queue in
the "vxi_clonerm_jobs" stat.

* 2947029 (Tracking ID: 2926684)

SYMPTOM:
On systems with heavy transactions workload like creation, deletion of files
and so on, the system may panic with the following stack trace:
a|..
vxfs:vx_traninit+0x10
vxfs:vx_dircreate_tran+0x420
vxfs:vx_pd_create+0x980
vxfs:vx_create1_pd+0x1d0
vxfs:vx_do_create+0x80
vxfs:vx_create1+0xd4
vxfs:vx_create+0x158
a|..

DESCRIPTION:
In case of a delayed log, a transaction commit can complete before completing
the log write. The memory for transaction is freed before logging the
transaction and corrupts the transaction freelist causing the system to panic.

RESOLUTION:
The code is modified such that the transaction is not freed untill the log is
written.

* 2959557 (Tracking ID: 2834192)

SYMPTOM:
The mount operation fails after full fsck(1M) utility is run and displays the
following error message on the console:
'UX:vxfs mount.vxfs: ERROR: V-3-26881 : Cannot be mounted until it has been
cleaned by fsck. Please run "fsck -t vxfs -y MNTPNT" before mounting'.

DESCRIPTION:
When a CFS is mounted, VxFS validates the per-node-cut entries (PNCUT) which
are in-core against their counterparts on the disk. This validation failure
makes the mount unsuccessful for the full fsck. Full fsck is in the fourth pass
when it checks the free inode/extent maps and merges the dirty PNCUT files in-
core, and validates them with the corresponding on-disk values. However, if any
PNCUT entry is corrupted, then the fsck(1M) utility simply ignores it. This
results in the mount failure.

RESOLUTION:
The code is modified to enhance the fsck(1M) utility to handle any delinquent
PNCUT entries and rebuild them as required.

* 2978234 (Tracking ID: 2972183)

SYMPTOM:
"fsppadm enforce" takes longer than usual time force update the secondary
nodes than it takes to force update the primary nodes.

DESCRIPTION:
The ilist is force updated on secondary node. As a result the performance on
the secondary becomes low.

RESOLUTION:
Force update the ilist file on Secondary nodes only on error condition.

* 2978236 (Tracking ID: 2977828)

SYMPTOM:
The file system is marked bad after an inode table overflow error with
the following error messages:

kernel: vxfs: msgcnt 7911 mesg 014: V-2-14: vx_iget - inode table overflow
kernel: vxfs: msgcnt 7912 mesg 063: V-2-63: vx_fset_markbad -
<devicename> file system fileset (index <filesystem index>) marked bad
kernel: V-2-96: vx_setfsflags - <devicename> file system fullfsck
flag set - vx_fset_markbad

DESCRIPTION:
To remove a checkpoint, the system truncates every file that is
consumed by the checkpoint. When the number of the files are too large, the
inode cache may become full, leading to an ENFILE error (inode table full). And
the ENFILE error inappropriately sets the full fsck flag on the file system.

RESOLUTION:
The code is modified to convert the ENFILE error to the ENOSPC error
to fix the issue.

* 2982161 (Tracking ID: 2982157)

SYMPTOM:
During internal testing, the Af:vx_trancommit:4A debug asset was hit when the available transaction space is lesser than the required space.

DESCRIPTION:
The Af:vx_trancommit:4A assert is hit when available transaction space is lesser than required. During the file truncate operations, when VxFS calculates transaction space, it doesnAt consider the transaction space required in case the file has shared extents. As a result, the Af:vx_trancommit:4A debug assert is hit.

RESOLUTION:
The code is modified to take into account the extra transaction buffer space required when the file being truncated has shared extents.

* 2983249 (Tracking ID: 2983248)

SYMPTOM:
The vxrepquota(1M) command dumps core on a systems with more than 50
file systems mounted with the quota option. The stack trace is as follows:
/opt/VRTS/bin/vxrepquota
strlen+0x50()
sprintf+0x40()
..
..
main+0x6d4()

DESCRIPTION:
In vxrepquota(1M) and vxquotaon(1M), VxFS allocates an array of 50
pointers for the vfstab entries. Thus it can hold a maximum of 50 entries. If
there are more than 50 VxFS file system entries in the /etc/vfstab file, it
leads to system overflow.

RESOLUTION:
The code is modified to extend the size of the array listbuf to
1024, so that the overflow occurs only if there are more than 1024 VxFS file
system entries in the /etc/vfstab file.

* 2999566 (Tracking ID: 2999560)

SYMPTOM:
While trying to clear the 'metadataok' flag on a volume of the volume set, the 'fsvoladm'(1M) command gives error.

DESCRIPTION:
The 'fsvoladm'(1M) command sets and clears 'dataonly' and 'metadataok'flags on a volume in a vset on which VxFS is mounted.
The 'fsvoladm'(1M) command fails while clearing A the AmetadataokA flag and reports, an EINVAL (invalid argument) error for certain volumes. This failure occurs because while clearing the flag, VxFS reinitialize the reorg structure for some volumes. During re-initialization, VxFS frees the existing FS structures. However, it still refers to the stale device structure resulting in an EINVAL error.

RESOLUTION:
The code is modified to let the in-core device structure point to the updated and correct data.

* 3027250 (Tracking ID: 3031901)

SYMPTOM:
The 'vxtunefs(1M)' command accepts the garbage value for the
'max_buf_dat_size' tunable.

DESCRIPTION:
When the garbage value for the 'max_buf_dat_size' tunable using 'vxtunefs(1M)' is specified, the tunable accepts the value and gives the successful update message; but the value actually doesn't get reflected in the system. And, this error is not identified from parsing the command line value of THE 'max_buf_dat_size' tunable; hence the garbage value for this tunable is also accepted.

RESOLUTION:
The code is modified to handle the error returned from parsing the command line value of the 'max_buf_data_size' tunable.

* 3056103 (Tracking ID: 3197901)

SYMPTOM:
fset_get fails for the mention configuration

DESCRIPTION:
duplicate symbol fs_bmap in VxFS libvxfspriv.a and vxfspriv.so

RESOLUTION:
duplicate symbol fs_bmap in VxFS libvxfspriv.a and vxfspriv.so has
being fixed by renaming to fs_bmap_priv in the libvxfspriv.a

* 3059000 (Tracking ID: 3046983)

SYMPTOM:
There is an invalid CFS node number (<inode number>)
in ".__fsppadm_fclextract". This causes the Dynamic Storage Tiering (DST)
policy enforcement to fail.

DESCRIPTION:
DST policy enforcement sometimes depends on the extraction of the File Change
Log (FCL). When the FCL change log is processed, it reads the FCL records from
the change log into the buffer. If it finds that the buffer is not big enough
to hold the records, it will do some rollback and pass out the needed buffer
size. However, the rollback is not complete, this results in the problem.

RESOLUTION:
The code is modified to add the codes to the rollback content of "fh_bp1-
>fb_addr" and "fh_bp2->fb_addr".

* 3108176 (Tracking ID: 2667658)

SYMPTOM:
Attempt to perform an fscdsconv-endian conversion from the SPARC little-endian
byte order to the x86 big-endian byte order fails because of a macro overflow.

DESCRIPTION:
Using the fscdsconv(1M) command to perform endian conversion from the SPARC
little-endian (any SPARC architecture machine) byte order to the x86 big-endian
(any x86 architecture machine) byte order fails. The write operation for the
recovery file results in the control data offset (a hard coded macro to 500MB)
overflow.

RESOLUTION:
The code is modified to take an estimate of the control-data offset explicitly
and dynamically while creating and writing the recovery file.

* 3131798 (Tracking ID: 2839871)

SYMPTOM:
On a system with DELICACHE enabled, several file system operations may hang
with the following stack trace:

vx_delicache_inactive
vx_delicache_inactive_wp
vx_workitem_process
vx_worklist_process
vx_worklist_thread
vx_kthread_init

DESCRIPTION:
The DELICACHE lock is used to synchronize the access to the DELICACHE list and
it is held only while updating this list. However, in some cases it is held
longer and is released only after the issued I/O is completed, causing other
threads to hang.

RESOLUTION:
The code is modified to release the spinlock before issuing a blocking I/O
request.

* 3131799 (Tracking ID: 2833450)

SYMPTOM:
The fstyp(1M) command displays a negative value for ninode on file
systems more than 2 Tera Byte (TB).

DESCRIPTION:
When the "fstyp" command is run along with '-v' option, it prints
the information about the file system superblock. In Solaris, there is an
overflow bug in the printf for the number of data blocks in the file system For
example, if the dsize field has 64 bit assigned internally, when you run fstyp
command, whatever value it should print into dsize field , it prints in ninode
field. In case of 2TB file systems, it prints negative value.

RESOLUTION:
The code has been modified to use the appropriate format specifier
to manage the overflow.

* 3131826 (Tracking ID: 2966277)

SYMPTOM:
Systems with high file-system activity like read/write/open/lookup may panic
with the following stack trace due to a rare race condition:

spinlock+0x21 ( )
-> vx_rwsleep_unlock()
vx_ipunlock+0x40()
vx_inactive_remove+0x530()
vx_inactive_tran+0x450()
vx_local_inactive_list+0x30()
vx_inactive_list+0x420()
-> vx_workitem_process()
-> vx_worklist_process()
vx_worklist_thread+0x2f0()
kthread_daemon_startup+0x90()

DESCRIPTION:
ILOCK is released before doing a IPUNLOCK that causes a race condition. This
results in a panicwhen an inode that has been set free is accessed.

RESOLUTION:
The code is modified so that the ILOCK is used to protect the inodes' memory
from being set free, while the memory is being accessed.

* 3248029 (Tracking ID: 2439261)

SYMPTOM:
When the vx_fiostats_tunable is changed from zero to non-zero, the
system panics with the following stack trace:
vx_fiostats_do_update
vx_fiostats_update
vx_read1
vx_rdwr
vno_rw
rwuio
pread

DESCRIPTION:
When vx_fiostats_tunable is changed from zero to non-zero, all the
incore-inode fiostats attributes are set to NULL. When these attributes are
accessed, the system panics due to the NULL pointer dereference.

RESOLUTION:
The code has been modified to check the file I/O stat attributes are
present before dereferencing the pointers.

* 3248042 (Tracking ID: 3072036)

SYMPTOM:
Reads from secondary node in CFS can sometimes fail with ENXIO (No such device
or address).

DESCRIPTION:
The incore attribute ilist on secondary node is out of sync with that of the
primary.

RESOLUTION:
The code is modified such that incore attribute ilist on secondary node is force
updated with data from primary node.

* 3248046 (Tracking ID: 3092114)

SYMPTOM:
The information output by the "df -i" command can often be inaccurate for
cluster mounted file systems.

DESCRIPTION:
In Cluster File System 5.0 release a concept of delegating metadata to nodes in
the cluster is introduced. This delegation of metadata allows CFS secondary
nodes to update metadata without having to ask the CFS primary to do it. This
provides greater node scalability.
However, the "df -i" information is still collected by the CFS primary
regardless of which node (primary or secondary) the "df -i" command is executed
on.

For inodes the granularity of each delegation is an Inode Allocation Unit
[IAU], thus IAUs can be delegated to nodes in the cluster.
When using a VxFS 1Kb file system block size each IAU will represent 8192
inodes.
When using a VxFS 2Kb file system block size each IAU will represent 16384
inodes.
When using a VxFS 4Kb file system block size each IAU will represent 32768
inodes.
When using a VxFS 8Kb file system block size each IAU will represent 65536
inodes.
Each IAU contains a bitmap that determines whether each inode it represents is
either allocated or free, the IAU also contains a summary count of the number
of inodes that are currently free in the IAU.
The ""df -i" information can be considered as a simple sum of all the IAU
summary counts.
Using a 1Kb block size IAU-0 will represent inodes numbers 0 - 8191
Using a 1Kb block size IAU-1 will represent inodes numbers 8192 - 16383
Using a 1Kb block size IAU-2 will represent inodes numbers 16384 - 32768
etc.
The inaccurate "df -i" count occurs because the CFS primary has no visibility
of the current IAU summary information for IAU that are delegated to Secondary
nodes.
Therefore the number of allocated inodes within an IAU that is currently
delegated to a CFS Secondary node is not known to the CFS Primary. As a
result, the "df -i" count information for the currently delegated IAUs is
collected from the Primary's copy of the IAU summaries. Since the Primary's
copy of the IAU is stale, therefore the "df -i" count is only accurate when no
IAUs are currently delegated to CFS secondary nodes.
In other words - the IAUs currently delegated to CFS secondary nodes will cause
the "df -i" count to be inaccurate.
Once an IAU is delegated to a node it can "timeout" after a 3 minutes of
inactivity. However, not all IAU delegations will timeout. One IAU will always
remain delegated to each node for performance reasons. Also an IAU whose inodes
are all allocated (so no free inodes remain in the IAU) it would not timeout
either.
The issue can be best summarized as:
The more IAUs that remain delegated to CFS secondary nodes, the greater the
inaccuracy of the "df -i" count.

RESOLUTION:
Allow the delegations for IAU's whose inodes are all allocated (so no free
inodes in the IAU) to "timeout" after 3 minutes of inactivity.

* 3248051 (Tracking ID: 3121933)

SYMPTOM:
The pwrite() function fails with EOPNOTSUPP when the write range is in two
indirect extents.

DESCRIPTION:
When the range of pwrite() falls in two indirect extents (one ZFOD extent
belonging to DB2 pre-allocated files created with setext( , VX_GROWFILE, )
ioctl and another DATA extent belonging to adjacent INDIR) write fails with
EOPNOTSUPP. The reason is that VxFS is trying to coalesce extents which belong
to different indirect address extents as part of this transaction - such a meta-
data change consumes more transaction resources which VxFS transaction engine
is unable to support in the current implementation.

RESOLUTION:
Code is modified to retry the transaction without coalescing the extents, as
latter is an optimisation and should not fail write.

* 3248054 (Tracking ID: 3153919)

SYMPTOM:
The fsadm(1M) command may hang when the structural file set re-organization is
in progress. The following stack trace is observed:
vx_event_wait
vx_icache_process
vx_switch_ilocks_list
vx_cfs_icache_process
vx_switch_ilocks
vx_fs_reinit
vx_reorg_dostruct
vx_extmap_reorg
vx_struct_reorg
vx_aioctl_full
vx_aioctl_common
vx_aioctl
vx_ioctl
vx_compat_ioctl
compat_sys_ioctl

DESCRIPTION:
During the structural file set re-organization, due to some race condition, the
VX_CFS_IOWN_TRANSIT flag is set on the inode. At the final stage of the
structural file set re-organization, all the inodes are re-initialized. Since,
the VX_CFS_IOWN_TRANSIT flag is set improperly, the re-initialization fails to
proceed. This causes the hang.

RESOLUTION:
The code is modified such that VX_CFS_IOWN_TRANSIT flag is cleared.

* 3248089 (Tracking ID: 3003679)

SYMPTOM:
The file system hangs when doing fsppadm and removing a file with named stream
attributes (nattr) at the same time. The following two typical threads are
involved:

T1:
COMMAND: "fsppadm"
schedule at
vxg_svar_sleep_unlock
vxg_grant_sleep
vxg_cmn_lock
vxg_api_lock
vx_glm_lock
vx_ihlock
vx_cfs_iread
vx_iget
vx_traverse_tree
vx_dir_lookup
vx_rev_namelookup
vx_aioctl_common
vx_ioctl
vx_compat_ioctl
compat_sys_ioctl
T2:
COMMAND: "vx_worklist_thr"
schedule
vxg_svar_sleep_unlock
vxg_grant_sleep
vxg_cmn_lock
vxg_api_lock
vx_glm_lock
vx_genglm_lock
vx_dirlock
vx_do_remove
vx_purge_nattr
vx_nattr_dirremove
vx_inactive_tran
vx_cfs_inactive_list
vx_inactive_list
vx_workitem_process
vx_worklist_process
vx_worklist_thread
vx_kthread_init
kernel_thread

DESCRIPTION:
The file system hangs due to the deadlock between the threads. T1 initiated by
fsppadm calls vx_traverse_tree to obtain the path name for a given inode
number. T2 removes the inode as well as its affiliated nattr inodes.
The reverse name lookup (T1) holds the global dirlock in vx_dir_lookup during
the lookup process. It traverses the entire path from bottom to top to resolve
the inode number inversely in vx_traverse_tree. During the lookup, VxFS needs
to hold the hlock of each inode to read them, and drop it after reading.
The file removal (T2) is processed via vx_inactive_tran which will take
the "hlock" of the inode being removed. After that, it will remove all its
named attribute inodes invx_do_remove, where sometimes the global dirlock is
needed. Eventually, each thread waits for the lock, which is held by the other
thread and this result in the deadlock.

RESOLUTION:
The code is modified so that the dirlock is not acquired during reserve name
lookup.

* 3248090 (Tracking ID: 2963763)

SYMPTOM:
When thin_friendly_alloc and deliache_enable parameters are enabled, Veritas
File System (VxFS) may hit the deadlock. The thread involved in the deadlock
can have the following stack trace:

vx_rwsleep_lock()
vx_tflush_inode()
vx_fsq_flush()
vx_tranflush()
vx_traninit()
vx_remove_tran()
vx_pd_remove()
vx_remove1_pd()
vx_do_remove()
vx_remove1()
vx_remove_vp()
vx_remove()
vfs_unlink()
do_unlinkat

The threads waiting in vx_traninit() for transaction space, displays following
stack trace:

vx_delay2()
vx_traninit()
vx_idelxwri_done()
vx_idelxwri_flush()
vx_common_inactive_tran()
vx_inactive_tran()
vx_local_inactive_list()
vx_inactive_list+0x530()
vx_worklist_process()
vx_worklist_thread()

DESCRIPTION:
In the extent allocation code paths, VxFS is setting the IEXTALLOC flag on the
inode, without taking the ILOCK, with overlapping transactions picking up this
same inode off the delicache list makes the transaction done code paths to miss
the IUNLOCK call.

RESOLUTION:
The code is modified to change the corresponding code paths to set the
IEXTALLOC flag under proper protection.

* 3248094 (Tracking ID: 3192985)

SYMPTOM:
Checkpoints quota usage on CFS can be negative.
An example is as follows:
Filesystem hardlimit softlimit usage action_flag
/sofs1 51200 51200 18446744073709490176 << negative

DESCRIPTION:
In CFS, to manage the intent logs, and the other extra objects required for
CFS, a holding object referred to as a per-node-object-location table (PNOLT)
is created. In CFS, the quota usage is calculated by reading the per node cut
(current usage table) files (member of PNOLT) and summing up the quota usage
for each clone clain. However, when the quotaoff and quotaon operations are
fired on a CFS checkpoint, the usage shows "0" after these two operations are
executed. This happens because the quota usage calculation is skipped.
Subsequently, if a delete operation is performed, the usage becomes negative
since the blocks allocated for the deleted file are subtracted from zero.

RESOLUTION:
The code is modified such that when the quotaon operation is performed, the
quota usage calculation is not skipped.

* 3248096 (Tracking ID: 3214816)

SYMPTOM:
When you create and delete the inodes of a user frequently with the DELICACHE
feature enabled, the user quota file becomes corrupt.

DESCRIPTION:
The inode DELICACHE feature causes this issue. This feature optimizes the
updates on the inode map during the file creation and deletion operations. It
is enabled by default. You can disable this feature with the vxtunefs(1M)
command.

When DELICACHE is enabled and the quota is set for Veritas File System (VxFS),
VxFS updates the quota for the inodes before the inodes are on the DELICACHE
list and after they are on the inactive list during the removal process. As a
result, VxFS decrements the current number of user files twice. This causes the
quota file corruption.

RESOLUTION:
The code is modified to identify the inodes moved to the inactive list from the
DELICACHE list. This flag prevents the quota being decremented again during the
removal process.

* 3248099 (Tracking ID: 3189562)

SYMPTOM:
Oracle daemons get hang with the vx_growfile() kernel function. You may
see similar stack trace as follows:
vx_growfile+0004D4 ()
vx_doreserve+000118 ()
vx_tran_extset+0005DC ()
vx_extset_msg+0006E8 ()
vx_cfs_extset+000040 ()
vx_extset+0002D4 ()
vx_setext+000190 ()
vx_uioctl+0004AC ()
vx_ioctl+0000D0 ()
vx_ioctl_skey+00004C ()
vnop_ioctl+000050 (??, ??, ??, ??, ??, ??)
kernel_add_gate_cstack+000030 ()
vx_vop_ioctl+00001C ()
vx_odm_resize@AF15_6+00015C ()
vx_odm_resize+000030 ()
odm_vx_resize+000040 ()
odm_resize+0000E8 ()
vxodmioctl+00018C ()
hkey_legacy_gate+00004C ()
vnop_ioctl+000050 (??, ??, ??, ??, ??, ??)
vno_ioctl+000178 (??, ??, ??, ??, ??)

DESCRIPTION:
The vx_growfile() kernel function may run into a loop on a highly fragmented
file system, which causes multiple processes to hang. The vx_growfile() routine
is invoked through the setext(1) command or its Application Programming
Interface (API). When the vx_growfile() function requires more extents than the
typed extent buffer can spare, an VX_EBMAPLOCK error may occur. To handle the
error, VxFS cancels the transaction and repeats the same operation again, which
creates the loop.

RESOLUTION:
The code is modified to make VxFS commit the available extents to
proceed the growfile transaction, and repeat enough times until the transaction
is completed.

* 3284764 (Tracking ID: 3042485)

SYMPTOM:
During internal Stress testing, the f:vx_purge_nattr:1 assert fails.

DESCRIPTION:
In case of corruption, the file-system check utility is run, and the inodes to be checked or fixed are picked up serially.
However, in some cases the order in which these are processed changes, which cause inconsistent meta-data resulting in assert failure.

RESOLUTION:
The code is modified to handle named attribute inodes in an earlier pass during full fsck operation.

* 3296988 (Tracking ID: 2977035)

SYMPTOM:
While running an internal noise test in a Cluster File System (CFS) environment, a debug assert issue was observed in vx_dircompact()function.

DESCRIPTION:
Compacting directory blocks are avoided if the inode has AextopA (extended operation) flags, such as deferred inode removal and pass through truncation set.. The issue is caused when the inode has extended pass through truncation and is considered for compacting.

RESOLUTION:
The code is modified to avoid compacting the directory blocks of the inode if it has [0]an extended operation of pass through truncation set.[0]

* 3299685 (Tracking ID: 2999493)

SYMPTOM:
During internal testing, the file system check validation fails after a successful full fsck operation and displays the following error message:
run_fsck :
First full fsck pass failed, exiting

DESCRIPTION:
Even after a successful full fsck completion, the fsck validation fails due to incorrect entries in a structural file (IFRCT) which maintains reference of count of shared extents. While processing information for indirect extents, the modified data does not get flushed to the disk because the buffer is not marked dirty after its contents are modified.

RESOLUTION:
The code is modified to mark the buffer dirty when its contents are modified.

* 3306410 (Tracking ID: 2495673)

SYMPTOM:
During communication between the nodes in a cluster, the incore inode gets marked AbadA and an internal test assertion fails.

DESCRIPTION:
In a Cluster File System (CFS) environment, when two nodes communicate for grant on inode, some data is also piggybacked to the initiating node. If there is any discrepancy on the data that is piggybacked between these two nodes within the cluster, the incore inode gets marked AbadA. During communication, the file system gets disabled causing stale concurrent I/O data transfer to the initiating node resulting in a mismatch.

RESOLUTION:
The code is modified such that if the file system gets disabled, it invalidates its concurrent I/O count state from other nodes and does not delegate false information when asked for concurrent I/O count from other nodes.

* 3310758 (Tracking ID: 3310755)

SYMPTOM:
When the system processes an indirect extent, if it finds the first record as Zero Fill-On-Demand (ZFOD) extent (or first n records are ZFOD records), then it hits the assert.

DESCRIPTION:
In case of indirect extents reference count mechanism (shared block count)
regarding files having the shared ZFOD extents are not behaving correctly.

RESOLUTION:
The code for the reference count queue (RCQ) handling for the shared indirect ZFOD extents is modified, and the fsck(1M) issues with snapshot of file[0] when there are ZFOD extents has been fixed.

* 3321730 (Tracking ID: 3214328)

SYMPTOM:
A mismatch is observed between the states for the Global Lock Manager (GLM) grant level and the Global Lock Manager (GLM) data
in a Cluster File System (CFS) inode.

DESCRIPTION:
When a file system is disabled during some error situation, and if any thread starts its execution before disabling the file system, then the execution is completed in spite of file system being disabled in between. The Global Lock Manager (GLM) state of an inode changes without updating other flags like inode->i_cflags, which causes a mismatch between the states.

RESOLUTION:
The code is modified to skip updating the Global Lock Manager (GLM) state when specific flag is set in inode->i_cflags and also when the file system is disabled.

* 3323912 (Tracking ID: 3259634)

SYMPTOM:
In CFS, each node with mounted file system cluster has its own intent
log in the file system. A CFS with more than 4, 294, 967, 296 file system blocks
can zero out an incorrect location resulting from an incorrect typecasting. For
example, that kind of CFS can incorrectly zero out 65536 file system blocks at
the block offset of 1, 537, 474, 560 (file system blocks) with a 8-Kb file system
block size and an intent log with the size of 65536 file system blocks. This
issue can only occur if an intent log is located above an offset of
4, 294, 967, 296 file system blocks. This situation can occur when you add a new
node to the cluster and mount an additional CFS secondary for the first time,
which needs to create and zero a new intent log. This situation can also happen
if you resize a file system or intent log and clear an intent log.

The problem occurs only with the following file system size and the FS block
size combinations:

1kb block size and FS size > 4TB
2kb block size and FS size > 8TB
4kb block size and FS size > 16TB
8kb block size and FS size > 32TB

For example, the message log can contain the following messages:

The full fsck flag is set on a file system with the following type of messages:

2013 Apr 17 14:52:22 sfsys kernel: vxfs: msgcnt 5 mesg 096: V-2-96:
vx_setfsflags - /dev/vx/dsk/sfsdg/vol1 file system fullfsck flag set - vx_ierror

2013 Apr 17 14:52:22 sfsys kernel: vxfs: msgcnt 6 mesg 017: V-2-17:
vx_attr_iget - /dev/vx/dsk/sfsdg/vol1 file system inode 13675215 marked bad
incore

2013 Jul 17 07:41:22 sfsys kernel: vxfs: msgcnt 47 mesg 096: V-2-96:
vx_setfsflags - /dev/vx/dsk/sfsdg/vol1 file system fullfsck flag set -
vx_ierror

2013 Jul 17 07:41:22 sfsys kernel: vxfs: msgcnt 48 mesg 017: V-2-17:
vx_dirbread - /dev/vx/dsk/sfsdg/vol1 file system inode 55010476 marked bad
incore

DESCRIPTION:
In CFS, each node with mounted file system cluster has its own
intent log in the file system. When an additional node mounts the file system as
a CFS Secondary, the CFS creates an intent log. Note that intent logs are never
removed, they are reused.

When you clear an intent log, Veritas File System (VxFS) passes an incorrect
block number to the log clearing routine, which zeros out an incorrect location.
The incorrect location might point to the file data or file system metadata. Or,
the incorrect location might be part of the file system's available free space.
This is silent corruption. If the file system metadata corrupts, VxFS can detect
the corruption when it subsequently accesses the corrupt metadata and marks the
file system for full fsck.

RESOLUTION:
The code is modified so that VxFS can pass the correct block number
to the log clearing routine.

* 3338024 (Tracking ID: 3297840)

SYMPTOM:
A metadata corruption is found during the file removal process with the inode block count getting negative.

DESCRIPTION:
When the user removes or truncates a file having the shared indirect blocks, there can be an instance where the block count can be updated to reflect the removal of the shared indirect blocks when the blocks are not removed from the file. The next iteration of the loop updates the block count again while removing these blocks. This will eventually lead to the block count being a negative value after all the blocks are removed from the file. The removal code expects the block count to be zero before updating the rest of the metadata.

RESOLUTION:
The code is modified to update the block count and other tracking metadata in the same transaction as the blocks are removed from the file.

* 3338026 (Tracking ID: 3331419)

SYMPTOM:
Machine panics with the following stack trace.

#0 [ffff883ff8fdc110] machine_kexec at ffffffff81035c0b
#1 [ffff883ff8fdc170] crash_kexec at ffffffff810c0dd2
#2 [ffff883ff8fdc240] oops_end at ffffffff81511680
#3 [ffff883ff8fdc270] no_context at ffffffff81046bfb
#4 [ffff883ff8fdc2c0] __bad_area_nosemaphore at ffffffff81046e85
#5 [ffff883ff8fdc310] bad_area at ffffffff81046fae
#6 [ffff883ff8fdc340] __do_page_fault at ffffffff81047760
#7 [ffff883ff8fdc460] do_page_fault at ffffffff815135ce
#8 [ffff883ff8fdc490] page_fault at ffffffff81510985
[exception RIP: print_context_stack+173]
RIP: ffffffff8100f4dd RSP: ffff883ff8fdc548 RFLAGS: 00010006
RAX: 00000010ffffffff RBX: ffff883ff8fdc6d0 RCX: 0000000000002755
RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000046
RBP: ffff883ff8fdc5a8 R8: 000000000002072c R9: 00000000fffffffb
R10: 0000000000000001 R11: 000000000000000c R12: ffff883ff8fdc648
R13: ffff883ff8fdc000 R14: ffffffff81600460 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#9 [ffff883ff8fdc540] print_context_stack at ffffffff8100f4d1
#10 [ffff883ff8fdc5b0] dump_trace at ffffffff8100e4a0
#11 [ffff883ff8fdc650] show_trace_log_lvl at ffffffff8100f245
#12 [ffff883ff8fdc680] show_trace at ffffffff8100f275
#13 [ffff883ff8fdc690] dump_stack at ffffffff8150d3ca
#14 [ffff883ff8fdc6d0] warn_slowpath_common at ffffffff8106e2e7
#15 [ffff883ff8fdc710] warn_slowpath_null at ffffffff8106e33a
#16 [ffff883ff8fdc720] hrtick_start_fair at ffffffff810575eb
#17 [ffff883ff8fdc750] pick_next_task_fair at ffffffff81064a00
#18 [ffff883ff8fdc7a0] schedule at ffffffff8150d908
#19 [ffff883ff8fdc860] __cond_resched at ffffffff81064d6a
#20 [ffff883ff8fdc880] _cond_resched at ffffffff8150e550
#21 [ffff883ff8fdc890] vx_nalloc_getpage_lnx at ffffffffa041afd5 [vxfs]
#22 [ffff883ff8fdca80] vx_nalloc_getpage at ffffffffa03467a3 [vxfs]
#23 [ffff883ff8fdcbf0] vx_do_getpage at ffffffffa034816b [vxfs]
#24 [ffff883ff8fdcdd0] vx_do_read_ahead at ffffffffa03f705e [vxfs]
#25 [ffff883ff8fdceb0] vx_read_ahead at ffffffffa038ed8a [vxfs]
#26 [ffff883ff8fdcfc0] vx_do_getpage at ffffffffa0347732 [vxfs]
#27 [ffff883ff8fdd1a0] vx_getpage1 at ffffffffa034865d [vxfs]
#28 [ffff883ff8fdd2f0] vx_fault at ffffffffa03d4788 [vxfs]
#29 [ffff883ff8fdd400] __do_fault at ffffffff81143194
#30 [ffff883ff8fdd490] handle_pte_fault at ffffffff81143767
#31 [ffff883ff8fdd570] handle_mm_fault at ffffffff811443fa
#32 [ffff883ff8fdd5e0] __get_user_pages at ffffffff811445fa
#33 [ffff883ff8fdd670] get_user_pages at ffffffff81144999
#34 [ffff883ff8fdd690] vx_dio_physio at ffffffffa041d812 [vxfs]
#35 [ffff883ff8fdd800] vx_dio_rdwri at ffffffffa02ed08e [vxfs]
#36 [ffff883ff8fdda20] vx_write_direct at ffffffffa044f490 [vxfs]
#37 [ffff883ff8fddaf0] vx_write1 at ffffffffa04524bf [vxfs]
#38 [ffff883ff8fddc30] vx_write_common_slow at ffffffffa0453e4b [vxfs]
#39 [ffff883ff8fddd30] vx_write_common at ffffffffa0454ea8 [vxfs]
#40 [ffff883ff8fdde00] vx_write at ffffffffa03dc3ac [vxfs]
#41 [ffff883ff8fddef0] vfs_write at ffffffff81181078
#42 [ffff883ff8fddf30] sys_pwrite64 at ffffffff81181a32
#43 [ffff883ff8fddf80] system_call_fastpath at ffffffff8100b072

DESCRIPTION:
The panic is due to kernel referring to corrupted thread_info structure from the
scheduler, thread_info got corrupted by stack overflow. While doing direct I/O
write, user-space pages need to be pre-faulted using __get_user_pages() code
path. This code path is very deep can end up consuming lot of stack space.

RESOLUTION:
Reduced the kernel stack consumption by ~400-500 bytes in this code path by
making various changes in the way pre-faulting is done.

* 3338030 (Tracking ID: 3335272)

SYMPTOM:
The mkfs (make file system) command dumps core when the log size
provided is not aligned. The following stack trace is displayed:

(gdb) bt
#0 find_space ()
#1 place_extents ()
#2 fill_fset ()
#3 main ()
(gdb)

DESCRIPTION:
While creating the VxFS file system using the mkfs command, if the
log size provided is not aligned properly, you may end up in doing
miscalculations for placing the RCQ extents and finding no place. This leads to
illegal memory access of AU bitmap and results in core dump.

RESOLUTION:
The code is modified to place the RCQ extents in the same AU where
log extents are allocated.

* 3338063 (Tracking ID: 3332902)

SYMPTOM:
The system running the fsclustadm(1M) command panics while shutting
down. The following stack trace is logged along with the panic:
machine_kexec
crash_kexec
oops_end
page_fault [exception RIP: vx_glm_unlock]
vx_cfs_frlpause_leave [vxfs]
vx_cfsaioctl [vxfs]
vxportalkioctl [vxportal]
vfs_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath

DESCRIPTION:
There exists a race-condition between "fsclustadm(1M) cfsdeinit"
and "fsclustadm(1M) frlpause_disable". The "fsclustadm(1M) cfsdeinit" fails
after cleaning the Group Lock Manager (GLM), without downgrading the CFS state.
Under the false CFS state, the "fsclustadm(1M) frlpause_disable" command enters
and accesses the GLM lock, which "fsclustadm(1M) cfsdeinit" frees resulting in a
panic.

There exists another race between the code in vx_cfs_deinit() and the code in
fsck, and it will lead to the situation that although fsck has a reservation
held, but this couldn't prevent vx_cfs_deinit() from freeing vx_cvmres_list
because there is no such a check for vx_cfs_keepcount.

RESOLUTION:
The code is modified to add appropriate checks in the
"fsclustadm(1M) cfsdeinit" and "fsclustadm(1M) frlpause_disable" to avoid the
race-condition.

* 3338762 (Tracking ID: 3096834)

SYMPTOM:
Intermittent vx_disable messages are displayed in system log.

DESCRIPTION:
VxFS displays intermittent vx_disable messages. The file system is
not corrupt and the fsck(1M) command does not indicate any problem with the file
system. However, the file system gets disabled.

RESOLUTION:
The code is modified to make the vx_disable message verbose with
stack trace information to facilitate further debugging.

* 3338776 (Tracking ID: 3224101)

SYMPTOM:
On a file system that is mounted by a cluster, the system panics after you
enable the lazy optimization for updating the i_size across the cluster nodes.
The stack trace may look as follows:
vxg_free()
vxg_cache_free4()
vxg_cache_free()
vxg_free_rreq()
vxg_range_unlock_body()
vxg_api_range_unlock()
vx_get_inodedata()
vx_getattr()
vx_linux_getattr()
vxg_range_unlock_body()
vxg_api_range_unlock()
vx_get_inodedata()
vx_getattr()
vx_linux_getattr()

DESCRIPTION:
On a file system that is mounted by a cluster with the -o cluster option, read
operations or write operations take a range lock to synchronize updates across
the different nodes. The lazy optimization incorrectly enables a node to release
a range lock which is not acquired and panic the node.

RESOLUTION:
The code has been modified to release only those range locks which are acquired.

* 3338779 (Tracking ID: 3252983)

SYMPTOM:
On a high-end system greater than or equal to 48 CPUs, some file-system
operations may hang with the following stack trace:
vx_ilock()
vx_tflush_inode()
vx_fsq_flush()
vx_tranflush()
vx_traninit()
vx_tran_iupdat()
vx_idelxwri_done()
vx_idelxwri_flush()
vx_delxwri_flush()
vx_workitem_process()
vx_worklist_process()
vx_worklist_thread()

DESCRIPTION:
The function to get an inode returns an incorrect error value if there are no
free inodes available in incore, this error value allocates an inode on-disk
instead of allocating it to the incore. As a result, the same function is called
again resulting in a continuous loop.

RESOLUTION:
The code is modified to return the correct error code.

* 3338780 (Tracking ID: 3253210)

SYMPTOM:
When the file system reaches the space limitation, it hangs with the following stack trace:
vx_svar_sleep_unlock()
default_wake_function()
wake_up()
vx_event_wait()
vx_extentalloc_handoff()
vx_te_bmap_alloc()
vx_bmap_alloc_typed()
vx_bmap_alloc()
vx_bmap()
vx_exh_allocblk()
vx_exh_splitbucket()
vx_exh_split()
vx_dopreamble()
vx_rename_tran()
vx_pd_rename()

DESCRIPTION:
When large directory hash is enabled through the vx_dexh_sz(5M) tunable, Veritas File System (VxFS) uses the large directory hash for directories.
When you rename a file, a new directory entry is inserted to the hash table, which results in hash split. The hash split fails the current transaction and retries after some housekeeping jobs complete. These jobs include allocating more space for the hash table. However, VxFS doesn't check the return value of the preamble job. And thus, when VxFS runs out of space, the rename transaction is re-entered permanently without knowing if more space is allocated by preamble jobs.

RESOLUTION:
The code is modified to enable VxFS to exit looping when ENOSPC is returned from the preamble job.

* 3338787 (Tracking ID: 3261462)

SYMPTOM:
File system with size greater than 16TB corrupts with vx_mapbad messages in the system log.

DESCRIPTION:
The corruption results from the combination of the following two conditions:
a. Two or more threads race against each other to allocate around the same offset range. As a result, VxFS returns the buffer locked only in shared mode for all the threads which fail in allocating the extent.
b. Since the allocated extent is from a region beyond 16TB, threads need to convert the buffer to a different type so that to accommodate the new extentAs start value.

The buffer overrun happens because VxFS erroneously tries to unconditionally convert the buffer to the new type even though the buffer might not be able to accommodate the converted data.

RESOLUTION:
When the race condition is detected, VxFS returns proper retry errors to the caller, so that the whole operation is retried from the beginning. Also, the code is modified to ensure that VxFS doesnAt try to convert the buffer to the new type when it cannot accommodate the new data. In case this check fails, VxFS performs the proper split logic, so that buffer overrun doesnAt happen when the operation is retried.

* 3338790 (Tracking ID: 3233284)

SYMPTOM:
FSCK binary hangs while checking Reference Count Table (RCT) with the following stack trace:
bmap_search_typed_raw()
bmap_check_typed_raw()
rct_check()
process_device()
main()

DESCRIPTION:
The FSCK binary hangs due to the looping in the bmap_search_typed_raw() function. This function searches for extent entry in the indirect buffer for a given offset. In this case, the given offset is less than the start offset of the first extent entry. This unhandled corner case causes the infinite loop.

RESOLUTION:
The code is modified to handle the following cases:
1. Searching in empty indirect block.
2. Searching for an offset, which is less than the start offset of the first entry in the indirect block.

* 3339230 (Tracking ID: 3308673)

SYMPTOM:
With the delayed allocations feature enabled for local mounted file
system having highly fragmented available free space, the file system is
disabled with the following message seen in the system log
WARNING: msgcnt 1 mesg 031: V-2-31: vx_disable - /dev/vx/dsk/testdg/testvol file
system disabled

DESCRIPTION:
VxFS transaction provides multiple extent allocations to fulfill one allocation
request for a file system that has a high free space fragmentation. Thus, the
allocation transaction becomes large and fails to commit. After retrying the
transaction for a defined number of times, the file system is disabled with the
with the above mentioned error

RESOLUTION:
The code is modified to commit the part of the transaction which is commit table
and retrying the remaining part

* 3339884 (Tracking ID: 1949445)

SYMPTOM:
System is unresponsive when files are created on large directory. The following stack is logged:

vxg_grant_sleep()
vxg_cmn_lock()
vxg_api_lock()
vx_glm_lock()
vx_get_ownership()
vx_exh_coverblk()
vx_exh_split()
vx_dexh_setup()
vx_dexh_create()
vx_dexh_init()
vx_do_create()

DESCRIPTION:
For large directories, large directory hash (LDH) is enabled to improve the lookup feature. When a system takes ownership of LDH inode twice in same thread context (while building hash for directory), it becomes unresponsive

RESOLUTION:
The code is modified to avoid taking ownership again if we already have the ownership of the LDH inode.

* 3340029 (Tracking ID: 3298041)

SYMPTOM:
While performing "delayed extent allocations" by writing to a file
sequentially and extending the file's size, or performing a mixture of
sequential write I/O and random write I/O which extend a file's size, the write
I/O performance to the file can suddenly degrade significantly.

DESCRIPTION:
The 'dalloc' feature allows VxFS to allocate extents (file system
blocks) to a file in a delayed fashion when extending a file size. Asynchronous
writes that extend a file's size will create and dirty memory pages, new
extents can therefore be allocated when the dirty pages are flushed to disk
(via background processing) rather than allocating the extents in the same
context as the write I/O. However, in some cases, with the delayed allocation
on, the flushing of dirty pages may occur synchronously in the foreground in
the same context as the write I/O, when triggered the foreground flushing can
significantly slow the write I/O performance.

RESOLUTION:
The code is modified to avoid the foreground flushing of data in
the same write context.

* 3351946 (Tracking ID: 3194635)

SYMPTOM:
Internal stress test on locally mounted filesystem exitsed with an error message.

DESCRIPTION:
With a file having Zero-Filled on Demand (ZFOD) extents, a write operation in ZFOD extent area may lead to the coalescing of extent of type SHARED or COMPRESSED, or both with new extent of type DATA. The new DATA extent may be coalesced with the adjacent extent, if possible. If this happens without unsharing for shared extent or uncompressing for compressed extent case, data or metadata corruption may occur.

RESOLUTION:
The code is modified such that adjacent shared, compressed or pseudo-compressed extent is not coalesced.

* 3351947 (Tracking ID: 3164418)

SYMPTOM:
Internal stress test on locally mounted VxFS filesytem results in data corruption in no space on device scenario while doing spilt on Zero Fill-On-Demand(ZFOD) extent.

DESCRIPTION:
When the split operation on Zero Fill-On-Demand(ZFOD) extent fails because of the ENOSPC(no space on device) error, then it erroneously processes the original ZFOD extent and returns no error. This may result in data corruption.

RESOLUTION:
The code is modified to return the ZFOD extent to its original state, if the ZFOD split operation fails due to ENOSPC error.

* 3359278 (Tracking ID: 3364290)

SYMPTOM:
The kernel may panic in Veritas File System (VxFS) when it is internally working
on reference count queue(RCQ) record.

DESCRIPTION:
The work item spawned by VxFS in kernel to process RCQ records during RCQ full
situation is getting passed file system pointer as argument. Since no active
level is held, this file system pointer is not guaranteed to be valid by the
time the workitem starts processing. This may result in the panic.

RESOLUTION:
The code is modified to pass externally visible file system structure, as this
structure is guaranteed to be valid since the creator of the work item takes a
reference held on the structure which is released after the workitem exits.

* 3364285 (Tracking ID: 3364282)

SYMPTOM:
The fsck(1M) command fails to correct inode list file.

DESCRIPTION:
The fsck(1M) command fails to correct the inode list file to write metadata for the inode list file after writing to disk an extent for the inode list file, if the write operation is successful.

RESOLUTION:
The fsck(1M) command is modified to write metadata for the inode list file after succewrite operations of an extent for the inode list file.

* 3364289 (Tracking ID: 3364287)

SYMPTOM:
Debug assert may be hit in the vx_real_unshare() function in the
cluster environment.

DESCRIPTION:
The vx_extend_unshare() function wrongly looks at the offset immediately after
the current unshare length boundary. Instead, it should look at the offset that
falls on the last byte of current unshare length. This may result in hitting
debug asserts in the vx_real_unshare() function.

RESOLUTION:
The code is modified for the shared compressed extent. When the
vx_extend_unshare() function tries to extend the unshared region, it doesnAt
look up at the first byte immediately after the region is unshared. Instead, it
does a looks up at the last byte unshared.

* 3364302 (Tracking ID: 3364301)

SYMPTOM:
Assert failure because of improper handling of inode lock while truncating a reorg inode.

DESCRIPTION:
While truncating the reorg extent, there may be a case where unlock on inode is called even when
lock on inode is not taken.While truncating reorg inode, locks held are released and before it acquires them
again, it checks if the inode is cluster inode. if true, it goes for taking delegation hold lock. If there
was error while taking delegation hold lock, it comes to error code path. Here it checks if there was any
transaction and if it had tran commitable error. It commits the transaction and on success calls the unlock
to release the locks which was not held.

RESOLUTION:
The code is modified to check whether lock is taken or not before unlocking.

* 3364307 (Tracking ID: 3364306)

SYMPTOM:
Stack overflow seen in extent allocation code path.

DESCRIPTION:
Stack overflow appears in the vx_extprevfind() code path.

RESOLUTION:
The code is modified to hand-off the extent allocation to a worker thread when stack consumption reaches 4k.

* 3364317 (Tracking ID: 3364312)

SYMPTOM:
The fsadm(1M) command is unresponsive while processing the VX_FSADM_REORGLK_MSG message. The following stack trace may be seen while processing VX_FSADM_REORGLK_MSG:

vx_tranundo()
vx_do_rct_gc()
vx_rct_setup_gc()
vx_reorg_complete_gc()
vx_reorg_complete()
vx_reorg_clear_rct()
vx_reorg_clear()
vx_reorg_clear()
vx_recv_fsadm_reorglk()
vx_recv_fsadm()
vx_msg_recvreq()
vx_msg_process_thread()
vx_thread_base()

DESCRIPTION:
In the vx_do_rct_gc() function, flag for in-directory cleanup is set for a shared indirect extent (SHR_IADDR_EXT). If the truncation fails, the vx_do_rct_gc()function does not clear the in-directory cleanup flag. As a result, the caller ends up calling the vx_do_rct_gc()function repeatedly leading to a never ending loop.

RESOLUTION:
The code is modified to reset the value of in-directory cleanup flag in case of truncation error inside the vx_do_rct_gc() function.

* 3364333 (Tracking ID: 3312897)

SYMPTOM:
In Cluster File System (CFS), system can hang while trying to perform any administrative operation when the primary node is disabled.

DESCRIPTION:
In CFS, when node 1 tries to do some administrative operation which freezes and thaws the file system (e.g. turning on/off fcl), a deadlock can occur between the thaw and recovery (which started due to CFS primary being disabled) threads. The thread on node 1 trying to thaw is blocked while waiting for node 2 to reply to the loadfs message. The thread processing the loadfs message is waiting to complete the recovery operation. The recovery thread on node 2 is waiting for lock on an extent map (emap) buffer. This lock is held on node 1, as part of a transaction that was committed during the freeze, which results into a deadlock.

RESOLUTION:
The code is modified such as to flush any transactions that were committed during a freeze before starting the thawing process.

* 3364335 (Tracking ID: 3331109)

SYMPTOM:
The full fsck does not repair corrupted reference count queue (RCQ) record.

DESCRIPTION:
When the RCQ record is corrupted due to an I/O error or log error, there is no code in full fsck which handles this corruption.
As a result, some further operations related to RCQ might fail.

RESOLUTION:
The code is modified To repair the corrupt RCQ entry during a full fsck.

* 3364338 (Tracking ID: 3331045)

SYMPTOM:
Kernel Oops in unlock code of map while referring freed mlink due to a race with iodone routine for delayed writes.

DESCRIPTION:
After issuing ASYNC I/O of map buffer, there is a possible race Between the vx_unlockmap() function and the vx_mapiodone() function. Due to a race, the vx_unlockmap() function refers a mlink after it gets freed.

RESOLUTION:
The code is modified to handle such race condition.

* 3364349 (Tracking ID: 3359200)

SYMPTOM:
Internal test on Veritas File System (VxFS) fsdedup(1M) feature in cluster filesystem environment results in
a hang.

DESCRIPTION:
The thread which processes the fsdedup(1M) request is taking the delegation lock on extent map which itself is waiting to acquire a lock on cluster-wide reference count queue(RCQ) buffer. While other internal VxFS thread is working on RCQ takes lock on cluster-wide RCQ buffer and is waiting to acquire delegation lock on extent map causinga deadlock.

RESOLUTION:
The code is modified to correct the lock hierarchy such that the delegation lock on extent map is taken before taking lock on cluster-wide RCQ buffer.

* 3370650 (Tracking ID: 2735912)

SYMPTOM:
The performance of tier relocation for moving a large number of files is poor
when the `fsppadm enforce' command is used. When looking at the fsppadm(1M)
command in the kernel, the following stack trace is observed:

vx_cfs_inofindau
vx_findino
vx_ialloc
vx_reorg_ialloc
vx_reorg_isetup
vx_extmap_reorg
vx_reorg
vx_allocpolicy_enforce
vx_aioctl_allocpolicy
vx_aioctl_common
vx_ioctl
vx_compat_ioctl

DESCRIPTION:
When the relocation is for each file located in Tier 1 to be relocated to Tier
2, Veritas File System (VxFS) allocates a new reorg inode and all its extents
in Tier 2. VxFS then swaps the content of these two files and deletes the
original file. This new inode allocation which involves a lot of processing can
result in poor performance when a large number of files are moved.

RESOLUTION:
The code is modified to develop a reorg inode pool or cache instead of
allocating it each time.

* 3372909 (Tracking ID: 3274592)

SYMPTOM:
Internal noise test on Cluster File System (CFS)is unresponsive while executing the fsadm(1M) command

DESCRIPTION:
In CFS, the fsadm(1M) command hangs in the kernel, while processing the fsadm-reorganisation message on a secondary node. The hang results due to a race with the thread processing fsadm-query message for mounting primary-fileset on secondary node where the thread processing fsadm-query message wins the race.

RESOLUTION:
The code is modified to synchronize the processing of fsadm-query message and fsadm-reorganization message on the primary node. This synchronization ensures that they are processed in the order in which they were received.

* 3380905 (Tracking ID: 3291635)

SYMPTOM:
Internal testing found the Avx_freeze_block_threads_all:7cA debug assert on locally mounted file systems while processing preambles for transactions

DESCRIPTION:
While processing preambles for transactions, if reference count queue (RCQ) is full, VxFS may hamper the processing of RCQ to free some records. This may result in hitting the debug assert.

RESOLUTION:
The code is modified to ignore the Reference count queue (RCQ) full errors when VxFS processes preambles for transactions.

* 3396539 (Tracking ID: 3331093)

SYMPTOM:
MountAgent got stuck while doing repeated switchover due to current
VxFS-AMF notification/unregistration design with the following stacktrace:

sleep_spinunlock+0x61 ()
vx_delay2+0x1f0 ()
vx_unreg_callback_funcs_impl+0xd0 ()
disable_vxfs_api+0x190 ()
text+0x280 ()
amf_event_release+0x230 ()
amf_fs_event_lookup_notify_multi+0x2f0 ()
amf_vxfs_mount_opt_change_callback+0x190 ()
vx_aioctl_unsetmntlock+0x390 ()
cold_vx_aioctl_common+0x7c0 ()
vx_aioctl+0x300 ()
vx_admin_ioctl+0x610 ()
vxportal_ioctl+0x690 ()
spec_ioctl+0xf0 ()
vno_ioctl+0x350 ()
ioctl+0x410 ()
syscall+0x5b0 ()

DESCRIPTION:
This issue is related to VxFS-AMF interface. VxFS provides
notifications to AMF for certain events like FS being disabled or mount options
change. While VxFS has called into AMF, AMF event handling mechanism can trigger
an unregistration of VxFS in the same context since VxFS's notification
triggered the last event notification registered with AMF.

Before VxFS calls into AMF, a variable vx_fsamf_busy is set to 1 and it is reset
when the callback returns. The unregistration loops if it finds that
vx_fsamf_busy is set to 1. Since unregistration was called from the same context
of the notification call back, the vx_fsamf_busy was never set to 0 and the loop
goes on endlessly causing the command that triggered the notification to hang.

RESOLUTION:
A delayed unregistration mechanism is employed. The fix addresses
the issue of getting unregistration from AMF in the context of callback from
VxFS to AMF. In such scenario, the unregistration is marked for a later time.
When all the notifications return and if a delayed unregistration is marked, the
unregistration routine is explicitly called.

* 3402484 (Tracking ID: 3394803)

SYMPTOM:
The vxupgrade(1M) command causes VxFS to panic with the following stack trace:
panic_save_regs_switchstack()
panic
bad_kern_reference()
$cold_pfault()
vm_hndlr()
bubbleup()
vx_fs_upgrade()
vx_upgrade()
$cold_vx_aioctl_common()
vx_aioctl()
vx_ioctl()
vno_ioctl()
ioctl()
syscall()

DESCRIPTION:
The panic is caused due to de_referencing the operator in the NULL device (one
of the devices in the DEVLIST is showing as a NULL device).

RESOLUTION:
The code is modified to skip the NULL devices when the device in EVLIST is
processed.

* 3405172 (Tracking ID: 3436699)

SYMPTOM:
Assert failure occurs because of race condition between clone mount thread and directory removal thread while pushing data on clone.

DESCRIPTION:
There is a race condition between clone mount thread and directory removal thread (while pushing modified directory data on clone). On AIX, vnodes are added into the VFS vnode list (link-list of vnodes). The first entry to this vnode link-list must be root's vnode, which was done during the mount process. While mounting a clone, mount thread is scheduled before adding root's vnode into this list. During this time, the thread 2 takes the VFS lock on the same VFS list and tries to enter the directory's vnode into this vnode list. As there was no root vnode present at the start, it is assumed that this directory vnode as a root vnode and while cross checking this with the VROOT flag, the assert fails.

RESOLUTION:
The code is modified to handle the race condition by attaching root vnode into VFS vnode list before setting VFS pointer into file set.

* 3426534 (Tracking ID: 3426511)

SYMPTOM:
Unloading VxFS modules may fail on Solaris 11 even after successful uninstallation of the VxFS package.

DESCRIPTION:
The failure is caused by automatic loading of VxFS modules after the modunload operation during the uninstallation of the VxFS package.

RESOLUTION:
The code is modified so that the automatic loading of VxFS module cannot succeed after the modunload operation during the uninstallation of the VxFS package.

* 3430687 (Tracking ID: 3444775)

SYMPTOM:
Internal noise testing on Cluster File System (CFS) results in a kernel panic in function vx_fsadm_query()with the following error message "Unable to handle kernel paging request".

DESCRIPTION:
The issue occurs due to simultaneous asynchronous access or modification by two threads to inode list extent array. As a result, memory freed by one thread is accessed by the other thread, resulting in the panic.

RESOLUTION:
The code is modified to add relevant locks to synchronize access or modification of inode list extent array.

Patch ID: VRTSvxfs-6.0.300.000

* 2928921 (Tracking ID: 2843635)

SYMPTOM:
The VxFS internal testing, there are some failures during the reorg operation of
structural files.

DESCRIPTION:
While the reorg is in progress, from certain ioctl, the error value that is to
be returned is overwritten and thus results in an incorrect error value and test
failures.

RESOLUTION:
Made changes accordingly so as the error value is not overwritten.

* 2933290 (Tracking ID: 2756779)

SYMPTOM:
Write and read performance concerns on Cluster File System (CFS) when running
applications that rely on POSIX file-record locking (fcntl).

DESCRIPTION:
The usage of fcntl on CFS leads to high messaging traffic across nodes thereby
reducing the performance of readers and writers.

RESOLUTION:
The code is modified to cache the ranges that are being file-record locked on
the node. This is tried whenever possible to avoid broadcasting of messages
across the nodes in the cluster.

* 2933291 (Tracking ID: 2806466)

SYMPTOM:
A reclaim operation on a file system that is mounted on an LVM volume using the
fsadm(1M) command with the -R option may panic the system. And the following
stack trace is displayed:
vx_dev_strategy+0xc0()
vx_dummy_fsvm_strategy+0x30()
vx_ts_reclaim+0x2c0()
vx_aioctl_common+0xfd0()
vx_aioctl+0x2d0()
vx_ioctl+0x180()

DESCRIPTION:
Thin reclamation supports only mounted file systems on a VxVM volume.

RESOLUTION:
The code is modified to return errors without panicking the system if the
underlying volume is LVM.

* 2933292 (Tracking ID: 2895743)

SYMPTOM:
It takes a longer than usual time for many Windows7 clients to log off in
parallel if the user profile is stored in Cluster File system (CFS).

DESCRIPTION:
Veritas File System (VxFS) keeps file creation time/full ACL things for samba
clients in the extended attribute which is implemented via named streams. VxFS
reads the named stream for each of the ACL objects. Reading of named stream is
a costly operation, as it results in an open, an opendir, a lookup, and another
open to get the fd. The VxFS function vx_nattr_open() holds the exclusive
rwlock to read an ACL object that stored as extended attribute. It may cause
heavy lock contention when many threads want the same lock. They might get
blocked until one of the nattr_open releases it. This takes time since
nattr_open is very slow.

RESOLUTION:
The code is modified so that it takes the rwlock in shared mode instead of
Exclusive mode.

* 2933294 (Tracking ID: 2750860)

SYMPTOM:
Performance of the write operation with small request size may degrade
on a large Veritas File System (VxFS) file system. Many threads may be found
sleeping with the following stack trace:

vx_sleep_lock
vx_lockmap
vx_getemap
vx_extfind
vx_searchau_downlevel
vx_searchau_downlevel
vx_searchau_downlevel
vx_searchau_downlevel
vx_searchau_uplevel
vx_searchau+0x600
vx_extentalloc_device
vx_extentalloc
vx_te_bmap_alloc
vx_bmap_alloc_typed
vx_bmap_alloc
vx_write_alloc3
vx_recv_prealloc
vx_recv_rpc
vx_msg_recvreq
vx_msg_process_thread
kthread_daemon_startup

DESCRIPTION:
A VxFS allocate unit (AU) is composed of 32768 disk blocks, and can
be expanded when it is partially allocated, or non-expanded when the AU is fully
occupied or completely unused. The extent map for a large file system with 1k
block size is organized as a big tree. For example, a 4-TB file system with 1KB
file system block size can have up to 128k Aus. To find an appropriate extent,
VxFS extent allocation algorithm will first search expanded AU to avoid causing
free space fragmentation by traversing the free extent map tree. If getting
failed, it will do the same with the non-expanded AUs. When there are too many
small extents(less than 32768 blocks) requests, and all the small free extents
are used up, but a large number of au-size extents (32768 blocks) are available;
the file system could run into this hang. Because of no small available extents
in the expanded AUs, VxFS will look for some larger non-expanded extents, namely
au-size extents, which are not what VxFS wanted (expanded AU is
expected). As a result, each request will walk along the big extent map tree for
every au-size extent, which will end up with failure finally. The requested
extent can be gotten during the second attempt for non-expanded AUs eventually,
but the unnecessary work consumes a lot of CPU resource.

RESOLUTION:
The code is modified to optimize the free-extend-search algorithm by
skipping certain au-size extents to reduce the overall search time.

* 2933296 (Tracking ID: 2923105)

SYMPTOM:
Removing the Veritas File System (VxFS) module using rmmod(8) on a system
having heavy buffer cache usage may hang.

DESCRIPTION:
When a large number of buffers are allocated from the buffer cache, at the time
of removing VxFS module, the process of freeing the buffers takes a long time.

RESOLUTION:
The code is modified to use an improved algorithm which prevents it from
traversing the free lists even if it has found the free chunk. Instead, it will
break out from the search and free that buffer.

* 2933309 (Tracking ID: 2858683)

SYMPTOM:
The reserve-extent attributes are changed after the vxrestore(1M ) operation,
for files that are greater than 8192 bytes.

DESCRIPTION:
A local variable is used to contain the number of the reserve bytes that are
reused during the vxrestore(1M) operation, for further VX_SETEXT ioctl call for
files that are greater than 8k. As a result, the attribute information is
changed.

RESOLUTION:
The code is modified to preserve the original variable value till the end of
the function.

* 2933313 (Tracking ID: 2841059)

SYMPTOM:
The file system gets marked for a full fsck operation and the following message
is displayed in the system log:

V-2-96: vx_setfsflags
<volume name> file system fullfsck flag set - vx_ierror

vx_setfsflags+0xee/0x120
vx_ierror+0x64/0x1d0 [vxfs]
vx_iremove+0x14d/0xce0
vx_attr_iremove+0x11f/0x3e0
vx_fset_pnlct_merge+0x482/0x930
vx_lct_merge_fs+0xd1/0x120
vx_lct_merge_fs+0x0/0x120
vx_walk_fslist+0x11e/0x1d0
vx_lct_merge+0x24/0x30
vx_workitem_process+0x18/0x30
vx_worklist_process+0x125/0x290
vx_worklist_thread+0x0/0xc0
vx_worklist_thread+0x6d/0xc0
vx_kthread_init+0x9b/0xb0

V-2-17: vx_iremove_2
<volume name>: file system inode 15 marked bad incore

DESCRIPTION:
Due to a race condition, the thread tries to remove an attribute inode that has
already been removed by another thread. Hence, the file system is marked for a
full fsck operation and the attribute inode is marked as 'bad ondisk'.

RESOLUTION:
The code is modified to check if the attribute node that a thread is trying to
remove has already been removed.

* 2933325 (Tracking ID: 2905820)

SYMPTOM:
If the file is being read via the NFSv4 client, then removing the same
file on the NFSv4 server may hang if the file system is VxFS.

The stack trace may look similar to the following:
rfs4_dbe_twait()
deleg_vnevent()
vhead_vnevent()
fop_vnevent()
vnevent_remove()
vx_pd_remove()
vx_remove1_pd()
vx_do_remove()
vx_remove1()
vx_remove_vp()
vx_remove()
fop_remove()
vn_removeat()
vn_remove()
unlink()
_syscall32_save()

DESCRIPTION:
The deleting thread holds the irwlock in the EXCL mode and waits
for the delegation from the client, while the client holds the delegation and
keeps waiting for the irwlock in the SH mode, hence the deadlock.

RESOLUTION:
The code is modified to inform the NFSv4 File Event Monitor (FEM)
about the deletion of the file before holding the irwlock in the EX mode to
avoid the deadlock.

* 2933326 (Tracking ID: 2827751)

SYMPTOM:
When Oracle Disk Manager (ODM) is used with non-VxVM devices, high
kernel memory allocation is observed with the following stack:
kmem_slab_alloc+0xac
kmem_cache_alloc+0x2dc
bp_mapin_common+0xdc
vdc_strategy+0x3c
vx_dio_physio+0x654
vx_dio_rdwri+0x4a0
fdd_write_end+0x504
fdd_rw+0x6ac
fdd_odm_rw+0x278
odm_vx_aio+0xb8
odm_vx_io+0x1c
odm_io_issue+0xf8
odm_io_start+0x1e4
odm_io_req+0xb3c
odm_request_io+0xdc

DESCRIPTION:
With non-VxVM device ODM should deallocate the kernel memory for self-owned
buffers by calling the bp_mapout() function.

RESOLUTION:
The code has been modified to fix high kernel memory allocation.

* 2933751 (Tracking ID: 2916691)

SYMPTOM:
fsdedup infinite loop with the following stack:

#5 [ffff88011a24b650] vx_dioread_compare at ffffffffa05416c4
#6 [ffff88011a24b720] vx_read_compare at ffffffffa05437a2
#7 [ffff88011a24b760] vx_dedup_extents at ffffffffa03e9e9b
#11 [ffff88011a24bb90] vx_do_dedup at ffffffffa03f5a41
#12 [ffff88011a24bc40] vx_aioctl_dedup at ffffffffa03b5163

DESCRIPTION:
vx_dedup_extents() do the following to dedup two files:

1. Compare the data extent of the two files that need to be deduped.
2. Split both files' bmap to make them share the first file's common data
extent.
3. Free the duplicate data extent of the second file.

In step 2, During bmap split, vx_bmap_split() might need to allocate space for
the inode's bmap to add new bmap entries, which will add emap to this
transaction. (This condition is more likely to hit if the dedup is being run on
two large files that have interleaved duplicate/difference data extents, the
files bmap will needed to be splited more in this case)

In step 3, vx_extfree1() doesn't support Multi AU extent free if there is
already an emap in the same transaction,
In this case, it will return VX_ETRUNCMAX. (Please see incident e569695 for
history of this limitation)

VX_ETRUNCMAX is a retirable error, so vx_dedup_extents() will undo everything in
the transaction and retry from the beginning, then hit the same error again.
Thus infinite loop.

RESOLUTION:
We make vx_te_bmap_split() always register an transaction preamble for the bmap
split operation in dedup, and let vx_dedup_extents() perform the preamble at a
separate transaction before it retry the dedup operation.

* 2933822 (Tracking ID: 2624262)

SYMPTOM:
Panic hit in vx_bc_do_brelse() function while executing dedup functionality with
following backtrace.
vx_bc_do_brelse()
vx_mixread_compare()
vx_dedup_extents()
enqueue_entity()
__alloc_pages_slowpath()
__get_free_pages()
vx_getpages()
vx_do_dedup()
vx_aioctl_dedup()
vx_aioctl_common()
vx_rwunlock()
vx_aioctl()
vx_ioctl()
vfs_ioctl()
do_vfs_ioctl()
sys_ioctl()

DESCRIPTION:
While executing function vx_mixread_compare() in dedup codepath, we hit error
due
to which an allocated data structure remained uninitialised.
The panic occurs due to writing to this uninitialised allocated data structure
in
the function vx_mixread_compare().

RESOLUTION:
Code is changed to free the memory allocated to the data structure when we are
going out due to error.

* 2937367 (Tracking ID: 2923867)

SYMPTOM:
Got assert hit due to VX_RCQ_PROCESS_MSG having lower
priority(Numerically) than VX_IUPDATE_MSG;

DESCRIPTION:
When primary is going to send VX_IUPDATE_MSG message to the owner
of the inode about updation of the inode's non-transactional field change then
it checks for the current messaging priority(for VX_RCQ_PROCESS_MSG) with the
priority of the message being sent(VX_IUPDATE_MSG) to avoid possible deadlock.
In our case we were getting VX_RCQ_PROCESS_MSG priority numerically lower than
VX_IUPDATE_MSG thus getting assert hit.

RESOLUTION:
We have changed the VX_RCQ_PROCESS_MSG priority numerically higher
than VX_IUPDATE_MSG thus avoiding possible assert hit.

* 2976664 (Tracking ID: 2906018)

SYMPTOM:
In the event of a system crash, the fsck-intent-log is not replayed and file
system is marked clean. Subsequently, mounting the file-system-extended
operations is not completed.

DESCRIPTION:
Only when a file system that contains PNOLTs is mounted locally (mounted
without using 'mount -o cluster') are potentially exposed to this issue.

The reason why fsck silently skips the intent-log replay is that each PNOLT has
a flag to identify whether the intent-log is dirty or not - in the event of a
system crash this flag signifies whether intent-log replay is required or not.
In the event of a system crash whilst the file system was mounted locally and
the PNOLTs are not utilized. The fsck intent-log replay will still check for
the flags in the PNOLTs, however, these are the wrong flags to check if the
file system was locally mounted. The fsck intent-log replay therefore assumes
that the intent-logs are clean (because the PNOLTs are not marked dirty) and it
therefore skips the replay of intent-log altogether.

RESOLUTION:
The code is modified such that when PNOLTs exist in the file system, VxFS will
set the dirty flag in the CFS primary PNOLT while mounting locally. With this
change, in the event of system crash whilst a file system is locally mounted,
the subsequent fsck intent-log replay will correctly utilize the PNOLT
structures and successfully replay the intent log.

* 2978227 (Tracking ID: 2857751)

SYMPTOM:
The internal testing hits the assert "f:vx_cbdnlc_enter:1a" when the upgrade was
in progress.

DESCRIPTION:
The clone/fileset should be mounted if there is an attempt to add an entry in
the dnlc. If the clone/fileset is not mounted and still there is an attempt to
add it to dnlc, then it is not valid.

RESOLUTION:
Fix is added to check if filset is mounted or not before adding an entry to dnlc.

* 2984589 (Tracking ID: 2977697)

SYMPTOM:
Deleting checkpoints of file systems with character special device
files viz. /dev/null using fsckptadm may panic the machine with the following
stack trace:
vx_idetach
vx_inode_deinit
vx_idrop
vx_inull_list
vx_workitem_process
vx_worklist_process
vx_worklist_thread
vx_kthread_init

DESCRIPTION:
During the checkpoint removal operation the type of the inodes is
converted to 'pass through inode'. During a conversion we try to refer to the
device reference for the special file, which is invalid in the clone context
leading to a panic.

RESOLUTION:
The code is modified to remove device reference of the special character files
during the clone removal operation thus preventing the panic.

* 2987373 (Tracking ID: 2881211)

SYMPTOM:
File ACLs not preserved in checkpoints properly if file has hardlink. Works fine
with file ACLs which don't have hardlinks.

DESCRIPTION:
This issue is with attribute inode. When we add an acl entry, if its in the
immediate area its propagated to the clone . But in the case if attribute inode
is created, its not being propagated to the checkpoint. We are missing push in
the context of attribute inode and so getting this issue.

RESOLUTION:
Modified the code to propagate the ACLs entries (attribute inode case) to the clone.

* 3007184 (Tracking ID: 3018869)

SYMPTOM:
fsadm command shows that the mountpoint is not a vxfs file system

RESOLUTION:
Made changes to fetch correct value for fstype using OS provided API's so that
the statvfs.f_basetype field gets valid i.e. "vxfs" value.

* 3021281 (Tracking ID: 3013950)

SYMPTOM:
During the VxFS internal testing, after installing the stack and rebooting the
machine, the "f:vx_info_init:2" assert is observed.

DESCRIPTION:
The _ncpu value in Solaris kernel was increased beyond 640. Solaris 10 Update
11 has 1024 as a limit and Soalris 11 Update 1 has 3072 as _ncpu value. In the
current VxFS code,
the maco VX_CMD_MAX_CPU has a value of 640. There is a validation check for
[VX_CMD_MAX_CPU >= VX_MAX_CPU], which fails after Solaris 11 update 1 and hence
the panic occurs.

RESOLUTION:
The code is modified to increase the VX_CMD_MAX_CPU limit to 4096.

Patch ID: VRTSvxfs-6.0.100.200

* 2912412 (Tracking ID: 2857629)

SYMPTOM:
When a new node takes over a primary for the file system, it could
process stale shared extent records in a per node queue. The primary will
detect a bad record and set the full fsck flag. It will also disable the file
system to prevent further corruption.

DESCRIPTION:
Every node in the cluster that adds or removes references to shared extents,
adds the shared extent records to a per node queue. The primary node in the
cluster processes the records in the per node queues and maintains reference
counts in a global shared extent device. In certain cases the primary node
might process bad or stale records in the per node queue. Two situations under
which bad or stale records could be processed are:
1. clone creation initiated from a secondary node immediately after primary
migration to different node.
2. queue wraparound on any node and take over of primary by new node
immediately afterwards.
Full fsck might not be able to rectify the file system corruption.

RESOLUTION:
Update the per node shared extent queue head and tail pointers to correct values
on primary before starting processing of shared extent records.

* 2912435 (Tracking ID: 2885592)

SYMPTOM:
vxdump of a file system which is compressed using vxcompress is aborted.

DESCRIPTION:
vxdump is aborted due to malloc() failure. malloc()fails due to a
memory leak in vxdump command code while handling compressed extents.

RESOLUTION:
Fixed the memory leak.

* 2923805 (Tracking ID: 2590918)

SYMPTOM:
Upon new node in the cluster taking over as primary of the file system,
there might be a significant delay in freeing up unshared extents. This problem
can occur only in the case when shared extent addition or deletions occurred
immediately after primary switch over to different node in the cluster.

DESCRIPTION:
When a new node in the cluster takes over as primary for the file
system, a file system thread in the new primary performs a full scan of the
shared extent device file to free up any shared extents that have become
completely unshared. If heavy shared extent related activity such as additional
sharing or unsharing of extents were to occur anywhere in the cluster while the
full scan was being performed, the full scan could get interrupted. Due to a
bug, the full scan is marked as completed and scheduled further scans of the
shared extent device are partial scans. This will cause a substantial delay in
freeing up some of the unshared extents in the device file.

RESOLUTION:
If the first full scan of shared extent device upon primary takeover
gets interrupted, then do not mark the full scan as complete.

Patch ID: VRTSamf-6.0.5.1100

* 4008106 (Tracking ID: 4008102)

SYMPTOM:
Veritas Cluster Server does not support Oracle Solaris 11.4.

DESCRIPTION:
Veritas Cluster Server does not support Oracle Solaris versions later than 11.3.

RESOLUTION:
Veritas Cluster Server now supports Oracle Solaris 11.4.

Patch ID: VRTSamf-6.0.5.100

* 3871503 (Tracking ID: 3871501)

SYMPTOM:
On Solaris 11.2 SRU 8 or above with Asynchronous Monitoring
framework (AMF) enabled, VCS agent processes may not respond or may encounter AMF
errors during registration.

DESCRIPTION:
For Solaris 11.2 from SRU 8 onward, kernel header is modified.
AMF functionality is affected because of the changes in the kernel headers
which can cause agent processes like Mount, Oracle, Netlistener and Process
enabled with AMF functionality to hang or report error during AMF
registration.

RESOLUTION:
The AMF code is modified to handle the changes in the kernel
headers for Solaris 11.2 SRU 8 to address the issue during registration.

* 3875894 (Tracking ID: 3873866)

SYMPTOM:
Veritas Infoscale Availability does not support Oracle Solaris 11.3.

DESCRIPTION:
Veritas Infoscale Availability does not support Oracle Solaris versions later
than 11.2.

RESOLUTION:
Veritas Infoscale Availability now supports Oracle Solaris 11.3.

Patch ID: VRTSvxfen-6.0.5.3100

* 3864474 (Tracking ID: 3864470)

SYMPTOM:
When the I/O fencing startup script starts, it tries to configure fencing. The
startup script exits if it there is any error in configuring the VxFEN module.

DESCRIPTION:
Any errors while trying to configuring VxFEN causes I/O fencing startup script
to exit the retry operation after a specific number of tries. If the errors get
resolved after the startup script has exited, you need to manually start the I/O
fencing program.

RESOLUTION:
When the I/O fencing program starts the vxfen init script waits indefinitely for
vxfenconfig utility to configure VxFEN module.

* 3914135 (Tracking ID: 3913303)

SYMPTOM:
A non root users can not read VxFEN log files.

DESCRIPTION:
Non-root users, do not have read permissions for VxFEN log files.

RESOLUTION:
The VxFEN code is changed to allow non root users to read VxFEN log files.

* 4005681 (Tracking ID: 3960112)

SYMPTOM:
Disk-based fencing configurations fails when vSAN LUNs are exported over iSCSI.

DESCRIPTION:
This issue occurs because the VxFen module cannot properly handle vSAN LUNs over iSCSI.

RESOLUTION:
This hotfix addresses the issue by updating the VxFen module to support the use of vSAN LUNs over iSCSI. A new environment variable, VXFEN_VSAN_EXTINQ, is added to provide this support. For fencing to work with vSAN over iSCSI, set "VXFEN_VSAN_EXTINQ=1" in the "/etc/sysconfig/vxfen" file.

* 4005702 (Tracking ID: 3935040)

SYMPTOM:
The Cluster Server component creates some required files in the /tmp
and /var/tmp directories.

DESCRIPTION:
The Cluster Server component creates some required files in the
/tmp and /var/tmp directories. Non-root users have access to these folders,
and they may accidentally modify, move, or delete these files. Such actions
may interfere with the normal functioning of Cluster Server.

RESOLUTION:
This hotfix addresses the issue by moving the required Cluster
Server files to secure locations.

* 4008105 (Tracking ID: 4008102)

SYMPTOM:
Veritas Cluster Server does not support Oracle Solaris 11.4.

DESCRIPTION:
Veritas Cluster Server does not support Oracle Solaris versions later than 11.3.

RESOLUTION:
Veritas Cluster Server now supports Oracle Solaris 11.4.

Patch ID: VRTSvxfen-6.0.500.300

* 3864474 (Tracking ID: 3864470)

SYMPTOM:
When the I/O fencing startup script starts, it tries to configure fencing. The
startup script exits if it there is any error in configuring the VxFEN module.

RESOLUTION:
When the I/O fencing program starts the vxfen init script waits indefinitely for
vxfenconfig utility to configure VxFEN module.

* 3914135 (Tracking ID: 3913303)

SYMPTOM:
A non root users can not read VxFEN log files.

DESCRIPTION:
Non-root users, do not have read permissions for VxFEN log files.

RESOLUTION:
The VxFEN code is changed to allow non root users to read VxFEN log files.

Patch ID: VRTSgab-6.0.5.1100

* 4005400 (Tracking ID: 3984685)

SYMPTOM:
In a rare case, peer MAC/IP learning may fail in LLT, and the current debug capability is not enough to detect the problem.

DESCRIPTION:
When an LLT node comes up, it learns the MAC/IP address of its peers with the help of LLT_ARP requests or the acknowledgements that it receives from other nodes.
Peer address learning may fail due to several reasons, and the information in the current debug messages is not enough to identify the cause of a specific failure.

RESOLUTION:
This hotfix updates the LLT module to add several detailed debug messages in this context, which are disabled by default. If issues about peer MAC/IP learning failure are encountered, the messages can be turned on to identify the cause of a specific failure.

* 4008104 (Tracking ID: 4008102)

SYMPTOM:
Veritas Cluster Server does not support Oracle Solaris 11.4.

DESCRIPTION:
Veritas Cluster Server does not support Oracle Solaris versions later than 11.3.

RESOLUTION:
Veritas Cluster Server now supports Oracle Solaris 11.4.

Patch ID: VRTSllt-6.0.5.1100

* 4005400 (Tracking ID: 3984685)

SYMPTOM:
In a rare case, peer MAC/IP learning may fail in LLT, and the current debug capability is not enough to detect the problem.

* 4005401 (Tracking ID: 3935040)

SYMPTOM:
The Cluster Server component creates some required files in the /tmp
and /var/tmp directories.

RESOLUTION:
This hotfix addresses the issue by moving the required Cluster
Server files to secure locations.

* 4008103 (Tracking ID: 4008102)

SYMPTOM:
Veritas Cluster Server does not support Oracle Solaris 11.4.

DESCRIPTION:
Veritas Cluster Server does not support Oracle Solaris versions later than 11.3.

RESOLUTION:
Veritas Cluster Server now supports Oracle Solaris 11.4.

INSTALLING THE PATCH
--------------------
Run the Installer script to automatically install the patch:
-----------------------------------------------------------
Please be noted that the installation of this P-Patch will cause downtime.

To install the patch perform the following steps on at least one node in the cluster:
1. Copy the patch sfha-sol11_x64-Patch-6.0.5.3100.tar.gz to /tmp
2. Untar sfha-sol11_x64-Patch-6.0.5.3100.tar.gz to /tmp/hf
# mkdir /tmp/hf
# cd /tmp/hf
# gunzip /tmp/sfha-sol11_x64-Patch-6.0.5.3100.tar.gz
# tar xf /tmp/sfha-sol11_x64-Patch-6.0.5.3100.tar
3. Install the hotfix(Please be noted that the installation of this P-Patch will cause downtime.)
# pwd
/tmp/hf
# ./installSFHA605P31 [<host1> <host2>...]

You can also install this patch together with 6.0.1 GA release and 6.0.5 Patch release
# ./installSFHA605P31 -base_path [<601 path>] -mr_path [<605 path>] [<host1> <host2>...]
where the -mr_path should point to the 6.0.5 image directory, while -base_path to the 6.0.1 image.

Install the patch manually:
--------------------------
Manual installation is not recommended.

REMOVING THE PATCH
------------------
Manual uninstallation is not recommended.

SPECIAL INSTRUCTIONS
--------------------
1) Delete '.vxvm-configured'
# rm /etc/vx/reconfig.d/state.d/.vxvm-configured
2) Refresh vxvm-configure
# svcadm refresh vxvm-configure
3) Delete 'install-db'
# rm /etc/vx/reconfig.d/state.d/install-db
4) Reboot the system using shutdown command.

You need to use the shutdown command to reboot the system after patch installation or de-installation:

shutdown -g0 -y -i6

OTHERS
------
NONE