* * * READ ME * * * * * * Veritas Storage Foundation HA 6.0.5 * * * * * * Patch 6.0.5.600 * * * Patch Date: 2018-09-11 This document provides the following information: * PATCH NAME * OPERATING SYSTEMS SUPPORTED BY THE PATCH * PACKAGES AFFECTED BY THE PATCH * BASE PRODUCT VERSIONS FOR THE PATCH * SUMMARY OF INCIDENTS FIXED BY THE PATCH * DETAILS OF INCIDENTS FIXED BY THE PATCH * INSTALLATION PRE-REQUISITES * INSTALLING THE PATCH * REMOVING THE PATCH PATCH NAME ---------- Veritas Storage Foundation HA 6.0.5 Patch 6.0.5.600 OPERATING SYSTEMS SUPPORTED BY THE PATCH ---------------------------------------- RHEL6 x86-64 PACKAGES AFFECTED BY THE PATCH ------------------------------ VRTSamf VRTSaslapm VRTSdbac VRTSgab VRTSglm VRTSgms VRTSllt VRTSodm VRTSvxfen VRTSvxfs VRTSvxvm BASE PRODUCT VERSIONS FOR THE PATCH ----------------------------------- * Symantec VirtualStore 6.0.1 * Veritas Cluster Server 6.0.1 * Veritas Dynamic Multi-Pathing 6.0.1 * Veritas Storage Foundation 6.0.1 * Veritas Storage Foundation Cluster File System HA 6.0.1 * Veritas Storage Foundation for Oracle RAC 6.0.1 * Veritas Storage Foundation HA 6.0.1 SUMMARY OF INCIDENTS FIXED BY THE PATCH --------------------------------------- Patch ID: VRTSvxfs 6.0.500.600 * 3957178 (3957852) VxFS support for RHEL6.10. * 3952360 (3952357) VxFS support for RHEL6.x retpoline kernels * 2927359 (2927357) Assert hit in internal testing. * 2933300 (2933297) Compression support for the dedup ioctl. * 2972674 (2244932) Internal assert failure during testing. * 3040130 (3137886) Thin Provisioning Logging does not work for reclaim operations triggered via fsadm command. * 3248031 (2628207) Full fsck operation taking long time. * 3682640 (3637636) Cluster File System (CFS) node initialization and protocol upgrade may hang during the rolling upgrade. * 3690056 (3689354) Users having write permission on file cannot open the file with O_TRUNC if the file has setuid or setgid bit set. * 3726112 (3704478) The library routine to get the mount point, fails to return mount point of root file system. * 3796626 (3762125) Directory size increases abnormally. * 3796630 (3736398) NULL pointer dereference panic in lazy unmount. * 3796633 (3729158) Deadlock occurs due to incorrect locking order between write advise and dalloc flusher thread. * 3796637 (3574404) Stack overflow during rename operation. * 3796644 (3269553) VxFS returns inappropriate message for read of hole via Oracle Disk Manager (ODM). * 3796652 (3686438) NMI panic in the vx_fsq_flush function. * 3796664 (3444154) Reading from a de-duped file-system over NFS can result in data corruption seen on the NFS client. * 3796671 (3596378) The copy of a large number of small files is slower on vxfs compared to ext4 * 3796676 (3615850) Write system call hangs with invalid buffer length * 3796684 (3601198) Replication makes the copies of 64-bit external quota files too. * 3796687 (3604071) High CPU usage consumed by the vxfs thread process. * 3796727 (3617191) Checkpoint creation takes a lot of time. * 3796731 (3558087) The ls -l and other commands which uses stat system call may take long time to complete. * 3796733 (3695367) Unable to remove volume from multi-volume VxFS using "fsvoladm" command. * 3796745 (3667824) System panicked while delayed allocation(dalloc) flushing. * 3799999 (3602322) System panics while flushing the dirty pages of the inode. * 3821416 (3817734) Direct command to run fsck with -y|Y option was mentioned in the message displayed to user when file system mount fails. * 3843470 (3867131) Kernel panic in internal testing. * 3843734 (3812914) On RHEL 6.5 and RHEL 6.4 latest kernel patch, umount(8) system call hangs if an application watches for inode events using inotify(7) APIs. * 3848508 (3867128) Assert failed in internal native AIO testing. * 3851967 (3852324) Assert failure during internal stress testing. * 3852297 (3553328) During internal testing full fsck failed to clean the file system cleanly. * 3852512 (3846521) "cp -p" fails if modification time in nano seconds have 10 digits. * 3861518 (3549057) The "relatime" mount option is shown in /proc/mounts but it is not supported by VxFS. * 3862350 (3853338) Files on VxFS are corrupted while running the sequential asynchronous write workload under high memory pressure. * 3862425 (3859032) System panics in vx_tflush_map() due to NULL pointer de-reference. * 3862435 (3833816) Read returns stale data on one node of the CFS. * 3864139 (3869174) Write system call deadlock on rhel5 and sles10. * 3864751 (3867147) Assert failed in internal dedup testing. * 3866970 (3866962) Data corruption seen when dalloc writes are going on the file and simultaneously fsync started on the same file. * 3622423 (3250239) Panic in vx_naio_worker * 3673958 (3660422) On RHEL 6.6, umount(8) system call hangs if an application is watching for inode events using inotify(7) APIs. * 3678626 (3613048) Support vectored AIO on Linux * 3409692 (3402618) The mmap read performance on VxFS is slow * 3426009 (3412667) The RHEL 6 system panics with Stack Overflow. * 3469683 (3469681) File system is disabled while free space defragmentation is going on. * 3498950 (3356947) When there are multi-threaded writes with fsync calls between them, VxFS becomes slow. * 3498954 (3352883) During the rename operation, lots of nfsd threads hang. * 3498963 (3396959) RHEL 6.4 system panics with stack overflow errors due to memory pressure. * 3498976 (3434811) The vxfsconvert(1M) in VxFS 6.1 hangs. * 3498978 (3424564) fsppadm fails with ENODEV and "file is encrypted or is not a database" errors * 3498998 (3466020) File System is corrupted with error message "vx_direrr: vx_dexh_keycheck_1" * 3499005 (3469644) System panics in the vx_logbuf_clean() function. * 3499008 (3484336) The fidtovp() system call can panic in the vx_itryhold_locked () function. * 3499011 (3486726) VFR logs too much data on the target node. * 3499014 (3471245) The mongodb fails to insert any record. * 3499030 (3484353) The file system may hang with a partitioned directory feature enabled. * 3514824 (3443430) Fsck allocates too much memory. * 3515559 (3498048) while the system is making backup, the Als AlA command on the same file system may hang. * 3515569 (3430461) The nested unmounts fail if the parent file system is disabled. * 3515588 (3294074) System call fsetxattr() is slower on Veritas File System (VxFS) than ext3 file system. * 3515737 (3511232) Stack overflow causes kernel panic in the vx_write_alloc() function. * 3515739 (3510796) System panics when VxFS cleans the inode chunks. * 3517702 (3517699) Return code 240 for command fsfreeze(1M) is not documented in man page for fsfreeze. * 3517707 (3093821) The system panics due to referring freed super block after the vx_unmount() function errors. * 3557193 (3473390) Multiple stack overflows with VxFS on RHEL6 lead to panics/system crashes. * 3567855 (3567854) On VxFS 6.0.5, the vxedquota(1M) and vxrepquota(1M) commands fail in certain scenarios. * 3579957 (3233315) "fsck" utility dumps core, with full scan. * 3581566 (3560968) The delicache_enable tunable is not persistent in the Cluster File System (CFS) environment. * 3584297 (3583930) While external quota file is restored or over-written, old quota records are preserved. * 3588236 (3604007) Stack overflow on SLES11 * 3590573 (3331010) Command fsck(1M) dumped core with segmentation fault * 3595894 (3595896) While creating the OracleRAC 12.1.0.2 database, the node panics. * 3597454 (3602386) vfradmin man page shows the incorrect info about default behavior of -d option * 3597560 (3597482) The pwrite(2) function fails with the EOPNOTSUPP error. * 2705336 (2059611) The system panics due to a NULL pointer dereference while flushing bitmaps to the disk. * 2978234 (2972183) The fsppadm(1M) enforce command takes a long time on the secondary nodes compared to the primary nodes. * 2982161 (2982157) During internal testing, the Af:vx_trancommit:4A debug asset was hit when the available transaction space is lesser than required. * 2999566 (2999560) The 'fsvoladm'(1M) command fails to clear the 'metadataok' flag on a volume. * 3027250 (3031901) The 'vxtunefs(1M)' command accepts the garbage value for the 'max_buf_dat_size' tunable. * 3056103 (3197901) prevent duplicate symbol in VxFS libvxfspriv.a and vxfspriv.so * 3059000 (3046983) Invalid CFS node number in ".__fsppadm_fclextract", causes the DST policy enforcement failure. * 3108176 (2667658) The 'fscdsconv endian' conversion operation fails because of a macro overflow. * 3248029 (2439261) When the vx_fiostats_tunable value is changed from zero to non-zero, the system panics. * 3248042 (3072036) Read operations from secondary node in CFS can sometimes fail with the ENXIO error code. * 3248046 (3092114) The information output displayed by the "df -i" command may be inaccurate for cluster mounted file systems. * 3248054 (3153919) The fsadm (1M) command may hang when the structural file set re-organization is in progress. * 3296988 (2977035) A debug assert issue was encountered in vx_dircompact() function while running an internal noise test in the Cluster File System (CFS) environment * 3310758 (3310755) Internal testing hits a debug assert Avx_rcq_badrecord:9:corruptfsA. * 3317118 (3317116) Internal command conformance text for mount command on RHEL6 Update4 hit debug assert inside the vx_get_sb_impl()function. * 3338024 (3297840) A metadata corruption is found during the file removal process. * 3338026 (3331419) System panic because of kernel stack overflow. * 3338030 (3335272) The mkfs (make file system) command dumps core when the log size provided is not aligned. * 3338063 (3332902) While shutting down, the system running the fsclustadm(1M) command panics. * 3338750 (2414266) The fallocate(2) system call fails on Veritas File System (VxFS) file systems in the Linux environment. * 3338762 (3096834) Intermittent vx_disable messages are displayed in the system log. * 3338776 (3224101) After you enable the optimization for updating the i_size across the cluster nodes lazily, the system panics. * 3338779 (3252983) On a high-end system greater than or equal to 48 CPUs, some file system operations may hang. * 3338780 (3253210) File system hangs when it reaches the space limitation. * 3338781 (3249958) When the /usr file system is mounted as a separate file system, Veritas File system (VxFS) fails to load. * 3338787 (3261462) File system with size greater than 16TB corrupts with vx_mapbad messages in the system log. * 3338790 (3233284) FSCK binary hangs while checking Reference Count Table (RCT. * 3339230 (3308673) A fragmented file system is disabled when delayed allocations feature is enabled. * 3339884 (1949445) System is unresponsive when files were created on large directory. * 3339949 (3271892) Veritas File Replicator (VFR) jobs fail if the same Process ID (PID) is associated with the multiple jobs working on different target file systems. * 3339963 (3071622) On SLES10, bcopy(3) with overlapping address does not work. * 3339964 (3313756) The file replication daemon exits unexpectedly and dumps core on the target side. * 3340029 (3298041) With the delayed allocation feature enabled on a locally mounted file system, observable performance degradation might be experienced when writing to a file and extending the file size. * 3340031 (3337806) The find(1) command may panic the systems with Linux kernels with versions greater than 3.0. * 3348459 (3274048) VxFS hangs when it requests a cluster-wide grant on an inode while holding a lock on the inode. * 3351939 (3351937) Command vfradmin(1M) may fail while promoting job on locally mounted VxFS file system due to "relatime" mount option. * 3351946 (3194635) The internal stress test on a locally mounted file system exited with an error message. * 3351947 (3164418) Internal stress test on locally mounted VxFS filesytem results in data corruption in no space on device scenario while doing spilt on Zero Fill-On-Demand(ZFOD) extent * 3359278 (3364290) The kernel may panic in Veritas File System (VxFS) when it is internally working on reference count queue (RCQ) record. * 3364285 (3364282) The fsck(1M) command fails to correct inode list file * 3364289 (3364287) Debug assert may be hit in the vx_real_unshare() function in the cluster environment. * 3364302 (3364301) Assert failure because of improper handling of inode lock while truncating a reorg inode. * 3364305 (3364303) Internal stress test on a locally mounted file system hits a debug assert in VxFS File Device Driver (FDD). * 3364307 (3364306) Stack overflow seen in extent allocation code path. * 3364317 (3364312) The fsadm(1M) command is unresponsive while processing the VX_FSADM_REORGLK_MSG message. * 3364333 (3312897) System can hang when the Cluster File System (CFS) primary node is disabled. * 3364335 (3331109) The full fsck does not repair the corrupted reference count queue (RCQ) record. * 3364338 (3331045) Kernel Oops in unlock code of map while referring freed mlink due to a race with iodone routine for delayed writes. * 3364349 (3359200) Internal test on Veritas File System (VxFS) fsdedup(1M) feature in cluster file system environment results in a hang. * 3364353 (3331047) Memory leak occurs in the vx_followlink() function in error condition. * 3364355 (3263336) Internal noise test on cluster file system hits the "f:vx_cwfrz_wait:2" and "f:vx_osdep_msgprint:panic" debug asserts. * 3369037 (3349651) VxFS modules fail to load on RHEL6.5 * 3369039 (3350804) System panic on RHEL6 due to kernel stack overflow corruption. * 3370650 (2735912) The performance of tier relocation using the fsppadm(1M) enforce command degrades while migrating a large number of files. * 3372896 (3352059) High memory usage occurs when VxFS uses Veritas File Replicator (VFR) on the target even when no jobs are running. * 3372909 (3274592) Internal noise test on cluster file system is unresponsive while executing the fsadm(1M) command * 3380905 (3291635) Internal testing found debug assert Avx_freeze_block_threads_all:7cA on locally mounted file systems while processing preambles for transactions. * 3381928 (3444771) Internal noise test on cluster file system hits debug assert while creating a file. * 3383150 (3383147) The ACA operator precedence error may occur while turning off delayed allocation. * 3383271 (3433786) The vxedquota(1M) command fails to set quota limits for some users. * 3396539 (3331093) Issue with MountAgent Process for vxfs. While doing repeated switchover on HP-UX, MountAgent got stuck. * 3402484 (3394803) A panic is observed in VxFS routine vx_upgrade7() function while running the vxupgrade command(1M). * 3405172 (3436699) An assert failure occurs because of a race condition between clone mount thread and directory removal thread while pushing data on clone. * 3411725 (3415639) The type of the fsdedupadm(1M) command always shows as MANUAL even it is launched by the fsdedupschd daemon. * 3429587 (3463464) Internal kernel functionality conformance test hits a kernel panic due to null pointer dereference. * 3430687 (3444775) Internal noise testing on cluster file system results in a kernel panic in function vx_fsadm_query() with an error message. * 3436393 (3462694) The fsdedupadm(1M) command fails with error code 9 when it tries to mount checkpoints on a cluster. * 3468413 (3465035) The VRTSvxfs and VRTSfsadv packages displays incorrect "Provides" list. * 3384781 (3384775) Installing patch 6.0.3.200 on RHEL 6.4 or earlier RHEL 6.* versions fails with ERROR: No appropriate modules found. * 3349652 (3349651) VxFS modules fail to load on RHEL6.5 * 3356841 (2059611) The system panics due to a NULL pointer dereference while flushing bitmaps to the disk. * 3356845 (3331419) System panic because of kernel stack overflow. * 3356892 (3259634) A Cluster File System (CFS) with blocks larger than 4GB may become corrupt. * 3356895 (3253210) File system hangs when it reaches the space limitation. * 3356909 (3335272) The mkfs (make file system) command dumps core when the log size provided is not aligned. * 3357264 (3350804) System panic on RHEL6 due to kernel stack overflow corruption. * 3357278 (3340286) After a file system is resized, the tunable setting of dalloc_enable gets reset to a default value. * 3100385 (3369020) The Veritas File System (VxFS) module fails to load in the RHEL 6 Update 4 environment. * 2912412 (2857629) File system corruption can occur requiring a full fsck(1M) after cluster reconfiguration. * 2928921 (2843635) Internal testing is having some failures. * 2933290 (2756779) The code is modified to improve the fix for the read and write performance concerns on Cluster File System (CFS) when it runs applications that rely on the POSIX file-record using the fcntl lock. * 2933291 (2806466) A reclaim operation on a file system that is mounted on a Logical Volume Manager (LVM) may panic the system. * 2933292 (2895743) Accessing named attributes for some files stored in CFS seems to be slow. * 2933294 (2750860) Performance of the write operation with small request size may degrade on a large file system. * 2933296 (2923105) Removal of the VxFS module from the kernel takes a longer time. * 2933309 (2858683) Reserve extent attributes changed after vxrestore, for files greater than 8192bytes. * 2933313 (2841059) full fsck fails to clear the corruption in attribute inode 15 * 2933330 (2773383) The read and write operations on a memory mapped files are unresponsive. * 2933333 (2893551) The file attribute value is replaced with question mark symbols when the Network File System (NFS) connections experience a high load. * 2933335 (2641438) After a system is restarted, the modifications that are performed on the username space-extended attributes are lost. * 2933571 (2417858) VxFS quotas do not support 64 bit limits. * 2933729 (2611279) Filesystem with shared extents may panic. * 2933751 (2916691) Customer experiencing hangs when doing dedups * 2933822 (2624262) Filestore:Dedup:fsdedup.bin hit oops at vx_bc_do_brelse * 2937367 (2923867) Internal test hits an assert "f:xted_set_msg_pri1:1". * 2976664 (2906018) The vx_iread errors are displayed after successful log replay and mount of the file system. * 2978227 (2857751) The internal testing hits the assert "f:vx_cbdnlc_enter:1a". * 2983739 (2857731) Internal testing hits an assert "f:vx_mapdeinit:1" * 2984589 (2977697) A core dump is generated while you are removing the clone. * 2987373 (2881211) File ACLs not preserved in checkpoints properly if file has hardlink. * 2988749 (2821152) Internal Stress test hit an assert "f:vx_dio_physio:4, 1" on locally mounter file system. * 3008450 (3004466) Installation of 5.1SP1RP3 fails on RHEL 6.3 Patch ID: VRTSdbac-6.0.500.300 * 3952512 (3951435) Support for RHEL 6.10 and RHEL 6.x RETPOLINE kernels. * 3952996 (3952988) Support for RHEL6.x RETPOLINE kernels. Patch ID: VRTSamf-6.0.500.500 * 3952511 (3951435) Support for RHEL 6.10 and RHEL 6.x RETPOLINE kernels. * 3952995 (3952988) Support for RHEL6.x RETPOLINE kernels. * 3749188 (3749172) Lazy un-mount of VxFS file system causes node panic Patch ID: VRTSvxfen-6.0.500.700 * 3952509 (3951435) Support for RHEL 6.10 and RHEL 6.x RETPOLINE kernels. * 3952993 (3952988) Support for RHEL6.x RETPOLINE kernels. * 3864474 (3864470) The I/O fencing configuration script exits after a limited number of retries. * 3914135 (3913303) Non root users should have read permissions for VxFEN log files. Patch ID: VRTSgab-6.0.500.400 * 3952507 (3951435) Support for RHEL 6.10 and RHEL 6.x RETPOLINE kernels. * 3952991 (3952988) Support for RHEL6.x RETPOLINE kernels. Patch ID: VRTSllt-6.0.500.400 * 3952506 (3951435) Support for RHEL 6.10 and RHEL 6.x RETPOLINE kernels. * 3952990 (3952988) Support for RHEL6.x RETPOLINE kernels. Patch ID: VRTSvxvm 6.0.500.500 * 3952559 (3951290) Retpoline support for VxVM on RHEL6.10 and RHEL6.x retpoline kernels * 3889559 (3889541) While Performing Root-Disk Encapsulation, After Splitting the mirrored-root disk, Grub Entry of Split_Mirror_dg_disk is having incorrect hd number. * 3496715 (3281004) For DMP minimum queue I/O policy with large number of CPUs a couple of issues are observed. * 3501358 (3399323) The reconfiguration of Dynamic Multipathing (DMP) database fails. * 3502662 (3380481) When you select a removed disk during the "5 Replace a failed or removed disk" operation, the vxdiskadm(1M) command displays an error message. * 3521727 (3521726) System panicked for double freeing IOHINT. * 3526501 (3526500) Disk IO failures occur with DMP IO timeout error messages when DMP (Dynamic Multi-pathing) IO statistics demon is not running. * 3531332 (3077582) A Veritas Volume Manager (VxVM) volume may become inaccessible causing the read/write operations to fail. * 3540777 (3539548) After adding removed MPIO disk back, 'vxdisk list' or 'vxdmpadm listctlr all' commands may show duplicate entry for DMP node with error state. * 3552411 (3482026) The vxattachd(1M) daemon reattaches plexes of manually detached site. * 3600161 (3599977) During a replica connection, referencing a port that is already deleted in another thread causes a system panic. * 3603811 (3594158) The spinlock and unspinlock are referenced to different objects when interleaving with a kernel transaction. * 3612801 (3596330) 'vxsnap refresh' operation fails with `Transaction aborted waiting for IO drain` error * 3621240 (3621232) The vradmin ibc command cannot be started or executed on Veritas Volume Replicators (VVR) secondary node. * 3622069 (3513392) Reference to replication port that is already deleted caused panic. * 3638039 (3625890) vxdisk resize operation on CDS disks fails with an error message of "Invalid attribute specification" * 3648603 (3564260) VVR commands are unresponsive when replication is paused and resumed in a loop. * 3654163 (2916877) vxconfigd hangs on a node leaving the cluster. * 3690795 (2573229) On RHEL6, the server panics when Dynamic Multi-Pathing (DMP) executes PERSISTENT RESERVE IN command with REPORT CAPABILITIES service action on powerpath controlled device. * 3713320 (3596282) Snap operations fail with error "Failed to allocate a new map due to no free map available in DCO". * 3737823 (3736502) Memory leakage is found when transaction aborts. * 3774137 (3565212) IO failure is seen during controller giveback operations on Netapp Arrays in ALUA mode. * 3780978 (3762580) In Linux kernels greater than or equal to RHEL6.6 like RHEL7 and SLES11SP3, the vxfen module fails to register the SCSI-3 PR keys to EMC devices when powerpath exists in coexistence with DMP (Dynamic Multipathing). * 3788751 (3788644) Reuse raw device number when checking for available raw devices. * 3799809 (3665644) The system panics due to an invalid page pointer in the Linux bio structure. * 3799822 (3573262) System panic during space optimized snapshot operations on recent UltraSPARC architectures. * 3800394 (3672759) The vxconfigd(1M) daemon may core dump when DMP database is corrupted. * 3800396 (3749557) System hangs because of high memory usage by vxvm. * 3800449 (3726110) On systems with high number of CPU's, Dynamic Multipathing (DMP) devices may perform considerably slower than OS device paths. * 3800452 (3437852) The system panics when Symantec Replicator Option goes to PASSTHRU mode. * 3800738 (3433503) Due to an incorrect memory access, Vxconfigd(1M) daemon cores with a stack trace. * 3800788 (3648719) The server panics while adding or removing LUNs or HBAs. * 3801225 (3662392) In the Cluster Volume Manager (CVM) environment, if I/Os are getting executed on slave node, corruption can happen when the vxdisk resize(1M) command is executing on the master node. * 3805243 (3523575) VxDMP (Veritas Dynamic Multi-pathing) path restoration daemon could disable paths connected to EMC Clariion array. * 3805902 (3795622) With Dynamic Multipathing (DMP) Native Support enabled, LVM global_filter is not updated properly in lvm.conf file. * 3805938 (3790136) File system hang observed due to IO's in Ditry Region Logging (DRL). * 3806808 (3645370) vxevac command fails to evacuate disks with Dirty Region Log(DRL) plexes. * 3807761 (3729078) VVR(Veritas Volume Replication) secondary site panic occurs during patch installation because of flag overlap issue. * 3816233 (3686698) vxconfigd was getting hung due to deadlock between two threads * 3825466 (3825467) SLES11-SP4 build fails. * 3826918 (3819670) poll() return -1 with errno EINTR should be handled correctly in vol_admintask_wait() * 3829273 (3823283) While unencapsulating a boot disk in SAN environment(Storage Area Network), Linux operating system sticks in grub after reboot. * 3837711 (3488071) The command "vxdmpadm settune dmp_native_support=on" fails to enable Dynamic Multipathing (DMP) Native Support * 3837712 (3776520) Filters are not updated properly in lvm.conf file in VxDMP initrd(initial ramdisk) while enabling Dynamic Multipathing (DMP) Native Support. * 3837713 (3137543) Issue with Root disk encapsulation(RDE) due to changes in Linux trusted boot grub. * 3837715 (3581646) Logical Volumes fail to migrate back to OS devices Dynamic Multipathing (DMP) when DMP native support is disabled while root("/") is mounted on LVM. * 3841220 (2711312) New symbolic link is created in root directory when a FC channel is pulled out. * 3796596 (3433503) Due to an incorrect memory access, Vxconfigd(1M) daemon cores with a stack trace. * 3796666 (3573262) System panic during space optimized snapshot operations on recent UltraSPARC architectures. * 3832703 (3488071) The command "vxdmpadm settune dmp_native_support=on" fails to enable Dynamic Multipathing (DMP) Native Support * 3832705 (3776520) Filters are not updated properly in lvm.conf file in VxDMP initrd(initial ramdisk) while enabling Dynamic Multipathing (DMP) Native Support. * 3835367 (3137543) Issue with Root disk encapsulation(RDE) due to changes in Linux trusted boot grub. * 3835662 (3581646) Logical Volumes fail to migrate back to OS devices Dynamic Multipathing (DMP) when DMP native support is disabled while root("/") is mounted on LVM. * 3836923 (3133322) When dmp native support is enabled, This command "vxdmpadm native ls" does not display root devices under linux logical volume manager(LVM) in any volume group(VG) . * 3632969 (3631230) VRTSvxvm patch version 6.0.5 and 6.1.1 or previous will not work with RHEL6.6 update. * 2941224 (2921816) System panics while starting replication after disabling the DCM volumes. * 2960654 (2932214) "vxdisk resize" operation may cause the disk goes into "online invalid" state. * 2974602 (2986596) The disk groups imported with mix of standard and clone logical unit numbers (LUNs) may lead to data corruption. * 2999881 (2999871) The vxinstall(1M) command gets into a hung state when it is invoked through Secure Shell (SSH) remote execution. * 3022349 (3052770) The vradmin syncrvg operation with a volume set fails to synchronize the secondary RVG with the primary RVG. * 3032358 (2952403) Shared disk group fails to destroy if master has lost storage. * 3033904 (2308875) vxddladm(1M) list command options (hbas, ports, targets) don't display the correct values for the state attribute. * 3036949 (3045033) "vxdg init" should not create a disk group on clone disk that was previously part of a disk group. * 3043206 (3038684) The restore daemon enables the paths of Business Continuance Volumes-Not Ready (BCV-NR) devices. * 3049356 (3060327) The vradmin repstatus(1M) shows "dcm contains 0 kbytes" during the Smart Autosync. * 3089749 (3088059) On Red Hat Enterprise Linux 6.x (RHEL6.x), the type of host bus adapter (HBA) is reported as SCSI instead of FC. * 3094185 (3091916) The Small Computer System Interface (SCSI) I/O errors overflow the syslog. * 3144764 (2398954) The system panics while performing I/O on a VxFS mounted instant snapshot with the Oracle Disk Manager (ODM) SmartSync enabled. * 3189041 (3130353) Continuous disable or enable path messages are seen on the console for EMC Not Ready (NR) devices. * 3195695 (2954455) During Dynamic Reconfiguration Operations in vxdiskadm, when a pattern is specified to match a range of LUNs for removal, the pattern is matched erroneously. * 3197460 (3098559) Cluster File System (CFS) data corrupted due to cloned copy of logical unit numbers (LUNs) that is imported with volume asymmetry. * 3254133 (3240858) The /etc/vx/vxesd/.udev_lock file may have different permissions at different instances. * 3254199 (3015181) I/O hangs on both the nodes of the cluster when the disk array is disabled. * 3254201 (3121380) I/O of replicated volume group (RVG) hangs after one data volume is disabled. * 3254204 (2882312) If an SRL fault occurs in the middle of an I/O load, and you immediately issue a read operation on data written during the SRL fault, the system returns old data. * 3254205 (3162418) The vxconfigd(1M) command dumps core due to wrong check in ddl_find_cdevno() function. * 3254231 (3010191) Previously excluded paths are not excluded after upgrade to VxVM 5.1SP1RP3. * 3254233 (3012929) The vxconfigbackup(1M) command gives errors when disk names are changed. * 3254301 (3199056) Veritas Volume Replicator (VVR) primary system panics in the vol_cmn_err function due to the VVR corrupted queue. * 3261607 (3261601) System panics when dmp_destroy_dmpnode() attempts to free an already free virtual address. * 3264166 (3254311) System panics when reattaching site to a site-consistent diskgroup having volume larger than 1.05 TB * 3271596 (3271595) Veritas Volume Manager (VxVM) should prevent the disk reclaim flag from getting turned off, when there are pending reclaims on the disk. * 3271764 (2857360) The vxconfigd(1M) command hangs when the vol_use_rq tunable of VxVM is changed from 1 to 0. * 3306164 (2972513) In CVM, PGR keys from shared data disks are not removed after stopping VCS. * 3309931 (2959733) Handling the device path reconfiguration in case the device paths are moved across LUNs or enclosures to prevent the vxconfigd(1M) daemon coredump. * 3312134 (3325022) The VirtIO-disk interface exported disks from an SLES11 SP2 or SLES11 SP3 host are not visible. * 3312311 (3321269) vxunroot may hang during un-encapsulation of root disk. * 3321337 (3186149) On Linux System with LVM version 2.02.85, on enabling dmp_native_support LVM volume Groups will disappear. * 3344127 (2969844) The device discovery failure should not cause the DMP database to be destroyed completely. * 3344128 (2643506) vxconfigd dumps core when LUNs from the same enclosure are presented as different types, say A/P and A/P-F. * 3344129 (2910367) When SRL on the secondary site disabled, secondary panics. * 3344130 (2825102) CVM reconfiguration and VxVM transaction code paths can simultaneously access volume device list resulting in data corruption. * 3344132 (2860230) In a Cluster Volume Manager (CVM) environment, the shared disk remains as opaque after execution of vxdiskunsetup(1M)command on a master node. * 3344134 (3011405) Execution of "vxtune -o export" command fails and displays an error message. * 3344138 (3041014) Beautify error messages seen during relayout operation. * 3344140 (2966990) In a Veritas Volume Replicator (VVR) environment, the I/O hangs at the primary side after multiple cluster reconfigurations are triggered in parallel. * 3344142 (3178029) When you synchronize a replicated volume group (RVG), the diff string is over 100%. * 3344143 (3101419) In CVR environment, I/Os to the data volumes in an RVG experience may temporary hang during the SRL overflow with the heavy I/O load. * 3344145 (3076093) The patch upgrade script "installrp" can panic the system while doing a patch upgrade. * 3344148 (3111062) When diffsync is executed, vxrsync gets the following error in lossy networks: VxVM VVR vxrsync ERROR V-5-52-2074 Error opening socket between [HOST1] and [HOST2] -- [Connection timed out] * 3344150 (2992667) When new disks are added to the SAN framework of the Virtual Intelligent System (VIS) appliance and the Fibre Channel (FC) switcher is changed to the direct connection, the "vxdisk list" command does not show the newly added disks even after the "vxdisk scandisks" command is executed. * 3344161 (2882412) The 'vxdisk destroy' command uninitializes a VxVM disk which belongs to a deported disk group. * 3344167 (2979824) The vxdiskadm(1M) utility bug results in the exclusion of the unintended paths. * 3344175 (3114134) The Smart (sync) Autosync feature fails to work and instead replicates the entire volume size for larger sized volumes. * 3344264 (3220929) The Veritas Volume Manager (VxVM) fails to convert a logical volume manager (LVM) volume to Veritas Volume Manager (VxVM) volume. * 3344268 (3091978) The lvm.conf variable preferred_names is set to use DMP even if the dmp_native_support tunable is 'off'. * 3344286 (2933688) When the 'Data corruption protection' check is activated by Dynamic Mult- Pathing (DMP), the device- discovery operation aborts, but the I/O to the affected devices continues, this results in data corruption. * 3347380 (3031796) Snapshot reattach operation fails if any other snapshot of the primary volume is not accessible. * 3349877 (2685230) In a Cluster Volume Replicator (CVR) environment, if the SRL is resized and the logowner is switched to and from the master node to the slave node, then there could be a SRL corruption that leads to the Rlink detach. * 3349917 (2952553) Refresh of a snapshot should not be allowed from a different source volume without force option. * 3349937 (3273435) VxVM disk group creation or import with SCSI-3 Persistent Reservation (PR) fails. * 3349939 (3225660) The Dynamic Reconfiguration (DR) tool does not list thin provisioned LUNs during a LUN removal operation. * 3349985 (3065072) Data loss occurs during the import of a clone disk group, when some of the disks are missing and the import "useclonedev" and "updateid" options are specified. * 3349990 (2054606) During the DMP driver unload operation the system panics. * 3350000 (3323548) In the Cluster Volume Replicator (CVR) environment, a cluster-wide vxconfigd hang occurs on primary when you start the cache object. * 3350019 (2020017) Cluster node panics when mirrored volumes are configured in the cluster. * 3350027 (3239521) When you do the PowerPath pre-check, the Dynamic Reconfiguration (DR) tool displays the following error message: 'Unable to run command [/sbin/powermt display]' and exits. * 3350232 (2993667) Veritas Volume Manager (VxVM) allows setting the Cross-platform Data Sharing (CDS) attribute for a disk group even when a disk is missing, because it experienced I/O errors. * 3350235 (3084449) The shared flag sets during the import of private disk group because a shared disk group fails to clear due to minor number conflict error during the import abort operation. * 3350241 (3067784) The grow and shrink operations by the vxresize(1M) utility may dump core in vfprintf() function. * 3350265 (2898324) UMR errors reported by Purify tool in "vradmind migrate" command. * 3350288 (3120458) In cluster volume replication (CVR) in data change map (DCM) mode, cluster-wide vxconfigd hang is seen when one of the nodes is stopped. * 3350293 (2962010) The replication hangs when the Storage Replicator Log (SRL) is resized. * 3350786 (3060697) The vxrootmir(1M) utility fails with the following error message: VxVM vxdisk ERROR V-5-1-5433 Device sdb: init failed. * 3350787 (2969335) The node that leaves the cluster node while the instant operation is in progress, hangs in the kernel and cannot join back to the cluster node unless it is rebooted. * 3350789 (2938710) The vxassist(1M) command dumps core during the relayout operation . * 3350979 (3261485) The vxcdsconvert(1M) utility failed with the error "Unable to initialize the disk as a CDS disk". * 3350989 (3152274) The dd command to SRDF-R2 (write disable) device hangs, which causes the vm command hangs for a long time. But no issues with the Operating System (OS)devices. * 3351005 (2933476) The vxdisk(1M) command resize fails with a generic error message. Failure messages need to be more informative. * 3351035 (3144781) In the Veritas Volume Replicator (VVR) environment, execution of the vxrlink pause command causes a hang on the secondary node if the rlink disconnect is already in progress. * 3351075 (3271985) In Cluster Volume Replication (CVR), with synchronous replication, aborting a slave node from the Cluster Volume Manager (CVM) cluster makes the slave node panic. * 3351092 (2950624) vradmind fails to operate on the new master when a node leaves the cluster. * 3351125 (2812161) In a Veritas Volume Replicator (VVR) environment, after the Rlink is detached, the vxconfigd(1M) daemon on the secondary host may hang. * 3351922 (2866299) The NEEDSYNC flag set on volumes in a Replicated Volume Group (RVG) not getting cleared after the vxrecover command is run. * 3352027 (3188154) The vxconfigd(1M) daemon does not come up after enabling the native support and rebooting the host. * 3352208 (3049633) In Veritas Volume Replicator (VVR) environment, the VxVM configuration daemon vxconfigd(1M) hangs on secondary node when all disk paths are disabled on secondary node. * 3352226 (2893530) With no VVR configuration, when system is rebooted, it panicked. * 3352282 (3102114) A system crash during the 'vxsnap restore' operation can cause the vxconfigd (1M) daemon to dump core after the system reboots. * 3352963 (2746907) The vxconfigd(1M) daemon can hang under the heavy I/O load on the master node during the reconfiguration. * 3353059 (2959333) The Cross-platform Data Sharing (CDS) flag is not listed for disabled CDS disk groups. * 3353064 (3006245) While executing a snapshot operation on a volume which has 'snappoints' configured, the system panics infrequently. * 3353131 (2874810) When you install DMP only solutions using the installdmp command, the root support is not enabled. * 3353244 (2925746) In the cluster volume manager (CVM) environment, cluster-wide vxconfigd may hang during CVM reconfiguration. * 3353291 (3140835) When the reclaim operation is in progress using the TRIM interface, the path of one of the trim capable disks is disabled and this causes a panic in the system. * 3353953 (2996443) In a cluster volume replication (CVR) environment, log owner name mismatch configuration error is seen on Slave nodes after it brings down the master node. * 3353985 (3088907) A node in a Cluster Volume Manager can panic while destroying a shared disk group. * 3353990 (3178182) During a master take over task, shared disk group re-import operation fails due to false serial split brain (SSB) detection. * 3353995 (3146955) Remote disks (lfailed or lmissing disks) go into the "ONLINE INVALID LFAILED" or "ONLINE INVALID LMISSING" state after the disk loses global disk connectivity. * 3353997 (2845383) The site gets detached if the plex detach operation is performed with the site- consistency set to off. * 3354023 (2869514) In the clustered environment with large Logical unit number (LUN) configuration, the node join process takes long time. * 3354024 (2980955) Disk group (dg) goes into disabled state if vxconfigd(1M) is restarted on new master after master switch. * 3354028 (3136272) The disk group import operation with the "-o noreonline" option takes additional import time. * 3355830 (3122828) Dynamic Reconfiguration (DR) tool lists the disks which are tagged with Logical Volume Manager (LVM), for removal or replacement. * 3355856 (2909668) In case of multiple sets of the cloned disks of the same source disk group, the import operation on the second set of the clone disk fails, if the first set of the clone disks were imported with "updateid". * 3355878 (2735364) The "clone_disk" disk flag attribute is not cleared when a cloned disk group is removed by the "vxdg destroy " command. * 3355883 (3085519) Missing disks are permanently detached from the disk group because -o updateid and tagname options are used to import partial disks. * 3355971 (3289202) Handle KMSG_EPURGE error in CVM disk connectivity protocols. * 3355973 (3003991) The vxdg adddisk command hangs when paths for all the disks in the disk group are disabled. * 3356836 (3125631) Snapshot creation on volume sets may fail with the error: "vxsnap ERROR V-5-1-6433 Component volume has changed". * 3361977 (2236443) Disk group import failure should be made fencing aware, in place of VxVM vxdmp V-5-0-0 i/o error message. * 3361998 (2957555) The vxconfigd(1M) daemon on the CVM master node hangs in the userland during the vxsnap(1M) restore operation. * 3362062 (3010830) During a script-based or web-based installation, the post check verification for VRTSvxvm and VRTSaslapm may fail due to changed user and group permissions of some files after installation. * 3362065 (2861011) The "vxdisk -g resize " command fails with an error for the Cross-platform Data Sharing(CDS) formatted disk. * 3362087 (2916911) The vxconfigd(1M) daemon sends a VOL_DIO_READ request before the device is open. This may result in a scenario where the open operation fails but the disk read or write operations proceeds. * 3362138 (2223258) The vxdisksetup(1M) command initializes the disk which already has Logical Volume Manager (LVM) or File System (FS) on it. * 3362144 (1942051) IO hangs on a master node after disabling the secondary paths from slave node and rebooting the slave node. * 3362923 (3258531) The vxcdsconvert(1M) command fails with error "Plex column/offset is not 0/0 for new vol ". * 3362948 (2599887) The DMP device paths that are marked as "Disabled" cannot be excluded from VxVM control. * 3365295 (3053073) Dynamic Reconfiguration (DR) Tool doesn't pick thin LUNs in "online invalid" state for disk remove operation. * 3365313 (3067452) If new LUNs are added in the cluster, and its naming scheme has the avid set option set to 'no', then DR (Dynamic Reconfiguration) Tool changes the mapping between dmpnode and disk record. * 3365321 (3238397) Dynamic Reconfiguration (DR) Tool's Remove LUNs option does not restart the vxattachd daemon. * 3365390 (3222707) Dynamic Reconfiguration (DR) tool does not permit the removal of disks associated with a deported diskgroup(dg). * 3368953 (3368361) When site consistency is configured within a private disk group and CVM is up, the reattach operation of a detached site fails. * 3384633 (3142315) Disk is misidentified as clone disk with udid_mismatch flag. * 3384636 (3244217) Cannot reset the clone_disk flag during vxdg import. * 3384662 (3127543) Non-labeled disks go into udid_mismatch after vxconfigd restart. * 3384697 (3052879) Auto import of the cloned disk group fails after reboot even when source disk group is not present. * 3384986 (2996142) Data is corrupted or lost if the mapping from disk access (DA) to Data Module (DM) of a disk is incorrect. * 3386843 (3279932) The vxdisksetup and vxdiskunsetup utilities were failing on disk which is part of a deported disk group (DG), even if "-f" option is specified. * 3387847 (3326900) Warnings are observed during execution of the vxunroot command. * 3394692 (3421236) The vxautoconvert/vxvmconvert utility command fails to convert LVM (Logical Volume Manager) to VxVM (Veritas Volume Manager) causing data loss. * 3395095 (3416098) The vxvmconvert utility throws error during execution. * 3395499 (3373142) Updates to vxassist and vxedit man pages for behavioral changes after 6.0. * 3401836 (2790864) For OTHER_DISKS enclosure, the vxdmpadm config reset CLI fails while trying to reset IO Policy value. * 3405318 (3259732) In a CVR environment, rebooting the primary slave followed by connect-disconnect in loop causes rlink to detach. * 3408321 (3408320) Thin reclamation fails for EMC 5875 arrays. * 3409473 (3409612) The value of reclaim_on_delete_start_time cannot be set to values outside the range: 22:00-03:59 * 3413044 (3400504) Upon disabling the host side Host Bus Adapter (HBA) port, extended attributes of some devices are not seen anymore. * 3414151 (3280830) Multiple vxresize operations on a layered volume fail with error message "There are other recovery activities. Cannot grow volume" * 3414265 (2804326) In the Veritas Volume Replicator (VVR) environment, secondary logging is seen in effect even if Storage Replicator Log (SRL) size mismatch is seen across primary and secondary. * 3416320 (3074579) The "vxdmpadm config show" CLI does not display the configuration file name which is present under the root(/) directory. * 3416406 (3099796) The vxevac command fails on volumes having Data Change Object (DCO) log. The error message "volume is not using the specified disk name" is displayed. * 3417081 (3417044) System becomes unresponsive while creating a VVR TCP connection. * 3417672 (3287880) In a clustered environment, if a node doesn't have storage connectivity to clone disks, then the vxconfigd on the node may dump core during the clone disk group import. * 3420074 (3287940) Logical unit number (LUN) from EMC CLARiiON array and having NR (Not Ready) state are shown in the state of online invalid by Veritas Volume Manager (VxVM). * 3423613 (3399131) For Point Patch (PP) enclosure, both DA_TPD and DA_COEXIST_TPD flags are set. * 3423644 (3416622) The hot-relocation feature fails for a corrupted disk in the CVM environment. * 3424795 (3424798) Veritas Volume Manager (VxVM) mirror attach operations (e.g., plex attach, vxassist mirror, and third-mirror break-off snapshot resynchronization) may take longer time under heavy application I/O load. * 3427124 (3435225) In a given CVR setup, rebooting the master node causes one of the slaves to panic. * 3427480 (3163549) vxconfigd(1M) hangs on master node if slave node joins the master having disks which are missing on master. * 3429328 (3433931) The AvxvmconvertA utility fails to get the correct LVM version. * 3434079 (3385753) Replication to the Disaster Recovery (DR) site hangs even though Replication links (Rlinks) are in the connected state. * 3434080 (3415188) I/O hangs during replication in Veritas Volume Replicator (VVR). * 3434189 (3247040) vxdisk scandisks enables the PowerPath (PP) enclosure which was disabled previously. * 3435000 (3162987) The disk has a UDID_MISMATCH flag in the vxdisk list output. * 3435008 (2958983) Memory leak is observed during the reminor operations. * 3461200 (3163970) The "vxsnap -g syncstart " command is unresponsive on the Veritas Volume Replicator (VVR) DR site. * 3358310 (2743870) Continuous I/O errors are seen when retry I/O. * 3358313 (3194358) The continuous messages displayed in the syslog file with EMC not-ready (NR) LUNs. * 3358345 (2091520) The ability to move the configdb placement from one disk to another using "vxdisk set keepmeta=[always|skip|default]" command. * 3358346 (3353211) A. After EMC Symmetrix BCV (Business Continuance Volume) device switches to read-write mode, continuous vxdmp (Veritas Dynamic Multi Pathing) error messages flood syslog. B. DMP metanode/path under DMP metanode gets disabled unexpectedly. * 3358348 (2665425) The vxdisk -px "attribute" list(1M) Command Line Interface (CLI) does not support some basic VxVM attributes * 3358351 (3158320) VxVM (Veritas Volume Manager) command "vxdisk -px REPLICATED list (disk)" displays wrong output. * 3358352 (3326964) VxVM hangs in Clustered Volume Manager (CVM) environments in the presence of FMR operations. * 3358354 (3332796) Getting message: VxVM vxisasm INFO V-5-1-0 seeking block #... while initializing disk that is not ASM disk. * 3358367 (3230148) Clustered Volume Manager (CVM) hangs during split brain testing. * 3358368 (3249264) Veritas Volume Manager (VxVM) thin disk reclamation functionality causes disk label loss, private region corruption and data corruption. * 3358369 (3250369) Execution of vxdisk scandisks command causes endless I/O error messages in syslog. * 3358370 (2921147) udid_mismatch flag is absent on a clone disk when source disk is unavailable. * 3358371 (3125711) When the secondary node is restarted and the reclaim operation is going on the primary node, the system panics. * 3358372 (3156295) When DMP native support is enabled for Oracle Automatic Storage Management (ASM) devices, the permission and ownership of /dev/raw/raw# devices goes wrong after reboot. * 3358373 (3218013) Dynamic Reconfiguration (DR) Tool does not delete the stale OS (Operating System) device handles. * 3358374 (3237503) System hangs after creating space-optimized snapshot with large size cache volume. * 3358377 (3199398) Output of the command "vxdmpadm pgrrereg" depends on the order of DMP node list where the terminal output depends on the last LUN (DMP node) * 3358379 (1783763) In a Veritas Volume Replicator (VVR) environment, the vxconfigd(1M) daemon may hang during a configuration change operation. * 3358380 (2152830) A diskgroup (DG) import fails with a non-descriptive error message when multiple copies (clones) of the same device exist and the original devices are either offline or not available. * 3358381 (2859470) The Symmetrix Remote Data Facility R2 (SRDF-R2) with the Extensible Firmware Interface (EFI) label is not recognized by Veritas Volume Manager (VxVM) and goes in an error state. * 3358382 (3086627) The "vxdisk -o thin, fssize list" command fails with error message V-5-1-16282 * 3358404 (3021970) A secondary node panics due to NULL pointer dereference when the system frees an interlock. * 3358414 (3139983) Failed I/Os from SCSI are retried only on very few paths to a LUN instead of utilizing all the available paths, and may result in DMP sending I/O failures to the application bounded by the recovery option tunable. * 3358416 (3312162) Data corruption may occur on the Secondary Symantec Volume Replicator (VVR) Disaster Recovery (DR) Site. * 3358417 (3325122) In a Clustered Volume Replicator (CVR) environment, when you create stripe-mirror volumes with logtype=dcm, creation may fail. * 3358418 (3283525) The vxconfigd(1M) daemon hangs due to Data Change Object (DCO) corruption after volume resize. * 3358420 (3236773) Multiple error messages of the same format are displayed during setting or getting the failover mode for EMC Asymmetric Logical Unit Access (ALUA) disk array. * 3358423 (3194305) In the Veritas Volume Replicator (VVR) environment, replication status goes in a paused state. * 3358429 (3300418) VxVM volume operations on shared volumes cause unnecessary read I/Os. * 3358433 (3301470) All cluster volume replication (CVR) nodes panic repeatedly due to null pointer dereference in vxio. * 3362234 (2994976) System panics during mirror break-off snapshot creation or plex detach operation in vol_mv_pldet_callback() function. * 3365296 (2824977) The Command Line Interface (CLI) "vxdmpadm setattr enclosure failovermode" which is meant for Asymmetric Logical Unit Access (ALUA) type of arrays fails with an error on certain arrays without providing an appropriate reason for the failure. * 3366670 (2982834) The /etc/vx/bin/vxdmpraw command fails to create a raw device for a full device when enabling Dynamic Multi-Pathing (DMP) support for Automatic Storage Management (ASM) and also does not delete all the raw devices when the DMP support is disabled for ASM. * 3366688 (2957645) When the vxconfigd daemon/command is restarted, the terminal gets flooded with error messages. * 3366703 (3056311) For release < 5.1 SP1, allow disk initialization with CDS format using raw geometry. * 3368236 (3327842) In the Cluster Volume Replication (CVR) environment, with IO load on Primary and replication going on, if the user runs the vradmin resizevol(1M) command on Primary, often these operations terminate with error message "vradmin ERROR Lost connection to host". * 3371753 (3081410) Dynamic Reconfiguration (DR) tool fails to pick up any disk for LUNs removal operation. * 3373213 (3373208) DMP wrongly sends the SCSI PR OUT command with APTPL bit value as A0A to arrays. * 3374166 (3325371) Panic occurs in the vol_multistepsio_read_source() function when snapshots are used. * 3374735 (3423316) The vxconfigd(1M) daemon observes a core dump while executing the vxdisk(1M) scandisks command. * 3375424 (3250450) In the presence of a linked volume, running the vxdisk(1M) command with the -o thin, fssize list option causes the system to panic. * 3375575 (3403172) The EMC Symmetrix Not-Ready (NR) device paths gets disabled or enabled unexpectedly. * 3377209 (3377383) The vxconfigd crashes when a disk under Dynamic Multi-pathing (DMP) reports device failure. * 3381922 (3235350) I/O on grown region of a volume leads to system panic if the volume has instant snapshot. * 3387376 (2945658) If the Disk label is modified for an Active/Passive LUN, then the current passive paths don't reflect this modification after a failover. * 3387405 (3019684) I/O hang is observed when SRL is about to overflow after the logowner switches from slave to master. * 3387417 (3107741) The vxrvg snapdestroy command fails with the "Transaction aborted waiting for io drain" error message. * 2892702 (2567618) The VRTSexplorer dumps core in vxcheckhbaapi/print_target_map_entry. * 3090670 (3090667) The system panics or hangs while executing the "vxdisk -o thin, fssize list" command as part of Veritas Operations Manager (VOM) Storage Foundation (SF) discovery. * 3140411 (2959325) The vxconfigd(1M) daemon dumps core while performing the disk group move operation. * 3150893 (3119102) Support LDOM Live Migration with fencing enabled. * 3156719 (2857044) System crashes while resizing a volume with Data Change Object (DCO)version 30. * 3159096 (3146715) Rlinks do not connect with Network Address Translation (NAT) configurations on Little Endian Architecture. * 3209160 (2750782) The Veritas Volume Manager (VxVM) upgrade process fails because it incorrectly assumes that the root disk is encapsulated. * 3210759 (3177758) Performance degradation occurs after upgrade from SF 5.1SP1RP3 to SF 6.0.1 on Linux. * 3254132 (3186971) DMP native support function doesn't set the logical volume manager (LVM) configuration file correctly after turning on DMP native support. As a result, the system is unbootable. * 3254227 (3182350) If there are more than 8192 paths in the system, the vxassist(1M) command hangs when you create a new VxVM volume or increase the existing volume's size. * 3254229 (3063378) Some VxVM commands run slowly when EMC PowerPath presents and manages "read only" devices such as EMC SRDF-WD or BCV-NR. * 3254427 (3182175) The vxdisk -o thin, fssize list command can report incorrect File System usage data. * 3280555 (2959733) Handling the device path reconfiguration in case the device paths are moved across LUNs or enclosures to prevent the vxconfigd(1M) daemon coredump. * 3283644 (2945658) If the Disk label is modified for an Active/Passive LUN, then the current passive paths don't reflect this modification after a failover. * 3283668 (3250096) /dev/raw/raw# devices vanish on reboot when selinux enabled. * 3294641 (3107741) The vxrvg snapdestroy command fails with the "Transaction aborted waiting for io drain" error message. * 3294642 (3019684) I/O hang is observed when SRL is about to overflow after the logowner switches from slave to master. * 2853712 (2815517) vxdg adddisk allows mixing of clone & non-clone disks in a DiskGroup. * 2863672 (2834046) NFS migration failed due to device re-minoring. * 2892590 (2779580) Secondary node gives configuration error (no Primary RVG) after reboot of master node on Primary site. * 2892682 (2837717) "vxdisk(1M) resize" command fails if 'da name' is specified. * 2892684 (1859018) "link detached from volume" warnings are displayed when a linked-breakoff snapshot is created * 2892698 (2851085) DMP doesn't detect implicit LUN ownership changes for some of the dmpnodes * 2892716 (2753954) When a cable is disconnected from one port of a dual-port FC HBA, the paths via another port are marked as SUSPECT PATH. * 2940447 (2940446) A full file system check (fsck) hangs on I/O in Veritas Volume Manager (VxVM) when the cache object size is very large. * 2941193 (1982965) The vxdg(1M) command import operation fails if the disk access) name is based on the naming scheme which is different from the prevailing naming scheme on the host. * 2941226 (2915063) During the detachment of a plex of a volume in the Cluster Volume Manager (CVM) environment, the system panics. * 2941234 (2899173) The vxconfigd(1M) daemon hangs after executing the "vradmin stoprep" command. * 2941237 (2919318) The I/O fencing key value of data disk are different and abnormal in a VCS cluster with I/O fencing. * 2941252 (1973983) The vxunreloc(1M) command fails when the Data Change Object (DCO) plex is in DISABLED state. * 2942259 (2839059) vxconfigd logged warning "cannot open /dev/vx/rdmp/cciss/c0d device to check for ASM disk format". * 2942336 (1765916) VxVM socket files don't have proper write protection * 2944708 (1725593) The 'vxdmpadm listctlr' command has to be enhanced to print the count of device paths seen through the controller. * 2944710 (2744004) vxconfigd is hung on the VVR (Veritas Volume Replicator) secondary node during VVR configuration. * 2944714 (2833498) vxconfigd hangs while reclaim operation is in progress on volumes having instant snapshots * 2944717 (2851403) System panic is seen while unloading "vxio" module. This happens whenever VxVM uses SmartMove feature and the "vxportal" module gets reloaded (for e.g. during VxFS package upgrade) * 2944722 (2869594) Master node panics due to corruption if space optimized snapshots are refreshed and 'vxclustadm setmaster' is used to select master. * 2944724 (2892983) vxvol dumps core if new links are added while the operation is in progress. * 2944725 (2910043) Avoid order 8 allocation by the vxconfigd(1M) daemon while the node is reconfigured. * 2944727 (2919720) The vxconfigd(1M) command dumps core in the rec_lock1_5() function. * 2944729 (2933138) panic in voldco_update_itemq_chunk() due to accessing invalid buffer * 2944741 (2866059) The error messages displayed during the resize operation by using the vxdisk (1M) command needs to be enhanced. * 2962257 (2898547) The 'vradmind' process dumps core on the Veritas Volume Replicator (VVR) secondary site in a Clustered Volume Replicator (CVR) environment, when Logowner Service Group on VVR Primary Site is shuffled across its CVM (Clustered Volume Manager) nodes. * 2965542 (2928764) SCSI3 PGR registrations fail when dmp_fast_recovery is disabled. * 2973659 (2943637) DMP IO statistic thread may cause out of memory issue so that OOM(Out Of Memory) killer is invoked and causes system panic. * 2974870 (2935771) In a Veritas Volume Replicator (VVR) environment, the 'rlinks' disconnect after switching the master node. * 2976946 (2919714) On a thin Logical Unit Number (LUN), the vxevac(1M) command returns 0 without migrating the unmounted-VxFS volumes. * 2978189 (2948172) Executing "vxdisk -o thin, fssize list" command can result in panic. * 2979767 (2798673) System panics in voldco_alloc_layout() while creating volume with instant DCO * 2983679 (2970368) Enhance handling of SRDF-R2 Write-Disabled devices in DMP. * 2988017 (2971746) For single-path device, bdget() function is being called for each I/O, which cause high cpu usage and leads to I/O performance degradation. * 2988018 (2964169) In multiple CPUs environment, I/O performance degradation is seen when I/O is done through VxFS and VxVM specific private interface. * 3004823 (2692012) When moving the subdisks by using the vxassist(1M) command or the vxevac(1M) command, if the disk tags are not the same for the source and the destination, the command fails with a generic error message. * 3004852 (2886333) The vxdg(1M) join operation should not allow mixing of clone and non-clone disks in a disk group. * 3005921 (1901838) After installation of a license key that enables multi-pathing, the state of the controller is shown as DISABLED in the command- line-interface (CLI) output for the vxdmpadm(1M) command. * 3006262 (2715129) Vxconfigd hangs during Master takeover in a CVM (Clustered Volume Manager) environment. * 3011391 (2965910) Volume creation with vxassist using "-o ordered alloc=" dumps core. * 3011444 (2398416) vxassist dumps core while creating volume after adding attribute "wantmirror=ctlr" in default vxassist rulefile * 3020087 (2619600) Live migration of virtual machine having SFHA/SFCFSHA stack with data disks fencing enabled, causes service groups configured on virtual machine to fault. * 3025973 (3002770) While issuing a SCSI inquiry command, NULL pointer dereference in DMP causes system panic. * 3026288 (2962262) Uninstall of dmp fails in presence of other multipathing solutions * 3027482 (2273190) Incorrect setting of UNDISCOVERED flag can lead to database inconsistency * 2860207 (2859470) The Symmetrix Remote Data Facility R2 (SRDF-R2) with the Extensible Firmware Interface (EFI) label is not recognized by Veritas Volume Manager (VxVM) and goes in an error state. * 2876865 (2510928) The extended attributes reported by "vxdisk -e list" for the EMC SRDF luns are reported as "tdev mirror", instead of "tdev srdf-r1". * 2892499 (2149922) Record the diskgroup import and deport events in syslog * 2892621 (1903700) Removing mirror using vxassist does not work. * 2892643 (2801962) Growing a volume takes significantly large time when the volume has version 20 DCO attached to it. * 2892650 (2826125) VxVM script daemon is terminated abnormally on its invocation. * 2892660 (2000585) vxrecover doesn't start remaining volumes if one of the volumes is removed during vxrecover command run. * 2892689 (2836798) In VxVM, resizing simple EFI disk fails and causes system panic/hang. * 2922798 (2878876) The vxconfigd daemon dumps core in vol_cbr_dolog() due to race between two threads processing requests from the same client. * 2924117 (2911040) The restore operation from a cascaded snapshot leaves the volume in unusable state if any cascaded snapshot is in the detached state. * 2924188 (2858853) After master switch, vxconfigd dumps core on old master. * 2924207 (2886402) When re-configuring devices, vxconfigd hang is observed. * 2933468 (2916094) Enhancements have been made to the Dynamic Reconfiguration Tool(DR Tool) to create a separate log file every time DR Tool is started, display a message if a command takes longer time, and not to list the devices controlled by TPD (Third Party Driver) in 'Remove Luns' option of DR Tool. * 2933469 (2919627) Dynamic Reconfiguration tool should be enhanced to remove LUNs feasibly in bulk. * 2934259 (2930569) The LUNs in 'error' state in output of 'vxdisk list' cannot be removed through DR(Dynamic Reconfiguration) Tool. * 2942166 (2942609) Message displayed when user quits from Dynamic Reconfiguration Operations is shown as error message. * 3952992 (3951938) Retpoline support for ASLAPM rpm on RHEL6.10 and RHEL6.x retpoline kernels Patch ID: VRTSgms 6.0.500.200 * 3957182 (3957857) GMS support for RHEL6.10. * 3952376 (3952375) GMS support for RHEL6.x retpoline kernels Patch ID: VRTSglm 6.0.500.600 * 3957181 (3957855) GLM support for RHEL6.10. * 3952369 (3952368) GLM support for RHEL6.x retpoline kernels * 3845273 (2850818) GLM thread got panic with null pointer de-reference. * 3364311 (3364309) Internal stress test on the cluster file system, hit a debug assert in the Group Lock Manager (GLM). Patch ID: VRTSodm 6.0.500.300 * 3957180 (3957853) ODM support for RHEL6.10. * 3952366 (3952365) ODM support for RHEL6.x retpoline kernels * 3559203 (3529371) The package verification of VRTSodm on Linux fails. * 3322294 (3323866) Some ODM operations may fail with "ODM ERROR V-41-4-1-328-22 Invalid argument" * 3369038 (3349649) The Oracle Disk Manager (ODM) module fails to load on RHEL6.5 * 3384781 (3384775) Installing patch 6.0.3.200 on RHEL 6.4 or earlier RHEL 6.* versions fails with ERROR: No appropriate modules found. * 3349650 (3349649) ODM modules fail to load on RHEL6.5 DETAILS OF INCIDENTS FIXED BY THE PATCH --------------------------------------- This patch fixes the following incidents: Patch ID: VRTSvxfs 6.0.500.600 * 3957178 (Tracking ID: 3957852) SYMPTOM: VxFS support for RHEL6.10. DESCRIPTION: Since RHEL6.10 is new release and it has retpoline kernel therefore adding VxFS support for it. RESOLUTION: Added VxFS support for RHEL6.10. * 3952360 (Tracking ID: 3952357) SYMPTOM: VxFS support for RHEL6.x retpoline kernels DESCRIPTION: Redhat released retpoline kernel for older RHEL6.x releases. The VxFS module should recompile with retpoline aware GCC to support retpoline kernel. RESOLUTION: Compiled VxFS with retpoline GCC. * 2927359 (Tracking ID: 2927357) SYMPTOM: Assert hit in internal testing. DESCRIPTION: Got the assert in internal testing when attribute inode being purged is marked bad or the file system is disabled RESOLUTION: Code is modified to return EIO in such cases. * 2933300 (Tracking ID: 2933297) SYMPTOM: Compression support for the dedup ioctl. DESCRIPTION: Compression support for the dedup ioctl for NBU. RESOLUTION: Added limited support for compressed extents to the dedup ioctl for NBU. * 2972674 (Tracking ID: 2244932) SYMPTOM: Deadlock is seen when trying to reuse the inode number in case of checkpoint. DESCRIPTION: In the presence of a checkpoint, there is a possibility that inode number is assigned in such a way that parent->child can become child->parent in different fset. This is leading to a deadlock scenario. So, possible fix is that in presence of checkpoint , when its mounted, dont try to take a blocking lock on the second inode(as we do it in rename vnode operation ) , as this inode may have been reused in the clone fset and lock has been already been taken, as even if there is no push set for the inode, rwlock goes on whole clone chain, even if chain inodes are unrelated. In the context of rename vnode operation , when there is a need for creation of a hidden hash directory, parent directory need to be exclusively rwlocked(right now, its a blocking lock) , this may also lead to a scenario where two renames are going simultaneously and same parent inode involved can cause deadlock. RESOLUTION: Code is modified to avoid this deadlock. * 3040130 (Tracking ID: 3137886) SYMPTOM: Thin Provisioning Logging does not work for reclaim operations triggered via fsadm command. DESCRIPTION: Thin Provisioning Logging does not work for reclaim operations triggered via fsadm command. RESOLUTION: Code is added to log reclamation issued by fsadm command; to create backup log file once size of reclaim log file exceeds 1MB and for saving command string of fsadm command. * 3248031 (Tracking ID: 2628207) SYMPTOM: A full fsck operation on a file system with a large number of Access Control List (ACL) settings and checkpoints takes a long time (in some cases, more than a week) to complete. DESCRIPTION: The fsck operation is mainly blocked in pass1d phase. The process of pass1d is as follows: 1. For each fileset, pass1d goes through all the inodes in the ilist of the fileset and then retrieves the inode information from the disk. 2. It extracts the corresponding attribute inode number. 3. Then, it reads the attribute inode and counts the number of rules inside it. 4. Finally, it reads the rules into the buffer and performs the checking. The attribute rules data can be parsed anywhere on the file system and the checkpoint link may be followed to locate the real inodes, which consumes a significant amount of time. RESOLUTION: The code is modified to: 1. Add a read ahead mechanism for the attribute ilist which boosts the read for attribute inodes via buffering. 2. Add a bitmap to record those attribute inodes which have been already checked to avoid redundant checks. 3. Add an option to the fsck operation which enables it to check a specific fileset separately rather than checking the entire file system. * 3682640 (Tracking ID: 3637636) SYMPTOM: Cluster File System (CFS) node initialization and protocol upgrade may hang during rolling upgrade with the following stack trace: vx_svar_sleep_unlock() vx_event_wait() vx_async_waitmsg() vx_msg_broadcast() vx_msg_send_join_version() vx_msg_send_join() vx_msg_gab_register() vx_cfs_init() vx_cfs_reg_fsckd() vx_cfsaioctl() vxportalunlockedkioctl() vxportalunlockedioctl() And vx_delay() vx_recv_protocol_upgrade_intent_msg() vx_recv_protocol_upgrade() vx_ctl_process_thread() vx_kthread_init() DESCRIPTION: CFS node initialization waits for the protocol upgrade to complete. Protocol upgrade waits for the flag related to the CFS initialization to be cleared. As the result, the deadlock occurs. RESOLUTION: The code is modified so that the protocol upgrade process does not wait to clear the CFS initialization flag. * 3690056 (Tracking ID: 3689354) SYMPTOM: Users having write permission on file cannot open the file with O_TRUNC if the file has setuid or setgid bit set. DESCRIPTION: On Linux, kernel triggers an explicit mode change as part of O_TRUNC processing to clear setuid/setgid bit. Only the file owner or a privileged user is allowed to do a mode change operation. Hence for a non-privileged user who is not the file owner, the mode change operation fails making open() system call to return EPERM. RESOLUTION: Mode change request to clear setuid/setgid bit coming as part of O_TRUNC processing is allowed for other users. * 3726112 (Tracking ID: 3704478) SYMPTOM: The library routine to get the mount point, fails to return mount point of root file system. DESCRIPTION: For getting the mount point of a path, we scan the input path name and get the nearest path which represents a mount point. But, when the file is in the root file system ("/"), this function returns an error code 1 and hence does not return the mount point of that file. This is because of a bug in the logic of path name parsing, which neglects the root mount point while parsing. RESOLUTION: Fix the logic in path name parsing so that the mount point of root file system is returned. * 3796626 (Tracking ID: 3762125) SYMPTOM: Directory size sometimes keeps increasing even though the number of files inside it doesn't increase. DESCRIPTION: This only happens to CFS. A variable in the directory inode structure marks the start of directory free space. But when the directory ownership changes, the variable may become stale, which could cause this issue. RESOLUTION: The code is modified to reset this free space marking variable when there's ownershipchange. Now the space search goes from beginning of the directory inode. * 3796630 (Tracking ID: 3736398) SYMPTOM: Panic occurs in the lazy unmount path during deinit of VxFS-VxVM API. DESCRIPTION: The panic occurs when an exiting thread drops the last reference to a lazy-unmounted VxFS file system which is the last VxFS mount on the system. The exiting thread does unmount, which then makes call into VxVM to de-initialize the private FS-VM API as it is the last VxFS mounted file system. The function to be called in VxVM is looked-up via the files under /proc. This requires a file to be opened, but the exit processing has removed the structures needed by the thread to open a file, because of which a panic is observed RESOLUTION: The code is modified to pass the deinit work to worker thread. * 3796633 (Tracking ID: 3729158) SYMPTOM: The fuser and other commands hang on VxFS file systems. DESCRIPTION: The hang is seen while 2 threads contest for 2 locks -ILOCK and PLOCK. The writeadvise thread owns the ILOCK but is waiting for the PLOCK, while the dalloc thread owns the PLOCK and is waiting for the ILOCK. RESOLUTION: The code is modified to correct the order of locking. Now PLOCK is followed by ILOCK. * 3796637 (Tracking ID: 3574404) SYMPTOM: System panics because of a stack overflow during rename operation. DESCRIPTION: The stack is overflown by 88 bytes in the rename code path. The thread_info structure is disrupted with VxFS page buffer head addresses. RESOLUTION: We now use dynamic allocation of local structures. This saves 256 bytes and gives enough room. * 3796644 (Tracking ID: 3269553) SYMPTOM: VxFS returns inappropriate message for read of hole via ODM. DESCRIPTION: Sometimes sparse files containing temp or backup/restore files are created outside the Oracle database. And, Oracle can read these files only using the ODM. As a result, ODM fails with an ENOTSUP error. RESOLUTION: The code is modified to return zeros instead of an error. * 3796652 (Tracking ID: 3686438) SYMPTOM: System panicked with NMI during file system transaction flushing. DESCRIPTION: In vx_iflush_list we take the icachelock to traverse the icache list and flush the dirty inodes on the icache list to the disk. In that context, when we do a vx_iunlock we may sleep while flushing and holding the icachelock which is a spinlock. The other processors which are busy waiting for the same icache lock spinlock have the interrupts disabled, and this results in the NMI panic. RESOLUTION: In vx_iflush_list, use VX_iUNLOCK_NOFLUSH instead of vx_iunlock which avoids flushing and sleeping holding the spinlock. * 3796664 (Tracking ID: 3444154) SYMPTOM: Reading from a de-duped file-system over NFS can result in data corruption seen on the NFS client. DESCRIPTION: This is not corruption on the file-system, but comes from a combination of VxFS's shared page cache and Linux's TCP stack. RESOLUTION: To avoid the corruption, a temporary page is used to avoid sending the same physical page back-to-back down the TCP stack. * 3796671 (Tracking ID: 3596378) SYMPTOM: The copy of a large number of small files is slower on Veritas File System (VxFS) compared to EXT4. DESCRIPTION: VxFS implements the fsetxattr() system call in a synchronized way. Hence, before returning to the system call, the VxFS will take some time to flush the data to the disk. In this way, the VxFS guarantees the file system consistency in case of file system crash. However, this implementation has a side-effect that it serializes the whole processing, which takes more time. RESOLUTION: The code is modified to change the transaction to flush the data in a delayed way. * 3796676 (Tracking ID: 3615850) SYMPTOM: The write system call writes up to count bytes from the pointed buffer to the file referred to by the file descriptor field: ssize_t write(int fd, const void *buf, size_t count); When the count parameter is invalid, sometimes it can cause the write() to hang on VxFS file system. E.g. with a 10000 bytes buffer, but the count is set to 30000 by mistake, then you may encounter such problem. DESCRIPTION: On recent linux kernels, you cannot take a page-fault while holding a page locked so as to avoid a deadlock. This means uiomove can copy less than requested, and any partially populated pages created in routine which establish a virtual mapping for the page are destroyed. This can cause an infinite loop in the write code path when the given user-buffer is not aligned with a page boundary and the length given to write() causes an EFAULT; uiomove() does a partial copy, segmap_release destroys the partially populated pages and unwinds the uio. The operation is then repeated. RESOLUTION: The code is modified to move the pre-faulting to the buffered IO write-loops; The system either shortens the length of the copy if all of the requested pages cannot be faulted, or fails with EFAULT if no pages are pre-faulted. This prevents the infinite loop. * 3796684 (Tracking ID: 3601198) SYMPTOM: Replication copies the 64-bit external quota files ('quotas.64' and 'quotas.grp.64') to the destination file system. DESCRIPTION: The external quota files hold the quota limits for users and groups. While replicating the file system, the vfradmin (1M) command copies these external quota files to the destination file system. But, the quota limits of source FS may not be applicable for destination FS. Hence, we should ideally skip the external quota files from being replicated. RESOLUTION: Exclude the 64-bit external quota files in the replication process. * 3796687 (Tracking ID: 3604071) SYMPTOM: With the thin reclaim feature turned on, you can observe high CPU usage on the vxfs thread process. DESCRIPTION: In the routine to get the broadcast information of a node which contains maps of Allocation Units (AUs) for which node holds the delegations, the locking mechanism is inefficient. Thus every time when this routine is called, it will perform a series of down-up operation on a certain semaphore. This can result in a huge CPU cost when many threads calling the routine in parallel. RESOLUTION: The code is modified to optimize the locking mechanism in the routine to get the broadcast information of a node which contains maps of Allocation Units (AUs) for which node holds the delegations, so that it only does down-up operation on the semaphore once. * 3796727 (Tracking ID: 3617191) SYMPTOM: Checkpoint creation may take hours. DESCRIPTION: During checkpoint creation, with an inode marked for removal and being overlaid, there may be a downstream clone and VxFS starts pulling all the data. With Oracle it's evident because of temporary files deletion during checkpoint creation. RESOLUTION: The code is modified to selectively pull the data, only if a downstream push inode exists for file. * 3796731 (Tracking ID: 3558087) SYMPTOM: When stat system call is executed on VxFS File System with delayed allocation feature enabled, it may take long time or it may cause high cpu consumption. DESCRIPTION: When delayed allocation (dalloc) feature is turned on, the flushing process takes much time. The process keeps the get page lock held, and needs writers to keep the inode reader writer lock held. Stat system call may keeps waiting for inode reader writer lock. RESOLUTION: Delayed allocation code is redesigned to keep the get page lock unlocked while flushing. * 3796733 (Tracking ID: 3695367) SYMPTOM: Unable to remove volume from multi-volume VxFS using "fsvoladm" command. It fails with "Invalid argument" error. DESCRIPTION: Volumes are not being added in the in-core volume list structure correctly. Therefore while removing volume from multi-volume VxFS using "fsvoladm", command fails. RESOLUTION: The code is modified to add volumes in the in-core volume list structure correctly. * 3796745 (Tracking ID: 3667824) SYMPTOM: Race between vx_dalloc_flush() and other threads turning off dalloc and reusing the inode results in a kernel panic. DESCRIPTION: Dalloc flushing can race with dalloc being disabled on the inode. In vx_dalloc_flush we drop the dalloc lock, the inode is no longer held, so the inode can be reused while the dalloc flushing is still in progress. RESOLUTION: Resolve the race by taking a hold on the inode before doing the actual dalloc flushing. The hold prevents the inode from getting reused and hence prevents the kernel panic. * 3799999 (Tracking ID: 3602322) SYMPTOM: System may panic while flushing the dirty pages of the inode. DESCRIPTION: Panic may occur due to the synchronization problem between one thread that flushes the inode, and the other thread that frees the chunks that contain the inodes on the freelist. The thread that frees the chunks of inodes on the freelist grabs an inode, and clears/de-reference the inode pointer while deinitializing the inode. This may result in the pointer de-reference, if the flusher thread is working on the same inode. RESOLUTION: The code is modified to resolve the race condition by taking proper locks on the inode and freelist, whenever a pointer in the inode is de-referenced. If the inode pointer is already de-initialized to NULL, then the flushing is attempted on the next inode. * 3821416 (Tracking ID: 3817734) SYMPTOM: If file system with full fsck flag set is mounted, direct command message is printed to the user to clean the file system with full fsck. DESCRIPTION: When mounting file system with full fsck flag set, mount will fail and a message will be printed to clean the file system with full fsck. This message contains direct command to run, which if run without collecting file system metasave will result in evidences being lost. Also since fsck will remove the file system inconsistencies it may lead to undesired data being lost. RESOLUTION: More generic message is given in error message instead of direct command. * 3843470 (Tracking ID: 3867131) SYMPTOM: Kernel panic in internal testing. DESCRIPTION: In internal testing the vdl_fsnotify_sb is found NULL because we are not allocating and initializing it in initialization routine i.e vx_fill_super(). The vdl_fsnotify_sb would be initialized in vx_fill_super() only when the kernel's fsnotify feature is available. But fsnotify feature is not available in RHEL5/SLES10 kernel. RESOLUTION: code is added to check if fsnotify feature is available in the running kernel. * 3843734 (Tracking ID: 3812914) SYMPTOM: On RHEL 6.5 and RHEL 6.4 latest kernel patch, umount(8) system call hangs if an application watches for inode events using inotify(7) APIs. DESCRIPTION: On RHEL 6.5 and RHEL 6.4 latest kernel patch, additional OS counters were added in the super block to track inotify Watches. These new counters were not implemented in VxFS for RHEL6.5/RHEL6.4 kernel. Hence, while doing umount, the operation hangs until the counter in the superblock drops to zero, which would never happen since they are not handled in VxFS. RESOLUTION: The code is modified to handle additional counters added in super block of RHEL6.5/RHEL6.4 latest kernel. * 3848508 (Tracking ID: 3867128) SYMPTOM: Assert failed in internal native AIO testing. DESCRIPTION: On RHEL5/SLES10, in NAIO, the iovec comes from the kernel stack. So when handed-off the work item to the worker thread, then the work item points to an iovec structure in a stack-frame which no longer exists. So, the iovecs memory can be corrupted when it is used for a new stack-frame. RESOLUTION: Code is modified to allocate the iovec dynamically in naio hand-off code and copy it into the work item before doing handoff. * 3851967 (Tracking ID: 3852324) SYMPTOM: Assert failure during internal stress testing. DESCRIPTION: While reading the partition directory, offset is right shifted by 8 but while retrying, offset wasn't left shifted back to original value. This can lead to offset as 0 which results in assert failure. RESOLUTION: Code is modified to left shift offset by 8 before retrying. * 3852297 (Tracking ID: 3553328) SYMPTOM: During internal testing it was found that per node LCT file was corrupted, due to which attribute inode reference counts were mismatching, resulting in fsck failure. DESCRIPTION: During clone creation LCT from 0th pindex is copied to the new clone's LCT. Any update to this LCT file from non-zeroth pindex can cause count mismatch in the new fileset. RESOLUTION: The code is modified to handle this issue. * 3852512 (Tracking ID: 3846521) SYMPTOM: cp -p is failing with EINVAL for files with 10 digit modification time. EINVAL error is returned if the value in tv_nsec field is greater than/outside the range of 0 to 999, 999, 999. VxFS supports the update in usec but when copying in the user space, we convert the usec to nsec. So here in this case, usec has crossed the upper boundary limit i.e 999, 999. DESCRIPTION: In a cluster, its possible that time across nodes might differ.so when updating mtime, vxfs check if it's cluster inode and if nodes mtime is newer time than current node time, then accordingly increment the tv_usec instead of changing mtime to older time value. There might be chance that it, tv_usec counter got overflowed here, which resulted in 10 digit mtime.tv_nsec. RESOLUTION: Code is modified to reset usec counter for mtime/atime/ctime when upper boundary limit i.e. 999999 is reached. * 3861518 (Tracking ID: 3549057) SYMPTOM: The "relatime" mount option wrongly shown in /proc/mounts. DESCRIPTION: The "relatime" mount option wrongly shown in /proc/mounts. VxFS does not understand relatime mount option. It comes from Linux kernel. RESOLUTION: Code is modified to handle the issue. * 3862350 (Tracking ID: 3853338) SYMPTOM: Files on VxFS are corrupted while running the sequential write workload under high memory pressure. DESCRIPTION: VxFS may miss out writes sometimes under excessive write workload. Corruption occurs because of the race between the writer thread which is doing sequential asynchronous writes and the flusher thread which flushes the in-core dirty pages. Due to an overlapping write, they are serialized over a page lock. Because of an optimization, this lock is released, leading to a small window where the waiting thread could race. RESOLUTION: The code is modified to fix the race by reloading the inode write size after taking the page lock. * 3862425 (Tracking ID: 3859032) SYMPTOM: System panics in vx_tflush_map() due to NULL pointer dereference. DESCRIPTION: When converting VxFS using vxconvert, new blocks are allocated to the structural files like smap etc which can contain garbage. This is done with the expectation that fsck will rebuild the correct smap. but in fsck, we have missed to distinguish between EAU fully EXPANDED and ALLOCATED. because of which, if allocation to the file which has the last allocation from such affected EAU is done, it will create the sub transaction on EAU which are in allocated state. Map buffers of such EAUs are not initialized properly in VxFS private buffer cache, as a result, these buffers will be released back as stale during the transaction commit. Later, if any file-system wide sync tries to flush the metadata, it can refer to these buffer pointers and panic as these buffers are already released and reused. RESOLUTION: Code is modified in fsck to correctly set the state of EAU on disk. Also, modified the involved code paths as to avoid using doing transactions on unexpanded EAUs. * 3862435 (Tracking ID: 3833816) SYMPTOM: In a CFS cluster, one node returns stale data. DESCRIPTION: In a 2-node CFS cluster, when node 1 opens the file and writes to it, the locks are used with CFS_MASTERLESS flag set. But when node 2 tries to open the file and write to it, the locks on node 1 are normalized as part of HLOCK revoke. But after the Hlock revoke on node 1, when node 2 takes the PG Lock grant to write, there is no PG lock revoke on node 1, so the dirty pages on node 1 are not flushed and invalidated. The problem results in reads returning stale data on node 1. RESOLUTION: The code is modified to cache the PG lock before normalizing it in vx_hlock_putdata, so that after the normalizing, the cache grant is still with node 1.When node 2 requests PG lock, there is a revoke on node 1 which flushes and invalidates the pages. * 3864139 (Tracking ID: 3869174) SYMPTOM: Write system call might get into deadlock on rhel5 and sles10. DESCRIPTION: Issue exists due to page fault handling when holding the page lock. On rhel5 and sles10 when we go for write we may hold page locks and now if page fault happens, page fault handler will be waiting on the lock which we have already held resulting in deadlock. RESOLUTION: This behavior has been taken care of. Now we prefault so that deadlock can be skipped. * 3864751 (Tracking ID: 3867147) SYMPTOM: Assert failed in internal dedup testing. DESCRIPTION: If dedup is run on the same file twice simultaneously and extent split is happened, then the second extent split of the same file can cause assert failure. It is due to stale extent information being used for second split after first one. RESOLUTION: Code is modified to handle lookup for new bmap if same inode detected. * 3866970 (Tracking ID: 3866962) SYMPTOM: Data corruption seen when dalloc writes are going on the file and simultaneously fsync started on the same file. DESCRIPTION: In case if dalloc writes are going on the file and simultaneously synchronous flushing is started on the same file, then synchronous flushing will try to flush all the dirty pages of the file without considering underneath allocation. In this case, flushing can happen on the unallocated blocks and this can result into data loss. RESOLUTION: Code is modified to flush data till actual allocation in case of dalloc writes. * 3622423 (Tracking ID: 3250239) SYMPTOM: Panic in vx_naio_worker, trace back:- crash_kexec() __die () do_page_fault() error_exit() vx_naio_worker() vx_kthread_init() kernel_thread() DESCRIPTION: If a thread submitting linux native AIO has posix sibling threads, and it simply exits after the submit, then kernel will not wait for the aio to finish in the exit processing. When aio complete, it may dereference the task_struct of that exited thread, this cause the panic. RESOLUTION: Install vxfs hook to task_struct, so when the thread exits, it will wait for its aio to complete no matter it's cloned or not. Set module load-time tunable vx_naio_wait to non-zero value to turn on this fix. * 3673958 (Tracking ID: 3660422) SYMPTOM: On RHEL 6.6, umount(8) system call hangs if an application is watching for inode events using inotify(7) APIs. DESCRIPTION: On RHEL 6.6, additional counters were added in the super block to track inotify watches, these new counters were not implemented in VxFS. Hence while doing umount, the operation hangs until the counter in the superblock drops to zero, which would never happen since they are not handled in VXFS. RESOLUTION: Code is modified to handle additional counters added in RHEL6.6. * 3678626 (Tracking ID: 3613048) SYMPTOM: System can panic with the following stack:: machine_kexec crash_kexec oops_end die do_invalid_op invalid_op aio_complete vx_naio_worker vx_kthread_init DESCRIPTION: VxFS does not correctly support IOCB_CMD_PREADV and IOCB_CMD_PREADV, which causes a BUG to fire in the kernel code (in fs/aio.c:__aio_put_req()). RESOLUTION: Add support for the vectored AIO commands and fixed the increment of ->ki_users so it is guarded by the required spinlock. * 3409692 (Tracking ID: 3402618) SYMPTOM: The mmap read performance on VxFS is slow. DESCRIPTION: The mmap read performance on VxFS is not good, because the read ahead operation is not triggered while the mmap reads is executed. RESOLUTION: An enhancement has been made to the read ahead operation. It helps improve the mmap read performance. * 3426009 (Tracking ID: 3412667) SYMPTOM: On RHEL 6, the inode update operation may create deep stack and cause system panic due to stack overflow. Below is the stack trace: dequeue_entity() dequeue_task_fair() dequeue_task() deactivate_task() thread_return() io_schedule() get_request_wait() blk_queue_bio() generic_make_request() submit_bio() vx_dev_strategy() vx_bc_bwrite() vx_bc_do_bawrite() vx_bc_bawrite() vx_bwrite() vx_async_iupdat() vx_iupdat_local() vx_iupdat_clustblks() vx_iupdat_local() vx_iupdat() vx_iupdat_tran() vx_tflush_inode() vx_fsq_flush() vx_tranflush() vx_traninit() vx_get_alloc() vx_tran_get_alloc() vx_alloc_getpage() vx_do_getpage() vx_internal_alloc() vx_write_alloc() vx_write1() vx_write_common_slow() vx_write_common() vx_vop_write() vx_writev() vx_naio_write_v2() do_sync_readv_writev() do_readv_writev() vfs_writev() nfsd_vfs_write() nfsd_write() nfsd3_proc_write() nfsd_dispatch() svc_process_common() svc_process() nfsd() kthread() kernel_thread() DESCRIPTION: Some VxFS operation may need inode update. This may create very deep stack and cause system panic due to stack overflow. RESOLUTION: The code is modified to add a handoff point in the inode update function. If the stack usage reaches a threshold, it will start a separate thread to do the work to limit stack usage. * 3469683 (Tracking ID: 3469681) SYMPTOM: Free space defragmentation results into EBUSY error and file system is disabled. DESCRIPTION: While remounting the file system, the re-initialization gives EBUSY error if the in-core and on-disk version numbers of an inode does not match. When pushing data blocks to the clone, the inode version of the immediate clone inode is bumped. But if there is another clone in the chain, then the ILIST extent of this immediate clone inode is not pushed onto that clone. This is not right because the inode has been modified. RESOLUTION: The code is modified so that the ILIST extents of the immediate clone inode is pushed onto the next clone in chain. * 3498950 (Tracking ID: 3356947) SYMPTOM: VxFS doesnAt work as fast as expected when multi-threaded writes are issued onto a file, which is interspersed by fsync calls. DESCRIPTION: When multi-threaded writes are issued with fsync calls in between the writes, fsync can serialise the writes by taking IRWLOCK on the inode and doing the whole file putpages. Therefore out-of-the box performance is relatively slow in terms of throughput. RESOLUTION: The code is fixed to remove the fsync's serialisation with IRWLOCK and make it conditional only for some cases. * 3498954 (Tracking ID: 3352883) SYMPTOM: During the rename operation, lots of nfsd threads waiting for mutex operation hang with the following stack trace : vxg_svar_sleep_unlock vxg_get_block vxg_api_initlock vx_glm_init_blocklock vx_cbuf_lookup vx_getblk_clust vx_getblk_cmn vx_getblk vx_fshdchange vx_unlinkclones vx_clone_dispose vx_workitem_process vx_worklist_process vx_worklist_thread vx_kthread_init vxg_svar_sleep_unlock vxg_grant_sleep vxg_cmn_lock vxg_api_trylock vx_glm_trylock vx_glmref_trylock vx_mayfrzlock_try vx_walk_fslist vx_log_sync vx_workitem_process vx_worklist_process vx_worklist_thread vx_kthread_init DESCRIPTION: A race condition is observed between the NFS rename and additional dentry alias created by the current vx_splice_alias()function. This race condition causes two different directory dentries pointing to the same inode, which results in mutex deadlock in lock_rename()function. RESOLUTION: The code is modified to change the vx_splice_alias()function to prevent the creation of additional dentry alias. * 3498963 (Tracking ID: 3396959) SYMPTOM: On RHEL 6.4 with insufficient free memory, creating a file may panic the system with the following stack trace: shrink_mem_cgroup_zone() shrink_zone() do_try_to_free_pages() try_to_free_pages() __alloc_pages_nodemask() alloc_pages_current() __get_free_pages() vx_getpages() vx_alloc() vx_bc_getfreebufs() vx_bc_getblk() vx_getblk_bp() vx_getblk_cmn() vx_getblk() vx_iread() vx_local_iread() vx_iget() vx_ialloc() vx_dirmakeinode() vx_dircreate() vx_dircreate_tran() vx_pd_create() vx_create1_pd() vx_do_create() vx_create1() vx_create_vp() vx_create() vfs_create() do_filp_open() do_sys_open() sys_open() system_call_fastpath() DESCRIPTION: VxFS estimates the stack that is required to perform various kernel operations and creates hand-off threads if the estimated stack usage goes above the allowed kernel limit. However, the estimation may go wrong when the system is under heavy memory pressure, as some Linux kernel changes in RHEL 6.4 increase the depth of stack. There might be additional functions that are called in case of getpage to alleviate the situation, which leads to increased stack usage. RESOLUTION: The code is modified to take stack depth calculations to correctly estimate the stack usage under memory pressure conditions. * 3498976 (Tracking ID: 3434811) SYMPTOM: In VxFS 6.1, the vxfsconvert(1M) command hangs within the vxfsl3_getext() Function with following stack trace: search_type() bmap_typ() vxfsl3_typext() vxfsl3_getext() ext_convert() fset_convert() convert() DESCRIPTION: There is a type casting problem for extent size. It may cause a non-zero value to overflow and turn into zero by mistake. This further leads to infinite looping inside the function. RESOLUTION: The code is modified to remove the intermediate variable and avoid type casting. * 3498978 (Tracking ID: 3424564) SYMPTOM: fsppadm fails with ENODEV and "file is encrypted or is not a database" errors DESCRIPTION: The error handler was missing for ENODEV, while we process only the directory inodes and the database got corrupted for 2nd error. RESOLUTION: Added a error handler to ignore the ENODEV while processing directory inode only and for database corruption: we added a log message to capture all the db logs to understand/know why corruption happened. * 3498998 (Tracking ID: 3466020) SYMPTOM: File System is corrupted with the following error message in the log: WARNING: msgcnt 28 mesg 008: V-2-8: vx_direrr: vx_dexh_keycheck_1 - /TraceFile file system dir inode 3277090 dev/block 0/0 diren WARNING: msgcnt 27 mesg 008: V-2-8: vx_direrr: vx_dexh_keycheck_1 - /TraceFile file system dir inode 3277090 dev/block 0/0 diren WARNING: msgcnt 26 mesg 008: V-2-8: vx_direrr: vx_dexh_keycheck_1 - /TraceFile file system dir inode 3277090 dev/block 0/0 diren WARNING: msgcnt 25 mesg 096: V-2-96: vx_setfsflags - /dev/vx/dsk/a2fdc_cfs01/trace_lv01 file system fullfsck flag set - vx_direr WARNING: msgcnt 24 mesg 008: V-2-8: vx_direrr: vx_dexh_keycheck_1 - /TraceFile file system dir inode 3277090 dev/block 0/0 diren DESCRIPTION: In case error is returned from function vx_dirbread() via function vx_dexh_keycheck1(), the FULLFSCK flag is set on the FS unconditionally. A corrupted LDH can lead to the reading of the wrong block, which results in the setting of FULLFSCK flag. The system doesnAt verify whether it is reading the wrong value due to a corrupted LDH, so that the FULLFSCK flag is set unnecessarily because a corrupted LDH can be fixed online by recreating the hash. RESOLUTION: The code is modified such that when a corruption of the LDH is detected, the system removes the Large Directory hash instead of setting FULLFSCK. The Large Directory Hash will then be recreated the next time the directory is modified. * 3499005 (Tracking ID: 3469644) SYMPTOM: System panics in the vx_logbuf_clean() function while traversing chain of transactions off the intent log buffer. The stack trace is as follows: vx_logbuf_clean () vx_logadd () vx_log() vx_trancommit() vx_exh_hashinit () vx_dexh_create () vx_dexh_init () vx_pd_rename () vx_rename1_pd() vx_do_rename () vx_rename1 () vx_rename () vx_rename_skey () DESCRIPTION: The system panics as the vx_logbug_clean() function tries to access an already freed transaction from transaction chain to flush it to log. RESOLUTION: The code is modified to make sure that transaction gets flushed to the log before it is freed. * 3499008 (Tracking ID: 3484336) SYMPTOM: The fidtovp() system call can panic in the vx_itryhold_locked() function with the following stack trace: vx_itryhold_locked vx_iget vx_common_vget vx_do_vget vx_vget_skey vfs_vget fidtovp kernel_add_gate_cstack nfs3_fhtovp rfs3_getattr rfs_dispatch svc_getreq threadentry [kdb_read_mem] DESCRIPTION: Some VxFS operations like the vx_vget() function try to get a hold on an in-core inode using the vx_itryhold_locked() function, but it doesnAt take the lock on the corresponding directory inode. This may lead to a race condition when this inode is present on the delicache list and is inactivated. Thereby this results in a panic when the vx_itryhold_locked() function tries to remove it from a free list. This is actually a known issue, but the last fix was not complete. It missed some functions which may also cause the race condition. RESOLUTION: The code is modified to take inode list lock inside the vx_inactive_tran(), vx_tranimdone() and vx_tranuninode() functions to avoid race condition. * 3499011 (Tracking ID: 3486726) SYMPTOM: VFR logs too much data on the target node. DESCRIPTION: At the target node, it logs debug level messages evenif the debug mode was off. Also it doesnAt consider the debug mode specified at the time of job creation. RESOLUTION: The code is modified to not log the debug level messages on the target node if the specified debug mode is set off. * 3499014 (Tracking ID: 3471245) SYMPTOM: The Mongodb fails to insert any record because lseek fails to seek to the EOF. DESCRIPTION: Fallocate doesn't update the inode's i_size on linux, which causes lseek unable to seek to the EOF. RESOLUTION: Before returning from the vx_fallocate() function, call the vx_getattr()function to update the Linux inode with the VxFS inode. * 3499030 (Tracking ID: 3484353) SYMPTOM: It is a self-deadlock caused by a missing unlock of DIRLOCK. Its typical stack trace is like the following: slpq_swtch_core() real_sleep() sleep_one() vx_smp_lock() vx_dirlock() vx_do_rename() vx_rename1() vx_rename() vn_rename() rename() syscall() DESCRIPTION: When a partitioned directory feature (PD) of Veritas File System (VxFS) is enabled, there is a possibility of self-deadlock when there are multiple renaming threads operating on the same target directory. The issue is due to the fact that there is a missing unlock of DIRLOCK in the vx_int_rename() function. RESOLUTION: The code is modified by adding missing unlock for directory lock in the vx_int_rename()function.. * 3514824 (Tracking ID: 3443430) SYMPTOM: Fsck allocates too much memory. DESCRIPTION: Since Storage Foundation 6.0, parallel inode list processing with multiple threads is introduced to help reduce the fsck time. However, the parallel threads have to allocate redundant memory instead of reusing buffers in the buffer cache efficiently when inode list has many holes. RESOLUTION: The code is fixed to make each thread to maintain its own buffer cache from which it can reuse free memory. * 3515559 (Tracking ID: 3498048) SYMPTOM: while the system is making backup, the Als AlA command on the same file system may hang. DESCRIPTION: When the dalloc (delayed allocation) feature is turned on, flushing takes quite a lot of time which keeps hold on getpage lock, as this lock is needed by writers which keep read write lock held on inodes. The Als AlA command needs ACLs(access control lists) to display information. But in Veritas File System (VxFS), ACLS are accessed only under protection of inode read write lock, which results in the hang. RESOLUTION: The code is modified to turn dalloc off and improve write throttling by restricting the kernel flusher from updating Intenal counter for write page flush.. * 3515569 (Tracking ID: 3430461) SYMPTOM: The nested unmounts as well as the force unmounts fail if, the parent file system is disabled which further inhibits the unmounting of the child file system. DESCRIPTION: If a file system is mounted inside another vxfs mount, and if the parent file system gets disabled, then it is not possible to sanely unmount the child even with the force unmounts. This issue is observed because a disabled file system does not allow directory look up on it. On Linux, a file system can be unmounted only by providing the path of the mount point. RESOLUTION: The code is modified to allow the exceptional path look for unmounts. These are read only operations and hence are safer. This makes it possible for the unmount of child file system to proceed. * 3515588 (Tracking ID: 3294074) SYMPTOM: System call fsetxattr() is slower on Veritas File System (VxFS) than ext3 file system. DESCRIPTION: VxFS implements the fsetxattr() system call in a synchronized sync way. Hence, it will take some time to flush the data to the disk before returning to the system call to guarantee file system consistency in case of file system crash. RESOLUTION: The code is modified to allow the transaction to flush the data in a delayed way. * 3515737 (Tracking ID: 3511232) SYMPTOM: The kernel panics in the vx_write_alloc() function with the following stack trace: __schedule_bug thread_return schedule_timeout wait_for_common wait_for_completion vx_getpages_handoff vx_getpages vx_alloc vx_mklbtran vx_te_bufalloc vx_te_bmap_split vx_te_bmap_alloc vx_bmap_alloc_typed vx_bmap_alloc vx_get_alloc vx_tran_get_alloc vx_alloc_getpage vx_do_getpage vx_internal_alloc vx_write_alloc vx_write1 vx_write_common_slow vx_write_common vx_write vx_naio_write vx_naio_write_v2 aio_rw_vect_retry aio_run_iocb do_io_submit sys_io_submit system_call_fastpath DESCRIPTION: During a memory allocation, the stack overflows and corrupts the thread_info structure of the thread. Then when the system sleeps for a handed-off result, the corruption caused by the overflow is detected and the system panics. RESOLUTION: The code is fixed to detect thread_info corruptions, as well as to change some stack allocations to dynamic allocations to save stack usage. * 3515739 (Tracking ID: 3510796) SYMPTOM: System panics in Linux 3.0.x kernel when VxFS cleans the inode chunks. DESCRIPTION: On Linux, the kernel swap daemon (kswapd) takes a reference hold on a page but not the owning inode. In vx_softcnt_flush(), when the inode's final softcnt drops, the kernel calls the vx_real_destroy_inode() function. On 3.x.x kernels, vx_real_destroy_inode() temporarily clears the address-space operations. Therefore a window is created where kswapd works on a page without VxFS's address-space operations set, and results in a panic. RESOLUTION: The code is fixed to flush all pages for an inode before it calls the vx_softcnt_flush() function. * 3517702 (Tracking ID: 3517699) SYMPTOM: Return code 240 for command fsfreeze(1M) is not documented in man page for fsfreeze. DESCRIPTION: Return code 240 for command fsfreeze(1M) is not documented in man page for fsfreeze. RESOLUTION: The man page for fsfreeze(1M) is modified to document return code 240. * 3517707 (Tracking ID: 3093821) SYMPTOM: The system panics due to referring freed super block after the vx_unmount() function errors. DESCRIPTION: In the file system unmount process, once Linux VFS calls in to VxFS specific unmount processing vx_unmount(), it doesn't expect error from this call. So, once the vx_unmount()function returns, Linux frees the file systems corresponding super_block object. But if any error is observed during vx_unmount(), it may leave file system Inodes in vxfs inode cache as it is. Otherwise, when there is no error, the file system inodes are processed and dropped on vx_unmount(). As file systems inodes left in VxFS inode cache would still point to freed super block object, so now when these inodes on Inode cache are cleaned up to free or reuse, they may refer freed super block in certain cases, which might lead to Panic due to NULL pointer de-reference. RESOLUTION: Do not return EIO or ENXIO error from vx_detach_fset() when you unmounts the filesystem. Insted of returning error process, drop inode from the inode cache. * 3557193 (Tracking ID: 3473390) SYMPTOM: In memory pressure scenarios, we see panics/system crashes due to stack overflows. DESCRIPTION: Specifically on RHEL6, the memory allocation routines consume much higher memory than other distributions like SLES, or even RHEL5. Due to this, multiple overflows are reported for the RHEL6 platform. Most of these overflows occur when Veritas File System (VxFS) tries to allocate memory under memory pressure. RESOLUTION: The code is modified to fix multiple overflows by adding handoff codepaths, adjusting handoff limits, removing on-stack structures and reducing the number of function frames on stack wherever possible. * 3567855 (Tracking ID: 3567854) SYMPTOM: On Veritas File System (VxFS) 6.0.5, A If the file system has no external quota file, the vxedquota(1M) command gives error. A If the first mounted VxFS file system in the mnttab file contains no quota file, the vxrepquota(1M) command fails. DESCRIPTION: A The vxedquota(1M) issue: When the vxedquota(1M) command is executed, it expects the 32-bit quota file to be present in the mount-point, even if it does not contain any external quota file. This results in the error message while you are editing the quotas. This issue is fixed by only processing the file systems on which quota files are present. A The vxrepquota(1M) issue: On the other hand, when vxrepquota(1M) is executed, the command vxrepquota(1M) scans the mnttab file [0]to look for the VxFS file system which has external quota files. But, if the first VxFS file system doesn't contain any quota file, the command gives error and does not report any quota information. This issue is fixed by looping through the mnttab file till we get mount-point in which quota files exist. RESOLUTION: A For the vxedquota(1M) issue, the code is modified by skipping the file systems which don't contain quota files. A For the vxrepquota(1M) issue, the code is modified by obtaining a VxFS mount-point (with quota files) from the mnttab file. * 3579957 (Tracking ID: 3233315) SYMPTOM: "fsck" utility dumps core, while checking the RCT file. DESCRIPTION: "fsck" utility dumps core, while checking the RCT file. "bmap_search_typed()" function is passed with wrong parameter, and results in the core dump with the following stack trace: bmap_get_typeparms () bmap_search_typed_raw() bmap_search_typed() rct_walk() bmap_check_typed_raw() rct_check() main() RESOLUTION: Fixed the code to pass the correct parameters to "bmap_search_typed()" function. * 3581566 (Tracking ID: 3560968) SYMPTOM: The delicache_enable tunable is inconsistent in the CFS environment. DESCRIPTION: On the secondary nodes, the tunable values are exported from the primary mount, while the delicache_enable tunable value comes from the AtunefstabA file. Therefore the tunable values are not persistent. RESOLUTION: The code is fixed to read the "tunefstab" file only for the delicache_enable tunable during mount and set the value accordingly. * 3584297 (Tracking ID: 3583930) SYMPTOM: When external quota file is over-written or restored from backup, new settings which were added after the backup still remain. DESCRIPTION: The purpose of the quotaon operation is to copy the quota limits from external to internal quota file, because internal quota file is not always updated with correct limits. To complete the copy operation, the extent of external file is compared to the extent of internal file at the corresponding offset. Now, if external quota file is overwritten (or restored to its original copy) and the size of internal file is more than that of external, the quotaon operation does not clear the additional (stale) quota records in the internal file. Later, the sync operation (part of quotaon) copies these stale records from internal to external file. Hence, both internal and external files contain stale records. RESOLUTION: The code is modified to get rid of the stale records in the internal file at the time of quotaon. * 3588236 (Tracking ID: 3604007) SYMPTOM: Stack overflow was observed during extent allocation code path. DESCRIPTION: In extent allocation code path during write we are seeing stack overflow on sles11. We already have hand-off point in the code path but hand-off was not happening due to the reason that we were having some bytes more than necessary to trigger the hand-off. RESOLUTION: Now changing the value of trigger point so that hand-off can takes place. * 3590573 (Tracking ID: 3331010) SYMPTOM: Command fsck(1M) dumped core with segmentation fault. Following stack is observed. fakebmap() rcq_apply_op() rct_process_pending_tasklist() process_device() main() DESCRIPTION: While working on the device in function precess_device(), command fsck tries to access already freed device related structures available in pending task list during retry code path. RESOLUTION: Code is modified to free up the pending task list before retrying in function precess_device(). * 3595894 (Tracking ID: 3595896) SYMPTOM: While creating the OracleRAC 12.1.0.2 database, the node panics with the following stack: aio_complete() vx_naio_do_work() vx_naio_worker() vx_kthread_init() DESCRIPTION: For a zero size request (with a correctly aligned buffer), Veritas File System (VxFS) erroneously queues the work internally and returns -EIOCBQUEUED. The kernel calls function aio_complete() for this zero size request. However, while VxFS is performing the queued work internally, function aio_complete()gets called again. The double call of function aio_complete() results in the panic. RESOLUTION: The code is modified such that zero size requests will not queue elements inside VxFS work queue. * 3597454 (Tracking ID: 3602386) SYMPTOM: vfradmin man page shows the incorrect info about default behavior of -d option DESCRIPTION: When we run vfradmin command without -d option then by default the debugging is in ENABLED mode but man page indicates that the default debugging should be in DISABLED mode. RESOLUTION: Changes has been done in man page of vfradmin to reflect the correct default behavior. * 3597560 (Tracking ID: 3597482) SYMPTOM: The pwrite(2) function fails with EOPNOTSUPP when the write range is in two indirect extents. DESCRIPTION: When the range of pwrite() falls in two indirect extents, one ZFOD extent belonging to DB2 pre-allocated files created with setext( , VX_GROWFILE, ) ioctl and another DATA extent belonging to adjacent INDIR, write fails with EOPNOTSUPP. The reason is that Veritas File System (VxFS) is trying to coalesce extents which belong to different indirect address extents as part of this transaction A such a meta-data change consumes more transaction resources which VxFS transaction engine is unable to support in the current implementation. RESOLUTION: The code is modified to retry the write transaction without combining the extents. * 2705336 (Tracking ID: 2059611) SYMPTOM: The system panics due to a NULL pointer dereference while flushing the bitmaps to the disk and the following stack trace is displayed: a| a| vx_unlockmap+0x10c vx_tflush_map+0x51c vx_fsq_flush+0x504 vx_fsflush_fsq+0x190 vx_workitem_process+0x1c vx_worklist_process+0x2b0 vx_worklist_thread+0x78 DESCRIPTION: The vx_unlockmap() function unlocks a map structure of the file system. If the map is being used, the hold count is incremented. The vx_unlockmap() function attempts to check whether this is an empty mlink doubly linked list. The asynchronous vx_mapiodone routine can change the link at random even though the hold count is zero. RESOLUTION: The code is modified to change the evaluation rule inside the vx_unlockmap() function, so that further evaluation can be skipped over when map hold count is zero. * 2978234 (Tracking ID: 2972183) SYMPTOM: "fsppadm enforce" takes longer than usual time force update the secondary nodes than it takes to force update the primary nodes. DESCRIPTION: The ilist is force updated on secondary node. As a result the performance on the secondary becomes low. RESOLUTION: Force update the ilist file on Secondary nodes only on error condition. * 2982161 (Tracking ID: 2982157) SYMPTOM: During internal testing, the Af:vx_trancommit:4A debug asset was hit when the available transaction space is lesser than the required space. DESCRIPTION: The Af:vx_trancommit:4A assert is hit when available transaction space is lesser than required. During the file truncate operations, when VxFS calculates transaction space, it doesnAt consider the transaction space required in case the file has shared extents. As a result, the Af:vx_trancommit:4A debug assert is hit. RESOLUTION: The code is modified to take into account the extra transaction buffer space required when the file being truncated has shared extents. * 2999566 (Tracking ID: 2999560) SYMPTOM: While trying to clear the 'metadataok' flag on a volume of the volume set, the 'fsvoladm'(1M) command gives error. DESCRIPTION: The 'fsvoladm'(1M) command sets and clears 'dataonly' and 'metadataok'flags on a volume in a vset on which VxFS is mounted. The 'fsvoladm'(1M) command fails while clearing A the AmetadataokA flag and reports, an EINVAL (invalid argument) error for certain volumes. This failure occurs because while clearing the flag, VxFS reinitialize the reorg structure for some volumes. During re-initialization, VxFS frees the existing FS structures. However, it still refers to the stale device structure resulting in an EINVAL error. RESOLUTION: The code is modified to let the in-core device structure point to the updated and correct data. * 3027250 (Tracking ID: 3031901) SYMPTOM: The 'vxtunefs(1M)' command accepts the garbage value for the 'max_buf_dat_size' tunable. DESCRIPTION: When the garbage value for the 'max_buf_dat_size' tunable using 'vxtunefs(1M)' is specified, the tunable accepts the value and gives the successful update message; but the value actually doesn't get reflected in the system. And, this error is not identified from parsing the command line value of THE 'max_buf_dat_size' tunable; hence the garbage value for this tunable is also accepted. RESOLUTION: The code is modified to handle the error returned from parsing the command line value of the 'max_buf_data_size' tunable. * 3056103 (Tracking ID: 3197901) SYMPTOM: fset_get fails for the mention configuration DESCRIPTION: duplicate symbol fs_bmap in VxFS libvxfspriv.a and vxfspriv.so RESOLUTION: duplicate symbol fs_bmap in VxFS libvxfspriv.a and vxfspriv.so has being fixed by renaming to fs_bmap_priv in the libvxfspriv.a * 3059000 (Tracking ID: 3046983) SYMPTOM: There is an invalid CFS node number () in ".__fsppadm_fclextract". This causes the Dynamic Storage Tiering (DST) policy enforcement to fail. DESCRIPTION: DST policy enforcement sometimes depends on the extraction of the File Change Log (FCL). When the FCL change log is processed, it reads the FCL records from the change log into the buffer. If it finds that the buffer is not big enough to hold the records, it will do some rollback and pass out the needed buffer size. However, the rollback is not complete, this results in the problem. RESOLUTION: The code is modified to add the codes to the rollback content of "fh_bp1- >fb_addr" and "fh_bp2->fb_addr". * 3108176 (Tracking ID: 2667658) SYMPTOM: Attempt to perform an fscdsconv-endian conversion from the SPARC little-endian byte order to the x86 big-endian byte order fails because of a macro overflow. DESCRIPTION: Using the fscdsconv(1M) command to perform endian conversion from the SPARC little-endian (any SPARC architecture machine) byte order to the x86 big-endian (any x86 architecture machine) byte order fails. The write operation for the recovery file results in the control data offset (a hard coded macro to 500MB) overflow. RESOLUTION: The code is modified to take an estimate of the control-data offset explicitly and dynamically while creating and writing the recovery file. * 3248029 (Tracking ID: 2439261) SYMPTOM: When the vx_fiostats_tunable is changed from zero to non-zero, the system panics with the following stack trace: vx_fiostats_do_update vx_fiostats_update vx_read1 vx_rdwr vno_rw rwuio pread DESCRIPTION: When vx_fiostats_tunable is changed from zero to non-zero, all the incore-inode fiostats attributes are set to NULL. When these attributes are accessed, the system panics due to the NULL pointer dereference. RESOLUTION: The code has been modified to check the file I/O stat attributes are present before dereferencing the pointers. * 3248042 (Tracking ID: 3072036) SYMPTOM: Reads from secondary node in CFS can sometimes fail with ENXIO (No such device or address). DESCRIPTION: The incore attribute ilist on secondary node is out of sync with that of the primary. RESOLUTION: The code is modified such that incore attribute ilist on secondary node is force updated with data from primary node. * 3248046 (Tracking ID: 3092114) SYMPTOM: The information output by the "df -i" command can often be inaccurate for cluster mounted file systems. DESCRIPTION: In Cluster File System 5.0 release a concept of delegating metadata to nodes in the cluster is introduced. This delegation of metadata allows CFS secondary nodes to update metadata without having to ask the CFS primary to do it. This provides greater node scalability. However, the "df -i" information is still collected by the CFS primary regardless of which node (primary or secondary) the "df -i" command is executed on. For inodes the granularity of each delegation is an Inode Allocation Unit [IAU], thus IAUs can be delegated to nodes in the cluster. When using a VxFS 1Kb file system block size each IAU will represent 8192 inodes. When using a VxFS 2Kb file system block size each IAU will represent 16384 inodes. When using a VxFS 4Kb file system block size each IAU will represent 32768 inodes. When using a VxFS 8Kb file system block size each IAU will represent 65536 inodes. Each IAU contains a bitmap that determines whether each inode it represents is either allocated or free, the IAU also contains a summary count of the number of inodes that are currently free in the IAU. The ""df -i" information can be considered as a simple sum of all the IAU summary counts. Using a 1Kb block size IAU-0 will represent inodes numbers 0 - 8191 Using a 1Kb block size IAU-1 will represent inodes numbers 8192 - 16383 Using a 1Kb block size IAU-2 will represent inodes numbers 16384 - 32768 etc. The inaccurate "df -i" count occurs because the CFS primary has no visibility of the current IAU summary information for IAU that are delegated to Secondary nodes. Therefore the number of allocated inodes within an IAU that is currently delegated to a CFS Secondary node is not known to the CFS Primary. As a result, the "df -i" count information for the currently delegated IAUs is collected from the Primary's copy of the IAU summaries. Since the Primary's copy of the IAU is stale, therefore the "df -i" count is only accurate when no IAUs are currently delegated to CFS secondary nodes. In other words - the IAUs currently delegated to CFS secondary nodes will cause the "df -i" count to be inaccurate. Once an IAU is delegated to a node it can "timeout" after a 3 minutes of inactivity. However, not all IAU delegations will timeout. One IAU will always remain delegated to each node for performance reasons. Also an IAU whose inodes are all allocated (so no free inodes remain in the IAU) it would not timeout either. The issue can be best summarized as: The more IAUs that remain delegated to CFS secondary nodes, the greater the inaccuracy of the "df -i" count. RESOLUTION: Allow the delegations for IAU's whose inodes are all allocated (so no free inodes in the IAU) to "timeout" after 3 minutes of inactivity. * 3248054 (Tracking ID: 3153919) SYMPTOM: The fsadm(1M) command may hang when the structural file set re-organization is in progress. The following stack trace is observed: vx_event_wait vx_icache_process vx_switch_ilocks_list vx_cfs_icache_process vx_switch_ilocks vx_fs_reinit vx_reorg_dostruct vx_extmap_reorg vx_struct_reorg vx_aioctl_full vx_aioctl_common vx_aioctl vx_ioctl vx_compat_ioctl compat_sys_ioctl DESCRIPTION: During the structural file set re-organization, due to some race condition, the VX_CFS_IOWN_TRANSIT flag is set on the inode. At the final stage of the structural file set re-organization, all the inodes are re-initialized. Since, the VX_CFS_IOWN_TRANSIT flag is set improperly, the re-initialization fails to proceed. This causes the hang. RESOLUTION: The code is modified such that VX_CFS_IOWN_TRANSIT flag is cleared. * 3296988 (Tracking ID: 2977035) SYMPTOM: While running an internal noise test in a Cluster File System (CFS) environment, a debug assert issue was observed in vx_dircompact()function. DESCRIPTION: Compacting directory blocks are avoided if the inode has AextopA (extended operation) flags, such as deferred inode removal and pass through truncation set.. The issue is caused when the inode has extended pass through truncation and is considered for compacting. RESOLUTION: The code is modified to avoid compacting the directory blocks of the inode if it has [0]an extended operation of pass through truncation set.[0] * 3310758 (Tracking ID: 3310755) SYMPTOM: When the system processes an indirect extent, if it finds the first record as Zero Fill-On-Demand (ZFOD) extent (or first n records are ZFOD records), then it hits the assert. DESCRIPTION: In case of indirect extents reference count mechanism (shared block count) regarding files having the shared ZFOD extents are not behaving correctly. RESOLUTION: The code for the reference count queue (RCQ) handling for the shared indirect ZFOD extents is modified, and the fsck(1M) issues with snapshot of file[0] when there are ZFOD extents has been fixed. * 3317118 (Tracking ID: 3317116) SYMPTOM: Internal command conformance text for mount command on RHEL6 Update4 hit debug assert inside the vx_get_sb_impl() function DESCRIPTION: In RHEL6 Update4 kernel security update 2.6.32-358.18.1, Redhat changed the flag used to save mount status of dentry from d_flags to d_mounted. This resulted in debug assert in the vx_get_sb_impl() function, as d_flags was used to check mount status of dentry in RHEL6. RESOLUTION: The code is modified to use d_flags if OS is RHEL6 update2, else use d_mounted to determine mount status for dentry. * 3338024 (Tracking ID: 3297840) SYMPTOM: A metadata corruption is found during the file removal process with the inode block count getting negative. DESCRIPTION: When the user removes or truncates a file having the shared indirect blocks, there can be an instance where the block count can be updated to reflect the removal of the shared indirect blocks when the blocks are not removed from the file. The next iteration of the loop updates the block count again while removing these blocks. This will eventually lead to the block count being a negative value after all the blocks are removed from the file. The removal code expects the block count to be zero before updating the rest of the metadata. RESOLUTION: The code is modified to update the block count and other tracking metadata in the same transaction as the blocks are removed from the file. * 3338026 (Tracking ID: 3331419) SYMPTOM: Machine panics with the following stack trace. #0 [ffff883ff8fdc110] machine_kexec at ffffffff81035c0b #1 [ffff883ff8fdc170] crash_kexec at ffffffff810c0dd2 #2 [ffff883ff8fdc240] oops_end at ffffffff81511680 #3 [ffff883ff8fdc270] no_context at ffffffff81046bfb #4 [ffff883ff8fdc2c0] __bad_area_nosemaphore at ffffffff81046e85 #5 [ffff883ff8fdc310] bad_area at ffffffff81046fae #6 [ffff883ff8fdc340] __do_page_fault at ffffffff81047760 #7 [ffff883ff8fdc460] do_page_fault at ffffffff815135ce #8 [ffff883ff8fdc490] page_fault at ffffffff81510985 [exception RIP: print_context_stack+173] RIP: ffffffff8100f4dd RSP: ffff883ff8fdc548 RFLAGS: 00010006 RAX: 00000010ffffffff RBX: ffff883ff8fdc6d0 RCX: 0000000000002755 RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000046 RBP: ffff883ff8fdc5a8 R8: 000000000002072c R9: 00000000fffffffb R10: 0000000000000001 R11: 000000000000000c R12: ffff883ff8fdc648 R13: ffff883ff8fdc000 R14: ffffffff81600460 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffff883ff8fdc540] print_context_stack at ffffffff8100f4d1 #10 [ffff883ff8fdc5b0] dump_trace at ffffffff8100e4a0 #11 [ffff883ff8fdc650] show_trace_log_lvl at ffffffff8100f245 #12 [ffff883ff8fdc680] show_trace at ffffffff8100f275 #13 [ffff883ff8fdc690] dump_stack at ffffffff8150d3ca #14 [ffff883ff8fdc6d0] warn_slowpath_common at ffffffff8106e2e7 #15 [ffff883ff8fdc710] warn_slowpath_null at ffffffff8106e33a #16 [ffff883ff8fdc720] hrtick_start_fair at ffffffff810575eb #17 [ffff883ff8fdc750] pick_next_task_fair at ffffffff81064a00 #18 [ffff883ff8fdc7a0] schedule at ffffffff8150d908 #19 [ffff883ff8fdc860] __cond_resched at ffffffff81064d6a #20 [ffff883ff8fdc880] _cond_resched at ffffffff8150e550 #21 [ffff883ff8fdc890] vx_nalloc_getpage_lnx at ffffffffa041afd5 [vxfs] #22 [ffff883ff8fdca80] vx_nalloc_getpage at ffffffffa03467a3 [vxfs] #23 [ffff883ff8fdcbf0] vx_do_getpage at ffffffffa034816b [vxfs] #24 [ffff883ff8fdcdd0] vx_do_read_ahead at ffffffffa03f705e [vxfs] #25 [ffff883ff8fdceb0] vx_read_ahead at ffffffffa038ed8a [vxfs] #26 [ffff883ff8fdcfc0] vx_do_getpage at ffffffffa0347732 [vxfs] #27 [ffff883ff8fdd1a0] vx_getpage1 at ffffffffa034865d [vxfs] #28 [ffff883ff8fdd2f0] vx_fault at ffffffffa03d4788 [vxfs] #29 [ffff883ff8fdd400] __do_fault at ffffffff81143194 #30 [ffff883ff8fdd490] handle_pte_fault at ffffffff81143767 #31 [ffff883ff8fdd570] handle_mm_fault at ffffffff811443fa #32 [ffff883ff8fdd5e0] __get_user_pages at ffffffff811445fa #33 [ffff883ff8fdd670] get_user_pages at ffffffff81144999 #34 [ffff883ff8fdd690] vx_dio_physio at ffffffffa041d812 [vxfs] #35 [ffff883ff8fdd800] vx_dio_rdwri at ffffffffa02ed08e [vxfs] #36 [ffff883ff8fdda20] vx_write_direct at ffffffffa044f490 [vxfs] #37 [ffff883ff8fddaf0] vx_write1 at ffffffffa04524bf [vxfs] #38 [ffff883ff8fddc30] vx_write_common_slow at ffffffffa0453e4b [vxfs] #39 [ffff883ff8fddd30] vx_write_common at ffffffffa0454ea8 [vxfs] #40 [ffff883ff8fdde00] vx_write at ffffffffa03dc3ac [vxfs] #41 [ffff883ff8fddef0] vfs_write at ffffffff81181078 #42 [ffff883ff8fddf30] sys_pwrite64 at ffffffff81181a32 #43 [ffff883ff8fddf80] system_call_fastpath at ffffffff8100b072 DESCRIPTION: The panic is due to kernel referring to corrupted thread_info structure from the scheduler, thread_info got corrupted by stack overflow. While doing direct I/O write, user-space pages need to be pre-faulted using __get_user_pages() code path. This code path is very deep can end up consuming lot of stack space. RESOLUTION: Reduced the kernel stack consumption by ~400-500 bytes in this code path by making various changes in the way pre-faulting is done. * 3338030 (Tracking ID: 3335272) SYMPTOM: The mkfs (make file system) command dumps core when the log size provided is not aligned. The following stack trace is displayed: (gdb) bt #0 find_space () #1 place_extents () #2 fill_fset () #3 main () (gdb) DESCRIPTION: While creating the VxFS file system using the mkfs command, if the log size provided is not aligned properly, you may end up in doing miscalculations for placing the RCQ extents and finding no place. This leads to illegal memory access of AU bitmap and results in core dump. RESOLUTION: The code is modified to place the RCQ extents in the same AU where log extents are allocated. * 3338063 (Tracking ID: 3332902) SYMPTOM: The system running the fsclustadm(1M) command panics while shutting down. The following stack trace is logged along with the panic: machine_kexec crash_kexec oops_end page_fault [exception RIP: vx_glm_unlock] vx_cfs_frlpause_leave [vxfs] vx_cfsaioctl [vxfs] vxportalkioctl [vxportal] vfs_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath DESCRIPTION: There exists a race-condition between "fsclustadm(1M) cfsdeinit" and "fsclustadm(1M) frlpause_disable". The "fsclustadm(1M) cfsdeinit" fails after cleaning the Group Lock Manager (GLM), without downgrading the CFS state. Under the false CFS state, the "fsclustadm(1M) frlpause_disable" command enters and accesses the GLM lock, which "fsclustadm(1M) cfsdeinit" frees resulting in a panic. There exists another race between the code in vx_cfs_deinit() and the code in fsck, and it will lead to the situation that although fsck has a reservation held, but this couldn't prevent vx_cfs_deinit() from freeing vx_cvmres_list because there is no such a check for vx_cfs_keepcount. RESOLUTION: The code is modified to add appropriate checks in the "fsclustadm(1M) cfsdeinit" and "fsclustadm(1M) frlpause_disable" to avoid the race-condition. * 3338750 (Tracking ID: 2414266) SYMPTOM: The fallocate(2) system call fails on VxFS file systems in the Linux environment. DESCRIPTION: The fallocate(2) system call, which is used for pre-allocating the file space on Linux, is not supported on VxFS. RESOLUTION: The code is modified to support the fallocate(2) system call on VxFS in the Linux environment. * 3338762 (Tracking ID: 3096834) SYMPTOM: Intermittent vx_disable messages are displayed in system log. DESCRIPTION: VxFS displays intermittent vx_disable messages. The file system is not corrupt and the fsck(1M) command does not indicate any problem with the file system. However, the file system gets disabled. RESOLUTION: The code is modified to make the vx_disable message verbose with stack trace information to facilitate further debugging. * 3338776 (Tracking ID: 3224101) SYMPTOM: On a file system that is mounted by a cluster, the system panics after you enable the lazy optimization for updating the i_size across the cluster nodes. The stack trace may look as follows: vxg_free() vxg_cache_free4() vxg_cache_free() vxg_free_rreq() vxg_range_unlock_body() vxg_api_range_unlock() vx_get_inodedata() vx_getattr() vx_linux_getattr() vxg_range_unlock_body() vxg_api_range_unlock() vx_get_inodedata() vx_getattr() vx_linux_getattr() DESCRIPTION: On a file system that is mounted by a cluster with the -o cluster option, read operations or write operations take a range lock to synchronize updates across the different nodes. The lazy optimization incorrectly enables a node to release a range lock which is not acquired and panic the node. RESOLUTION: The code has been modified to release only those range locks which are acquired. * 3338779 (Tracking ID: 3252983) SYMPTOM: On a high-end system greater than or equal to 48 CPUs, some file-system operations may hang with the following stack trace: vx_ilock() vx_tflush_inode() vx_fsq_flush() vx_tranflush() vx_traninit() vx_tran_iupdat() vx_idelxwri_done() vx_idelxwri_flush() vx_delxwri_flush() vx_workitem_process() vx_worklist_process() vx_worklist_thread() DESCRIPTION: The function to get an inode returns an incorrect error value if there are no free inodes available in incore, this error value allocates an inode on-disk instead of allocating it to the incore. As a result, the same function is called again resulting in a continuous loop. RESOLUTION: The code is modified to return the correct error code. * 3338780 (Tracking ID: 3253210) SYMPTOM: When the file system reaches the space limitation, it hangs with the following stack trace: vx_svar_sleep_unlock() default_wake_function() wake_up() vx_event_wait() vx_extentalloc_handoff() vx_te_bmap_alloc() vx_bmap_alloc_typed() vx_bmap_alloc() vx_bmap() vx_exh_allocblk() vx_exh_splitbucket() vx_exh_split() vx_dopreamble() vx_rename_tran() vx_pd_rename() DESCRIPTION: When large directory hash is enabled through the vx_dexh_sz(5M) tunable, Veritas File System (VxFS) uses the large directory hash for directories. When you rename a file, a new directory entry is inserted to the hash table, which results in hash split. The hash split fails the current transaction and retries after some housekeeping jobs complete. These jobs include allocating more space for the hash table. However, VxFS doesn't check the return value of the preamble job. And thus, when VxFS runs out of space, the rename transaction is re-entered permanently without knowing if more space is allocated by preamble jobs. RESOLUTION: The code is modified to enable VxFS to exit looping when ENOSPC is returned from the preamble job. * 3338781 (Tracking ID: 3249958) SYMPTOM: When the /usr file system is mounted as a separate file system, Veritas File system (VxFS) fails to load. DESCRIPTION: There is a dependency between the two file systems, as VxFS uses a non-vital system utility command found in the /usr file system at boot up. RESOLUTION: The code is modified by changing the VxFS init script to load VxFS only after /usr file system is up to resolve the dependency. * 3338787 (Tracking ID: 3261462) SYMPTOM: File system with size greater than 16TB corrupts with vx_mapbad messages in the system log. DESCRIPTION: The corruption results from the combination of the following two conditions: a. Two or more threads race against each other to allocate around the same offset range. As a result, VxFS returns the buffer locked only in shared mode for all the threads which fail in allocating the extent. b. Since the allocated extent is from a region beyond 16TB, threads need to convert the buffer to a different type so that to accommodate the new extentAs start value. The buffer overrun happens because VxFS erroneously tries to unconditionally convert the buffer to the new type even though the buffer might not be able to accommodate the converted data. RESOLUTION: When the race condition is detected, VxFS returns proper retry errors to the caller, so that the whole operation is retried from the beginning. Also, the code is modified to ensure that VxFS doesnAt try to convert the buffer to the new type when it cannot accommodate the new data. In case this check fails, VxFS performs the proper split logic, so that buffer overrun doesnAt happen when the operation is retried. * 3338790 (Tracking ID: 3233284) SYMPTOM: FSCK binary hangs while checking Reference Count Table (RCT) with the following stack trace: bmap_search_typed_raw() bmap_check_typed_raw() rct_check() process_device() main() DESCRIPTION: The FSCK binary hangs due to the looping in the bmap_search_typed_raw() function. This function searches for extent entry in the indirect buffer for a given offset. In this case, the given offset is less than the start offset of the first extent entry. This unhandled corner case causes the infinite loop. RESOLUTION: The code is modified to handle the following cases: 1. Searching in empty indirect block. 2. Searching for an offset, which is less than the start offset of the first entry in the indirect block. * 3339230 (Tracking ID: 3308673) SYMPTOM: With the delayed allocations feature enabled for local mounted file system having highly fragmented available free space, the file system is disabled with the following message seen in the system log WARNING: msgcnt 1 mesg 031: V-2-31: vx_disable - /dev/vx/dsk/testdg/testvol file system disabled DESCRIPTION: VxFS transaction provides multiple extent allocations to fulfill one allocation request for a file system that has a high free space fragmentation. Thus, the allocation transaction becomes large and fails to commit. After retrying the transaction for a defined number of times, the file system is disabled with the with the above mentioned error RESOLUTION: The code is modified to commit the part of the transaction which is commit table and retrying the remaining part * 3339884 (Tracking ID: 1949445) SYMPTOM: System is unresponsive when files are created on large directory. The following stack is logged: vxg_grant_sleep() vxg_cmn_lock() vxg_api_lock() vx_glm_lock() vx_get_ownership() vx_exh_coverblk() vx_exh_split() vx_dexh_setup() vx_dexh_create() vx_dexh_init() vx_do_create() DESCRIPTION: For large directories, large directory hash (LDH) is enabled to improve the lookup feature. When a system takes ownership of LDH inode twice in same thread context (while building hash for directory), it becomes unresponsive RESOLUTION: The code is modified to avoid taking ownership again if we already have the ownership of the LDH inode. * 3339949 (Tracking ID: 3271892) SYMPTOM: VFR fail if the same Process ID (PID) is associated with the multiple jobs working on different target file systems. DESCRIPTION: The single-threaded VFR jobs cause the bottlenecks that result in scalability issues. Also, they may fail, if there are multiple jobs doing replications on different file systems and associated with single process. If one of jobs finishes, it may unregister all other jobs and the replication operation may fail. RESOLUTION: The code is modified to add multithreading, so that multiple jobs can be served per thread basis. This modification allows better scalability. The code is also modified to fix the VFR job unregister method. * 3339963 (Tracking ID: 3071622) SYMPTOM: On SLES10, bcopy(3) with overlapping address does not work. DESCRIPTION: The bcopy() function is a deprecated API due to which the function fails to handle overlapping addresses. RESOLUTION: The code is modified to replace the bcopy() function with memmove() function that handles the overlapping addresses requests. * 3339964 (Tracking ID: 3313756) SYMPTOM: The file replication daemon exits unexpectedly and dumps core on the target side. With the following stack trace: rep_cleanup_session() replnet_server_dropchan() replnet_client_connstate() replnet_conn_changestate() replnet_conn_evalpoll() vxev_loop() main() DESCRIPTION: The replication daemon tries to close a session which is already closed and hence cored dumps accessing a null pointer. RESOLUTION: The code is modified to the check the state of the session before trying to close it. * 3340029 (Tracking ID: 3298041) SYMPTOM: While performing "delayed extent allocations" by writing to a file sequentially and extending the file's size, or performing a mixture of sequential write I/O and random write I/O which extend a file's size, the write I/O performance to the file can suddenly degrade significantly. DESCRIPTION: The 'dalloc' feature allows VxFS to allocate extents (file system blocks) to a file in a delayed fashion when extending a file size. Asynchronous writes that extend a file's size will create and dirty memory pages, new extents can therefore be allocated when the dirty pages are flushed to disk (via background processing) rather than allocating the extents in the same context as the write I/O. However, in some cases, with the delayed allocation on, the flushing of dirty pages may occur synchronously in the foreground in the same context as the write I/O, when triggered the foreground flushing can significantly slow the write I/O performance. RESOLUTION: The code is modified to avoid the foreground flushing of data in the same write context. * 3340031 (Tracking ID: 3337806) SYMPTOM: On linux kernels greater than 3.0 find(1) command, the kernel may panic in the link_path_walk() function with the following stack trace: do_page_fault page_fault link_path_walk path_lookupat do_path_lookup user_path_at_empty vfs_fstatat sys_newfstatat system_call_fastpath DESCRIPTION: VxFS overloads a bit of the dentry flag at 0x1000 for internal usage. Linux didn't use this bit until kernel version 3.0 onwards. Therefore it is possible that both Linux and VxFS strive for this bit, which panics the kernel. RESOLUTION: The code is modified not to use 0x1000 bit in the dentry flag . * 3348459 (Tracking ID: 3274048) SYMPTOM: VxFS hangs when it requests a cluster-wide grant on an inode while holding a lock on the inode. DESCRIPTION: In the vx_switch_ilocks_list() function, VxFS requests the cluster-wide grant on an inode when it holds a lock on the inode, which may result in a deadlock. RESOLUTION: The code is modified to release a lock on an inode before asking for the cluster-wide grant on the inode, and recapture lock on the inode after cluster-wide grant is obtained. * 3351939 (Tracking ID: 3351937) SYMPTOM: The vfradmin(1M) command may fail when promoting a job on the locally mounted file system due to the "relatime" mount option.[0] DESCRIPTION: If /etc/mtab is a symlink to /proc/mounts, the vfradmin(1M) command fails as the mount command is unable to handle "relatime" option returned by new Linux versions and not handling of fstype "vxclonefs" for checkpoints RESOLUTION: The code is modified such that to suppress the "relatime" mount option while remounting file system and proper handling of fstype "vxclonefs" for checkpoints. * 3351946 (Tracking ID: 3194635) SYMPTOM: Internal stress test on locally mounted filesystem exitsed with an error message. DESCRIPTION: With a file having Zero-Filled on Demand (ZFOD) extents, a write operation in ZFOD extent area may lead to the coalescing of extent of type SHARED or COMPRESSED, or both with new extent of type DATA. The new DATA extent may be coalesced with the adjacent extent, if possible. If this happens without unsharing for shared extent or uncompressing for compressed extent case, data or metadata corruption may occur. RESOLUTION: The code is modified such that adjacent shared, compressed or pseudo-compressed extent is not coalesced. * 3351947 (Tracking ID: 3164418) SYMPTOM: Internal stress test on locally mounted VxFS filesytem results in data corruption in no space on device scenario while doing spilt on Zero Fill-On-Demand(ZFOD) extent. DESCRIPTION: When the split operation on Zero Fill-On-Demand(ZFOD) extent fails because of the ENOSPC(no space on device) error, then it erroneously processes the original ZFOD extent and returns no error. This may result in data corruption. RESOLUTION: The code is modified to return the ZFOD extent to its original state, if the ZFOD split operation fails due to ENOSPC error. * 3359278 (Tracking ID: 3364290) SYMPTOM: The kernel may panic in Veritas File System (VxFS) when it is internally working on reference count queue(RCQ) record. DESCRIPTION: The work item spawned by VxFS in kernel to process RCQ records during RCQ full situation is getting passed file system pointer as argument. Since no active level is held, this file system pointer is not guaranteed to be valid by the time the workitem starts processing. This may result in the panic. RESOLUTION: The code is modified to pass externally visible file system structure, as this structure is guaranteed to be valid since the creator of the work item takes a reference held on the structure which is released after the workitem exits. * 3364285 (Tracking ID: 3364282) SYMPTOM: The fsck(1M) command fails to correct inode list file. DESCRIPTION: The fsck(1M) command fails to correct the inode list file to write metadata for the inode list file after writing to disk an extent for the inode list file, if the write operation is successful. RESOLUTION: The fsck(1M) command is modified to write metadata for the inode list file after succewrite operations of an extent for the inode list file. * 3364289 (Tracking ID: 3364287) SYMPTOM: Debug assert may be hit in the vx_real_unshare() function in the cluster environment. DESCRIPTION: The vx_extend_unshare() function wrongly looks at the offset immediately after the current unshare length boundary. Instead, it should look at the offset that falls on the last byte of current unshare length. This may result in hitting debug asserts in the vx_real_unshare() function. RESOLUTION: The code is modified for the shared compressed extent. When the vx_extend_unshare() function tries to extend the unshared region, it doesnAt look up at the first byte immediately after the region is unshared. Instead, it does a looks up at the last byte unshared. * 3364302 (Tracking ID: 3364301) SYMPTOM: Assert failure because of improper handling of inode lock while truncating a reorg inode. DESCRIPTION: While truncating the reorg extent, there may be a case where unlock on inode is called even when lock on inode is not taken.While truncating reorg inode, locks held are released and before it acquires them again, it checks if the inode is cluster inode. if true, it goes for taking delegation hold lock. If there was error while taking delegation hold lock, it comes to error code path. Here it checks if there was any transaction and if it had tran commitable error. It commits the transaction and on success calls the unlock to release the locks which was not held. RESOLUTION: The code is modified to check whether lock is taken or not before unlocking. * 3364305 (Tracking ID: 3364303) SYMPTOM: Internal stress test on a locally mounted file system hits a debug assert in VxFS File Device Driver (FDD). DESCRIPTION: In VxFS File Device Driver (FDD), dentry operations are explicitly set using an assignment, which cause flags related to deletion to get incorrectly populated in dentry. This causes dentry from not getting shrunk immediately after use from cache. As a result a debug assert is hit. RESOLUTION: The code is modified to use the d_set_d_op() operating system function to initialize dentry operations. This function also takes care of the related flags inside dentry. * 3364307 (Tracking ID: 3364306) SYMPTOM: Stack overflow seen in extent allocation code path. DESCRIPTION: Stack overflow appears in the vx_extprevfind() code path. RESOLUTION: The code is modified to hand-off the extent allocation to a worker thread when stack consumption reaches 4k. * 3364317 (Tracking ID: 3364312) SYMPTOM: The fsadm(1M) command is unresponsive while processing the VX_FSADM_REORGLK_MSG message. The following stack trace may be seen while processing VX_FSADM_REORGLK_MSG: vx_tranundo() vx_do_rct_gc() vx_rct_setup_gc() vx_reorg_complete_gc() vx_reorg_complete() vx_reorg_clear_rct() vx_reorg_clear() vx_reorg_clear() vx_recv_fsadm_reorglk() vx_recv_fsadm() vx_msg_recvreq() vx_msg_process_thread() vx_thread_base() DESCRIPTION: In the vx_do_rct_gc() function, flag for in-directory cleanup is set for a shared indirect extent (SHR_IADDR_EXT). If the truncation fails, the vx_do_rct_gc()function does not clear the in-directory cleanup flag. As a result, the caller ends up calling the vx_do_rct_gc()function repeatedly leading to a never ending loop. RESOLUTION: The code is modified to reset the value of in-directory cleanup flag in case of truncation error inside the vx_do_rct_gc() function. * 3364333 (Tracking ID: 3312897) SYMPTOM: In Cluster File System (CFS), system can hang while trying to perform any administrative operation when the primary node is disabled. DESCRIPTION: In CFS, when node 1 tries to do some administrative operation which freezes and thaws the file system (e.g. turning on/off fcl), a deadlock can occur between the thaw and recovery (which started due to CFS primary being disabled) threads. The thread on node 1 trying to thaw is blocked while waiting for node 2 to reply to the loadfs message. The thread processing the loadfs message is waiting to complete the recovery operation. The recovery thread on node 2 is waiting for lock on an extent map (emap) buffer. This lock is held on node 1, as part of a transaction that was committed during the freeze, which results into a deadlock. RESOLUTION: The code is modified such as to flush any transactions that were committed during a freeze before starting the thawing process. * 3364335 (Tracking ID: 3331109) SYMPTOM: The full fsck does not repair corrupted reference count queue (RCQ) record. DESCRIPTION: When the RCQ record is corrupted due to an I/O error or log error, there is no code in full fsck which handles this corruption. As a result, some further operations related to RCQ might fail. RESOLUTION: The code is modified To repair the corrupt RCQ entry during a full fsck. * 3364338 (Tracking ID: 3331045) SYMPTOM: Kernel Oops in unlock code of map while referring freed mlink due to a race with iodone routine for delayed writes. DESCRIPTION: After issuing ASYNC I/O of map buffer, there is a possible race Between the vx_unlockmap() function and the vx_mapiodone() function. Due to a race, the vx_unlockmap() function refers a mlink after it gets freed. RESOLUTION: The code is modified to handle such race condition. * 3364349 (Tracking ID: 3359200) SYMPTOM: Internal test on Veritas File System (VxFS) fsdedup(1M) feature in cluster filesystem environment results in a hang. DESCRIPTION: The thread which processes the fsdedup(1M) request is taking the delegation lock on extent map which itself is waiting to acquire a lock on cluster-wide reference count queue(RCQ) buffer. While other internal VxFS thread is working on RCQ takes lock on cluster-wide RCQ buffer and is waiting to acquire delegation lock on extent map causinga deadlock. RESOLUTION: The code is modified to correct the lock hierarchy such that the delegation lock on extent map is taken before taking lock on cluster-wide RCQ buffer. * 3364353 (Tracking ID: 3331047) SYMPTOM: Memory leak occurs in the vx_followlink() function in error condition. DESCRIPTION: The vx_followlink() function doesnAt free allocated buffer, which results in the error of memory leak. RESOLUTION: The code is modified to free the buffer in the vx_followlink() function in error condition. * 3364355 (Tracking ID: 3263336) SYMPTOM: Internal noise test on cluster file system hits "f:vx_cwfrz_wait:2" and "f:vx_osdep_msgprint:panic" debug asserts. DESCRIPTION: VxFS hits the Af:vx_cwfrz_wait:2A and Af:vx_osdep_msgprint:panicA debug asserts due to a deadlock between work list thread and freeze. As a result, in processing delicache items (vx_delicache_process), all of the work threads are looping on the file control log (FCL) items in the work list 1, and they can never reach work list 0. Thus, the delicache items are never processed, cluster freeze never finish, and the FCL item never gets its active level. RESOLUTION: The code is modified to remove the force flag judgment in vx_delicache_process, so that the thread which enqueues delicache work item can always help in processing its own item. * 3369037 (Tracking ID: 3349651) SYMPTOM: VxFS modules fail to load on RHEL6.5 and following error messages are reported in system log. kernel: vxfs: disagrees about version of symbol putname kernel: vxfs: disagrees about version of symbol getname DESCRIPTION: In RHEL6.5 the kernel interfaces for getname and putname used by VxFS have changed. RESOLUTION: Code modified to use latest definitions of getname and putname kernel interfaces. * 3369039 (Tracking ID: 3350804) SYMPTOM: On RHEL6, VxFS can sometimes report system panic with errors as Thread overruns stack, or stack corrupts. DESCRIPTION: On RHEL6, the low-stack memory allocations consume significant memory, especially when system is under memory pressure and takes page allocator route. This breaks our earlier assumptions of stack depth calculations. RESOLUTION: The code is modified to add a check in case of RHEL6 before doing low-stack allocations for available stack size, VxFS re-tunes the various stack depth calculations for each distribution separately to minimize performance penalties. * 3370650 (Tracking ID: 2735912) SYMPTOM: The performance of tier relocation for moving a large number of files is poor when the `fsppadm enforce' command is used. When looking at the fsppadm(1M) command in the kernel, the following stack trace is observed: vx_cfs_inofindau vx_findino vx_ialloc vx_reorg_ialloc vx_reorg_isetup vx_extmap_reorg vx_reorg vx_allocpolicy_enforce vx_aioctl_allocpolicy vx_aioctl_common vx_ioctl vx_compat_ioctl DESCRIPTION: When the relocation is for each file located in Tier 1 to be relocated to Tier 2, Veritas File System (VxFS) allocates a new reorg inode and all its extents in Tier 2. VxFS then swaps the content of these two files and deletes the original file. This new inode allocation which involves a lot of processing can result in poor performance when a large number of files are moved. RESOLUTION: The code is modified to develop a reorg inode pool or cache instead of allocating it each time. * 3372896 (Tracking ID: 3352059) SYMPTOM: Due to memory leak, high memory usage occurs with vxfsrepld on target when no jobs are running. DESCRIPTION: On the target side, high memory usage may occur even when there are no jobs running because the memory allocated for some structures is not freed for every job iteration. RESOLUTION: The code is modified to resolve the memory leaks. * 3372909 (Tracking ID: 3274592) SYMPTOM: Internal noise test on Cluster File System (CFS)is unresponsive while executing the fsadm(1M) command DESCRIPTION: In CFS, the fsadm(1M) command hangs in the kernel, while processing the fsadm-reorganisation message on a secondary node. The hang results due to a race with the thread processing fsadm-query message for mounting primary-fileset on secondary node where the thread processing fsadm-query message wins the race. RESOLUTION: The code is modified to synchronize the processing of fsadm-query message and fsadm-reorganization message on the primary node. This synchronization ensures that they are processed in the order in which they were received. * 3380905 (Tracking ID: 3291635) SYMPTOM: Internal testing found the Avx_freeze_block_threads_all:7cA debug assert on locally mounted file systems while processing preambles for transactions DESCRIPTION: While processing preambles for transactions, if reference count queue (RCQ) is full, VxFS may hamper the processing of RCQ to free some records. This may result in hitting the debug assert. RESOLUTION: The code is modified to ignore the Reference count queue (RCQ) full errors when VxFS processes preambles for transactions. * 3381928 (Tracking ID: 3444771) SYMPTOM: Internal noise test on cluster file system hits debug assert while creating a file. DESCRIPTION: While creating a file, if the inode is allotted from the delicache list, it displayed null security field used for selinux. RESOLUTION: The code is modified to allocate the security structure when reusing the inode from delicache list. * 3383150 (Tracking ID: 3383147) SYMPTOM: The ACA operator precedence error may occur while turning AoffA delayed allocation. DESCRIPTION: Due to the C operator precedence issue, VxFS evaluates a condition wrongly. RESOLUTION: The code is modified to evaluate the condition correctly. * 3383271 (Tracking ID: 3433786) SYMPTOM: The vxedquota(1M) command fails to set quota limits for some users. DESCRIPTION: When vxedquota(1M) is invoked to set quota limits for users, it scans all the mounted VxFS file systems which have quota enabled and gets the quota file name paths. The buffer of user quota file path displayed path of group quota file name as the records were set for groups instead of users. In addition, passing of device name instead of mount point name leads to ioctl failure. RESOLUTION: The code is modified to use the correct buffer for user quota file name and use the mount point for all ioctls related to quota. ioctls related to quota. * 3396539 (Tracking ID: 3331093) SYMPTOM: MountAgent got stuck while doing repeated switchover due to current VxFS-AMF notification/unregistration design with the following stacktrace: sleep_spinunlock+0x61 () vx_delay2+0x1f0 () vx_unreg_callback_funcs_impl+0xd0 () disable_vxfs_api+0x190 () text+0x280 () amf_event_release+0x230 () amf_fs_event_lookup_notify_multi+0x2f0 () amf_vxfs_mount_opt_change_callback+0x190 () vx_aioctl_unsetmntlock+0x390 () cold_vx_aioctl_common+0x7c0 () vx_aioctl+0x300 () vx_admin_ioctl+0x610 () vxportal_ioctl+0x690 () spec_ioctl+0xf0 () vno_ioctl+0x350 () ioctl+0x410 () syscall+0x5b0 () DESCRIPTION: This issue is related to VxFS-AMF interface. VxFS provides notifications to AMF for certain events like FS being disabled or mount options change. While VxFS has called into AMF, AMF event handling mechanism can trigger an unregistration of VxFS in the same context since VxFS's notification triggered the last event notification registered with AMF. Before VxFS calls into AMF, a variable vx_fsamf_busy is set to 1 and it is reset when the callback returns. The unregistration loops if it finds that vx_fsamf_busy is set to 1. Since unregistration was called from the same context of the notification call back, the vx_fsamf_busy was never set to 0 and the loop goes on endlessly causing the command that triggered the notification to hang. RESOLUTION: A delayed unregistration mechanism is employed. The fix addresses the issue of getting unregistration from AMF in the context of callback from VxFS to AMF. In such scenario, the unregistration is marked for a later time. When all the notifications return and if a delayed unregistration is marked, the unregistration routine is explicitly called. * 3402484 (Tracking ID: 3394803) SYMPTOM: The vxupgrade(1M) command causes VxFS to panic with the following stack trace: panic_save_regs_switchstack() panic bad_kern_reference() $cold_pfault() vm_hndlr() bubbleup() vx_fs_upgrade() vx_upgrade() $cold_vx_aioctl_common() vx_aioctl() vx_ioctl() vno_ioctl() ioctl() syscall() DESCRIPTION: The panic is caused due to de_referencing the operator in the NULL device (one of the devices in the DEVLIST is showing as a NULL device). RESOLUTION: The code is modified to skip the NULL devices when the device in EVLIST is processed. * 3405172 (Tracking ID: 3436699) SYMPTOM: Assert failure occurs because of race condition between clone mount thread and directory removal thread while pushing data on clone. DESCRIPTION: There is a race condition between clone mount thread and directory removal thread (while pushing modified directory data on clone). On AIX, vnodes are added into the VFS vnode list (link-list of vnodes). The first entry to this vnode link-list must be root's vnode, which was done during the mount process. While mounting a clone, mount thread is scheduled before adding root's vnode into this list. During this time, the thread 2 takes the VFS lock on the same VFS list and tries to enter the directory's vnode into this vnode list. As there was no root vnode present at the start, it is assumed that this directory vnode as a root vnode and while cross checking this with the VROOT flag, the assert fails. RESOLUTION: The code is modified to handle the race condition by attaching root vnode into VFS vnode list before setting VFS pointer into file set. * 3411725 (Tracking ID: 3415639) SYMPTOM: The type of the fsdedupadm(1M) command always shows as MANUAL even it is launched by the fsdedupschd daemon. DESCRIPTION: The deduplication tasks scheduled by the scheduler do not show their type as "SCHEDULED", instead they show it as "MANUAL". This is because the fsdeduschd daemon, while calling fsdedup, does not set the flag -d which would set the correct status. RESOLUTION: The code is modified so that the flag is set properly. * 3429587 (Tracking ID: 3463464) SYMPTOM: Internal kernel functionality conformance test hits a kernel panic due to null pointer dereference. DESCRIPTION: In the vx_fsadm_query()function, error handling code path incorrectly sets the nodeid to AnullA in the file system structure. As a result of clearing nodeid, any subsequent access to this field results in the kernel panic. RESOLUTION: The code is modified to improve the error handling code path. * 3430687 (Tracking ID: 3444775) SYMPTOM: Internal noise testing on Cluster File System (CFS) results in a kernel panic in function vx_fsadm_query()with the following error message "Unable to handle kernel paging request". DESCRIPTION: The issue occurs due to simultaneous asynchronous access or modification by two threads to inode list extent array. As a result, memory freed by one thread is accessed by the other thread, resulting in the panic. RESOLUTION: The code is modified to add relevant locks to synchronize access or modification of inode list extent array. * 3436393 (Tracking ID: 3462694) SYMPTOM: The fsdedupadm(1M) command fails with error code 9 when it tries to mount checkpoints on a cluster. DESCRIPTION: While mounting checkpoints, the fsdedupadm(1M) command fails to parse the cluster mount option correctly, resulting in the mount failure. RESOLUTION: The code is modified to parse cluster mount options correctly in the fsdedupadm(1M) operation. * 3468413 (Tracking ID: 3465035) SYMPTOM: The VRTSvxfs and VRTSfsadv packages display incorrect "Provides" list. DESCRIPTION: The VRTSvxfs and VRTSfsadv packages show incorrect provides such as libexpat, libssh2 etc. These libraries are used internally by VRTSvxfs and VRTSfsadv. And, since the libraries are private copies they are not available for non-Veritas product. RESOLUTION: The code is modified to disable the "Provides" list in VRTSvxfs and VRTSfsadv packages. * 3384781 (Tracking ID: 3384775) SYMPTOM: Installing patch 6.0.3.200 on RHEL 6.4 or earlier RHEL 6.* versions fails with ERROR: No appropriate modules found. # /etc/init.d/vxfs start ERROR: No appropriate modules found. Error in loading module "vxfs". See documentation. Failed to create /dev/vxportal ERROR: Module fdd does not exist in /proc/modules ERROR: Module vxportal does not exist in /proc/modules ERROR: Module vxfs does not exist in /proc/modules DESCRIPTION: VRTSvxfs and VRTSodm rpms ship four different set module for RHEL 6.1 and RHEL6.2 , RHEL 6.3 , RHEL6.4 and RHEL 6.5 each. However the current patch only contains the RHEL 6.5 module. Hence installation on earlier RHEL 6.* version fail. RESOLUTION: A superseding patch 6.0.3.300 will be released to include all the modules for RHEL 6.* versions which will be available on SORT for download. * 3349652 (Tracking ID: 3349651) SYMPTOM: VxFS modules fail to load on RHEL6.5 and following error messages are reported in system log. kernel: vxfs: disagrees about version of symbol putname kernel: vxfs: disagrees about version of symbol getname DESCRIPTION: In RHEL6.5 the kernel interfaces for getname and putname used by VxFS have changed. RESOLUTION: Code modified to use latest definitions of getname and putname kernel interfaces. * 3356841 (Tracking ID: 2059611) SYMPTOM: The system panics due to a NULL pointer dereference while flushing the bitmaps to the disk and the following stack trace is displayed: a| a| vx_unlockmap+0x10c vx_tflush_map+0x51c vx_fsq_flush+0x504 vx_fsflush_fsq+0x190 vx_workitem_process+0x1c vx_worklist_process+0x2b0 vx_worklist_thread+0x78 DESCRIPTION: The vx_unlockmap() function unlocks a map structure of the file system. If the map is being used, the hold count is incremented. The vx_unlockmap() function attempts to check whether this is an empty mlink doubly linked list. The asynchronous vx_mapiodone routine can change the link at random even though the hold count is zero. RESOLUTION: The code is modified to change the evaluation rule inside the vx_unlockmap() function, so that further evaluation can be skipped over when map hold count is zero. * 3356845 (Tracking ID: 3331419) SYMPTOM: Machine panics with the following stack trace. #0 [ffff883ff8fdc110] machine_kexec at ffffffff81035c0b #1 [ffff883ff8fdc170] crash_kexec at ffffffff810c0dd2 #2 [ffff883ff8fdc240] oops_end at ffffffff81511680 #3 [ffff883ff8fdc270] no_context at ffffffff81046bfb #4 [ffff883ff8fdc2c0] __bad_area_nosemaphore at ffffffff81046e85 #5 [ffff883ff8fdc310] bad_area at ffffffff81046fae #6 [ffff883ff8fdc340] __do_page_fault at ffffffff81047760 #7 [ffff883ff8fdc460] do_page_fault at ffffffff815135ce #8 [ffff883ff8fdc490] page_fault at ffffffff81510985 [exception RIP: print_context_stack+173] RIP: ffffffff8100f4dd RSP: ffff883ff8fdc548 RFLAGS: 00010006 RAX: 00000010ffffffff RBX: ffff883ff8fdc6d0 RCX: 0000000000002755 RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000046 RBP: ffff883ff8fdc5a8 R8: 000000000002072c R9: 00000000fffffffb R10: 0000000000000001 R11: 000000000000000c R12: ffff883ff8fdc648 R13: ffff883ff8fdc000 R14: ffffffff81600460 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffff883ff8fdc540] print_context_stack at ffffffff8100f4d1 #10 [ffff883ff8fdc5b0] dump_trace at ffffffff8100e4a0 #11 [ffff883ff8fdc650] show_trace_log_lvl at ffffffff8100f245 #12 [ffff883ff8fdc680] show_trace at ffffffff8100f275 #13 [ffff883ff8fdc690] dump_stack at ffffffff8150d3ca #14 [ffff883ff8fdc6d0] warn_slowpath_common at ffffffff8106e2e7 #15 [ffff883ff8fdc710] warn_slowpath_null at ffffffff8106e33a #16 [ffff883ff8fdc720] hrtick_start_fair at ffffffff810575eb #17 [ffff883ff8fdc750] pick_next_task_fair at ffffffff81064a00 #18 [ffff883ff8fdc7a0] schedule at ffffffff8150d908 #19 [ffff883ff8fdc860] __cond_resched at ffffffff81064d6a #20 [ffff883ff8fdc880] _cond_resched at ffffffff8150e550 #21 [ffff883ff8fdc890] vx_nalloc_getpage_lnx at ffffffffa041afd5 [vxfs] #22 [ffff883ff8fdca80] vx_nalloc_getpage at ffffffffa03467a3 [vxfs] #23 [ffff883ff8fdcbf0] vx_do_getpage at ffffffffa034816b [vxfs] #24 [ffff883ff8fdcdd0] vx_do_read_ahead at ffffffffa03f705e [vxfs] #25 [ffff883ff8fdceb0] vx_read_ahead at ffffffffa038ed8a [vxfs] #26 [ffff883ff8fdcfc0] vx_do_getpage at ffffffffa0347732 [vxfs] #27 [ffff883ff8fdd1a0] vx_getpage1 at ffffffffa034865d [vxfs] #28 [ffff883ff8fdd2f0] vx_fault at ffffffffa03d4788 [vxfs] #29 [ffff883ff8fdd400] __do_fault at ffffffff81143194 #30 [ffff883ff8fdd490] handle_pte_fault at ffffffff81143767 #31 [ffff883ff8fdd570] handle_mm_fault at ffffffff811443fa #32 [ffff883ff8fdd5e0] __get_user_pages at ffffffff811445fa #33 [ffff883ff8fdd670] get_user_pages at ffffffff81144999 #34 [ffff883ff8fdd690] vx_dio_physio at ffffffffa041d812 [vxfs] #35 [ffff883ff8fdd800] vx_dio_rdwri at ffffffffa02ed08e [vxfs] #36 [ffff883ff8fdda20] vx_write_direct at ffffffffa044f490 [vxfs] #37 [ffff883ff8fddaf0] vx_write1 at ffffffffa04524bf [vxfs] #38 [ffff883ff8fddc30] vx_write_common_slow at ffffffffa0453e4b [vxfs] #39 [ffff883ff8fddd30] vx_write_common at ffffffffa0454ea8 [vxfs] #40 [ffff883ff8fdde00] vx_write at ffffffffa03dc3ac [vxfs] #41 [ffff883ff8fddef0] vfs_write at ffffffff81181078 #42 [ffff883ff8fddf30] sys_pwrite64 at ffffffff81181a32 #43 [ffff883ff8fddf80] system_call_fastpath at ffffffff8100b072 DESCRIPTION: The panic is due to kernel referring to corrupted thread_info structure from the scheduler, thread_info got corrupted by stack overflow. While doing direct I/O write, user-space pages need to be pre-faulted using __get_user_pages() code path. This code path is very deep can end up consuming lot of stack space. RESOLUTION: Reduced the kernel stack consumption by ~400-500 bytes in this code path by making various changes in the way pre-faulting is done. * 3356892 (Tracking ID: 3259634) SYMPTOM: In CFS, each node with mounted file system cluster has its own intent log in the file system. A CFS with more than 4, 294, 967, 296 file system blocks can zero out an incorrect location resulting from an incorrect typecasting. For example, that kind of CFS can incorrectly zero out 65536 file system blocks at the block offset of 1, 537, 474, 560 (file system blocks) with a 8-Kb file system block size and an intent log with the size of 65536 file system blocks. This issue can only occur if an intent log is located above an offset of 4, 294, 967, 296 file system blocks. This situation can occur when you add a new node to the cluster and mount an additional CFS secondary for the first time, which needs to create and zero a new intent log. This situation can also happen if you resize a file system or intent log and clear an intent log. The problem occurs only with the following file system size and the FS block size combinations: 1kb block size and FS size > 4TB 2kb block size and FS size > 8TB 4kb block size and FS size > 16TB 8kb block size and FS size > 32TB For example, the message log can contain the following messages: The full fsck flag is set on a file system with the following type of messages: 2013 Apr 17 14:52:22 sfsys kernel: vxfs: msgcnt 5 mesg 096: V-2-96: vx_setfsflags - /dev/vx/dsk/sfsdg/vol1 file system fullfsck flag set - vx_ierror 2013 Apr 17 14:52:22 sfsys kernel: vxfs: msgcnt 6 mesg 017: V-2-17: vx_attr_iget - /dev/vx/dsk/sfsdg/vol1 file system inode 13675215 marked bad incore 2013 Jul 17 07:41:22 sfsys kernel: vxfs: msgcnt 47 mesg 096: V-2-96: vx_setfsflags - /dev/vx/dsk/sfsdg/vol1 file system fullfsck flag set - vx_ierror 2013 Jul 17 07:41:22 sfsys kernel: vxfs: msgcnt 48 mesg 017: V-2-17: vx_dirbread - /dev/vx/dsk/sfsdg/vol1 file system inode 55010476 marked bad incore DESCRIPTION: In CFS, each node with mounted file system cluster has its own intent log in the file system. When an additional node mounts the file system as a CFS Secondary, the CFS creates an intent log. Note that intent logs are never removed, they are reused. When you clear an intent log, Veritas File System (VxFS) passes an incorrect block number to the log clearing routine, which zeros out an incorrect location. The incorrect location might point to the file data or file system metadata. Or, the incorrect location might be part of the file system's available free space. This is silent corruption. If the file system metadata corrupts, VxFS can detect the corruption when it subsequently accesses the corrupt metadata and marks the file system for full fsck. RESOLUTION: The code is modified so that VxFS can pass the correct block number to the log clearing routine. * 3356895 (Tracking ID: 3253210) SYMPTOM: When the file system reaches the space limitation, it hangs with the following stack trace: vx_svar_sleep_unlock() default_wake_function() wake_up() vx_event_wait() vx_extentalloc_handoff() vx_te_bmap_alloc() vx_bmap_alloc_typed() vx_bmap_alloc() vx_bmap() vx_exh_allocblk() vx_exh_splitbucket() vx_exh_split() vx_dopreamble() vx_rename_tran() vx_pd_rename() DESCRIPTION: When large directory hash is enabled through the vx_dexh_sz(5M) tunable, Veritas File System (VxFS) uses the large directory hash for directories. When you rename a file, a new directory entry is inserted to the hash table, which results in hash split. The hash split fails the current transaction and retries after some housekeeping jobs complete. These jobs include allocating more space for the hash table. However, VxFS doesn't check the return value of the preamble job. And thus, when VxFS runs out of space, the rename transaction is re-entered permanently without knowing if more space is allocated by preamble jobs. RESOLUTION: The code is modified to enable VxFS to exit looping when ENOSPC is returned from the preamble job. * 3356909 (Tracking ID: 3335272) SYMPTOM: The mkfs (make file system) command dumps core when the log size provided is not aligned. The following stack trace is displayed: (gdb) bt #0 find_space () #1 place_extents () #2 fill_fset () #3 main () (gdb) DESCRIPTION: While creating the VxFS file system using the mkfs command, if the log size provided is not aligned properly, you may end up in doing miscalculations for placing the RCQ extents and finding no place. This leads to illegal memory access of AU bitmap and results in core dump. RESOLUTION: The code is modified to place the RCQ extents in the same AU where log extents are allocated. * 3357264 (Tracking ID: 3350804) SYMPTOM: On RHEL6, VxFS can sometimes report system panic with errors as Thread overruns stack, or stack corrupts. DESCRIPTION: On RHEL6, the low-stack memory allocations consume significant memory, especially when system is under memory pressure and takes page allocator route. This breaks our earlier assumptions of stack depth calculations. RESOLUTION: The code is modified to add a check in case of RHEL6 before doing low-stack allocations for available stack size, VxFS re-tunes the various stack depth calculations for each distribution separately to minimize performance penalties. * 3357278 (Tracking ID: 3340286) SYMPTOM: The tunable setting of dalloc_enable gets reset to a default value after a file system is resized. DESCRIPTION: The file system resize operation triggers the file system re-initialization process. During this process, the tunable value of dalloc_enable gets reset to the default value instead of retaining the old tunable value. RESOLUTION: The code is fixed such that the old tunable value of dalloc_enable is retained. * 3100385 (Tracking ID: 3369020) SYMPTOM: The Veritas File System (VxFS) module fails to load in RHEL 6 Update 4 environments. DESCRIPTION: The module fails to load because of two kABI incompatibilities with the RHEL 6 Update 4 environment. RESOLUTION: The code is modified to ensure that the VxFS module is supported in RHEL 6 Update 4 environments. * 2912412 (Tracking ID: 2857629) SYMPTOM: When a new node takes over a primary for the file system, it could process stale shared extent records in a per node queue. The primary will detect a bad record and set the full fsck flag. It will also disable the file system to prevent further corruption. DESCRIPTION: Every node in the cluster that adds or removes references to shared extents, adds the shared extent records to a per node queue. The primary node in the cluster processes the records in the per node queues and maintains reference counts in a global shared extent device. In certain cases the primary node might process bad or stale records in the per node queue. Two situations under which bad or stale records could be processed are: 1. clone creation initiated from a secondary node immediately after primary migration to different node. 2. queue wraparound on any node and take over of primary by new node immediately afterwards. Full fsck might not be able to rectify the file system corruption. RESOLUTION: Update the per node shared extent queue head and tail pointers to correct values on primary before starting processing of shared extent records. * 2928921 (Tracking ID: 2843635) SYMPTOM: The VxFS internal testing, there are some failures during the reorg operation of structural files. DESCRIPTION: While the reorg is in progress, from certain ioctl, the error value that is to be returned is overwritten and thus results in an incorrect error value and test failures. RESOLUTION: Made changes accordingly so as the error value is not overwritten. * 2933290 (Tracking ID: 2756779) SYMPTOM: Write and read performance concerns on Cluster File System (CFS) when running applications that rely on POSIX file-record locking (fcntl). DESCRIPTION: The usage of fcntl on CFS leads to high messaging traffic across nodes thereby reducing the performance of readers and writers. RESOLUTION: The code is modified to cache the ranges that are being file-record locked on the node. This is tried whenever possible to avoid broadcasting of messages across the nodes in the cluster. * 2933291 (Tracking ID: 2806466) SYMPTOM: A reclaim operation on a file system that is mounted on an LVM volume using the fsadm(1M) command with the -R option may panic the system. And the following stack trace is displayed: vx_dev_strategy+0xc0() vx_dummy_fsvm_strategy+0x30() vx_ts_reclaim+0x2c0() vx_aioctl_common+0xfd0() vx_aioctl+0x2d0() vx_ioctl+0x180() DESCRIPTION: Thin reclamation supports only mounted file systems on a VxVM volume. RESOLUTION: The code is modified to return errors without panicking the system if the underlying volume is LVM. * 2933292 (Tracking ID: 2895743) SYMPTOM: It takes a longer than usual time for many Windows7 clients to log off in parallel if the user profile is stored in Cluster File system (CFS). DESCRIPTION: Veritas File System (VxFS) keeps file creation time/full ACL things for samba clients in the extended attribute which is implemented via named streams. VxFS reads the named stream for each of the ACL objects. Reading of named stream is a costly operation, as it results in an open, an opendir, a lookup, and another open to get the fd. The VxFS function vx_nattr_open() holds the exclusive rwlock to read an ACL object that stored as extended attribute. It may cause heavy lock contention when many threads want the same lock. They might get blocked until one of the nattr_open releases it. This takes time since nattr_open is very slow. RESOLUTION: The code is modified so that it takes the rwlock in shared mode instead of Exclusive mode. * 2933294 (Tracking ID: 2750860) SYMPTOM: Performance of the write operation with small request size may degrade on a large Veritas File System (VxFS) file system. Many threads may be found sleeping with the following stack trace: vx_sleep_lock vx_lockmap vx_getemap vx_extfind vx_searchau_downlevel vx_searchau_downlevel vx_searchau_downlevel vx_searchau_downlevel vx_searchau_uplevel vx_searchau+0x600 vx_extentalloc_device vx_extentalloc vx_te_bmap_alloc vx_bmap_alloc_typed vx_bmap_alloc vx_write_alloc3 vx_recv_prealloc vx_recv_rpc vx_msg_recvreq vx_msg_process_thread kthread_daemon_startup DESCRIPTION: A VxFS allocate unit (AU) is composed of 32768 disk blocks, and can be expanded when it is partially allocated, or non-expanded when the AU is fully occupied or completely unused. The extent map for a large file system with 1k block size is organized as a big tree. For example, a 4-TB file system with 1KB file system block size can have up to 128k Aus. To find an appropriate extent, VxFS extent allocation algorithm will first search expanded AU to avoid causing free space fragmentation by traversing the free extent map tree. If getting failed, it will do the same with the non-expanded AUs. When there are too many small extents(less than 32768 blocks) requests, and all the small free extents are used up, but a large number of au-size extents (32768 blocks) are available; the file system could run into this hang. Because of no small available extents in the expanded AUs, VxFS will look for some larger non-expanded extents, namely au-size extents, which are not what VxFS wanted (expanded AU is expected). As a result, each request will walk along the big extent map tree for every au-size extent, which will end up with failure finally. The requested extent can be gotten during the second attempt for non-expanded AUs eventually, but the unnecessary work consumes a lot of CPU resource. RESOLUTION: The code is modified to optimize the free-extend-search algorithm by skipping certain au-size extents to reduce the overall search time. * 2933296 (Tracking ID: 2923105) SYMPTOM: Removing the Veritas File System (VxFS) module using rmmod(8) on a system having heavy buffer cache usage may hang. DESCRIPTION: When a large number of buffers are allocated from the buffer cache, at the time of removing VxFS module, the process of freeing the buffers takes a long time. RESOLUTION: The code is modified to use an improved algorithm which prevents it from traversing the free lists even if it has found the free chunk. Instead, it will break out from the search and free that buffer. * 2933309 (Tracking ID: 2858683) SYMPTOM: The reserve-extent attributes are changed after the vxrestore(1M ) operation, for files that are greater than 8192 bytes. DESCRIPTION: A local variable is used to contain the number of the reserve bytes that are reused during the vxrestore(1M) operation, for further VX_SETEXT ioctl call for files that are greater than 8k. As a result, the attribute information is changed. RESOLUTION: The code is modified to preserve the original variable value till the end of the function. * 2933313 (Tracking ID: 2841059) SYMPTOM: The file system gets marked for a full fsck operation and the following message is displayed in the system log: V-2-96: vx_setfsflags file system fullfsck flag set - vx_ierror vx_setfsflags+0xee/0x120 vx_ierror+0x64/0x1d0 [vxfs] vx_iremove+0x14d/0xce0 vx_attr_iremove+0x11f/0x3e0 vx_fset_pnlct_merge+0x482/0x930 vx_lct_merge_fs+0xd1/0x120 vx_lct_merge_fs+0x0/0x120 vx_walk_fslist+0x11e/0x1d0 vx_lct_merge+0x24/0x30 vx_workitem_process+0x18/0x30 vx_worklist_process+0x125/0x290 vx_worklist_thread+0x0/0xc0 vx_worklist_thread+0x6d/0xc0 vx_kthread_init+0x9b/0xb0 V-2-17: vx_iremove_2 : file system inode 15 marked bad incore DESCRIPTION: Due to a race condition, the thread tries to remove an attribute inode that has already been removed by another thread. Hence, the file system is marked for a full fsck operation and the attribute inode is marked as 'bad ondisk'. RESOLUTION: The code is modified to check if the attribute node that a thread is trying to remove has already been removed. * 2933330 (Tracking ID: 2773383) SYMPTOM: Deadlock involving two threads are observed. One holding the semaphore and waiting for the 'irwlock' and the other holding 'irwlock' and waiting for the 'mmap' semaphore and the following stack trace is displayed: vx_rwlock vx_naio_do_work vx_naio_worker vx_kthread_init kernel_thread. DESCRIPTION: The hang in 'down_read' occurs due to waiting for the mmap_sem. The thread holding the mmap_sem is waiting for the RWLOCK. This is being held by one of the threads wanting the mmap_sem, and hence the deadlock observed. An enhancement was made to not take the mmap_sem for cio and mmap. This was not complete and did not allow the mmap_sem to be taken for native asynch io calls when using this nommapcio option. RESOLUTION: The code is modified to skip taking the mmap_sem in case of native-io if the file has CIO advisory set. * 2933333 (Tracking ID: 2893551) SYMPTOM: When the Network File System (NFS) connections experience a high load, the file attribute value is replaced with question mark symbols. This issue occurs because the EACCESS() function got appended with the ls -l command when the cached entries from the NFS server were deleted. DESCRIPTION: The Veritas File System (VxFS) uses capabilities, such as, CAP_CHOWN to override the default inode permissions and allow users to search for directories. VxFS allows users to perform the search operation even when the 'r' or 'x' bits are not set as permissions. When the nfsd file system uses these capabilities to perform a dentry reconnect to connect to the dentry tree, some of the Linux file systems use the inode_permission() function to verify if a user is authorized to perform the operation. When dentry connects on behalf of the disconnected dentries then nfsd file system enables all capabilities without setting the on-wire user id to fsuid. Hence, VxFS fails to understand these capabilities and reports an error stating that the user doesn't have permissions on the directory. RESOLUTION: The code is modified to enable the vx_iaccess() function to check if Linux is processing the capabilities before returning the EACCES() function. This modification adds a minimum capability support for the nfsd file system. * 2933335 (Tracking ID: 2641438) SYMPTOM: When a system is unexpectedly shutdown, on a restart, you may lose the modifications that are performed on the username space-extended attributes ("user"). DESCRIPTION: The modification of a username space-extended attribute leads to an asynchronous-write operation. As a result, these modifications get lost during an unexpected system shutdown. RESOLUTION: The code is modified such that the modifications that are performed on the username space-extended attributes before the shutdown are made synchronous. * 2933571 (Tracking ID: 2417858) SYMPTOM: When the hard/soft limit of quota is specified above 1TB, the command fails and gives error. DESCRIPTION: We store the quota records corresponding to users in the external and internal quota files. In external quota file, the record are in the form of structure which are 32 bit. So, we can specify the block limits upto 32-bit value (1TB). This limit was insufficient in many cases. RESOLUTION: Made use of 64-bit structures and 64-bit limit macros to let users have usage/limits greater than 1 TB. * 2933729 (Tracking ID: 2611279) SYMPTOM: Filesystem with shared extents may panic with following stack trace. page_fault vx_overlay_bmap vx_bmap_lookup vx_bmap vx_local_get_sharedblkcnt vx_get_sharedblkcnt vx_aioctl_get_sharedblkcnt vx_aioctl_common mntput_no_expire vx_aioctl vx_ioctl DESCRIPTION: The mechanism to manage shared extent uses special file. We never expects HOLE in this special file. In HOLE cases we may see panic while working on this file. RESOLUTION: Code has been modified to check if HOLE is present in special file. In case if it is HOLE processing is skipped and thus panic is avoided. * 2933751 (Tracking ID: 2916691) SYMPTOM: fsdedup infinite loop with the following stack: #5 [ffff88011a24b650] vx_dioread_compare at ffffffffa05416c4 #6 [ffff88011a24b720] vx_read_compare at ffffffffa05437a2 #7 [ffff88011a24b760] vx_dedup_extents at ffffffffa03e9e9b #11 [ffff88011a24bb90] vx_do_dedup at ffffffffa03f5a41 #12 [ffff88011a24bc40] vx_aioctl_dedup at ffffffffa03b5163 DESCRIPTION: vx_dedup_extents() do the following to dedup two files: 1. Compare the data extent of the two files that need to be deduped. 2. Split both files' bmap to make them share the first file's common data extent. 3. Free the duplicate data extent of the second file. In step 2, During bmap split, vx_bmap_split() might need to allocate space for the inode's bmap to add new bmap entries, which will add emap to this transaction. (This condition is more likely to hit if the dedup is being run on two large files that have interleaved duplicate/difference data extents, the files bmap will needed to be splited more in this case) In step 3, vx_extfree1() doesn't support Multi AU extent free if there is already an emap in the same transaction, In this case, it will return VX_ETRUNCMAX. (Please see incident e569695 for history of this limitation) VX_ETRUNCMAX is a retirable error, so vx_dedup_extents() will undo everything in the transaction and retry from the beginning, then hit the same error again. Thus infinite loop. RESOLUTION: We make vx_te_bmap_split() always register an transaction preamble for the bmap split operation in dedup, and let vx_dedup_extents() perform the preamble at a separate transaction before it retry the dedup operation. * 2933822 (Tracking ID: 2624262) SYMPTOM: Panic hit in vx_bc_do_brelse() function while executing dedup functionality with following backtrace. vx_bc_do_brelse() vx_mixread_compare() vx_dedup_extents() enqueue_entity() __alloc_pages_slowpath() __get_free_pages() vx_getpages() vx_do_dedup() vx_aioctl_dedup() vx_aioctl_common() vx_rwunlock() vx_aioctl() vx_ioctl() vfs_ioctl() do_vfs_ioctl() sys_ioctl() DESCRIPTION: While executing function vx_mixread_compare() in dedup codepath, we hit error due to which an allocated data structure remained uninitialised. The panic occurs due to writing to this uninitialised allocated data structure in the function vx_mixread_compare(). RESOLUTION: Code is changed to free the memory allocated to the data structure when we are going out due to error. * 2937367 (Tracking ID: 2923867) SYMPTOM: Got assert hit due to VX_RCQ_PROCESS_MSG having lower priority(Numerically) than VX_IUPDATE_MSG; DESCRIPTION: When primary is going to send VX_IUPDATE_MSG message to the owner of the inode about updation of the inode's non-transactional field change then it checks for the current messaging priority(for VX_RCQ_PROCESS_MSG) with the priority of the message being sent(VX_IUPDATE_MSG) to avoid possible deadlock. In our case we were getting VX_RCQ_PROCESS_MSG priority numerically lower than VX_IUPDATE_MSG thus getting assert hit. RESOLUTION: We have changed the VX_RCQ_PROCESS_MSG priority numerically higher than VX_IUPDATE_MSG thus avoiding possible assert hit. * 2976664 (Tracking ID: 2906018) SYMPTOM: In the event of a system crash, the fsck-intent-log is not replayed and file system is marked clean. Subsequently, mounting the file-system-extended operations is not completed. DESCRIPTION: Only when a file system that contains PNOLTs is mounted locally (mounted without using 'mount -o cluster') are potentially exposed to this issue. The reason why fsck silently skips the intent-log replay is that each PNOLT has a flag to identify whether the intent-log is dirty or not - in the event of a system crash this flag signifies whether intent-log replay is required or not. In the event of a system crash whilst the file system was mounted locally and the PNOLTs are not utilized. The fsck intent-log replay will still check for the flags in the PNOLTs, however, these are the wrong flags to check if the file system was locally mounted. The fsck intent-log replay therefore assumes that the intent-logs are clean (because the PNOLTs are not marked dirty) and it therefore skips the replay of intent-log altogether. RESOLUTION: The code is modified such that when PNOLTs exist in the file system, VxFS will set the dirty flag in the CFS primary PNOLT while mounting locally. With this change, in the event of system crash whilst a file system is locally mounted, the subsequent fsck intent-log replay will correctly utilize the PNOLT structures and successfully replay the intent log. * 2978227 (Tracking ID: 2857751) SYMPTOM: The internal testing hits the assert "f:vx_cbdnlc_enter:1a" when the upgrade was in progress. DESCRIPTION: The clone/fileset should be mounted if there is an attempt to add an entry in the dnlc. If the clone/fileset is not mounted and still there is an attempt to add it to dnlc, then it is not valid. RESOLUTION: Fix is added to check if filset is mounted or not before adding an entry to dnlc. * 2983739 (Tracking ID: 2857731) SYMPTOM: Internal testing hits an assert "f:vx_mapdeinit:1" while the file system is frozen and not disabled. DESCRIPTION: while performing deinit for a free inode map, the delegation state should not be set.This is actually a race between freeze/release-dele/fs-reinit sequence and processing of extops during reconfig. RESOLUTION: Taking appropriate locks during the extop processing so that fs structures remain quiesced during the switch. * 2984589 (Tracking ID: 2977697) SYMPTOM: Deleting checkpoints of file systems with character special device files viz. /dev/null using fsckptadm may panic the machine with the following stack trace: vx_idetach vx_inode_deinit vx_idrop vx_inull_list vx_workitem_process vx_worklist_process vx_worklist_thread vx_kthread_init DESCRIPTION: During the checkpoint removal operation the type of the inodes is converted to 'pass through inode'. During a conversion we try to refer to the device reference for the special file, which is invalid in the clone context leading to a panic. RESOLUTION: The code is modified to remove device reference of the special character files during the clone removal operation thus preventing the panic. * 2987373 (Tracking ID: 2881211) SYMPTOM: File ACLs not preserved in checkpoints properly if file has hardlink. Works fine with file ACLs which don't have hardlinks. DESCRIPTION: This issue is with attribute inode. When we add an acl entry, if its in the immediate area its propagated to the clone . But in the case if attribute inode is created, its not being propagated to the checkpoint. We are missing push in the context of attribute inode and so getting this issue. RESOLUTION: Modified the code to propagate the ACLs entries (attribute inode case) to the clone. * 2988749 (Tracking ID: 2821152) SYMPTOM: Internal Stress test hit an assert "f:vx_dio_physio:4, 1" on locally mounter file system. DESCRIPTION: The issue was that there were two pages that were getting allocated, because the block size of the FS is of 8k. Now, while coming for "dio" the pages which were getting allocated they were tested whether the pages are good for I/O or not(VX_GOOD_IO_PAGE) i.e. its _count should be greater than zero(page_count(pp) > 0) or the page is compound(PageCompund(pp)) or the its reserverd(PageReserved(pp)). The first page was getting passed through the assert because the _count of the page was greater than zero and the second page was hitting the assert as all three conditions were failing for it. The problem why second page didn't have _count greater than 0 was because the compound page was getting allocated for it and in case of compound only the head page maintains the count. RESOLUTION: Code changed such that compound allocation shouldn't be done for pages greater than the PAGESIZE, instead we use VX_BUF_KMALLOC(). * 3008450 (Tracking ID: 3004466) SYMPTOM: Installation of 5.1SP1RP3 fails on RHEL 6.3 DESCRIPTION: Installation of 5.1SP1RP3 fails on RHEL 6.3 RESOLUTION: Updated the install script to handle the installation failure. Patch ID: VRTSdbac-6.0.500.300 * 3952512 (Tracking ID: 3951435) SYMPTOM: RHEL6.x RETPOLINE kernels and RHEL 6.10 are not supported DESCRIPTION: Red Hat has released RHEL 6.10 which has RETPOLINE kernel, and also released RETPOLINE kernels for older RHEL6.x Updates. Veritas Cluster Server kernel modules need to be recompiled with RETPOLINE aware GCC to support RETPOLINE kernel. RESOLUTION: Support for RHEL 6.10 and RETPOLINE kernels on RHEL6.x kernels is now introduced. * 3952996 (Tracking ID: 3952988) SYMPTOM: RHEL6.x RETPOLINE kernels are not supported. DESCRIPTION: Red Hat has released RETPOLINE kernel for older RHEL6.x releases. Veritas Cluster Server kernel modules need to be recompiled with RETPOLINE aware GCC to support RETPOLINE kernel. RESOLUTION: Support for RETPOLINE kernels on RHEL6.x kernels is now introduced. Patch ID: VRTSamf-6.0.500.500 * 3952511 (Tracking ID: 3951435) SYMPTOM: RHEL6.x RETPOLINE kernels and RHEL 6.10 are not supported DESCRIPTION: Red Hat has released RHEL 6.10 which has RETPOLINE kernel, and also released RETPOLINE kernels for older RHEL6.x Updates. Veritas Cluster Server kernel modules need to be recompiled with RETPOLINE aware GCC to support RETPOLINE kernel. RESOLUTION: Support for RHEL 6.10 and RETPOLINE kernels on RHEL6.x kernels is now introduced. * 3952995 (Tracking ID: 3952988) SYMPTOM: RHEL6.x RETPOLINE kernels are not supported. DESCRIPTION: Red Hat has released RETPOLINE kernel for older RHEL6.x releases. Veritas Cluster Server kernel modules need to be recompiled with RETPOLINE aware GCC to support RETPOLINE kernel. RESOLUTION: Support for RETPOLINE kernels on RHEL6.x kernels is now introduced. * 3749188 (Tracking ID: 3749172) SYMPTOM: Lazy un-mount of VxFS file system causes node panic DESCRIPTION: In case of lazy unmount, the call to lookup_bdev for VxFS file system causes node panic. RESOLUTION: Code changes done in AMF module to fix this issue. Patch ID: VRTSvxfen-6.0.500.700 * 3952509 (Tracking ID: 3951435) SYMPTOM: RHEL6.x RETPOLINE kernels and RHEL 6.10 are not supported DESCRIPTION: Red Hat has released RHEL 6.10 which has RETPOLINE kernel, and also released RETPOLINE kernels for older RHEL6.x Updates. Veritas Cluster Server kernel modules need to be recompiled with RETPOLINE aware GCC to support RETPOLINE kernel. RESOLUTION: Support for RHEL 6.10 and RETPOLINE kernels on RHEL6.x kernels is now introduced. * 3952993 (Tracking ID: 3952988) SYMPTOM: RHEL6.x RETPOLINE kernels are not supported. DESCRIPTION: Red Hat has released RETPOLINE kernel for older RHEL6.x releases. Veritas Cluster Server kernel modules need to be recompiled with RETPOLINE aware GCC to support RETPOLINE kernel. RESOLUTION: Support for RETPOLINE kernels on RHEL6.x kernels is now introduced. * 3864474 (Tracking ID: 3864470) SYMPTOM: When the I/O fencing startup script starts, it tries to configure fencing. The startup script exits if it there is any error in configuring the VxFEN module. DESCRIPTION: Any errors while trying to configuring VxFEN causes I/O fencing startup script to exit the retry operation after a specific number of tries. If the errors get resolved after the startup script has exited, you need to manually start the I/O fencing program. RESOLUTION: When the I/O fencing program starts the vxfen init script waits indefinitely for vxfenconfig utility to configure VxFEN module. * 3914135 (Tracking ID: 3913303) SYMPTOM: A non root users can not read VxFEN log files. DESCRIPTION: Non-root users, do not have read permissions for VxFEN log files. RESOLUTION: The VxFEN code is changed to allow non root users to read VxFEN log files. Patch ID: VRTSgab-6.0.500.400 * 3952507 (Tracking ID: 3951435) SYMPTOM: RHEL6.x RETPOLINE kernels and RHEL 6.10 are not supported DESCRIPTION: Red Hat has released RHEL 6.10 which has RETPOLINE kernel, and also released RETPOLINE kernels for older RHEL6.x Updates. Veritas Cluster Server kernel modules need to be recompiled with RETPOLINE aware GCC to support RETPOLINE kernel. RESOLUTION: Support for RHEL 6.10 and RETPOLINE kernels on RHEL6.x kernels is now introduced. * 3952991 (Tracking ID: 3952988) SYMPTOM: RHEL6.x RETPOLINE kernels are not supported. DESCRIPTION: Red Hat has released RETPOLINE kernel for older RHEL6.x releases. Veritas Cluster Server kernel modules need to be recompiled with RETPOLINE aware GCC to support RETPOLINE kernel. RESOLUTION: Support for RETPOLINE kernels on RHEL6.x kernels is now introduced. Patch ID: VRTSllt-6.0.500.400 * 3952506 (Tracking ID: 3951435) SYMPTOM: RHEL6.x RETPOLINE kernels and RHEL 6.10 are not supported DESCRIPTION: Red Hat has released RHEL 6.10 which has RETPOLINE kernel, and also released RETPOLINE kernels for older RHEL6.x Updates. Veritas Cluster Server kernel modules need to be recompiled with RETPOLINE aware GCC to support RETPOLINE kernel. RESOLUTION: Support for RHEL 6.10 and RETPOLINE kernels on RHEL6.x kernels is now introduced. * 3952990 (Tracking ID: 3952988) SYMPTOM: RHEL6.x RETPOLINE kernels are not supported. DESCRIPTION: Red Hat has released RETPOLINE kernel for older RHEL6.x releases. Veritas Cluster Server kernel modules need to be recompiled with RETPOLINE aware GCC to support RETPOLINE kernel. RESOLUTION: Support for RETPOLINE kernels on RHEL6.x kernels is now introduced. Patch ID: VRTSvxvm 6.0.500.500 * 3952559 (Tracking ID: 3951290) SYMPTOM: Retpoline support for VxVM on RHEL6.10 and RHEL6.x retpoline kernels DESCRIPTION: The RHEL6.10 is new release and it has Retpoline kernel. Also redhat released retpoline kernel for older RHEL6.x releases. The VxVM module should be recompiled with retpoline aware GCC to support retpoline kernel. RESOLUTION: Compiled VxVM with retpoline GCC. * 3889559 (Tracking ID: 3889541) SYMPTOM: After splitting the mirrored-disk to a separate Disk-group using vxrootadm utility, the grub entry for split-dg has incorrect hdnum like shown below - .. #vxvm_root_splitdg_vx_mirror_dg_START (do not remove) title vxvm_root_splitdg_vx_mirror_dg root (hdhd1, 1) <<<<<<< the 'hd' Prefix is extra here. kernel /boot/vmlinuz-2.6.32-642.el6.x86_64 ro root=/dev/vx/dsk/bootdg/rootvol rd_NO_LUKS KEYBOARDTYPE=pc KEYTABLE=us LANG=en_US.UTF-8 rd_NO_MD SYS FONT=latarcyrheb-sun16 crashkernel=auto rd_NO_LVM rd_NO_DM rhgb quiet rhgb quiet initrd /boot/VxVM_initrd.img #vxvm_root_splitdg_vx_mirror_dg_END (do not remove) .. DESCRIPTION: When we Perform Root-Disk-Encapsulationt, After Encapsulation and Mirroring, while we Split the root disk and the mirror disk, the device hdnum of the mirror disk is extra-prefixed by 'hd' string and erroneously It becomes as 'hdhd1' in the grub.conf file. This is wrong for the grub-entry and boot gets stuck while booting from this grub-entry. RESOLUTION: Code Changes have been made to appropriately update the grub.conf file * 3496715 (Tracking ID: 3281004) SYMPTOM: For DMP minimum queue I/O policy with large number of CPUs, the following issues are observed since the VxVM 5.1 SP1 release: 1. CPU usage is high. 2. I/O throughput is down if there are many concurrent I/Os. DESCRIPTION: The earlier minimum queue I/O policy is used to consider the host controller I/O load to select the least loaded path. For VxVM 5.1 SP1 version, an addition was made to consider the I/O load of the underlying paths of the selected host based controllers. However, this resulted in the performance issues, as there were lock contentions with the I/O processing functions and the DMP statistics daemon. RESOLUTION: The code is modified such that the host controller paths I/O load is not considered to avoid the lock contention. * 3501358 (Tracking ID: 3399323) SYMPTOM: The reconfiguration of Dynamic Multipathing (DMP) database fails with the below error: VxVM vxconfigd DEBUG V-5-1-0 dmp_do_reconfig: DMP_RECONFIGURE_DB failed: 2 DESCRIPTION: As part of the DMP database reconfiguration process, controller information from DMP user-land database is not removed even though it is removed from DMP kernel database. This creates inconsistency between the user-land and kernel-land DMP database. Because of this, subsequent DMP reconfiguration fails with above error. RESOLUTION: The code changes have been made to properly remove the controller information from the user-land DMP database. * 3502662 (Tracking ID: 3380481) SYMPTOM: When the vxdiskadm(1M) command selects a removed disk during the "5 Replace a failed or removed disk" operation, the vxdiskadm(1M) command displays the following error message: "/usr/lib/vxvm/voladm.d/lib/vxadm_syslib.sh: line 2091:return: -1: invalid option". DESCRIPTION: From bash version 4.0, bash doesnt accept negative error values. If VxVM scripts return negative values to bash, the error message is displayed. RESOLUTION: The code is modified so that VxVM scripts dont return negative values to bash. * 3521727 (Tracking ID: 3521726) SYMPTOM: When using Symantec Replication Option, system panic happens while freeing memory with the following stack trace on AIX, pvthread+011500 STACK: [0001BF60]abend_trap+000000 () [000C9F78]xmfree+000098 () [04FC2120]vol_tbmemfree+0000B0 () [04FC2214]vol_memfreesio_start+00001C () [04FCEC64]voliod_iohandle+000050 () [04FCF080]voliod_loop+0002D0 () [04FC629C]vol_kernel_thread_init+000024 () [0025783C]threadentry+00005C () DESCRIPTION: In certain scenarios, when a write IO gets throttled or un-winded in VVR, we free the memory related to one of our data structures. When we restart this IO, the same memory gets illegally accessed and freed again even though it was freed.It causes system panic. RESOLUTION: Code changes have been done to fix the illegal memory access issue. * 3526501 (Tracking ID: 3526500) SYMPTOM: Disk IO failures occur with DMP IO timeout error messages when DMP (Dynamic Multi-pathing) IO statistics daemon is not running. Following are the timeout error messages: VxVM vxdmp V-5-3-0 I/O failed on path 65/0x40 after 1 retries for disk 201/0x70 VxVM vxdmp V-5-3-0 Reached DMP Threshold IO TimeOut (100 secs) I/O with start 3e861909fa0 and end 3e86190a388 time VxVM vxdmp V-5-0-0 [Error] i/o error occurred (errno=0x206) on dmpnode 201/0x70 DESCRIPTION: When IO is submitted to DMP, it sets the start time on the IO buffer. The value of the start time depends on whether the DMP IO statistics daemon is running or not. When the IO is returned as error from SCSI to DMP, instead of retrying the IO on alternate paths, DMP failed that IO with 300 seconds timeout error, but the IO has elapsed only few milliseconds in its execution. The miscalculation of DMP timeout happens only when DMP IO statistics daemon is not running. RESOLUTION: The code is modified to calculate appropriate DMP IO timeout value when DMP IO statistics demon is not running. * 3531332 (Tracking ID: 3077582) SYMPTOM: A Veritas Volume Manager (VxVM) volume may become inaccessible causing the read/write operations to fail with the following error: # dd if=/dev/vx/dsk/<dg>/<volume> of=/dev/null count=10 dd read error: No such device 0+0 records in 0+0 records out DESCRIPTION: If I/Os to the disks timeout due to some hardware failures like weak Storage Area Network (SAN) cable link or Host Bus Adapter (HBA) failure, VxVM assumes that the disk is faulty or slow and it sets the failio flag on the disk. Due to this flag, all the subsequent I/Os fail with the No such device error. RESOLUTION: The code is modified such that vxdisk now provides a way to clear the failio flag. To check whether the failio flag is set on the disks, use the vxkprint(1M) utility (under /etc/vx/diag.d). To reset the failio flag, execute the vxdisk set <disk_name> failio=off command, or deport and import the disk group that holds these disks. * 3540777 (Tracking ID: 3539548) SYMPTOM: Adding MPIO(Multi Path I/O) disk that had been removed earlier may result in following two issues: 1. 'vxdisk list' command shows duplicate entry for DMP (Dynamic Multi-Pathing) node with error state. 2. 'vxdmpadm listctlr all' command shows duplicate controller names. DESCRIPTION: 1. Under certain circumstances, deleted MPIO disk record information is left in /etc/vx/disk.info file with its device number as -1 but its DMP node name is reassigned to other MPIO disk. When the deleted disk is added back, it is assigned the same name, without validating for conflict in the name. 2. When some devices are removed and added back to the system, we are adding a new controller for each and every path that we have discovered. This leads to duplicated controller entries in DMP database. RESOLUTION: 1. Code is modified to properly remove all stale information about any disk before updating MPIO disk names. 2. Code changes have been made to add the controller for selected paths only. * 3552411 (Tracking ID: 3482026) SYMPTOM: The vxattachd(1M) daemon reattaches plexes of manually detached site. DESCRIPTION: The vxattachd daemon reattaches plexes for a manually detached site that is the site with state as OFFLINE. As there was no check to differentiate between a manually detach site and the site that was detached due to IO failure. Hence, the vxattachd(1M) daemon brings the plexes online for manually detached site also. RESOLUTION: The code is modified to differentiate between manually detached site and the site detached due to IO failure. * 3600161 (Tracking ID: 3599977) SYMPTOM: During a replica connection, referencing a port that is already deleted in another thread causes a system panic with a similar stack trace as below: .simple_lock() soereceive() soreceive() .kernel_add_gate_cstack() kmsg_sys_rcv() nmcom_get_next_mblk() nmcom_get_next_msg() nmcom_wait_msg_tcp() nmcom_server_proc_tcp() nmcom_server_proc_enter() vxvm_start_thread_enter() DESCRIPTION: During a replica connection, a port is created before increasing the count. This is to protect the port from getting deleted. However, another thread deletes the port before the count is increased and after the port is created. While the replica connection thread proceeds, it refers to the port that is already deleted, which causes a NULL pointer reference and a system panic. RESOLUTION: The code is modified to prevent asynchronous access to the count that is associated with the port by means of locks. * 3603811 (Tracking ID: 3594158) SYMPTOM: The system panics on a VVR secondary node with the following stack trace: .simple_lock() soereceive() soreceive() .kernel_add_gate_cstack() kmsg_sys_rcv() nmcom_get_next_mblk() nmcom_get_next_msg() nmcom_wait_msg_tcp() nmcom_server_proc_tcp() nmcom_server_proc_enter() vxvm_start_thread_enter() DESCRIPTION: You may issue a spinlock or unspinlock to the replica to check whether to use a checksum in the received packet. During the lock or unlock operation, if there is a transaction that is being processed with the replica, which rebuilds the replica object in the kernel, then there is a possibility that the replica referenced in spinlock is different than the one which the replica has referenced in unspinlock (especially when the replica is referenced through several pointers). As a result, the system panics. RESOLUTION: The code is modified to set the flag in the port attribute to indicate whether to use the checksum during a port creation. Hence, for each packet that is received, you only need to check the flag in the port attribute rather than referencing it to the replica object. As part of the change, the spinlock or unspinlock statements are also removed. * 3612801 (Tracking ID: 3596330) SYMPTOM: 'vxsnap refresh' operation fails with following indicants: Errors occur from DR (Disaster Recovery) Site of VVR (Veritas Volume Replicator): o vxio: [ID 160489 kern.notice] NOTICE: VxVM vxio V-5-3-1576 commit: Timedout waiting for rvg [RVG] to quiesce, iocount [PENDING_COUNT] msg 0 o vxvm:vxconfigd: [ID 702911 daemon.warning] V-5-1-8011 Internal transaction failed: Transaction aborted waiting for io drain At the same time, following errors occur from Primary Site of VVR: vxio: [ID 218356 kern.warning] WARNING: VxVM VVR vxio V-5-0-267 Rlink [RLINK] disconnecting due to ack timeout on update message DESCRIPTION: VM (Volume Manager) Transactions on DR site get aborted as pending IOs could not be drained in stipulated time leading to failure of FMR (Fast-Mirror Resync) 'snap' operations. These IOs could not be drained because of IO throttling. A bug/race in conjunction with timing in VVR causes a miss in clearing this throttling condition/state. RESOLUTION: Code changes have been done to fix the race condition which ensures clearance of throttling state at appropriate time. * 3621240 (Tracking ID: 3621232) SYMPTOM: When the vradmin ibc command is executed to initiate the In-band Control (IBC) procedure, the vradmind (VVR daemon) on VVR secondary node goes into the disconnected state. Due to which, the following IBC procedure or vradmin ibc commands cannot be started or executed on VVRs secondary node and a message similar to the following appears on VVRs primary node: VxVM VVR vradmin ERROR V-5-52-532 Secondary is undergoing a state transition. Please re-try the command after some time. VxVM VVR vradmin ERROR V-5-52-802 Cannot start command execution on Secondary. DESCRIPTION: When IBC procedure runs into the commands finish state, the vradmin on VVR secondary node goes into a disconnected state, which the vradmind on primary node fails to realize. In such a scenario, the vradmind on primary refrains from sending a handshake request to the secondary node which can change the primary nodes state from disconnected to running. As a result, the vradmind in the primary node continues to be in the disconnected state and the vradmin ibc command fails to run on the VVR secondary node despite of being in the running state on the VVR primary node. RESOLUTION: The code is modified to make sure the vradmind on VVR primary node is notified while it goes into the disconnected state on VVR secondary node. As a result, it can send out a handshake request to take the secondary node out of the disconnected state. * 3622069 (Tracking ID: 3513392) SYMPTOM: secondary panics when rebooted while heavy IOs are going on primary PID: 18862 TASK: ffff8810275ff500 CPU: 0 COMMAND: "vxiod" #0 [ffff880ff3de3960] machine_kexec at ffffffff81035b7b #1 [ffff880ff3de39c0] crash_kexec at ffffffff810c0db2 #2 [ffff880ff3de3a90] oops_end at ffffffff815111d0 #3 [ffff880ff3de3ac0] no_context at ffffffff81046bfb #4 [ffff880ff3de3b10] __bad_area_nosemaphore at ffffffff81046e85 #5 [ffff880ff3de3b60] bad_area_nosemaphore at ffffffff81046f53 #6 [ffff880ff3de3b70] __do_page_fault at ffffffff810476b1 #7 [ffff880ff3de3c90] do_page_fault at ffffffff8151311e #8 [ffff880ff3de3cc0] page_fault at ffffffff815104d5 #9 [ffff880ff3de3d78] volrp_sendsio_start at ffffffffa0af07e3 [vxio] #10 [ffff880ff3de3e08] voliod_iohandle at ffffffffa09991be [vxio] #11 [ffff880ff3de3e38] voliod_loop at ffffffffa0999419 [vxio] #12 [ffff880ff3de3f48] kernel_thread at ffffffff8100c0ca DESCRIPTION: If the replication stage IOs are started after serialization of the replica volume, replication port could be deleted and set to NULL during handling the replica connection changes, this will cause the panic since we have not checked if the replication port is still valid before referencing to it. RESOLUTION: Code changes have been done to abort the stage IO if replication port is NULL. * 3638039 (Tracking ID: 3625890) SYMPTOM: After running the vxdisk resize command, the following message is displayed: "VxVM vxdisk ERROR V-5-1-8643 Device resize failed: Invalid attribute specification" DESCRIPTION: Two reserved cylinders for special usage for CDS (Cross-platform Data Sharing) VTOC(Volume Table of Contents) disks. In case of expanding a disk with particular disk size on storage side, VxVM(Veritas Volume Manager) may calculate the cylinder number as 2, which causes the vxdisk resize fails with the error message of "Invalid attribute specification". RESOLUTION: The code is modified to avoid the failure of resizing a CDS VTOC disk. * 3648603 (Tracking ID: 3564260) SYMPTOM: VVR commands are unresponsive when replication is paused and resumed in a loop. DESCRIPTION: While Veritas Volume Replicator (VVR) is in the process of sending updates then pausing a replication is deferred until acknowledgements of updates are received or until an error occurs. For some reason, if the acknowledgements get delayed or the delivery fails, the pause operation continues to get deferred resulting in unresponsiveness. RESOLUTION: The code is modified to resolve the issue that caused unresponsiveness. * 3654163 (Tracking ID: 2916877) SYMPTOM: vxconfigd hangs, if a node leaves the cluster, while I/O error handling is in progress. Stack observed is as follows: volcvm_iodrain_dg volcvmdg_abort_complete volcvm_abort_sio_start voliod_loop vol_kernel_thread_init DESCRIPTION: A bug in DCO error handling code can lead to an infinite loop if a node leaves cluster while I/O error handling is in progress. This causes vxconfigd to hang and stop responding to VxVM commands like vxprint, vxdisk etc. RESOLUTION: DCO error handling code has been changed so that I/O errors are handled correctly. Hence, hang is avoided. * 3690795 (Tracking ID: 2573229) SYMPTOM: On RHEL6, the server panics when Dynamic Multi-Pathing (DMP) executes PERSISTENT RESERVE IN command with REPORT CAPABILITIES service action on powerpath controlled device. The following stack trace is displayed: enqueue_entity at ffffffff81068f09 enqueue_task_fair at ffffffff81069384 enqueue_task at ffffffff81059216 activate_task at ffffffff81059253 pull_task at ffffffff81065401 load_balance_fair at ffffffff810657b7 thread_return at ffffffff81527d30 schedule_timeout at ffffffff815287b5 wait_for_common at ffffffff81528433 wait_for_completion at ffffffff8152854d blk_execute_rq at ffffffff8126d9dc emcp_scsi_cmd_ioctl at ffffffffa04920a2 [emcp] PowerPlatformBottomDispatch at ffffffffa0492eb8 [emcp] PowerSyncIoBottomDispatch at ffffffffa04930b8 [emcp] PowerBottomDispatchPirp at ffffffffa049348c [emcp] PowerDispatchX at ffffffffa049390d [emcp] MpxSendScsiCmd at ffffffffa061853e [emcpmpx] ClariionKLam_groupReserveRelease at ffffffffa061e495 [emcpmpx] MpxDefaultRegister at ffffffffa061df0a [emcpmpx] MpxTestPath at ffffffffa06227b5 [emcpmpx] MpxExtraTry at ffffffffa06234ab [emcpmpx] MpxTestDaemonCalloutGuts at ffffffffa062402f [emcpmpx] MpxIodone at ffffffffa0624621 [emcpmpx] MpxDispatchGuts at ffffffffa0625534 [emcpmpx] MpxDispatch at ffffffffa06256a8 [emcpmpx] PowerDispatchX at ffffffffa0493921 [emcp] GpxDispatch at ffffffffa0644775 [emcpgpx] PowerDispatchX at ffffffffa0493921 [emcp] GpxDispatchDown at ffffffffa06447ae [emcpgpx] VluDispatch at ffffffffa068b025 [emcpvlumd] GpxDispatch at ffffffffa0644752 [emcpgpx] PowerDispatchX at ffffffffa0493921 [emcp] GpxDispatchDown at ffffffffa06447ae [emcpgpx] XcryptDispatchGuts at ffffffffa0660b45 [emcpxcrypt] XcryptDispatch at ffffffffa0660c09 [emcpxcrypt] GpxDispatch at ffffffffa0644752 [emcpgpx] PowerDispatchX at ffffffffa0493921 [emcp] GpxDispatch at ffffffffa0644775 [emcpgpx] PowerDispatchX at ffffffffa0493921 [emcp] PowerSyncIoTopDispatch at ffffffffa04978b9 [emcp] emcp_send_pirp at ffffffffa04979b9 [emcp] emcp_pseudo_blk_ioctl at ffffffffa04982dc [emcp] __blkdev_driver_ioctl at ffffffff8126f627 blkdev_ioctl at ffffffff8126faad block_ioctl at ffffffff811c46cc dmp_ioctl_by_bdev at ffffffffa074767b [vxdmp] dmp_kernel_scsi_ioctl at ffffffffa0747982 [vxdmp] dmp_scsi_ioctl at ffffffffa0786d42 [vxdmp] dmp_send_scsireq at ffffffffa078770f [vxdmp] dmp_do_scsi_gen at ffffffffa077d46b [vxdmp] dmp_pr_check_aptpl at ffffffffa07834dd [vxdmp] dmp_make_mp_node at ffffffffa0782c89 [vxdmp] dmp_decode_add_disk at ffffffffa075164e [vxdmp] dmp_decipher_instructions at ffffffffa07521c7 [vxdmp] dmp_process_instruction_buffer at ffffffffa075244e [vxdmp] dmp_reconfigure_db at ffffffffa076f40e [vxdmp] gendmpioctl at ffffffffa0752a12 [vxdmp] dmpioctl at ffffffffa0754615 [vxdmp] dmp_ioctl at ffffffffa07784eb [vxdmp] dmp_compat_ioctl at ffffffffa0778566 [vxdmp] compat_blkdev_ioctl at ffffffff8128031d compat_sys_ioctl at ffffffff811e0bfd sysenter_dispatch at ffffffff81050c20 DESCRIPTION: Dynamic Multi-Pathing (DMP) uses PERSISTENT RESERVE IN command with the REPORT CAPABILITIES service action to discover target capabilities. On RHEL6, system panics unexpectedly when Dynamic Multi-Pathing (DMP) executes PERSISTENT RESERVE IN command with REPORT CAPABILITIES service action on powerpath controlled device coming from EMC Clarion/VNX array. This bug has been reported to EMC powperpath engineering. RESOLUTION: The Dynamic Multi-Pathing (DMP) code is modified to execute PERSISTENT RESERVE IN command with the REPORT CAPABILITIES service action to discover target capabilities only on non-third party controlled devices. * 3713320 (Tracking ID: 3596282) SYMPTOM: FMR (Fast Mirror Resync) operations fail with error "Failed to allocate a new map due to no free map available in DCO". "vxio: [ID 609550 kern.warning] WARNING: VxVM vxio V-5-3-1721 voldco_allocate_toc_entry: Failed to allocate a new map due to no free map available in DCO of [volume]" It often leads to disabling of the snapshot. DESCRIPTION: For instant space optimized snapshots, stale maps are left behind for DCO (Data Change Object) objects at the time of creation of cache objects. So, over the time if space optimized snapshots are created that use a new cache object, stale maps get accumulated, which eventually consume all the available DCO space, resulting in the error. RESOLUTION: Code changes have been done to ensure no stale entries are left behind. * 3737823 (Tracking ID: 3736502) SYMPTOM: When FMR is configured in VVR environment, 'vxsnap refresh' fails with below error message: "VxVM VVR vxsnap ERROR V-5-1-10128 DCO experienced IO errors during the operation. Re-run the operation after ensuring that DCO is accessible". Also, multiple messages of connection/disconnection of replication link(rlink) are seen. DESCRIPTION: Inherently triggered rlink connection/disconnection causes the transaction retries. During transaction, memory is allocated for Data Change Object(DCO) maps and is not cleared on abortion of a transaction. This leads to a problem of memory leak and eventually to exhaustion of maps. RESOLUTION: The fix has been added to clear the allocated DCO maps when transaction aborts. * 3774137 (Tracking ID: 3565212) SYMPTOM: While performing controller giveback operations on NetApp ALUA arrays, the below messages are observed in /etc/vx/dmpevents.log [Date]: I/O error occured on Path belonging to Dmpnode [Date]: I/O analysis done as DMP_PATH_BUSY on Path belonging to Dmpnode [Date]: I/O analysis done as DMP_IOTIMEOUT on Path belonging to Dmpnode DESCRIPTION: During the asymmetric access state transition, DMP puts the buffer pointer in the delay queue based on the flags observed in the logs. This delay resulted in timeout and thereby filesystem went into disabled state. RESOLUTION: DMP code is modified to perform immediate retries instead of putting the buffer pointer in the delay queue for transition in progress case. * 3780978 (Tracking ID: 3762580) SYMPTOM: Below logs were seen while setting up fencing for the cluster. VXFEN: vxfen_reg_coord_pt: end ret = -1 vxfen_handle_local_config_done: Could not register with a majority of the coordination points. DESCRIPTION: It is observed that in Linux kernels greater than or equal to RHEL6.6, RHEL7 and SLES11SP3, the interface used by DMP to send the SCSI commands to block devices does not transfer the data to/from the device. Therefore the SCSI-3 PR keys does not get registered. RESOLUTION: Code change is done to use scsi request_queue to send the SCSI commands to the underlying block device. Additional patch is required from EMC to support processing SCSI commands via the request_queue mechanism on EMC PowerPath devices. Please contact EMC for patch details for a specific kernel version. * 3788751 (Tracking ID: 3788644) SYMPTOM: When DMP (Dynamic Multi-Pathing) native support enabled for Oracle ASM environment, if we constantly adding and removing DMP devices, it will cause error like: /etc/vx/bin/vxdmpraw enable oracle dba 775 emc0_3f84 VxVM vxdmpraw INFO V-5-2-6157 Device enabled : emc0_3f84 Error setting raw device (Invalid argument) DESCRIPTION: There is a limitation (8192) of maximum raw device number N (exclusive) of /dev/raw/rawN. This limitation is defined in boot configuration file. When binding a raw device to a dmpnode, it uses /dev/raw/rawN to bind the dmpnode. The rawN is calculated by one-way incremental process. So even if we unbind the device later on, the "released" rawN number will not be reused in the next binding. When the rawN number is increased to exceed the maximum limitation, the error will be reported. RESOLUTION: Code has been changed to always use the smallest available rawN number instead of calculating by one-way incremental process. * 3799809 (Tracking ID: 3665644) SYMPTOM: The system panics with the following stack trace due to an invalid page pointer in the Linux bio structure: crash_kexec() die() do_page_fault() error_exit() blk_recount_segments() bio_phys_segments() init_request_from_bio() make_request() generic_make_request() gendmpstrategy() generic_make_request() dmp_indirect_io() dmpioctl() dmp_ioctl() dmp_compat_ioctl() DESCRIPTION: A falsified page pointer is returned when Dynamic Muti-Pathing (DMP) allocates memory by calling the Linux vmalloc() function and maps the allocated virtual address to the physical page in the back trace. The issue is observed because DMP refrains from calling the appropriate Linux kernel API leading to a system panic. RESOLUTION: The code is modified to call the correct Linux kernel API while DMP maps the virtual address, which the vmalloc() function allocates to the physical page. * 3799822 (Tracking ID: 3573262) SYMPTOM: On recent UltraSPARC-T4 architectures, Panic is observed with the topmost stack frame pointing to bcopy during snapshot operations involving space optimized snapshots. SPARC-T4:bcopy_more() SPARC-T4:bcopy() vxio:vol_cvol_bplus_delete() vxio:vol_cvol_dshadow1_done() vxio:voliod_iohandle() vxio:voliod_loop() DESCRIPTION: The bcopy kernel library routine on Solaris was optimized to take advantage of recent Ultrasparc-T4 architectures. But it has some known issues for large size copy in some patch versions of Solaris 10. The use of bcopy was causing in-core corruption of cache object metadata. The corruption later lead to system panic. RESOLUTION: The code is modified to use word by word copy of the buffer instead of bcopy kernel library routine. * 3800394 (Tracking ID: 3672759) SYMPTOM: When a DMP database is corrupted, the vxconfigd(1M) daemon may core dump with the following stack trace: database is corrupted. ddl_change_dmpnode_state () ddl_data_corruption_msgs () ddl_reconfigure_all () ddl_find_devices_in_system () find_devices_in_system () req_change_state () request_loop () main () DESCRIPTION: The issue is observed because the corrupted DMP database is not properly destroyed. RESOLUTION: The code is modified to remove the corrupted DMP database. * 3800396 (Tracking ID: 3749557) SYMPTOM: System hangs and becomes unresponsive because of heavy memory consumption by vxvm. DESCRIPTION: In the Dirty Region Logging(DRL) update code path an erroneous condition was present that lead to an infinite loop which keeps on consuming memory. This leads to consumption of large amounts of memory making the system unresponsive. RESOLUTION: Code has been fixed, to avoid the infinite loop, hence preventing the hang which was caused by high memory usage. * 3800449 (Tracking ID: 3726110) SYMPTOM: On systems with high number of CPU's, Dynamic Multipathing (DMP) devices may perform considerably slower than OS device paths. DESCRIPTION: In high CPU configuration, IO statistics related functionality in DMP takes more CPU time as DMP statistics are collected on per CPU basis. This stat collection happens in DMP IO code path hence it reduces the IO performance. Because of this DMP devices perform slower than OS device paths. RESOLUTION: Code changes are made to remove some of the stats collection functionality from DMP IO code path. Along with this, following tunable need to be turned off. 1. Turn off idle lun probing. #vxdmpadm settune dmp_probe_idle_lun=off 2. Turn off statistic gathering functionality. #vxdmpadm iostat stop Notes: 1. Please apply this patch if system configuration has large number of CPU and if DMP is performing considerably slower than OS device paths. For normal systems this issue is not applicable. * 3800452 (Tracking ID: 3437852) SYMPTOM: The system panics when Symantec Replicator Option goes to PASSTHRU mode. Panic stack trace might look like: vol_rp_halt() vol_rp_state_trans() vol_rv_replica_reconfigure() vol_rv_error_handle() vol_rv_errorhandler_callback() vol_klog_start() voliod_iohandle() voliod_loop() DESCRIPTION: When Storage Replicator Log (SRL) gets faulted for any reason, VVR goes into the PASSTHRU Mode. At this time, a few updates are erroneously freed. When these updates are accessed during the correct processing, access to these updates results in panic as the updates are already freed. RESOLUTION: The code changes have been made not to free the updates erroneously. * 3800738 (Tracking ID: 3433503) SYMPTOM: The Vxconfigd(1M) daemon dumps cores with a following stack trace while bringing a disk online: kernel_vsyscall() raise() abort() libc_message() int_free() free() get_geometry_info() devintf_disk_geom_raw() cds_get_geometry() cds_check_for_cds_fmt() auto_determine_format() auto_sys_online() auto_online() da_online() da_thread_online_disk() vold_thread_exec() start_thread() clone() DESCRIPTION: When a disk label, such as DOS label magic is corrupt and is written to a Sun label disk, vxconfigd(1M) daemon reads DOS partition table from the Sun label disk causing a core dump.The core occurs due to an error in the boundary memory access in the DOS partition read code. RESOLUTION: The code is modified to avoid the incorrect memory access. * 3800788 (Tracking ID: 3648719) SYMPTOM: The server panics with a following stack trace while adding or removing LUNs or HBAs: dmp_decode_add_path() dmp_decipher_instructions() dmp_process_instruction_buffer() dmp_reconfigure_db() gendmpioctl() vxdmpioctl() DESCRIPTION: While deleting a dmpnode, Dynamic Multi-Pathing (DMP) releases the memory associated with the dmpnode structure. In case the dmpnode doesn't get deleted for some reason, and if any other tasks access the freed memory of this dmpnode, then the server panics. RESOLUTION: The code is modified to avoid the tasks from accessing the memory that is freed by the dmpnode, which is deleted. The change also fixed the memory leak issue in the buffer allocation code path. * 3801225 (Tracking ID: 3662392) SYMPTOM: In the CVM environment, if I/Os are getting executed on slave node, corruption can happen when the vxdisk resize(1M) command is executing on the master node. DESCRIPTION: During the first stage of resize transaction, the master node re-adjusts the disk offsets and public/private partition device numbers. On a slave node, the public/private partition device numbers are not adjusted properly. Because of this, the partition starting offset is are added twice and causes the corruption. The window is small during which public/private partition device numbers are adjusted. If I/O occurs during this window then only corruption is observed. After the resize operation completes its execution, no further corruption will happen. RESOLUTION: The code has been changed to add partition starting offset properly to an I/O on slave node during execution of a resize command. * 3805243 (Tracking ID: 3523575) SYMPTOM: VxDMP (Veritas Dynamic Multipathing) path restoration daemon could disable paths connected to EMC Clariion array. This can cause DMP nodes to get disabled, eventually leading to IO errors on these DMP nodes. This issue was observed with the following kernel version on SLES 11 SP3, but could be present on other versions as well. # uname -r 3.0.101-0.15-default To confirm if this issue is being hit - check if running "vxdisk scandisks" enables the paths temporarily (the next VxDMP restore daemon cycle will again disable paths). DESCRIPTION: It is observed that very recent Linux kernels have broken the SG_IO kernel-to-kernel ioctl interface. This can cause path health-check routines within the dmpCLARiiON.ko APM (Array Policy Module) to get incorrect data from SCSI inquiry. This can lead to the APM incorrectly marking the paths as failed. This APM is installed by the VRTSaslapm package. RESOLUTION: Changed APM code to use SCSI bypass to get the SG_IO information. * 3805902 (Tracking ID: 3795622) SYMPTOM: With Dynamic Multipathing (DMP) Native Support enabled, LVM global_filter is not updated properly in lvm.conf file to reject the newly added paths. DESCRIPTION: With Dynamic Multipathing (DMP) Native Support enabled, when new paths are added to existing LUNs, LVM global_filter is not updated properly in lvm.conf file to reject the newly added paths. This can lead to duplicate PV (physical volumes) found error reported by LVM commands. RESOLUTION: Code changes have been made to properly update global_filter field in lvm.conf file when new paths are added to existing disks. * 3805938 (Tracking ID: 3790136) SYMPTOM: File system hang can be observed sometimes due to IO's hung in DRL. DESCRIPTION: There might be some IO's hung in DRL of mirrored volume due to incorrect calculation of outstanding IO's on volume and number of active IO's which are currently in progress on DRL. The value of the outstanding IO on volume can get modified incorrectly leading to IO's on DRL not to progress further which in turns results in a hang kind of scenario. RESOLUTION: Code changes have been done to avoid incorrect modification of value of outstanding IO's on volume and prevent the hang. * 3806808 (Tracking ID: 3645370) SYMPTOM: After running the vxevac command, if the user tries to rollback or commit the evacuation for a disk containing DRL plex, the action fails with the following errors: /etc/vx/bin/vxevac -g testdg commit testdg02 testdg03 VxVM vxsd ERROR V-5-1-10127 deleting plex %1: Record is associated VxVM vxassist ERROR V-5-1-324 fsgen/vxsd killed by signal 11, core dumped VxVM vxassist ERROR V-5-1-12178 Could not commit subdisk testdg02-01 in volume testvol VxVM vxevac ERROR V-5-2-3537 Aborting disk evacuation /etc/vx/bin/vxevac -g testdg rollback testdg02 testdg03 VxVM vxsd ERROR V-5-1-10127 deleting plex %1: Record is associated VxVM vxassist ERROR V-5-1-324 fsgen/vxsd killed by signal 11, core dumped VxVM vxassist ERROR V-5-1-12178 Could not rollback subdisk testdg02-01 in volume testvol VxVM vxevac ERROR V-5-2-3537 Aborting disk evacuation DESCRIPTION: When the user uses the vxevac command, new plexes are created on the target disks. Later, during the commit or roll back operation, VxVM deletes the plexes on the source or the target disks. For deleting a plex, VxVM should delete its sub disks first, otherwise the plex deletion fails with the following error message: VxVM vxsd ERROR V-5-1-10127 deleting plex %1: Record is associated The error is displayed because the code does not handle the deletion of subdisks of plexes marked for DRL (dirty region logging) correctly. RESOLUTION: The code is modified to handle evacuation of disks with DRL plexes correctly. . * 3807761 (Tracking ID: 3729078) SYMPTOM: In VVR environment, the panic may occur after SF(Storage Foundation) patch installation or uninstallation on the secondary site. DESCRIPTION: VXIO Kernel reset invoked by SF patch installation removes all Disk Group objects that have no preserved flag set, because the preserve flag is overlapped with RVG(Replicated Volume Group) logging flag, the RVG object won't be removed, but its rlink object is removed, result of system panic when starting VVR. RESOLUTION: Code changes have been made to fix this issue. * 3816233 (Tracking ID: 3686698) SYMPTOM: vxconfigd was getting hung due to deadlock between two threads DESCRIPTION: Two threads were waiting for same lock causing deadlock between them. This will lead to block all vx commands. untimeout function will not return until pending callback is cancelled (which is set through timeout function) OR pending callback has completed its execution (if it has already started). Therefore locks acquired by callback routine should not be held across call to untimeout routine or deadlock may result. Thread 1: untimeout_generic() untimeout() voldio() volsioctl_real() fop_ioctl() ioctl() syscall_trap32() Thread 2: mutex_vector_enter() voldsio_timeout() callout_list_expire() callout_expire() callout_execute() taskq_thread() thread_start() RESOLUTION: Code changes have been made to call untimeout outside the lock taken by callback handler. * 3825466 (Tracking ID: 3825467) SYMPTOM: Error showing about symbol d_lock. DESCRIPTION: This symbol is already present in this kernel, so its duplicate getting declared in kernel code of VxVM. RESOLUTION: Code is modified to remove the definition, hence solved the issue * 3826918 (Tracking ID: 3819670) SYMPTOM: When running smartmove with "vxevac", customer let it run in background by typing ctlr- z and bg command, however, this result in termination of data moving. DESCRIPTION: When doing data moving from user command like "vxevac", we submit the data moving as a task in the kernel, and use select() primitive on the task file descriptor to wait for task finishing events arrived. However, when typing "ctlr-z" plus bg, the select() returns -1 with errno EINTR, which is interpreted by our code logic as user termination action. Hence we terminate the data moving. The correct behavior should be retrying the select() to wait for task finishing events. RESOLUTION: Code changes has been done that when select() returns with errno EINTR, we check if task is finished or not; if not finished, retry the select(). * 3829273 (Tracking ID: 3823283) SYMPTOM: After taking reboot, OS sticks in grub. Manual kernel load is required to make operating system functional. DESCRIPTION: During unencapsulation of a boot disk in SAN environment, multiple entries corresponding to root disk are found in by-id device directory. As a result, a parse command will fail leading to create an improper menu file in grub. This menu file defines the device path from where kernel and other modules will be loaded. RESOLUTION: Proper modifications in code base is done to handle the multiple entries for SAN boot disk. * 3837711 (Tracking ID: 3488071) SYMPTOM: The command "vxdmpadm settune dmp_native_support=on" fails to enable Dynamic Multipathing (DMP) Native Support and fails with the below error: VxVM vxdmpadm ERROR V-5-1-15690 Operation failed for one or more volume groups DESCRIPTION: From LVM version 105, global_filters were introduced as part of lvm.conf file. RHEL 6.6 and RHEL 6.6 onwards has LVM version 111 and hence supports global_filters. The code changes were not done to handle global_filter in lvm.conf file while 6.1.1.100 was released. RESOLUTION: Code changes have been done to handle global_filter in lvm.conf file and allow DMP Native Support to work. * 3837712 (Tracking ID: 3776520) SYMPTOM: Filters are not updated properly in lvm.conf file in VxDMP initrd while enabling DMP Native Support leading to root Logical Volume (LV) mounted on OS device upon reboot. DESCRIPTION: From LVM version 105, global_filter were introduced as part of lvm.conf file. VxDMP updates initird lvm.conf file with the filters required for DMP Native Support to function. While updating the lvm.conf, VxDMP checks for the filter field to be updated whereas ideally we should check for global_filter field to be updated in the latest LVM version. This leads to lvm.conf file not having the proper filters leading to the issue. RESOLUTION: Code changes have been made to properly update global_filter field in lvm.conf file in VxDMP initrd. * 3837713 (Tracking ID: 3137543) SYMPTOM: After reboot, OS boots in maintenance mode as it fails to load initrd image and the kernel modules properly. DESCRIPTION: Due to some changes in grub from OS, kernel modules and initrd image entries, corresponding to VxVM root, are not populated properly in the grub configuration file. Hence, system fails to load kernel properly and boots in maintenance mode after the reboot. RESOLUTION: Code is modified to work with RDE. * 3837715 (Tracking ID: 3581646) SYMPTOM: Sometimes Logical Volumes may fail to migrate back to OS devices when Dynamic Multipathing (DMP) Native Support is disabled when the root is mounted on LVM. DESCRIPTION: lvmetad caches open count on devices which are present in accept section of filter in lvm.conf file. When DMP Native Support is enabled, all non-VxVM devices are put in reject section of filter so that only "/dev/vx/dmp" devices remain in accept section of filter in lvm.conf file. So lvmetad caches open count on "/dev/vx/dmp" devices. When DMP Native Support is disabled "/dev/vx/dmp" devices are not put in reject section of filter causing a stale open count for lvmetad which is causing physical volumes to point to stale devices even when DMP Native Support is disabled. RESOLUTION: Code changes have been made to add "/dev/vx/dmp" devices in reject section of filter in lvm.conf file so lvmetad releases open count on these devices. * 3841220 (Tracking ID: 2711312) SYMPTOM: After pulling the FC cable, new symbolic link gets created for the null path in root directory. # ls -l lrwxrwxrwx. 1 root root c -> /dev/vx/.dmp/c DESCRIPTION: Whenever FC cable is added or removed, an event is sent to the OS and some udev rules are executed by VxVM. When the FC cable is pulled, path id is removed and hardware path information becomes null. Without checking if hardware path information has become null, symbolic links are generated. RESOLUTION: Code changes are made to check if hardware path information is null and accordingly create symbolic links. * 3796596 (Tracking ID: 3433503) SYMPTOM: The Vxconfigd(1M) daemon dumps cores with a following stack trace while bringing a disk online: kernel_vsyscall() raise() abort() libc_message() int_free() free() get_geometry_info() devintf_disk_geom_raw() cds_get_geometry() cds_check_for_cds_fmt() auto_determine_format() auto_sys_online() auto_online() da_online() da_thread_online_disk() vold_thread_exec() start_thread() clone() DESCRIPTION: When a disk label, such as DOS label magic is corrupt and is written to a Sun label disk, vxconfigd(1M) daemon reads DOS partition table from the Sun label disk causing a core dump.The core occurs due to an error in the boundary memory access in the DOS partition read code. RESOLUTION: The code is modified to avoid the incorrect memory access. * 3796666 (Tracking ID: 3573262) SYMPTOM: On recent UltraSPARC-T4 architectures, Panic is observed with the topmost stack frame pointing to bcopy during snapshot operations involving space optimized snapshots. SPARC-T4:bcopy_more() SPARC-T4:bcopy() vxio:vol_cvol_bplus_delete() vxio:vol_cvol_dshadow1_done() vxio:voliod_iohandle() vxio:voliod_loop() DESCRIPTION: The bcopy kernel library routine on Solaris was optimized to take advantage of recent Ultrasparc-T4 architectures. But it has some known issues for large size copy in some patch versions of Solaris 10. The use of bcopy was causing in-core corruption of cache object metadata. The corruption later lead to system panic. RESOLUTION: The code is modified to use word by word copy of the buffer instead of bcopy kernel library routine. * 3832703 (Tracking ID: 3488071) SYMPTOM: The command "vxdmpadm settune dmp_native_support=on" fails to enable Dynamic Multipathing (DMP) Native Support and fails with the below error: VxVM vxdmpadm ERROR V-5-1-15690 Operation failed for one or more volume groups DESCRIPTION: From LVM version 105, global_filters were introduced as part of lvm.conf file. RHEL 6.6 and RHEL 6.6 onwards has LVM version 111 and hence supports global_filters. The code changes were not done to handle global_filter in lvm.conf file while 6.1.1.100 was released. RESOLUTION: Code changes have been done to handle global_filter in lvm.conf file and allow DMP Native Support to work. * 3832705 (Tracking ID: 3776520) SYMPTOM: Filters are not updated properly in lvm.conf file in VxDMP initrd while enabling DMP Native Support leading to root Logical Volume (LV) mounted on OS device upon reboot. DESCRIPTION: From LVM version 105, global_filter were introduced as part of lvm.conf file. VxDMP updates initird lvm.conf file with the filters required for DMP Native Support to function. While updating the lvm.conf, VxDMP checks for the filter field to be updated whereas ideally we should check for global_filter field to be updated in the latest LVM version. This leads to lvm.conf file not having the proper filters leading to the issue. RESOLUTION: Code changes have been made to properly update global_filter field in lvm.conf file in VxDMP initrd. * 3835367 (Tracking ID: 3137543) SYMPTOM: After reboot, OS boots in maintenance mode as it fails to load initrd image and the kernel modules properly. DESCRIPTION: Due to some changes in grub from OS, kernel modules and initrd image entries, corresponding to VxVM root, are not populated properly in the grub configuration file. Hence, system fails to load kernel properly and boots in maintenance mode after the reboot. RESOLUTION: Code is modified to work with RDE. * 3835662 (Tracking ID: 3581646) SYMPTOM: Sometimes Logical Volumes may fail to migrate back to OS devices when Dynamic Multipathing (DMP) Native Support is disabled when the root is mounted on LVM. DESCRIPTION: lvmetad caches open count on devices which are present in accept section of filter in lvm.conf file. When DMP Native Support is enabled, all non-VxVM devices are put in reject section of filter so that only "/dev/vx/dmp" devices remain in accept section of filter in lvm.conf file. So lvmetad caches open count on "/dev/vx/dmp" devices. When DMP Native Support is disabled "/dev/vx/dmp" devices are not put in reject section of filter causing a stale open count for lvmetad which is causing physical volumes to point to stale devices even when DMP Native Support is disabled. RESOLUTION: Code changes have been made to add "/dev/vx/dmp" devices in reject section of filter in lvm.conf file so lvmetad releases open count on these devices. * 3836923 (Tracking ID: 3133322) SYMPTOM: "vxdmpadm native ls" is not showing the rootvg when dmp native support is enabled. vxdmpadm native ls DMPNODENAME VOLUME GROUP ============================================= sda - sdb - DESCRIPTION: When dmp native support is enabled, there was a bug in pattern match for root device name with the root device name corresponding to the volume group, therefore it is skipping display of volume group of root devices under LVM. RESOLUTION: Appropriate code modification are done to resolve the incorrect pattern match of root device name. * 3632969 (Tracking ID: 3631230) SYMPTOM: VRTSvxvm patch version 6.0.5 and 6.1.1 will not work with RHEL6.6 update. # rpm -ivh VRTSvxvm-6.1.1.000-GA_RHEL6.x86_64.rpm Preparing... ########################################### [100%] 1:VRTSvxvm ########################################### [100%] Installing file /etc/init.d/vxvm-boot creating VxVM device nodes under /dev WARNING: No modules found for 2.6.32-494.el6.x86_64, using compatible modules for 2.6.32-71.el6.x86_64. FATAL: Error inserting vxio (/lib/modules/2.6.32- 494.el6.x86_64/veritas/vxvm/vxio.ko): Unknown symbol in module, or unknown parameter (see dmesg) ERROR: modprobe error for vxio. See documentation. warning: %post(VRTSvxvm-6.1.1.000-GA_RHEL6.x86_64) scriptlet failed, exit status 1 # Or after OS update, the system log file will have the following messages logged. vxio: disagrees about version of symbol poll_freewait vxio: Unknown symbol poll_freewait vxio: disagrees about version of symbol poll_initwait vxio: Unknown symbol poll_initwait DESCRIPTION: Installation of VRTSvxvm patch version 6.0.5 and 6.1.1 fails on RHEL6.6 due to the changes in poll_initwait() and poll_freewait() interfaces. RESOLUTION: The VxVM package has re-compiled with RHEL6.6 build environment. * 2941224 (Tracking ID: 2921816) SYMPTOM: In a VVR environment, if there is Storage Replicator Log (SRL) overflow then the Data Change Map (DCM) logging mode is enabled. For such instances, if there is an I/O failure on the DCM volume then the system panics with the following stack trace: vol_dcm_set_region() vol_rvdcm_log_update() vol_rv_mdship_srv_done() volsync_wait() voliod_loop() ... DESCRIPTION: There is a race condition where the DCM information is accessed at the same time when the DCM I/O failure is handled. This results in panic. RESOLUTION: The code is modified to handle the race condition. * 2960654 (Tracking ID: 2932214) SYMPTOM: After the "vxdisk resize" operation is performed from less than 1 TB to greater than or equal to 1 TB on a disk with SIMPLE or SLICED format, that has the Sun Microsystems Incorporation (SMI) label, the disk enters the "online invalid" state. DESCRIPTION: When the SIMPLE or SLICED disk, which has the Sun Microsystems Incorporation (SMI) label, is resized from less than 1 TB to greater than or equal to 1 TB by "vxdisk resize" operation, the disk will show the "online invalid" state. RESOLUTION: The code is modified to prevent the resize of the SIMPLE or SLICED disks with the SMI label from less than 1 TB to greater than or equal to 1 TB. * 2974602 (Tracking ID: 2986596) SYMPTOM: The disk groups imported with mix of standard and clone Logical Unit Numbers (LUNs) may lead to data corruption. DESCRIPTION: The vxdg(1M) command import operation should not allow mixing of clone and non- clone LUNs since it may result in data corruption if the clone copy is not up- to-date. vxdg(1M) import code was going ahead with clone LUNs when corresponding standard LUNs were unavailable on the same host. RESOLUTION: The code is modified for the vxdg(1M) command import operation, so that it does not pick up the clone disks in above case and prevent mix disk group import. The import fails if partial import is not allowed based on other options specified during the import. * 2999881 (Tracking ID: 2999871) SYMPTOM: The vxinstall(1M) command gets into a hung state when it is invoked through Secure Shell (SSH) remote execution. DESCRIPTION: The vxconfigd process which starts from the vxinstall script fails to close the inherited file descriptors, causing the vxinstall to enter the hung state. RESOLUTION: The code is modified to handle the inherited file descriptors for the vxconfigd process. * 3022349 (Tracking ID: 3052770) SYMPTOM: On little endian systems, vradmin syncrvg operation failed if RVG includes a volume set. DESCRIPTION: The operation failed due to different memory read convention on little-endian machines than big-endian machines. RESOLUTION: Operation for little-endian machines also has been taken care. * 3032358 (Tracking ID: 2952403) SYMPTOM: In a four node cluster if all storage connected to a disk group are removed from master, then the disk group destroy command fails. DESCRIPTION: During initial phase of the destroy operation, the disk association with disk group's is removed. When an attempt is made to clean up disk headers, the IO shipping does not happen as disk(s) does not belong to any disk group. RESOLUTION: The code is modified to save the shared disk group association until the disk group destroy operation is completed. The information is used to determine whether IO shipping can be used to complete disk header updates during disk group destroy operation. * 3033904 (Tracking ID: 2308875) SYMPTOM: vxddladm(1M) list command options (hbas, ports, targets) don't display correct values for the state attribute. DESCRIPTION: In some cases, VxVM doesn't use device names with correct slice information, which leads to the vxddladm(1M) list command options (hbas, ports, targets) not displaying the correct values for the state attribute. RESOLUTION: The code is modified to use the device name with appropriate slice information. * 3036949 (Tracking ID: 3045033) SYMPTOM: "vxdg init" should not create a disk group on clone disk that was previously part of a disk group. DESCRIPTION: If the disk contains copy of data from some other VxVM disk, attempt to add or initiate that disk with a new disk group (dg) should fail. Before it is added to the new dg, the disk should be explicitly cleaned up of the clone flag and udid flag on the disk by using the following command: vxdisk -c updateudid RESOLUTION: The code is modified to fail the "vxdg init" command or add operation on a clone disk that was previously part of a disk group. * 3043206 (Tracking ID: 3038684) SYMPTOM: The restore daemon attempts to re-enable disabled paths of the Business Continuance Volume - Not Ready (BCV-NR) devices, logging many DMP messages as follows: VxVM vxdmp V-5-0-148 enabled path 255/0x140 belonging to the dmpnode 3/0x80 VxVM vxdmp V-5-0-112 disabled path 255/0x140 belonging to the dmpnode 3/0x80 DESCRIPTION: The restore daemon tries to re-enable a disabled path of a BCV-NR device as the probe passes. But the open() operation fails on such devices as no I/O operations are permitted and the path is disabled. There is a check to prevent enabling the path of the device if the open() operation fails. Because of the bug in the open check, it incorrectly tries to re-enable the path of the BCV-NR device. RESOLUTION: The code is modified to do an open check on the BCV-NR block device. * 3049356 (Tracking ID: 3060327) SYMPTOM: As a part of initial synchronization using Smart Autosync, the vradmin repstatus(1M) command shows incorrect status of Data Change Map (DCM): root@hostname#vradmin -g dg1 repstatus rvg Replicated Data Set: rvg Primary: Host name: primary ip RVG name: rvg DG name: dg1 RVG state: enabled for I/O Data volumes: 1 VSets: 0 SRL name: srl SRL size: 1.00 G Total secondaries: 1 Secondary: Host name: primary ip RVG name: rvg DG name: dg1 Data status: inconsistent Replication status: resync in progress (smartsync autosync) Current mode: asynchronous Logging to: DCM (contains 0 Kbytes) (autosync) Timestamp Information: N/A The issue is specific to configurations in which primary data volumes have Veritas File System (VxFS) mounted. DESCRIPTION: The DCM status is not correctly retrieved and displayed when the Smartmove utility is being used for Autosync. RESOLUTION: Handle the Smartmove case for the vradmin repstatus(1M) command. * 3089749 (Tracking ID: 3088059) SYMPTOM: On Red Hat Enterprise Linux 6.x (RHEL6.x), the type of host bus adapter (HBA) is reported as SCSI when it should be reported as FC. DESCRIPTION: The soft link which gives the sysfs device, /sys/block//device has been changed to /sys/block/ on RHEL6.x. When the device discovery layer (DDL) claims a device, it reads information from the operating system (OS) interface. Since the soft link has been changed on RHEL6.x, it reports the type of HBA as the default value of SCSI. RESOLUTION: The code has been changed to consider the case of soft link change. * 3094185 (Tracking ID: 3091916) SYMPTOM: In VCS cluster environment, the syslog overflows with following the Small Computer System Interface (SCSI) I/O error messages: reservation conflict Unhandled error code Result: hostbyte=DID_OK driverbyte=DRIVER_OK CDB: Write(10): 2a 00 00 00 00 90 00 00 08 00 reservation conflict VxVM vxdmp V-5-0-0 [Error] i/o error occurred (errno=0x5) on dmpnode 201/0x60 Buffer I/O error on device VxDMP7, logical block 18 lost page write due to I/O error on VxDMP7 DESCRIPTION: In VCS cluster environment when the private disk group is flushed and deported on one node, some I/Os on the disk are cached as the disk writes are done asynchronously. Importing the disk group immediately ater with PGR keys causes I/O errors on the previous node as the PGR keys are not reserved on that node. RESOLUTION: The code is modified to write the I/Os synchronously on the disk. * 3144764 (Tracking ID: 2398954) SYMPTOM: The system panics while doing I/O on a Veritas File System (VxFS) mounted instant snapshot with the Oracle Disk Manager (ODM) SmartSync enabled. The following stack trace is observed: panic: post_hndlr(): Unresolved kernel interruption cold_vm_hndlr bubbledown as_ubcopy privlbcopy volkio_to_kio_copy vol_multistepsio_overlay_data vol_multistepsio_start voliod_iohandle voliod_loop kthread_daemon_startup DESCRIPTION: Veritas Volume Manager (VxVM) uses the fields av_back and av_forw of io buf structure to store its private information. VxFS also uses these fields to chain I/O buffers before passing I/O to VxVM. When an I/O is received at VxVM layer it always resets these fields. But if the ODM SmartSync is enabled, VxFS uses a special strategy routine to pass on hints to VxVM. Due to a bug in the special strategy routine, the av_back and av_forw fields are not reset and points to a valid buffer in VxFS I/O buffer chain. VxVM interprets these fields (av_back, av_forw) wrongly and modifies its contents which in turn corrupts the next buffer in the chain leading to the panic. RESOLUTION: The av_back and av_forw fields of io buf structure are reset in the special strategy routine. * 3189041 (Tracking ID: 3130353) SYMPTOM: Disabled and enabled path messages are displayed continuously on the console for the EMC NR (Not Ready) devices: I/O error occurred on Path hdisk139 belonging to Dmpnode emc1_1f2d Disabled Path hdisk139 belonging to Dmpnode emc1_1f2d due to path failure Enabled Path hdisk139 belonging to Dmpnode emc1_1f2d I/O error occurred on Path hdisk139 belonging to Dmpnode emc1_1f2d Disabled Path hdisk139 belonging to Dmpnode emc1_1f2d due to path failure DESCRIPTION: As part of the device discovery, DMP marks the paths belonging to the EMC NR devices as disabled, so that they are not used for I/O. However, the DMP- restore logic, which issues inquiry on the disabled path brings the NR device paths back to the enabled state. This cycle is repetitive, and as a result the disabled and enabled path messages are seen continuously on the console. RESOLUTION: The DMP code is modified to specially handle the EMC NR devices, so that they are not disabled/enabled repeatedly. This means that we are not just suppressing the messages, but we are also handling the devices in a different manner. * 3195695 (Tracking ID: 2954455) SYMPTOM: When a pattern is specified to vxdiskadm to match a range of Buns for removal, the pattern is matched erroneously. DESCRIPTION: While using the Dynamic Reconfiguration operation to remove logical unit numbers (LUNs) in vxdiskadm, if a pattern is specified to match a range of disks, the pattern matching is erroneous. And the operation fails subsequently. For example: If the range specified is "emc0_0738--emc0_0740", the matched pattern ignores the leading zero and matches emc0_738 instead of emc0_0738:<< VxVM vxdmpadm ERROR V-5-1-14053 Failed to get subpaths from emc0_739 .. VxVM vxdmpadm ERROR V-5-1-14053 Failed to get subpaths from emc0_740 VxVM vxdisk ERROR V-5-1-558 Disk emc0_740: Disk not in the configuration VxVM vxdmpadm ERROR V-5-1-2268 emc0_740 is not a valid dmp node name RESOLUTION: The pattern matching logic is enhanced to account for the leading zeros. * 3197460 (Tracking ID: 3098559) SYMPTOM: In clustered environment if there exist standard disk group (dg) and clone dg, the Slave will import clone dg when standard dg import is triggered cluster wide if LUNS of standard dg are disconnected from slave. This will cause corruption. DESCRIPTION: If there are deported cloned and standard dg in Clustered Volume Manager (CVM), and disks of standard dg are not accessible from slave, then the original dg import that is triggered imports cloned dg on that slave and original dg on other nodes. CFS gets mounted with data corruption. RESOLUTION: The code is modified to detect proper disks for dg import. * 3254133 (Tracking ID: 3240858) SYMPTOM: File '/etc/vx/vxesd/.udev_lock' might have different permissions at different instances. DESCRIPTION: ESD_UDEV_LOCKFILE is opened/created by both vxesd and vxesd_post_event support utility called by vxvm-udev rule. Since the open permission mode is not set, the mode gets inherited from the calling process. As a result, ESD_UDEV_LOCKFILE might have different permissions in different situations, which is undesirable. RESOLUTION: Code changes have been made to use uniform and defined set of permissions at all instances. * 3254199 (Tracking ID: 3015181) SYMPTOM: I/O can hang on all the nodes of a cluster when the complete non-Active/Active (A/A) class of the storage is disconnected. The problem is only CVM specific. DESCRIPTION: The issue occurs because the CVM-DMP protocol does not progress any further when the 'ioctls' on the corresponding DMP 'metanodes' fail. As a result, all hosts hold the I/Os forever. RESOLUTION: The code is modified to complete the CVM-DMP protocol when any of the 'ioctls' on the DMP 'metanodes' fail. * 3254201 (Tracking ID: 3121380) SYMPTOM: IO hang observed on the primary after disabling paths for one data volume in the RVG. Stack trace looks like: biowait default_physio volrdwr fop_write write syscall_trap32 DESCRIPTION: If the path of data volume is disabled when SRL is about to overflow, then it resulted in a internal data structure corruption. This resulted in a IO hang. RESOLUTION: Modified the code to handle IO error because of path disabled at the time of SRL overflow. * 3254204 (Tracking ID: 2882312) SYMPTOM: The Storage Replicator Log (SRL) faults in the middle of the I/O load. An immediate read on data that is written during the SRL fault may return old data. DESCRIPTION: In case of an SRL fault, the Replicated Volume Group (RVG) goes into the passthrough mode. The read/write operations are directly issued on the data volume. If the SRL is faulted while writing, and a read command is issued immediately on the same region, the read may return the old data. If a write command fails on the SRL, then the VVR acknowledges the write completion and places the RVG in the passthrough mode. The data-volume write is done asynchronously after acknowledging the write completion. If the read comes before the data volume write is finished, then it can return old data causing data corruption. It is race condition between write and read during the SRL failure. RESOLUTION: The code is modified to restart the write in case of the SRL failure and it does not acknowledge the write completion. When the write is restarted the RVG will be in passthrough mode, and it is directly issued on the data volume. Since, the acknowledgement is done only after the write completion, any subsequent read gets the latest data. * 3254205 (Tracking ID: 3162418) SYMPTOM: The vxconfigd(1M) command dumps core when VxVM(Veritas Volume Manager) tries to find certain devices by their device numbers. The stack might look like follow: ddi_hash_devno() ddl_find_cdevno() ddl_find_path_cdevno() req_daname_get() vold_process_request() start_thread() DESCRIPTION: When 'vxdisk scandisks' fails to discover devices, the device tree is emptied. Incorrect validating in device searching procedure causes NULL value dereference. RESOLUTION: Code changes are made to correctly detect the NULL value. * 3254231 (Tracking ID: 3010191) SYMPTOM: Previously excluded paths are not excluded after upgrade to VxVM 5.1SP1RP3. DESCRIPTION: Older version of VxVM maintains the logical path for excluded devices while current version maintains the hardware path for excluded devices. So before update, when the device paths are excluded from volume manager, the excluded- path entries are not recognized by the volume manager and this leads to inconsistency. RESOLUTION: To resolve this issue, the exclude logic code is modified to resolve the logical/hardware path from the exclude file. * 3254233 (Tracking ID: 3012929) SYMPTOM: When a disk name is changed when a backup operation is in progress, the vxconfigbackup(1M) command gives the following error: VxVM vxdisk ERROR V-5-1-558 Disk : Disk not in the configuration VxVM vxconfigbackup WARNING V-5-2-3718 Unable to backup Binary diskgroup configuration for diskgroup . DESCRIPTION: If disk names change during the backup, the vxconfigbackup(1M) command does not detect and refresh the changed names and it tries to find the configuration database information from the old diskname. Consequently, the vxconfigbackup (1M) command displays an error message indicating that the old disk is not found in the configuration and it fails to take backup of the disk group configuration from the disk. RESOLUTION: The code is modified to ensure that the new disk names are updated and are used to find and backup the configuration copy from the disk. * 3254301 (Tracking ID: 3199056) SYMPTOM: VVR(Veritas Volume Replicator) Primary system panics with the following stack: panic_trap kernel_add_gate vol_cmn_err .kernel_add_gate skey_kmode nmcom_deliver_ack nmcom_ack_tcp nmcom_server_proc_tcp nmcom_server_proc_enter vxvm_start_thread_enter DESCRIPTION: If primary receives Data acknowledgement prior to the network acknowledgement, VVR fabricates the network acknowledgement for the message and keeps the acknowledgement in a queue. When the real network acknowledgement arrives at primary, VVR removes the acknowledgement from the queue. Only one thread is supposed to access this queue. However, because of improper locking, there is a race where two threads can simultaneously update the queue causing queue corruption. System panic happens when accessing the corrupted queue. RESOLUTION: Code is modified to take the proper lock before entering the critical region. * 3261607 (Tracking ID: 3261601) SYMPTOM: System panics when the dmp_destroy_dmpnode() attempts to free an already free virtual address and displays the following stack trace: mt_pause_trigger+0x10 () cold_wait_for_lock+0xc0 () spinlock_usav+0xb0 () kmem_arena_free+0xd0 () vxdmp_hp_kmem_free+0x30 () dmp_destroy_dmpnode+0xaa0 () dmp_decode_destroy_dmpnode+0x430 () dmp_decipher_instructions+0x650 () dmp_process_instruction_buffer+0x350 () dmp_reconfigure_db+0xb0 () gendmpioctl+0x910 () dmpioctl+0xe0 () DESCRIPTION: Due to a race condition in the code when the dmp_destroy_dmpnode()attempts to free an already free virtual address, system panics because it tries to access a stale memory address. RESOLUTION: The code is fixed to avoid the occurrence of race condition. * 3264166 (Tracking ID: 3254311) SYMPTOM: In a Campus Cluster environment, either a manual detach or a detach because of storage connectivity followed by a site reattach, leads to a system panic. The stack trace might look like: ... voldco_or_acmbuf_to_pvmbuf+0x134() voldco_recover_detach_map+0x6e5() volmv_recover_dcovol+0x1ce() vol_mv_precommit+0x19f() vol_commit_iolock_objects+0x100() vol_ktrans_commit+0x23c() volconfig_ioctl+0x436() volsioctl_real+0x316() volsioctl+0x14() ... DESCRIPTION: When a site is reattached, possibly after a split-brain, it is possible that a site-consistent volume is updated on each site independently. In such a case, the tracking maps need to be recovered from each site to take care of updates done from both the sites. These maps are stored in a Data Change Object (DCO). During the recovery of a DCO, it uses a contiguous chunk of memory to read and update the DCO map. This chunk of memory is able to handle the DCO recovery as long as the volume size is less than 1.05 TB. When the volume size is larger than 1.05TB, the map size grows larger than the statically allocated memory buffer. In such a case, it overruns the memory buffer, thus leading to the system panic. RESOLUTION: The code is modified to ensure that the buffer is accessed within the limits and if required, another iteration of the DCO recovery is done. * 3271596 (Tracking ID: 3271595) SYMPTOM: When a volume on a thin reclaimable disk is deleted, and if the thin reclaim flag is removed from the disk which hosted this volume, on attempt to remove such disk from disk group following error is displayed: # vxdg -g rmdisk VxVM vxdg ERROR V-5-1-0 Disk is used by one or more subdisks which are pending to be reclaimed. Use "vxdisk reclaim " to reclaim space used by these subdisks, and retry "vxdg rmdisk" command. Note: reclamation is irreversible. Attempt to reclaim disk failed with following error: to reclaim the disk using following command. # vxdisk reclaim Disk : Failed. DESCRIPTION: When the volume on a thin reclaimable disk is deleted, the underlying disk is marked for reclaimation. Due to manual removal of the thin reclaim flag, the reclaimation cannot proceed and the pending subdisks associated with this disk cannot be removed. RESOLUTION: The code is modified such that any attempt to turn off thin reclaim flag manually fails on disks which have pending subdisks for reclaimation. * 3271764 (Tracking ID: 2857360) SYMPTOM: Vxconfigd hangs, when vol_use_rq tunable of VxVM is changed from 1 to 0. DESCRIPTION: Setting/Unsetting vol_use_rq tunable changes the nature of the request queue of a block device. When the tunable is unset(i.e value is set from 1 to 0), an unplug function is registered with operating system. For the kernel greater than 2, 6, 39 unplug functions are deprecated which causes the vxconfigd to hang. RESOLUTION: The code has been changed to add support for kernel greater than 2, 6, 39. * 3306164 (Tracking ID: 2972513) SYMPTOM: In CVM, PGR keys from shared data disks are not removed after stopping VCS. DESCRIPTION: In clustered environment with fencing enabled, for slave node, improper PGR key registration was happening. Due to this when hastop was issued on master and then on slave, slave was not able to clear PGR keys on disk. For example below PGR key mismatch will be seen on reading disk keys key[0]: [Numeric Format]: 66, 80, 71, 82, 48, 48, 48, 0 [Character Format]: BPGR000 Use only the numeric format to perform operations. The key has null characters which are represented as spaces in the character format. [Node Format]: Cluster ID: unknown Node ID: 1 Node Name: sles92219 key[1]: [Numeric Format]: 65, 80, 71, 82, 48, 48, 48, 49 [Character Format]: APGR0001 [Node Format]: Cluster ID: unknown Node ID: 0 Node Name: sles92218 sles92219:~ # RESOLUTION: Code changes are done to correctly register PGR keys on slave. * 3309931 (Tracking ID: 2959733) SYMPTOM: When the device paths are moved across LUNs or enclosures, the vxconfigd(1M) daemon can dump core, or the data corruption can occur due to the internal data structure inconsistencies. The following stack trace is observed: ddl_reconfig_partial () ddl_reconfigure_all () ddl_find_devices_in_system () find_devices_in_system () req_discover_disks () request_loop () main () DESCRIPTION: When the device path configuration is changed after a planned or unplanned disconnection by moving only a subset of the device paths across LUNs or the other storage arrays (enclosures), DMP's internal data structure is inconsistent. This causes the vxconfigd(1M) daemon to dump core. Also, for some instances the data corruption occurs due to the incorrect LUN to the path mappings. RESOLUTION: The vxconfigd(1M) code is modified to detect such situations gracefully and modify the internal data structures accordingly, to avoid a vxconfigd(1M) daemon core dump and the data corruption. * 3312134 (Tracking ID: 3325022) SYMPTOM: The disks that are exported using the VirtIO-disk interface from an SLES11 SP2 or SP3 host are invisible to Veritas Volume Manager running inside the Kernel-based Virtual Machine (KVM) guests. DESCRIPTION: Array Support Library (ASL) claims devices during device discovery. But it doesn't support devices that are exported from a host running SLES11 SP2 or SLES11 SP3, to guest using VirtIO-disk interfaces. Therefore the devices are not visible to Veritas Volume Manager running inside a KVM guest. For example, if disks vda, vdb are the only exported disks to guest "guest1" using the VirtIO-disk interface, they're not visible in "vxdisk list" output: guest1:~ # vxdisk list DEVICE TYPE DISK GROUP STATUS As, none of the ASLs claimed those devices: guest1:~ # vxddladm list devices DEVICE TARGET-ID STATE DDL-STATUS (ASL) =============================================================== vdb - Online - vda - Online - RESOLUTION: The code is fixed to make ASL claim the disk exported using VirtIO-disk Interface. * 3312311 (Tracking ID: 3321269) SYMPTOM: vxunroot may hang during un-encapsulation of root disk. DESCRIPTION: During un-encapsulation, vxunroot changes the UUID(universal unique identifier) of root disk partition using tune2fs tool, and verifies the changed UUID using a loop. Sometimes tune2fs fails to change UUID. As a result, verification condition fails and control loops infinitely. RESOLUTION: Code changes are done to break the loop if UUID change attempt fails 5 times and to inform the user with an appropriate error message. * 3321337 (Tracking ID: 3186149) SYMPTOM: On Linux System with LVM version 2.02.85, on enabling dmp_native_support LVM volume Groups will disappear. DESCRIPTION: From LVM version 2.02.85 onwards, device list is obtained from UDEV by default if LVM2 is compiled with UDEV support. This setting is managed using obtain_device_list_from_udev variable in /etc/lvm/lvm.conf. As DMP devices are not managed by UDEV, they will not be used by LVM. Thus LVM volumes are not migrated. RESOLUTION: Code changes are done such that, LVM version 2.02.85 onwards, on enabling DMP native support, variable "obtain_device_list_from_udev" is automatically set to 0 in /etc/lvm/lvm.conf, so that DMP devices are used by LVM. * 3344127 (Tracking ID: 2969844) SYMPTOM: The DMP database gets destroyed if the discovery fails for some reason. "ddl.log shows numerous entries as follows: DESTROY_DMPNODE: 0x3000010 dmpnode is to be destroyed/freed DESTROY_DMPNODE: 0x3000d30 dmpnode is to be destroyed/freed Numerous vxio errors are seen in the syslog as all VxVM I/O's fail afterwards. DESCRIPTION: VxVM deletes the old device database before it makes the new device database. If the discovery process fails for some reason, this results in a null DMP database. RESOLUTION: The code is modified to take a backup of the old device database before doing the new discovery. Therefore, if the discovery fails we restore the old database and display the appropriate message on the console. * 3344128 (Tracking ID: 2643506) SYMPTOM: vxconfigd dumps core when LUNs from the same enclosure are presented as different types, say A/P and A/P-F. DESCRIPTION: The VxVM configuration daemon, vxconfigd(1M) dumps core because Dynamic Muulti-Pathing (DMP) does not support the set up where LUNs from the same enclosure are configured as different types. RESOLUTION: The code is modified to ensure that the user receives a warning message when this situation arises. Example- Enclosure with cabinet serial number CK200070800815 has LUNs of type CLR-ALUA and CLR-A/PF. Enclosures having more than one array type is not supported. * 3344129 (Tracking ID: 2910367) SYMPTOM: In a VVR environment, when Storage Replicator Log (SRL) is inaccessible or after the paths to the SRL volume of the secondary node are disabled, the secondary node panics with the following stack trace: __bad_area_nosemaphore vsnprintf page_fault vol_rv_service_message_start thread_return sigprocmask voliod_iohandle voliod_loop kernel_thread DESCRIPTION: The SRL failure in handled differently on the primary node and the secondary node. On the secondary node, if there is no SRL, replication is not allowed and Rlink is detached. The code region is common for both, and at one place flags are not properly set during the transaction phase. This creates an assumption that the SRL is still connected and tries to access the structure. This leads to the panic. RESOLUTION: The code is modified to mark the necessary flag properly in the transaction phase. * 3344130 (Tracking ID: 2825102) SYMPTOM: In a CVM environment, some or all VxVM volumes become inaccessible on the master node. VxVM commands on the master node as well as the slave node(s) hang. On the master node, vxiod and vxconfigd sleep and the following stack traces is observed: "vxconfigd" on master : sleep_one vol_ktrans_iod_wakeup vol_ktrans_commit volconfig_ioctl volsioctl_real volsioctl vols_ioctl spec_ioctl vno_ioctl ioctl syscall "vxiod" on master : sleep vxvm_delay cvm_await_mlocks volmvcvm_cluster_reconfig_exit volcvm_master volcvm_vxreconfd_thread DESCRIPTION: VxVM maintains a list of all the volume devices in volume device list. This list can be corrupted by simultaneous access from CVM reconfiguration code path and VxVM transaction code path. This results in inaccessibility of some or all the volumes. RESOLUTION: The code is modified to avoid simultaneous access to the volume device list from the CVM reconfiguration code path and the VxVM transaction code path. * 3344132 (Tracking ID: 2860230) SYMPTOM: In a Cluster Volume Manager (CVM) environment, the shared disk remains as opaque after execution of vxdiskunsetup(1M)command on a master node. DESCRIPTION: In a Cluster Volume Manager (CVM) environment, if a disk group has opaque disks and the disk group is destroyed on the master node followed by execution of vxdiskunsetup(1M) command, then the slave still views disk as opaque. RESOLUTION: The code is modified to ensure that the slave receives signal to remove opaque disk. * 3344134 (Tracking ID: 3011405) SYMPTOM: Execution of "vxtune -o export" command fails with the following error message: "VxVM vxtune ERROR Unable to rename temp file to .Cross-device link VxVM vxtune ERROR Unable to export tunables". DESCRIPTION: During the export of componentorfeature specific tunables, initially, all the VxVM tunables are dumped into the file provided by the user. The component or the feature tunables are then extracted into temporary file. The temporary file is then renamed to the file provided by the user. If the file provided by the user is mounted on a different file system, instead of the root file system or /etc/, then the renaming fails because the links are on different file systems. RESOLUTION: The code is modified to dynamically create the temporary file on the same directory instead of hard coding the temporary file where user wants to have the exported file. In this way, cross-device link does not occur and the 'rename' operation does not fail. * 3344138 (Tracking ID: 3041014) SYMPTOM: Sometimes a "relayout" command may fail with following error messages with not much information: 1. VxVM vxassist ERROR V-5-1-15309 Cannot allocate 4294838912 blocks of disk space required by the relayout operation for column expansion: Not enough HDD devices that meet specification. VxVM vxassist ERROR V-5-1-4037 Relayout operation aborted. (7) 2. VxVM vxassist ERROR V-5-1-15312 Cannot allocate 644225664 blocks of disk space required for the relayout operation for temp space: Not enough HDD devices that meet specification. VxVM vxassist ERROR V-5-1-4037 Relayout operation aborted. (7) DESCRIPTION: In some executions of the vxrelayout(1M) command, the error messages do not provide sufficient information. For example, when enough space is not available, the vxrelayout(1M) command displays an error where it mentions less disk space available than required. Hence the "relayout" operation can still fail when the disk space is increased. RESOLUTION: The code is modified to display the correct space required for the "relayout" operation to complete successfully. * 3344140 (Tracking ID: 2966990) SYMPTOM: In a VVR environment, the I/O hangs at the primary side after multiple cluster reconfigurations are triggered in parallel. The stack trace is as following: delay vol_rv_transaction_prepare vol_commit_iolock_objects vol_ktrans_commit volconfig_ioctl volsioctl_real volsioctl fop_ioctl ioctl DESCRIPTION: With I/O on the master node and the slave node, rebooting the slave node triggers the cluster reconfiguration, which in turn triggers the RVG recovery. Before the reconfiguration is complete, the slave node joins back again, which interrupts the leave reconfiguration in the middle of the operation. The node join reconfiguration does not trigger any RVG recovery, and so the recovery is skipped. The regular I/Os wait for the recovery to be completed. This situation leads to a hang. RESOLUTION: The code is modified such that the join reconfiguration does the RVG recovery, if there are any pending RVG recoveries. * 3344142 (Tracking ID: 3178029) SYMPTOM: When you synchronize a replicated volume group (RVG), the diff string is over 100%, with output like the following: [2013-03-12 15:33:48] [17784] 03:07:35 180.18.18.161 swmdb_data_ swmdb_data_ 1379M/3072M 860% 4% DESCRIPTION: The value of 'different blocks' is defined as the 'unsigned long long' type, but the statistic function defines it as the 'int" type, therefore the value is truncated and it causes the incorrect output. RESOLUTION: The code is modified so that the 'unsigned long long' type is not defined by the statistic function as the integer type. * 3344143 (Tracking ID: 3101419) SYMPTOM: In a CVR environment, I/Os to the data volumes in an RVG may temporarily experience a hang during the SRL overflow with the heavy I/O load. DESCRIPTION: The SRL flush occurs at a slower rate than the incoming I/Os from the master node and the slave nodes. I/Os initiated on the master node get starved for a long time, this appears like an I/O hang. The I/O hang disappears once the SRL flush is complete. RESOLUTION: The code is modified to provide a fair schedule for the I/Os to be initiated on the master node and the slave nodes. * 3344145 (Tracking ID: 3076093) SYMPTOM: The patch upgrade script "installrp" can panic the system while doing a patch upgrade. The panic stack trace looks is observed as following: devcclose spec_close vnop_close vno_close closef closefd fs_exit kexitx kexit DESCRIPTION: When an upgrade is performed, the VxVM device drivers are not loaded, but the patch-upgrade process tries to start or stop the eventsource (vxesd) daemon. This can result in a system panic. RESOLUTION: The code is modified so that the eventsource (vxesd) daemon does not start unless the VxVM device drivers are loaded. * 3344148 (Tracking ID: 3111062) SYMPTOM: When diffsync is executed, vxrsync gets the following error in lossy networks: VxVM VVR vxrsync ERROR V-5-52-2074 Error opening socket between [HOST1] and [HOST2] -- [Connection timed out] DESCRIPTION: The current socket connection mechanism gives up after a single try. When the single attempt to connect fails, the command fails as well. RESOLUTION: The code is modified to retry the connection for 10 times. * 3344150 (Tracking ID: 2992667) SYMPTOM: When the framework for SAN of VIS is changed from FC-switcher to the direct connection, the new DMP disk cannot be retrieved by running the "vxdisk scandisks" command. DESCRIPTION: Initially, the DMP node had multiple paths. Later, when the framework for SAN of VIS is changed from the FC switcher to the direct connection, the number of paths of each affected DMP node is reduced to 1. At the same time, some new disks are added to the SAN. Newly added disks are reused by the device number of the removed devices (paths). As a result, the "vxdisk list" command does not show the newly added disks even after the "vxdisk scandisks" command is executed. RESOLUTION: The code is modified so that DMP can handle the device number reuse scenario in a proper manner. * 3344161 (Tracking ID: 2882412) SYMPTOM: The 'vxdisk destroy' command uninitializes a disk which belongs to a deported disk group. DESCRIPTION: The 'vxdisk destroy' command does not check whether the Veritas Volume Manager (VxVM) disk belongs to a deported disk group. This can lead to accidental uninitialization of such disk. RESOLUTION: The code is modified to prevent destruction of the disk which belongs to a deported disk group. The 'vxdisk destroy' command can now prompt you that the disk is already part of another disk group and then fail the operation. However, you can still force destroy the disk by using the -f option. * 3344167 (Tracking ID: 2979824) SYMPTOM: While excluding the controller using the vxdiskadm(1M) utility, the unintended paths get excluded DESCRIPTION: The issue occurs due to a logical error related to the grep command, when the hardware path of the controller to be retrieved is excluded. In some cases, the vxdiskadm(1M) utility takes the wrong hardware path for the controller that is excluded, and hence excludes unintended paths. Suppose there are two controllers viz. c189 and c18 and the controller c189 is listed above c18 in the command, and the controller c18 is excluded, then the hardware path of the controller c189 is passed to the function and hence it ends up excluding the wrong controller. RESOLUTION: The script is modified so that the vxdiskadm(1M) utility now takes the hardware path of the intended controller only, and the unintended paths do not get excluded. * 3344175 (Tracking ID: 3114134) SYMPTOM: The Smart (sync) Autosync feature fails to work on large volume (size > 1TB) and instead replicates entire volume. DESCRIPTION: Veritas File System (VxFS) reports data in-use for 1-MB chunks only whereas VVR operates on lower block size. Thus, even if the 8k block in 1MB chunk is dirty, VxFS reports the entire 1MB as data-in-use. Thereby, VVR replicates the entire 1MB. RESOLUTION: The code is modified such that VVR-VxFS integration now handles chunk less than 1MB. * 3344264 (Tracking ID: 3220929) SYMPTOM: The vxvmconvert menu fails to convert logical volume manager (LVM) volume to Veritas Volume Manager (VxVM) volume. DESCRIPTION: With the latest RHEL release, the LVM lvdisplay(1M) command has a new output format. In the previous RHEL release, the lvdisplay(1M) command output contained the LV Name field. This field provided the absolute path of the LVM volume. However, in the latest RHEL release, the LV Name field contains the base name and a new LV Path field provides the absolute path. The vxvmconvert menu does not recognize the field change in the LVM's lvdisplay(1M) command output and fails to convert the LVM volumes to VVM volumes. RESOLUTION: The code is modified to enable VxVM to recognize the lvdisplay(1M)command's new output format. * 3344268 (Tracking ID: 3091978) SYMPTOM: The lvm.conf variable preferred_names is set to use DMP even if the dmp_native_support tunable is 'off'. DESCRIPTION: When we turn on the dmp_native_support tunable in the lvm.conf file, the preferred_names variable should be set so that we can use the DMP devices. When we turn the tunable off, it should go back to its original value which was there before the tunable was on. But irrespective of the value of preferred_names, the system uses Dynamic Multipathing (DMP) devices when the tunable is on and it automatically migrates to its original device when it's turned 'off'. RESOLUTION: The code has been changed so that the preferred_names value is unaffected by the dmp_native_support tunable. * 3344286 (Tracking ID: 2933688) SYMPTOM: When the 'Data corruption protection' check is activated by DMP, the device- discovery operation aborts, but the I/O to the affected devices continues, this results in the data corruption. The message is displayed as following: Data Corruption Protection Activated - User Corrective Action Needed: To recover, first ensure that the OS device tree is up to date (requires OS specific commands). Then, execute 'vxdisk rm' on the following devices before reinitiating device discovery using 'vxdisk scandisks' DESCRIPTION: When 'Data corruption protection' check is activated by DMP, the device- discovery operation aborts after displaying a message. However, the device- discovery operation does not stop I/Os from being issued on the DMP device on the affected devices, for all those devices whose discovery information changed unexpectedly and are no longer valid. RESOLUTION: The code is modified so that DMP is changed to forcibly fail the I/Os on devices whose discovery information is changed unexpectedly. This prevents any further damage to the data. * 3347380 (Tracking ID: 3031796) SYMPTOM: When a snapshot is reattached using the "vxsnap reattach" command-line- interface (CLI), the operation fails with the following error message: "VxVM vxplex ERROR V-5-1-6616 Internal error in get object. Rec " DESCRIPTION: When a snapshot is reattached to the volume, the volume manager checks the consistency by locking all the related snapshots. If any related snapshots are not available the operation fails. RESOLUTION: The code is modified to ignore any inaccessible snapshot. This prevents any inconsistency during the operation. * 3349877 (Tracking ID: 2685230) SYMPTOM: In a Cluster Volume Replicator (CVR) environment, if Storage Replicator Log (SRL) is resized with the logowner set on the CVM slave node and followed by a CVM master node switch operation, then there could be a SRL corruption that leads to the Rlink detach. DESCRIPTION: In a CVR environment, if SRL is resized when the logowner is set on the CVM slave node and if this is followed by the master node switch operation, then the new master node does not have the correct mapping of the SRL volume. This results in I/Os issued on the new master node to be corrupted on the SRL-volume contents and detaches the Rlink. RESOLUTION: The code is modified to correctly update the SRL mapping so that the SRL corruption does not occur. * 3349917 (Tracking ID: 2952553) SYMPTOM: The vxsnap(1M) command allows refreshing a snapshot from a different volume other than the source volume. An example is as follows: # vxsnap refresh source= DESCRIPTION: The vxsnap(1M) command allows refreshing a snapshot from a different volume other than the source volume. This can result in an unintended loss of the snapshot. RESOLUTION: The code is modified to print the message requesting the user to use the "-f" option. This prevents any accidental loss of the snapshot. * 3349937 (Tracking ID: 3273435) SYMPTOM: VxVM disk group creation or import with SCSI-3 PR fails with following error messages: bash# vxdg -s import bdg VxVM vxdg ERROR V-5-1-10978 Disk group bdg: import failed: SCSI-3 PR operation failed DESCRIPTION: With kernel version 3.0.76-0.11-default, when SCSI device fails unsupported command, then the SCSI error code returned by OS SCSI stack in host byte was not handled. This causes disk group create/import failure while having mix of devices where some of devices don't support SCSI-3 Persistent Reservation (PR). RESOLUTION: The code is modified to handle new error codes that are introduced in the SCSI command responses. * 3349939 (Tracking ID: 3225660) SYMPTOM: The Dynamic Reconfiguration (DR) tool does not list thin provisioned logical unit numbers (LUNs) during a LUN removal operation. DESCRIPTION: Because of the change in the output pattern of a Dynamic MultiPathing (DMP) command, its output gets parsed incorrectly. Due to this, thin provisioned LUNs get filtered out. RESOLUTION: The code is modified to parse the output correctly. * 3349985 (Tracking ID: 3065072) SYMPTOM: Data loss occurs during the import of a clone disk group, when some of the disks are missing and the import "useclonedev" and "updateid" options are specified. The following error message is displayed: VxVM vxdg ERROR V-5-1-10978 Disk group pdg: import failed: Disk for disk group not found DESCRIPTION: During the clone disk group import if the "updateid" and "useclonedev" options are specified and some disks are unavailable, this causes the permanent data loss. Disk group Id is updated on the available disks during the import operation. The missing disks contain the old disk group Id, hence are not included in the later attempts to import the disk group with the new disk group Id. RESOLUTION: The code is modified such that any partial import of the clone disk group with the "updateid" option will no longer be allowed without the "f" (force) option. If the user forces the partial import of the clone disk group using the "f" option, the missing disks are not included in the later attempts to import the clone disk group with the new disk group Id. * 3349990 (Tracking ID: 2054606) SYMPTOM: During the DMP driver unload operation the system panics with the following stack trace: kmem_free dmp_remove_mp_node dmp_destroy_global_db dmp_unload vxdmp`_fini moduninstall modunrload modctl syscall_trap DESCRIPTION: The system panics during the DMP driver unload operation when its internal data structures are destroyed, because DMP attempts to free the memory associated with a DMP device that is marked for deletion from DMP. RESOLUTION: The code is modified to check the DMP device state before any attempt is made to free the memory associated with it. * 3350000 (Tracking ID: 3323548) SYMPTOM: In the Cluster Volume Replicator (CVR) environment, a cluster-wide vxconfigd hang occurs on primary when you start the cache object. Primary master vxconfigd stack: Schedule() volsync_wait() volsiowait() vol_cache_linkdone() vol_commit_link_objects() vol_ktrans_commit() volconfig_ioctl() volsioctl_real() vols_ioctl() vols_compat_ioctl() compat_sys_ioctl() sysenter_do_call() Primary slave vxconfigd stack: Schedule() volsync_wait() vol_kmsg_send_wait() volktcvm_master_request() volktcvm_iolock_wait() vol_ktrans_commit() volconfig_ioctl() volsioctl_real() vols_ioctl() vols_compat_ioctl() compat_sys_ioctl() sysenter_do_call() DESCRIPTION: This is an I/O hang issue on primary master when you start the cache object. The I/O code path is stuck due to incorrect initialization of related flags. RESOLUTION: The code is modified to correctly initialize the flags during the cache object initialization. * 3350019 (Tracking ID: 2020017) SYMPTOM: Cluster node panics with the following stack when mirrored volumes are configured in the cluster. panic+0xb4 () bad_kern_reference+0xd4 () pfault+0x140 () trap+0x8a4 () thandler+0x96c () volmv_msg_dc+0xa8 () <--- Trap in Kernel mode vol_mv_kmsg_request+0x930 () vol_kmsg_obj_request+0x3cc () vol_kmsg_request_receive+0x4c0 () vol_kmsg_ring_broadcast_receive+0x6c8 () vol_kmsg_receiver+0xa40 () kthread_daemon_startup+0x24 () kthread_daemon_startup+0x0 () DESCRIPTION: When a mirrored volume is opened or closed on any of the nodes in the cluster, a message is sent to all the nodes in the cluster. While receiving the message, a 32 bit integer field is de-referenced as being long and hence the cluster node panics. RESOLUTION: The code is modified to access the field appropriately as 32 bit integer. * 3350027 (Tracking ID: 3239521) SYMPTOM: When you do the PowerPath pre-check, the Dynamic Reconfiguration (DR) tool displays the following error message: 'Unable to run command [/sbin/powermt display]' and exits. The message details can be as follows: WARN: Please Do not Run any Device Discovery Operations outside the Tool during Reconfiguration operations INFO: The logs of current operation can be found at location /var/adm/vx/dmpdr_20130626_1446.log INFO: Collecting OS Version Info - done INFO: Collecting Arch type Info - done INFO: Collecting SF Product version Info - done. INFO: Checking if Multipathing is PowerPath Unable to run command [/sbin/powermt display 2>&1] DESCRIPTION: This error is seen when PowerPath is unable to display devices because PowerPath is not started on the system. RESOLUTION: The code is modified so that the powermt command was used only to warn you about the devices that are under PowerPath control. If no device gets displayed, you can ignore the message. * 3350232 (Tracking ID: 2993667) SYMPTOM: VxVM allows setting the Cross-platform Data Sharing (CDS) attribute for a disk group even when a disk is missing, because it experienced I/O errors. The following command succeeds even with an inaccessible disk: vxdg -g set cds=on DESCRIPTION: When the CDS attribute is set for a disk group, VxVM does not fail the operation if some disk is not accessible. If the disk genuinely had an I/O error and fails, VxVM does not allow setting the disk group as CDS, because the state of the failed disk cannot be determined. If a disk had a non-CDS format and fails with all the other disks in the disk group with the CDS format, this allows the disk group to be set as CDS. If the disk returns and the disk group is re-imported, there could be a CDS disk group that has a non-CDS disk. This violates the basic definition of the CDS disk group and results in the data corruption. RESOLUTION: The code is modified such that VxVM fails to set the CDS attribute for a disk group, if it detects a failed disk inaccessible because of an I/O error. Hence, the below operation would fail with an error as follows: # vxdg -g set cds=on Cannot enable CDS because device corresponding to is in-accessible. * 3350235 (Tracking ID: 3084449) SYMPTOM: The shared flag sets during the import of private disk group because a shared disk group fails to clear due to minor number conflict error during the import abort operation. DESCRIPTION: The shared flag that is set during the import operation fails to clear due to a minor number conflict error during the import abort operation. RESOLUTION: The code is modified such that the shared flag is cleared during import abort operation. * 3350241 (Tracking ID: 3067784) SYMPTOM: The grow and shrink operations by the vxresize(1M) utility may dump core in the vfprintf() function.The following stack trace is observed: vfprintf () volumivpfmt () volvpfmt () volpfmt () main () DESCRIPTION: The vfprintf() function dumps core as the format specified to print the file system type is incorrect. The integer/ hexadecimal value is printed as a string, using the %s. RESOLUTION: The code is modified to print the file system type as the hexadecimal value, using the %x. * 3350265 (Tracking ID: 2898324) SYMPTOM: Set of memory-leak issues in the user-land daemon "vradmind" that are reported by the Purify tool. DESCRIPTION: The issue is reported due to the improper/no initialization of the allocated memory. RESOLUTION: The code is modified to ensure that the proper initialization is done for the allocated memory. * 3350288 (Tracking ID: 3120458) SYMPTOM: When the log overflow protection is set to "dcm", the vxconfigd daemon hangs with following stack as one of the slaves leaves the cluster: vol_rwsleep_wrlock() vol_ktrans_commit() volsioctl_real() fop_ioctl() ioctl() DESCRIPTION: The issue is due to a race between reconfiguration that is triggered by slave leaving cluster and Storage Replicator Log (SRL) overflow. The SRL overflow protection is set to "dcm", which means if the SRL is about to overflow and the Rlink is in connect state, the I/Os should be throttled till about 20 MB becomes available in the SRL or the SRL drains by 5%. The mechanism initiates throttling at slave nodes that are shipping metadata, which never gets reset due to the above mentioned racing. RESOLUTION: The code is modified to throttle metadata shipping requests whenever a CVM reconfiguration is in progress and SRL is about to overflow and the latency protection is "dcm". * 3350293 (Tracking ID: 2962010) SYMPTOM: Replication hangs when the Storage Replicator Log (SRL) is resized. An example is as follows: For example: # vradmin -g vvrdg -l repstatus rvg ... Replication status: replicating (connected) Current mode: asynchronous Logging to: SRL ( 813061 Kbytes behind, 19 % full Timestamp Information: behind by 0h 0m 8s DESCRIPTION: When a SRL is resized, its internal mapping gets changed and a new stream of data gets started. Generally, we revert back to the old mapping immediately when the conditions requisite for the resize is satisfied. However, if the SRL gets wrapped around, the conditions are not satisfied immediately. The old mapping is referred to when all the requisite conditions are satisfied, and the data is sent with the old mapping. This is done without starting the new stream. This causes a replication hang, as the secondary node continues to expect the data according to the new stream. Once the hang occurs, the replication status remains unchanged. The replication status is not changed even though the Rlink is connected. RESOLUTION: The code is modified to start the new stream of data whenever the old mapping is reverted. * 3350786 (Tracking ID: 3060697) SYMPTOM: The vxrootmir(1M) utility fails with the following error message: VxVM vxdisk ERROR V-5-1-5433: init failed. DESCRIPTION: When you use the vxrootmir(1M) utility to mirror the root disk, the initial partition table is written from the root disk to the mirror disk. Then this partition table is modified to expand public and private partitions. The two operations are performed successively. For the root disks with many partitions, udev holds open count on the disk partition to register partitions with OS. Since the udev has open count on disk partitions, it causes the vxrootmir(1M) operation to fail. RESOLUTION: The code has been changed to ensure that the udev queues are emptied before expanding the public and private regions. * 3350787 (Tracking ID: 2969335) SYMPTOM: The node that leaves the cluster node while the instant operation is in progress, hangs in the kernel and cannot join to the cluster node unless it is rebooted. The following stack trace is displayed in the kernel, on the node that leaves the cluster: voldrl_clear_30() vol_mv_unlink() vol_objlist_free_objects() voldg_delete_finish() volcvmdg_abort_complete() volcvm_abort_sio_start() voliod_iohandle() voliod_loop() DESCRIPTION: In a clustered environment, during any instant snapshot operation such as the snapshot refresh/restore/reattach operation that requires metadata modification, the I/O activity on volumes involved in the operation is temporarily blocked, and once the metadata modification is complete the I/Os are resumed. During this phase if a node leaves the cluster, it does not find itself in the I/O hold-off state and cannot properly complete the leave operation and hangs. An after effect of this is that the node will not be able to join to the cluster node. RESOLUTION: The code is modified to properly unblock I/Os on the node that leaves. This avoids the hang. * 3350789 (Tracking ID: 2938710) SYMPTOM: The vxassist(1M) command dumps core with the following stack during the relayout operation: relayout_build_unused_volume () relayout_trans () vxvmutil_trans () trans () transaction () do_relayout () main () DESCRIPTION: During the relayout operation, the vxassist(1M) command sends a request to the vxconfigd(1M) daemon to get the object record of the volume. If the request fails, the vxassist(1M) command tries to print the error message using the name of the object from the record retrieved. This causes a NULL pointer de- reference and subsequently dumps core. RESOLUTION: The code is modified to print the error message using the name of the object from a known reference. * 3350979 (Tracking ID: 3261485) SYMPTOM: The vxcdsconvert(1M) utility fails with following error messages: VxVM vxcdsconvert ERROR V-5-2-2777 : Unable to initialize the disk as a CDS disk VxVM vxcdsconvert ERROR V-5-2-2780 : Unable to move volume off of the disk VxVM vxcdsconvert ERROR V-5-2-3120 Conversion process aborted DESCRIPTION: As part of the conversion process, the vxcdsconvert(1M) utility moves all the volumes to some other disk, before the disk is initialized with the CDS format. On the VxVM formatted disks apart from the CDS format, the VxVM volume starts immediately in the PUBLIC partition. If LVM or FS was stamped on the disk, even after the data migration to some other disk within the disk group, this signature is not erased. As part of the vxcdsconvert operation, when the disk is destroyed, only the SLICED tags are erased but the partition table still exists. The disk is then recognized to have a file system or LVM on the partition where the PUBLIC region existed earlier. The vxcdsconvert(1M) utility fails because the vxdisksetup which is invoked internally to initialize the disk with the CDS format, prevents the disk initialization for any foreign FS or LVM. RESOLUTION: The code is modified so that the vxcdsconvert(1M) utility forcefully invokes the vxdisksetup(1M) command to erase any foreign format. * 3350989 (Tracking ID: 3152274) SYMPTOM: The I/O operations hang with Not-Ready(NR) or Write-Disabled(WD) LUNs. System Log floods with the I/O errors. The error messages are like: VxVM vxdmp V-5-0-0 [Error] i/o error occurred (errno=0x6) on dmpnode 201/0xb0 .. .. DESCRIPTION: For performance reasons, Dynamic Multi-Pathing (DMP) immediately routes failed I/O through alternate available path, while performing asynchronous error analysis on I/O failed path. Not-ready (NR) rejects all kinds of I/O requests. Write-Disabled(WD) devices reject write I/O requests. But these devices respond well to Small Computer System Interface (SCSI) probes like inquiry. So, those devices whose I/O is retried through DMP asynchronous error analysis for different paths are not terminated due to a code problem. RESOLUTION: The code is modified to better handle Not-Ready (NR) or Write-Disabled (WD) kinds of devices. DMP asynchronous error analysis code is modified to handle such cases. * 3351005 (Tracking ID: 2933476) SYMPTOM: The vxdisk(1M) command resize operation fails with the following generic error messages that do not state the exact reason for the failure: VxVM vxdisk ERROR V-5-1-8643 Device 3pardata0_3649: resize failed: Operation is not supported. DESCRIPTION: The disk-resize operation fails in in the following cases: 1. When shared disk has simple or nopriv format. 2. When GPT (GUID Partition Table) labeled disk has simple or sliced format. 3. When the Cross-platform Data Sharing (CDS) disk is part of the disk group which has version less than 160 and disk resize is done to greater than 1 TB size. RESOLUTION: The code is modified to enhance disk resize failure messages. * 3351035 (Tracking ID: 3144781) SYMPTOM: In Veritas Volume Replicator (VVR) environment, execution of the vxrlink pause command causes a hang on the secondary node and displays the following stack trace: schedule() schedule_timeout() rp_send_request() vol_rp_secondary_cmd() vol_rp_ioctl() vol_objioctl() vol_object_ioctl() voliod_ioctl() volsioctl_real() vols_ioctl() vols_compat_ioctl() DESCRIPTION: The execution of the vxrlink pause command causes a hang on the secondary node, if an rlink disconnect is already in progress. This issue is observed due to a race condition between the two activities: rlink disconnect and pause. RESOLUTION: The code is modified to prevent the occurrence of the race condition between rlink disconnect and pause operations. * 3351075 (Tracking ID: 3271985) SYMPTOM: In Cluster Volume Replication (CVR), with synchronous replication, aborting a slave node from the Cluster Volume Manager (CVM) cluster makes the slave node panic with the following stack trace: vol_spinlock() vol_rv_wrship_done() voliod_iohandle() voliod_loop() ... DESCRIPTION: When the slave node is aborted, it was processing a message from the log owner or master node. The cluster abort operation and the message processing contend for some common data. This results in the panic. RESOLUTION: The code is modified to make sure the cluster abort operation does not contend with message processing with the log owner. * 3351092 (Tracking ID: 2950624) SYMPTOM: Following error message is displayed on the Primary repstatus post when a node leaves the cluster: VxVM VVR vradmin ERROR V-5-52-488 RDS has configuration error related to the master and logowner. DESCRIPTION: When a slave node leaves the cluster, VxVM treats it as a critical configuration error. RESOLUTION: The code is modified to separate the status flags for Master or Logowner nodes and Slave nodes. * 3351125 (Tracking ID: 2812161) SYMPTOM: In a VVR environment, after the Rlink is detached, the vxconfigd(1M) daemon on the secondary host may hang. The following stack trace is observed: cv_wait delay_common delay vol_rv_service_message_start voliod_iohandle voliod_loop ... DESCRIPTION: There is a race condition if there is a node crash on the primary site of VVR and if any subsequent Rlink is detached. The vxconfigd(1M) daemon on the secondary site may hang, because it is unable to clear the I/Os received from the primary site. RESOLUTION: The code is modified to resolve the race condition. * 3351922 (Tracking ID: 2866299) SYMPTOM: When layered volumes under the RVG are stopped forcefully, the vxprint output shows the NEEDSYNC flag on layered volumes even after running vxrecover. DESCRIPTION: vxrecover moves top level volumes in layered volumes under RVG to "ACTIVE" state whereas subvolumes remain in "NEEDSYNC" state. When vxrecover moves top volumes to "ACTIVE", vxrecover skips resynchronization for subvolumes when it is under the RVG. RESOLUTION: The code is modified to recover subvolumes under the RVG correctly. * 3352027 (Tracking ID: 3188154) SYMPTOM: The vxconfigd(1M) daemon does not come up after enabling the native support and rebooting the host. DESCRIPTION: The issue occurs because vxconfigd treats the migration of logical unit numbers (LUNs) from JBODs to array support libraries (ASLs) as a Data Corruption Protection Activated (DCPA) condition. RESOLUTION: The code is fixed so that the LUN migration from JBODs to ASLs is not treated as a DCPA condition. * 3352208 (Tracking ID: 3049633) SYMPTOM: In Veritas Volume Replicator (VVR) environment, the VxVM configuration daemon vxconfigd(1m) hangs on secondary when all disk paths are disabled on secondary and displays the following stack trace: vol_rv_transaction_prepare() vol_commit_iolock_objects() vol_ktrans_commit() volconfig_ioctl() volsioctl_real() volsioctl() vols_ioctl() DESCRIPTION: In response to the disabled disk paths, a transaction is triggered to perform plex detach. However, failing I/Os, if restarted, may wait for the past I/Os to complete. RESOLUTION: It is recommended that if the SRL or data volume fails, free them and proceed with the transaction instead of restarting the writes to it. * 3352226 (Tracking ID: 2893530) SYMPTOM: When a system is rebooted and there are no VVR configurations, the system panics with following stack trace: nmcom_server_start() vxvm_start_thread_enter() ... DESCRIPTION: The panic occurs because the memory segment is accessed after it is released. The access happens in the VVR module and can happen even if no VVR is configured on the system. RESOLUTION: The code is modified so that the memory segment is not accessed after it is released. * 3352282 (Tracking ID: 3102114) SYMPTOM: System crash during the 'vxsnap restore' operation can cause the vxconfigd(1M) daemon to dump core with the following stack on system start-up: rinfolist_iter() process_log_entry() scan_disk_logs() ... startup() main() DESCRIPTION: To recover from an incomplete restore operation, an entry is made in the internal logs. If the corresponding volume to that entry is not accessible, accessing a non-existent record causes the vxconfigd(1M) daemon to dump core with the SIGSEGV signal. RESOLUTION: The code is modified to ignore such an entry in the internal logs, if the corresponding volume does not exist. * 3352963 (Tracking ID: 2746907) SYMPTOM: Under heavy I/O load, the vxconfigd(1M) daemon hangs on the master node during the reconfiguration. The stack is observed as follows: vxconfigd stack: schedule volsync_wait vol_rwsleep_rdlock vol_get_disks volconfig_ioctl volsioctl_real vols_ioctl vols_compat_ioctl compat_sys_ioctl cstar_dispatch DESCRIPTION: When there is a reconfiguration, the vxconfigd(1M) daemon tries to acquire the volop_rwsleep write lock. This attempt fails as the I/O takes the read lock. As a result, the vxconfigd(1M) daemon tries to get the write lock. Thus I/O load starves out the vxconfigd(1M) daemons attempt to get the write lock. This results in the hang. RESOLUTION: The code is modified so that a new API is used to block out the read locks when an attempt is made to get the write locks. When the API is used during the reconfiguration starvation, the write lock is avoided. Thereby, the hang issue is resolved. * 3353059 (Tracking ID: 2959333) SYMPTOM: For Cross-platform Data Sharing (CDS) disk group, the "vxdg list" command does not list the CDS flag when that disk group is disabled. DESCRIPTION: When the CDS disk group is disabled, the state of the record list may not be stable. Hence, it is not considered if disabled disk group was CDS. As a result, Veritas Volume Manager (VxVM) does not mark any such flag. RESOLUTION: The code is modified to display the CDS flag for disabled CDS disk groups. * 3353064 (Tracking ID: 3006245) SYMPTOM: While executing a snapshot operation on a volume which has 'snappoints' configured, the system panics infrequently with the following stack trace: ... voldco_copyout_pervolmap () voldco_map_get () volfmr_request_getmap () ... DESCRIPTION: When the 'snappoints' are configured for a volume by using the vxsmptadm(1M) command in the kernel, the relationship is maintained using a field. This field is also used for maintaining the snapshot relationships. Sometimes, the 'snappoints' field may wrongly be identified with the snapshot field. This causes the system to panic. RESOLUTION: The code is modified to properly identify the fields that are used for the snapshot and the 'snappoints' and handle the fields accordingly. * 3353131 (Tracking ID: 2874810) SYMPTOM: When you install DMP only solutions using the installdmp command, the root support is not enabled. DESCRIPTION: When you install DMP only solutions, the root support should be enabled but it is not enabled by default. RESOLUTION: The code is modified in the install scripts to enable the root support by default when you install DMP only solutions. * 3353244 (Tracking ID: 2925746) SYMPTOM: In CVM environment, during CVM reconfiguration cluster-wide vxconfigd hangs with the following stack trace: pse_sleep_thread() volktenter() vol_get_disks() volconfig_ioctl() volsioctl_real() volsioctl() vols_ioctl() DESCRIPTION: When vxconfigd is heavily loaded as compared to kernel thread during a join reconfiguration, the cluster-wide reconfiguration hangs and leads the cluster-wide vxconfigd to hang. In this case, the cluster reconfiguration sequence is a node leave followed by another node join. RESOLUTION: The code is modified to take care of the missing condition during successive reconfiguration processing. * 3353291 (Tracking ID: 3140835) SYMPTOM: When the reclaim operation is in progress using the TRIM interface, the path of one of the trim capable disks is disabled and this causes a panic in the system with the following stack trace: submit_bio()_ blkdev_issue_discard() dmp_trim() dmp_reclaim_device() dmp_reclaim_storage() gendmpioctl() dmpioctl() vol_dmp_ktok_ioctl() voldisk_reclaim_region() security_capable() conv_copyin() vol_reclaim_disk() vol_reclaim_storage() dentry_open() volconfig_ioctl() security_capable() volsioctl_real() rb_reserve_next_event() vols_ioct() vols_compat_ioct() compat_sys_ioct() DESCRIPTION: During a reclaim operation, the block device path is opened only if it is not already opened. Else, the existing block device path pointer is used. If the disabled path operation is called in parallel, the block device pointer is invalidated by some routine invoked during disable path context. When reclaim operation is executed on such invalidated block device pointer, system panics. RESOLUTION: The code is modified to avoid use of stale block device handler by taking appropriate open and close locks during reclaim operation. * 3353953 (Tracking ID: 2996443) SYMPTOM: In CVR environment, on shutting down the Primary-Master, the "vradmin repstatus" and "vradmin printrvg" command show the following configuration error: "vradmind server on host not responding or hostname cannot be resolved". DESCRIPTION: On the non-logowner slave nodes, the logowner change does not get correctly reflected. It prevents the slave from reaching the old logowner which is shut down. RESOLUTION: The code is modified to reflect the changes of master and logowner on the slave nodes. * 3353985 (Tracking ID: 3088907) SYMPTOM: A node in a Cluster Volume Manager(CVM) can panic while destroying a shared disk group. The following stack trace is displayed: Xvolupd_disk_iocnt_locked() volrdiskiostart() vol_disk_tgt_write_start() voliod_iohandle() voliod_loop() kernel_thread() DESCRIPTION: During a shared disk group destroy, disks in this disk group are moved to a common pool that holds all disks that are not part of any disk group. When the disk group destroy is done, all the existing I/Os on the disks belonging to this disk group are completed. However, the I/O shipping enabled IOs can arrive from remote nodes even after all the local I/Os are cleaned up. This results in I/Os accessing the freed-up resources, which causes the system panic. RESOLUTION: The code has been modified so that during the disk group destroys operation; appropriate locks are taken to synchronize movement of disks from shared disk group to the common pool. * 3353990 (Tracking ID: 3178182) SYMPTOM: During a master take over task, shared disk group re-import operation fails due to false serial split brain (SSB) detection. DESCRIPTION: The disk private region contents are not updated under certain conditions during a node join. As a result, SSB ID mismatch (due to a stale value in in-core) is detected during re-import operation as a part of master take over task which causes the re-import operation failure. RESOLUTION: The code is modified to update the disk header contents in the memory with the disk header contents on joining node to avoid false SSB detection during master take over operation. * 3353995 (Tracking ID: 3146955) SYMPTOM: A remote disk (lfailed or lmissing disks) goes into the "ONLINE INVALID LFAILED" or "ONLINE INVALID LMISSING" state after the disk loses global disk connectivity. It becomes difficult to recover the remote disk from this state even if the connectivity is restored. DESCRIPTION: The INVALID state implies that the disk private region contents were read but they were found to be invalid contents. Actually, the private region contents themselves cannot be read in this case, because none of the nodes in the cluster has connectivity to the disk (global connectivity failure). RESOLUTION: The code is modified so that the remote disk is marked with the "Error" state in the case of a global connectivity failure. Setting "Error" state on the remote disk helps the transition to the ONLINE state when the connectivity to the disk is restored at least on any one node in the cluster. * 3353997 (Tracking ID: 2845383) SYMPTOM: The site gets detached if the plex detach operation is performed with the site consistency set to off. DESCRIPTION: If the plex detach operation is performed on the last complete plex of a site, the site is detached to maintain the site consistency. The site should be detached only if the site consistency is set. Initially, the decision to detach the site is made based on the value of the 'allsites' flag. So, the site gets detached when the last complete plex is detached, even if the site consistency is off. RESOLUTION: The code is modified to ensure that the site is detached when the last complete plex is detached, only if the site consistency is set. If the site consistency is off and the 'allsites' flag is on, detaching the last complete plex leads to the plex being detached. * 3354023 (Tracking ID: 2869514) SYMPTOM: In the clustered environment with large Logical unit number (LUN) configuration, the node join process takes long time. It may cause the cvm_clus resource to timeout and finally bringing the dependent groups in partial state. DESCRIPTION: In the clustered environment with large LUN configuration, if the node join process is triggered, it takes long time to complete. The reason is that, disk online is called for detecting if disk is connected cluster-wide. Below the vxconfigd(1M) stack is seen on node which takes long time to join: ioctl() ddl_indirect_ioctl() do_read_capacity_35() do_read_capacity() do_spt_getcap() do_readcap() auto_info_get() auto_sys_online() auto_online() da_online() dasup_validate() dapriv_validate() auto_validate() setup_remote_disks() slave_response() fillnextreq() vold_getrequest() request_loop() main() RESOLUTION: Changes have been made to use connectivity framework to detect if storage is accessible cluster wide. * 3354024 (Tracking ID: 2980955) SYMPTOM: Dg goes into disabled state if the vxconfigd(1M) daemon is restarted on new master after master switch. DESCRIPTION: If the vxconfigd(1M) daemon restarts and master tries to import a dg referring to the stale tempdb copies(/etc/vx/tempdb), the dg goes into disabled state if on. The tempdb is created on the master node (instead of slave nodes) at the time of dg creation. On master switch the tempdb is not cleared, so when the node becomes the master again (after master switch), on vxconfigd restart it tries to import the dg using stale tempdb copies instead of creating new tempdb for the dg. RESOLUTION: The code is changed to prevent dg going to disabled state after vxconfigd(1M) restart on the new master. * 3354028 (Tracking ID: 3136272) SYMPTOM: In a CVM environment, the disk group import operation with the "-o noreonline" option takes additional import time. DESCRIPTION: On a slave node when the clone disk group import is triggered by the master node, the "da re-online" takes place irrespective of the "-o noreonline" flag passed. This results in the additional import time. RESOLUTION: The code is modified to pass the hint to the slave node when the "-o noreonline" option is specified. Depending on the hint, the "da re-online" is either done or skipped. This avoids any additional import time. * 3355830 (Tracking ID: 3122828) SYMPTOM: Dynamic Reconfiguration (DR) tool lists the disks which are tagged with Logical Volume Manager (LVM), for removal or replacement. DESCRIPTION: When the tunable DMP Native Support is turned 'OFF', the Dynamic Reconfiguration (DR) tool should not list the Logical Volume Manager (LVM) disks for removal or replacement. When the tunable DMP Native Support is turned 'ON', then the Dynamic Reconfiguration (DR) tool should list the LVM disks for removal or replacement provided there are no open counts in the Dynamic Multi-Pathing (DMP) layer. RESOLUTION: The code is modified to exclude the Logical Volume manager (LVM) disks for removal or replacement option in Dynamic Reconfiguration (DR) tool. * 3355856 (Tracking ID: 2909668) SYMPTOM: In case of multiple sets of the cloned disks of the same source disk group, the import operation on the second set of the clone disk fails, if the first set of the clone disks were imported with "updateid". Import fails with following error message: VxVM vxdg ERROR V-5-1-10978 Disk group firstdg: import failed: No tagname disks for import DESCRIPTION: When multiple sets of the clone disk exists for the same source disk group, each set needs to be identified with the separate tags. If one set of the cloned disk with the same tag is imported using the "updateid" option, it replaces the disk group ID on the imported disk with the new disk group ID. The other set of the cloned disk with the different tag contains the disk group ID. This leads to the import failure for the tagged import for the other sets, except for the first set. Because the disk group name maps to the latest imported disk group ID. RESOLUTION: The code is modified in case of the tagged disk group import for disk groups that have multiple sets of the clone disks. The tag name is given higher priority than the latest update time of the disk group, during the disk group name to disk group ID mapping. * 3355878 (Tracking ID: 2735364) SYMPTOM: The "clone_disk" disk flag attribute is not cleared when a cloned disk group is removed by the "vxdg destroy " command. DESCRIPTION: When a cloned disk group is removed by the "vxdg destroy " command, the Veritas Volume Manager (VxVM) "clone_disk" disk flag attribute is not cleared. The "clone_disk" disk flag attribute should be automatically turned off when the VxVM disk group is destroyed. RESOLUTION: The code is modified to turn off the "clone_disk" disk flag attribute when a cloned disk group is removed by the "vxdg destroy " command. * 3355883 (Tracking ID: 3085519) SYMPTOM: Missing disks are permanently detached from the disk group because -o updateid and tagname options are used to import partial disks. DESCRIPTION: If user imports the partial disks in a disk group using -o updateid and tagname, then the imported disk group will have different dgid. So the missing disks from disk group configurations will be left alone forever. This will lead to permanently detaching missing disks from the disk group. RESOLUTION: The code has been modified so that the -o updateid and tagname are not allowed in the partial disk group import. It now warns the users with an error message similar as the following: # vxdg -o tag=TAG2 -o useclonedev=on -o updateid import testdg2 VxVM vxdg ERROR V-5-1-10978 Disk group testdg2: import failed: Disk for disk group not found * 3355971 (Tracking ID: 3289202) SYMPTOM: If a Cluster Volume Manager (CVM) node is stopped (stopnode/abortnode) when there are outstanding disk-connectivity related messages initiated by the same node, then vxconfigd may hang with the following stack trace: volsync_wait() vol_syncwait() volcvm_get_connectivity() vol_test_connectivity() volconfig_ioctl() volsioctl_real() vols_ioctl() vols_compat_ioctl() compat_sys_ioctl() sysenter_dispatch() DESCRIPTION: When a CVM node is aborted, all the outstanding messages on that node are cleared or purged. The relevant data structure for the given messages are supposed to be set to proper values in this purge operation. But due to some reason, there is not any flag being set for disk-connectivity protocol message. As a result, the disk-connectivity protocol initiated by vxconfigd-thread will hang after the messaging layer clears the message. RESOLUTION: The code has been changed so that the appropriate flag is set when we purge the disk-connectivity messages in the node leave/abort case. Now the initiator thread (vxconfigd) can detect the correct flag value, so that it fails the internal disk-connectivity protocol gracefully and proceeds further. This way the vxconfigd hang is avoided. * 3355973 (Tracking ID: 3003991) SYMPTOM: System fails to add a disk to a shared disk group, if all the paths of the existing disks are disabled. DESCRIPTION: The operation fails because there is continuous retry of internally generated IO when the disk is added to the disk group. Due to some error in the internal IO code path, the IO completion count is not kept correctly, which causes this failure. RESOLUTION: The code has been changed to correctly keep the internal IO completion on the given in shared disk group. * 3356836 (Tracking ID: 3125631) SYMPTOM: The snapshot creation operation using the vxsnap make command on volume sets occasionally fails with the error: "vxsnap ERROR V-5-1-6433 Component volume has changed". It occurs primarily when snapshot operation runs on the volume set after a fresh mount of the file system. DESCRIPTION: The snapshot creation proceeds in multiple atomic stages which are called transactions. If some state changes outside the operation, the operation fails. In the release in question, the state of the volume changes to DIRTY after the first transaction. This is due to asynchronous I/Os after mounting the file system, and it led to the mentioned failure in the later stage. RESOLUTION: The code is modified to expect such changes after the first transaction and not deem that as failure. * 3361977 (Tracking ID: 2236443) SYMPTOM: In a VCS environment, the "vxdg import" command does not display an informative error message, when a disk group cannot be imported because the fencing keys are registered to another host. The following error messages are displayed: # vxdg import sharedg VxVM vxdg ERROR V-5-1-10978 Disk group sharedg: import failed: No valid disk found containing disk group The system log contained the following NOTICE messages: Dec 18 09:32:37 htdb1 vxdmp: NOTICE: VxVM vxdmp V-5-0-0 i/o error occured (errno=0x5) on dmpnode 316/0x19b Dec 18 09:32:37 htdb1 vxdmp: [ID 443116 kern.notice] NOTICE: VxVM vxdmp V-5-0-0 i/o error occured (errno=0x5) on dmpnode 316/0x19b DESCRIPTION: The error messages that are displayed when a disk group cannot be imported because the fencing keys are registered to another host, needs to be more informative. RESOLUTION: The code has been added to the VxVM disk group import command to detect when a disk is reserved by another host, and issue a SCSI3 PR reservation conflict error message. * 3361998 (Tracking ID: 2957555) SYMPTOM: The vxconfigd(1M) daemon on the CVM master node hangs in the userland during the vxsnap(1M) restore operation. The following stack trace is displayed: rec_find_rid() position_in_restore_chain() kernel_set_object_vol() kernel_set_object() kernel_dg_commit_kernel_objects_20() kernel_dg_commit() commit() dg_trans_commit() slave_trans_commit() slave_response() fillnextreq() DESCRIPTION: During the snapshot restore operation, when the volume V1 gets restored from the source volume V2, and at the same time if the volume V2 gets restored from V1 or the child of V1 , the vxconfigd(1M) daemon tries to find the position of the volume that gets restored in the snapshot chain. For such instances, finding the position in the restore chain, causes the vxconfigd(1M) daemon to enter in an infinite loop and hangs. RESOLUTION: The code is modified to remove the infinite loop condition when the restore position is found. * 3362062 (Tracking ID: 3010830) SYMPTOM: On Linux, during the script-based or web-based installation, post check verification for VRTSvxvm and VRTSaslapm may fail with the following error message: Veritas Storage Foundation and High Availability Postcheck did not complete successfully You can find the following warnings on the system: CPI WARNING V-9-30-1250 The following packages are not passed package verification on host: ... VRTSvxvm VRTSaslapm ... Detailed information can be found in the log files or summary file. The log file contains the following text: CPI CHECK 'rpm -V VRTSvxvm' on host return: ......G.. /etc/vx ......G.. /etc/vx/kernel ......G.. /opt/VRTS .....UG.. /opt/VRTS/man .....UG.. /opt/VRTS/man/man1 .....UG.. /opt/VRTS/man/man4 .....UG.. /opt/VRTS/man/man7 ......G.. /opt/VRTSvxms ......G.. /opt/VRTSvxms/lib ......G.. /opt/VRTSvxms/lib/map CPI CHECK 'rpm -V VRTSaslapm' on host return: ......G.. /etc/vx ......G.. /etc/vx/kernel DESCRIPTION: On Linux, installer runs post-installation checks, as a part of the script-based or web-based installation. Out of post-installation checks, the Redhat Package Manager (RPM) verification fails for the VRTSvxvm and the VRTSaslapm packages. RESOLUTION: On Linux, the packaging scripts are modified to make user and group permissions of the above directories same for VxVM, VxFS, and VCS. * 3362065 (Tracking ID: 2861011) SYMPTOM: The "vxdisk -g resize " command fails with an error for the Cross-platform Data Sharing(CDS) formatted disk. The error message is displayed as following: "VxVM vxdisk ERROR V-5-1-8643 Device : resize failed: One or more subdisks do not fit in pub reg" DESCRIPTION: During the resize operation, VxVM updates the VM disk's private region with the new public region size, which is evaluated based on the raw disk geometry. But for the CDS disks, the geometry information stored in the disk label is fabricated such that the cylinder size is aligned with 8KB. The resize failure occurs when there is a mismatch in the public region size obtained from the disk label, and that stored in the private region. RESOLUTION: The code is modified such the new public region size is now evaluated based on the fabricated geometry considering the 8 KB alignment for the CDS disks, so that it is consistent with the size obtained from the disk label. * 3362087 (Tracking ID: 2916911) SYMPTOM: The vxconfigd(1M) deamon triggered a Data TLB Fault panic with the following stack trace: _vol_dev_strategy volsp_strategy vol_dev_strategy voldiosio_start volkcontext_process volsiowait voldio vol_voldio_read volconfig_ioctl volsioctl_real volsioctl vols_ioctl spec_ioctl vno_ioctl ioctl syscall DESCRIPTION: The kernel_force_open_disk() function checks if the disk device is open. The device is opened only if not opened earlier. When the device is opened, it calls the kernel_disk_load() function which in-turn calls the VOL_NEW_DISK ioctl () function. If the VOL_NEW_DISK ioctl fails, the error is not handled correctly as the return values are not checked. This may result in a scenario where the open operation fails but the disk read or write operation proceeds. RESOLUTION: The code is modified to handle the VOL_NEW_DISK ioctl. If the ioctl fails during the open operation of the device that does not exist, then the read or write operations are not allowed on the disk. * 3362138 (Tracking ID: 2223258) SYMPTOM: The vxdisksetup command initializes the disk which already has Logical Volume Manager (LVM) or File System (FS) on it. DESCRIPTION: The vxdisksetup command checks for LVM or FS signature only at the beginning of the disk. The command doesn't check for the cases where partitions have LVM or FS. Thus vxdisksetup goes ahead and initializes the disk. RESOLUTION: The code has been modified, so that vxdisksetup also checks partitions for LVM or FS signature. It does not initialize the disk if it already has LVM or FS on it. * 3362144 (Tracking ID: 1942051) SYMPTOM: IO hangs on a master node after disabling the secondary paths from slave node and rebooting the slave node. DESCRIPTION: When the GAB receiver flow control gets enabled due to heavy I/O load, I/O hang happens. When GAB receiver flow control gets enabled, GAB asks other nodes not to send any more messages. So every node is blocked until it clears the receiver flow control. The receiver flow control is supposed to be cleared when the receiver queue reaches the "VOL_KMSG_RECV_Q_LOW" limit. RESOLUTION: Clear the receiver flow control, once the receive queue reaches the "VOL_KMSG_RECV_Q_LOW" limit. * 3362923 (Tracking ID: 3258531) SYMPTOM: On Linux, when you run the vxcdsconvert(1M) command, you may encounter the following error: # vxcdsconvert -g -o novolstop group relayout: ERROR! emc_clariion0_36-UR-001: Plex column/offset is not 0/0 for new vol DESCRIPTION: On Linux, when you run the vxcdsconvert(1M) command on RHEL 6.4 with disks in different formats, like simple, cds, and sliced, you may observe this issue. It happens because of the changed constructs of the sort(1) command on RHEL 6.4. RESOLUTION: On Linux, the code is modified to sort entries properly. * 3362948 (Tracking ID: 2599887) SYMPTOM: The DMP device paths that are marked as "Disabled" cannot be excluded from VxVM control. DESCRIPTION: The DMP device paths which encounter connectivity issues are marked as "DISABLED". These disabled DMP device paths cannot be excluded from VxVM control due to a check for active state on the path. RESOLUTION: The offending check has been removed so that a path can be excluded from VxVM control irrespective of its state. * 3365295 (Tracking ID: 3053073) SYMPTOM: DR (Dynamic Reconfiguration) Tool doesn't pick thin LUNs in "online invalid" state for disk remove operation. DESCRIPTION: DR tool script used to parse the list of disks incorrectly, resulting in skipping thin LUNs in "online invalid" state. RESOLUTION: DR scripts have been modified to correctly parse the list of disks to select them properly. * 3365313 (Tracking ID: 3067452) SYMPTOM: If new LUNs are added in the cluster, and its naming scheme has option avid set to 'no', then DR Tool changes the mapping between dmpnode and disk record. Example: After adding 2 new disks, they get indexed at the beginning of the DMP (Dynamic MultiPathing) device list and the mapping of 'DEVICE' and 'DISK' changes: # vxdisk list DEVICE TYPE DISK GROUP STATUS xiv0_0 auto:cdsdisk - - online thinrclm --> xiv0_1 auto:cdsdisk - - online thinrclm --> xiv0_2 auto:cdsdisk xiv0_0 dg1 online thinrclm shared xiv0_3 auto:cdsdisk xiv0_1 dg1 online thinrclm shared xiv0_4 auto:cdsdisk xiv0_2 dg1 online thinrclm shared xiv0_5 auto:cdsdisk xiv0_3 dg1 online thinrclm shared xiv0_6 auto:cdsdisk xiv0_4 dg1 online thinrclm shared DESCRIPTION: When Reconfiguration events are improperly handled to assign device names, disk record mappings are changed by DR tool. RESOLUTION: The code has been modified so that the DR script will prompt the user to decide whether to run 'vxddladm assign names' when a Reconfiguration event is generated. Here is an example of the prompt: Do you want to run command [vxddladm assign names] and regenerate DMP device names : * 3365321 (Tracking ID: 3238397) SYMPTOM: Dynamic Reconfiguration (DR) Tool's Remove LUNs option does not restart the vxattachd daemon. DESCRIPTION: Due to wrong sequence of starting and stopping of the vxattachd daemon in the DR tool, the Remove LUNs option does not start the vxattachd daemon. In such cases, the disks that go offline will not be detected automatically when they come back online and the whole site can go offline. RESOLUTION: The code is modified to correct the sequence of start/stop operations in the DR scripts. * 3365390 (Tracking ID: 3222707) SYMPTOM: The Dynamic Reconfiguration (DR) tool does not permit the removal of disks associated with a deported diskgroup(dg). DESCRIPTION: The Dynamic Reconfiguration tool interface does not consider the disks associated with a deported diskgroup as valid candidates for the removal of disk(s). RESOLUTION: The code is modified to include the disks associated with a deported diskgroup for removal. * 3368953 (Tracking ID: 3368361) SYMPTOM: When site consistency is configured within a private disk group and Cluster Volume Manager (CVM) is up, the reattach operation of a detached site fails. DESCRIPTION: When you try to reattach the detached site configured in a private disk-group with CVM up on that node, the reattach operation fails with the following error "Disk (disk_name) do not have connectivity from one or more cluster nodes". The reattach operation fails because you are not checking the shared attribute of the disk group when you apply the disk connectivity check for a private disk group. RESOLUTION: The code is modified to make the disk connectivity check explicit for a shared disk group by checking the shared attribute of a disk group. * 3384633 (Tracking ID: 3142315) SYMPTOM: Sometimes the udid_mismatch flag gets set on a disk due to Array Support Library (ASL) upgrade, consequently the disk is misidentified as clone disk after import. DESCRIPTION: Sometimes udid_mismatch flag gets set on a disk due to Array Support Library (ASL) upgrade, consequently the disk is misidentified as clone disk after import. The operation is frequently required after ASL upgrade when ASL has a new logic in udid formation. The following two separate commands are required to reset the clone_flag on a clone disk. vxdisk updateudid vxdisk set clone=off RESOLUTION: A new vxdisk option '-c' is introduced to reset the clone_disk flag and update the udid. * 3384636 (Tracking ID: 3244217) SYMPTOM: There is no way to reset clone flag during import of disk group. DESCRIPTION: If user wants to reset clone flag on disk during import of a disk group. Then one has to deport the disk group. Then on each disk reset the clone flag and then import the disk group. RESOLUTION: The code has been modified to provide a -c option during disk group import to reset the clone flag. * 3384662 (Tracking ID: 3127543) SYMPTOM: Non-labeled disks go into udid_mismatch after vxconfigd restart. DESCRIPTION: Non-labeled disks go into udid_mismatch as the udid is not stamped on disk. The udid provided by Array Support Libraries (ASL) is compared with the invalid value, so the disk is marked as udid_mismatch. RESOLUTION: The code is modified so that the system does not compare the udid for non-labeled disks. * 3384697 (Tracking ID: 3052879) SYMPTOM: Auto import of the cloned disk group fails after reboot even when source disk group is not present. DESCRIPTION: If the source disk group is not present on the host, then the imported clone disk group on this host is not automatically imported after reboot. This was not allowed earlier. RESOLUTION: The code is modified. Now the auto import of the clone disk group is allowed as long as the disk group is imported prior to reboot, no matter whether the source disk group is available on host or not. * 3384986 (Tracking ID: 2996142) SYMPTOM: Data may corrupt or get lost due to incorrect mapping from DA to DM of a disk. DESCRIPTION: For various hardware or operating system issues, some of the disks lose VM configuration or label or partition, so the disk becomes 'online invalid' or 'error'. In an attempt to import disk group, those disks cannot be imported because the DM record is lost and the disk becomes 'failed disk'. For example: -- # vxdisk -o alldgs list|grep sdg hdisk10 auto:cdsdisk hdisk10 sdg online shared - - hdisk9 sdg failed was:hdisk9 RESOLUTION: Unique disk identifier (UDID) provided by device discovery layer (DDL) is added in DM record when a DM record is associated with the disk in the disk group. It helps identify the failed disks correctly. * 3386843 (Tracking ID: 3279932) SYMPTOM: The vxdisksetup and vxdiskunsetup utilities was failing on disk which is part of deported disk group (DG), even if "-f" option is specified. The vxdisksetup command fails with following error: VxVM vxedpart ERROR V-5-1-10089 partition modification failed : Device or resource busy The vxdiskunsetup command fails following error: VxVM vxdisk ERROR ERROR V-5-1-0 Device appears to be owned by disk group . Use -f option to force destroy. VxVM vxdiskunsetup ERROR V-5-2-5052 : Disk destroy failed. DESCRIPTION: The vxdisksetup and vxdiskunsetup utilities internally call the "vxdisk" utility. Due to a defect in vxdisksetup and vxdiskunsetup, the vxdisk operation used to fail on disk which is part of deported DG, even if "force" operation is requested by user. RESOLUTION: Code changes are done to the vxdisksetup and vxdiskunsetup utilities so that when "-f" option is specified the operation succeeds. * 3387847 (Tracking ID: 3326900) SYMPTOM: During execution of the vxunroot command, the following warning message is observed: "ERROR /opt/VRTS/bin/vxunroot: line 956: [: missing `]' VxVM vxunroot ERROR V-5-2-0 tune2fs failed to assign new uuids to rootdisk partitions" DESCRIPTION: Due to incorrect logical and relational operator in the vxunroot script, shell was not able to interpret it, which causes the warnings. RESOLUTION: The code is modified to use the correct logical and relational operator. * 3394692 (Tracking ID: 3421236) SYMPTOM: The vxautoconvert/vxvmconvert utility command fails to convert LVM (Logical Volume Manager) to VxVM (Veritas Volume Manager) causing data loss. DESCRIPTION: The following issues were observed with vxautoconvert and vxvmconvert tools: 1) Disk's partition table used is not in sync with the kernel copy of partition table on disk. This is because after running the pvcreate command, it writes LVM stamp but destroys the partition table on disk. Therefore, the LVM used is referred to as stale entry. 2) Vxautoconvert does not export current LVM version, hence, during further operation, the comparison of the LVM version fails. 3) LVM command "pvmove" fails with multi-pathed disk. 4) If the conversion is cancelled, then the stopped LVM volumes are not reverted back to active state. RESOLUTION: The code is modified to ensure that the disk partition table is in sync, to hide extra path during the pvmove operation so that the operation is successful. The code is also modified such that the vxautoconvert exports current LVM version and if, the conversion is cancelled, then the stopped LVM volumes are restored to active state. * 3395095 (Tracking ID: 3416098) SYMPTOM: "vxvmconvert" utility throws error during execution. The following error message is displayed. "[: 100-RHEL6: integer expression expected". DESCRIPTION: In vxvmconvert utility, the sed (stream editor) expression which is used to parse and extract the Logical Volume Manager (LVM) version is not appropriate and fails for particular RHEL(Red Hat Enterprise Linux) version. RESOLUTION: Code changes are done so that the expression parses correctly. * 3395499 (Tracking ID: 3373142) SYMPTOM: Manual pages for vxedit and vxassist do not contain details about updated behavior of these commands. DESCRIPTION: 1. vxedit manual page On this page it explains that if the reserve flag is set for a disk, then vxassist will not allocate a data subdisk on that disk unless the disk is specified on the vxassist command line. But Data Change Object (DCO) volume creation by vxassist or vxsnap command will not honor the reserved flag. 2. vxassist manual page DCO allocation policy has been updated starting from 6.0. The allocation policy may not succeed if there is insufficient disk space. The vxassist command then uses available space on the remaining disks of the disk group. This may prevent certain disk group from splitting or moving if the DCO plexes cannot accompany their parent data volume. RESOLUTION: The manual pages for both commands have been updated to reflect the new behavioral changes. * 3401836 (Tracking ID: 2790864) SYMPTOM: For OTHER_DISKS enclosure, the vxdmpadm config reset CLI fails while trying to reset IO Policy value. DESCRIPTION: The 'vxdmpadm config reset' CLI sets all DMP entity properties to default values. For OTHER_DISKS enclosure, default IO policy is shown as 'Vendor Defined', but this does not comply with a DMP IO policy. As a result, the config reset CLI fails while resetting IO Policy value. RESOLUTION: The code is modified so that for OTHER_DISKS enclosure, the default IO policy is set to SINGLE-ACTIVE when the 'vxdmpadm config reset' operation is performed. * 3405318 (Tracking ID: 3259732) SYMPTOM: In a Clustered Volume Replicator (CVR) environment, if the SRL size grows and if it is followed by a slave node leaving and then re-joining the cluster then rlink is detached. DESCRIPTION: After the slave re-joins the cluster, it does not correctly receive and process the SRL resize information received from the master. This means that application writes initiated on this slave may corrupt the SRL causing rlink to detach. RESOLUTION: The code is modified so that when a slave joins the cluster, make sure that the SRL resize related information is correctly received and processed by the slave. * 3408321 (Tracking ID: 3408320) SYMPTOM: Thin reclamation fails for EMC 5875 arrays with the following message: # vxdisk reclaim Reclaiming thin storage on: Disk : Reclaim Partially Done. Device Busy. DESCRIPTION: As a result of recent changes in EMC Microcode 5875, Thin Reclamation for EMC 5875 arrays fails because reclaim request length exceeds the maximum "write_same" length supported by the array. RESOLUTION: The code has been modified to correctly set the maximum "write_same" length of the array. * 3409473 (Tracking ID: 3409612) SYMPTOM: Fails to run "vxtune reclaim_on_delete_start_time " if the specified value is outside the range of 22:00-03:59 (E.g. setting it to 04:00 or 19:30 fails). DESCRIPTION: Tunable reclaim_on_delete_start_time can be set to any time value within 00:00 to 23:59. But because of the wrong regular expression to parse time, it cannot be set to all values in 00:00 - 23:59. RESOLUTION: The regular expression has been updated to parse time format correctly. Now all values in 00:00-23:59 can be set. * 3413044 (Tracking ID: 3400504) SYMPTOM: While disabling the host side HBA port, extended attributes of some devices are not present anymore. This happens even when there is a redundant controller present on the host which is in enabled state. An example output is shown below where the 'srdf' attribute of an EMC device (which has multiple paths through multiple controllers) gets affected. Before the port is disabled- # vxdisk -e list emc1_4028 emc1_4028 auto:cdsdisk emc1_4028 dg21 online c6t5000097208191154d112s2 srdf-r1 After the port is disabled- # vxdisk -e list emc1_4028 emc1_4028 auto:cdsdisk emc1_4028 dg21 online c6t5000097208191154d112s2 - DESCRIPTION: The code which prints the extended attributes isused to print the attributes of the first path in the list of all paths. If the first path belongs to the controller which is disabled, its attributes will be empty. RESOLUTION: The code is modified to look for the path in enabled state among all the paths and then print the attributes of such path. If all the paths are in disabled state, no attributes will be shown. * 3414151 (Tracking ID: 3280830) SYMPTOM: Multiple vxresize operations on layered volume fail. Following error message is observed. "ERROR V-5-1-16092 Volume : There are other recovery activities. Cannot grow volume". DESCRIPTION: Veritas Volume Manager internally maintains recovery offset for each volume which indicates the length of so far recovered volume. The Shrinkto operation called on the volume sets incorrect recovery offset. The following growto operation called on the same volume treats volume to be in recovery phase due to incorrect recovery offset set by the earlier shrink to operation. RESOLUTION: The code is modified to correctly set volume's recovery offset. * 3414265 (Tracking ID: 2804326) SYMPTOM: Secondary logging is seen in effect even if Storage Replicator Log (SRL) size mismatch is seen across primary and secondary. DESCRIPTION: Secondary logging should be turned off if there is mismatch in SRL size across primary and secondary. Considering that SRL size is the same before start of replication, if SRL size is increased on primary after replication is turned on, secondary logging would get turned off. In other case when SRL size is increased on secondary, secondary logging is not turned off. RESOLUTION: The code has been modified to turn off secondary logging when SRL size is changed on secondary. * 3416320 (Tracking ID: 3074579) SYMPTOM: The "vxdmpadm config show" CLI does not display the configuration file name which is present under the root(/) directory. DESCRIPTION: The "vxdmpadm config show" CLI displays the configuration file name which is loaded using the "vxdmpadm config load file" CLI. If a file is located under the root(/) directory, the "vxdmpadm config show" CLI does not display the name of such files. RESOLUTION: The code has been modified to display any configuration files that are loaded. * 3416406 (Tracking ID: 3099796) SYMPTOM: When the vxevac command is invoked on a volume with DCO log associated, it fails with error message "volume volname_dcl is not using the specified disk name". The error is only seen only for log volumes. No error is seen in case of simple volumes. DESCRIPTION: The vxevac command evacuates all volumes from a disk. It moves sub-disks of all volumes off the specified VxVM disk to the destination disks or any non-volatile, non-reserved disks within the disk group. During the evacuation of data volumes, it also implicitly evacuates the log volumes associated with them. If log volumes are explicitly placed in the list of volumes to be evacuated, above error is seen as those log volumes have already been evacuated off the disk though their corresponding data volumes. RESOLUTION: The code has been updated to avoid explicit evacuation of log volumes. * 3417081 (Tracking ID: 3417044) SYMPTOM: The system becomes unresponsive while creating Veritas Volume Replication (VVR) TCP connection. The vxiod kernel thread reports the following stack trace: mt_pause_trigger() wait_for_lock() spinlock_usav() kfree() t_kfree() kmsg_sys_free() nmcom_connect() vol_rp_connect() vol_rp_connect_start() voliod_iohandle() voliod_loop() DESCRIPTION: When multiple TCP connections are configured, some of these connections are still in the active state and the connection request process function attempts to free a memory block. If this block is already freed by a previous connection, then the kernel thread may become unresponsive on a HPUX platform. RESOLUTION: The code is modified to resolve the issue of freeing a memory block which is already freed by another connection. * 3417672 (Tracking ID: 3287880) SYMPTOM: In a clustered environment, if a node doesn't have storage connectivity to clone disks, then the vxconfigd on the node may dump core during the clone disk group import. The stack trace is as follows: chosen_rlist_delete() dg_import_complete_clone_tagname_update() req_dg_import() vold_process_request() DESCRIPTION: In a clustered environment, if a node doesn't have storage connectivity to clone disks due to improper cleanup handling in clone database, then the vxconfigd on the node may dump core during the clone disk group import. RESOLUTION: The code has been modified to properly cleanup clone database. * 3420074 (Tracking ID: 3287940) SYMPTOM: Logical unit number (LUN) from the EMC CLARiiONarray and having NR (Not Ready) state are shown in the state of online invalid by Veritas Volume Manager (VxVM). DESCRIPTION: Logical unit number (LUN) from EMC CLARiiON array and having NR state are shown in the state of online invalid by Veritas Volume Manager (VxVM). The EMC CLARiiON array does not have mechanism to communicate the NR state of LUN, so VxVM cannot recognize it. However, read operation on these LUNs fails. Due to defect in the disk online operation, the read failure is ignored and causes the disk online to succeed. Thus, these LUNs are shown as online invalid. RESOLUTION: Changes have been made to recognize and propagate the disk read operation failure during the online operation. So the EMC CLARiiON disks with NR state are shown in error state. * 3423613 (Tracking ID: 3399131) SYMPTOM: The following command fails with an error for a path managed by Third Party Driver (TPD) which co-exists with DMP. # vxdmpadm -f disable path= VxVM vxdmpadm ERROR V-5-1-11771 Operation not support DESCRIPTION: The Third Party Drivers manage the devices with or without the co-existence of Dynamic Multi Path driver. Disabling the paths managed by a third party driver which does not co-exist with DMP is not supported. But due to bug in the code, disabling the paths managed by a third party driver which co-exists with DMP also fails. The same flags are set for all third party driver devices. RESOLUTION: The code has been modified to block this command only for the third party drivers which cannot co-exist with DMP. * 3423644 (Tracking ID: 3416622) SYMPTOM: The hotArelocation feature fails with the following message for a corrupted disk in the Cluster Volume Manager (CVM) environment due to the disk connectivity check. vxprint output after hot relocation failure for corrupted disk ams_wms0_15. Disk group: testdg TY NAME ASSOC KSTATE LENGTH PLOFFS STATE TUTIL0 PUTIL0 dg testdg testdg - - - - - - dm ams_wms0_14 ams_wms0_14 - 4043008 - - - - dm ams_wms0_15 - - - - NODEVICE - - dm ams_wms0_16 ams_wms0_16 - 4043008 - SPARE - - v vol1 fsgen ENABLED 4042752 - ACTIVE - - pl vol1-01 vol1 ENABLED 4042752 - ACTIVE - - sd ams_wms0_14-01 vol1-01 ENABLED 4042752 0 - - - pl vol1-02 vol1 DISABLED 4042752 - NODEVICE - - sd ams_wms0_15-01 vol1-02 DISABLED 4042752 0 NODEVICE - - No hot relocation is done even a spare disk with sufficient space is available in locally connected disk in the CVM environment. DESCRIPTION: The hot-relocation feature fails due to connectivity check that wrongly assumes the disk being relocated is remotely connected. Additionally the disk is not selected for relocation due to a mismatch on Unique Disk ID (UDID) during the check. RESOLUTION: The code is modified to introduce an avoidance fix in case of corrupted source disk irrespective of whether it is local or remote. However, the hot-relocation feature functions only on locally connected target disk at master node. * 3424795 (Tracking ID: 3424798) SYMPTOM: Veritas Volume Manager (VxVM) mirror attach operations (e.g., plex attach, vxassist mirror, and third-mirror break-off snapshot resynchronization) may take longer time under heavy application I/O load. The vxtask list command shows tasks are in the 'auto-throttled (waiting)' state for a long time. DESCRIPTION: With the AdminIO de-prioritization feature, VxVM administrative I/O's (e.g. plex attach, vxassist mirror, and third-mirror break-off snapshot resynchronization) are de-prioritized under heavy application I/O load, but this can lead to very slow progress of these operations. RESOLUTION: The code is modified to disable the AdminIO de-prioritization feature. * 3427124 (Tracking ID: 3435225) SYMPTOM: In a given CVR setup, rebooting the master node causes one of the slaves to panic with following stack: pse_sleep_thread vol_rwsleep_rdlock vol_kmsg_send_common vol_kmsg_send_prealloc cvm_obj_sendmsg_prealloc vol_rv_async_done volkcontext_process voldiskiodone DESCRIPTION: The issue is triggered by one of the code paths sleeping in interrupt context. RESOLUTION: The code is modified so that sleep is not invoked in interrupt context. * 3427480 (Tracking ID: 3163549) SYMPTOM: If slave node tries to join the cluster with master node that has the same set of disks which are missing on master, the vxconfigd(1M) daemon may hang on master with following stack: kernel_vsyscall() ioctl() kernel_ioctl() kernel_write_disk() dasup_write() priv_update_header() priv_update_toc() priv_check() dasup_validate() dapriv_validate() auto_validate() dg_kernel_dm_changes() dg_kernel_changes() client_trans_start() dg_trans_start() dg_check_kernel() vold_check_signal() request_loop() main() DESCRIPTION: Since local connection doesnAt exist on the master, priv region I/O fails. The I/O is retried by ioshipping as other nodes have connection to this disk and a signal to vold is generated. The remote I/O succeeds and vold level transaction goes through fine. Then vold picks up the signal and initiates an internal transaction that is successfully completed on vold level. The operation results in initiating transaction on kernel level. The disk doesnAt change the disk group, but we end up picking vol_nulldg for this disk even though it is part of shared dg. Consequently it ends up switching to disk I/O policy even though it is not enabled. Thus the system keeps switching between remote and local policy continuously. RESOLUTION: The code is changed to pick appropriate dg. * 3429328 (Tracking ID: 3433931) SYMPTOM: The AvxvmconvertAutility fails to get the correct Logical Volume Manager (LVM) version and reports following message: "vxvmconvert is not supported if LVM version is lower than 2.02.61 and Disk having multiple path" DESCRIPTION: In the AvxvmconvertA utility, an incorrect stream editor (sed) expression is used to parse and extract the LVM version. Due to which the expression failed for build versions greater than two digits. RESOLUTION: The code is modified to parse the expression correctly. * 3434079 (Tracking ID: 3385753) SYMPTOM: Replication to the Disaster Recovery (DR) site hangs even though Replication links (Rlinks) are in the connected state. DESCRIPTION: Based on Network conditions under User Datagram Protocol (UDP), Symantec Replication Option (Veritas Volume Replicator (VVR) has its own flow control mechanism to control the flow/amount of data to be sent over the network. Under error prone network conditions which cause timeouts, VVR's flow control values become invalid, resulting in replication hang. RESOLUTION: The code is modified to ensure valid values for the flow control even under error prone network conditions. * 3434080 (Tracking ID: 3415188) SYMPTOM: Filesystem/IO hangs during data replication with Symantec Replication Option (VVR) with the following stack trace: schedule() volsync_wait() volopobjenter() vol_object_ioctl() voliod_ioctl() volsioctl_real() vols_ioctl() vols_compat_ioctl() compat_sys_ioctl() sysenter_dispatch() DESCRIPTION: One of the structures of Symantec Replication Option that are associated with Storage Replicator Log (SRL) can become invalid because of improper locking mechanism in code that leads to IO/file system hang. RESOLUTION: The code is changed to have appropriate locks to protect the code. * 3434189 (Tracking ID: 3247040) SYMPTOM: The execution of the "vxdisk scandisks" command enables the PP enclosure which was previously disabled using the "vxdmpadm disable enclosure=" command. DESCRIPTION: During device discovery due to a wrong check for PP enclosure, Dynamic Multi-pathing (DMP) destroys the old PP enclosure from device discovery layer (DDL) database and adds it as a new enclosure. This process removes all the old flags that are set on PP enclosure, and then DMP treats the enclosure as enabled due to the absence of the required flags. RESOLUTION: The code is modified to keep PP enclosure in the DDL database during device discovery, and the existing flags on the paths of the PP enclosure will not be reset. * 3435000 (Tracking ID: 3162987) SYMPTOM: When the disk is disconnected from the node due a cable pull or using the zone remove sort of operations, the disk has a UDID_MISMATCH flag in the vxdisk list output. DESCRIPTION: To verify if the disk has a mismatch udid DDL disk entry and private region udid, both the udids are retrieved and compared. In case the disk does not have connectivity, the udids mismatch value is set for the INVALID_UDID string prior the udid mismatch check. RESOLUTION: The code is modified such that the udid mismatch condition check is not performed for the disks that don't have connectivity. * 3435008 (Tracking ID: 2958983) SYMPTOM: Memory leak is observed during the reminor operation in the vxconfigd binary. DESCRIPTION: The reminor code path has a memory leak issue. And, this code path gets traversed when the auto reminor value is set to AOFFA. RESOLUTION: The code is modified to free the memory when the auto reminor value is set to AOFFA. * 3461200 (Tracking ID: 3163970) SYMPTOM: The "vxsnap -g syncstart " command hangs on the Veritas Volume Replicator (VVR) DR site with the following stack trace: cv_wait() delay_common() delay() vol_object_ioctl() voliod_ioctl() volsioctl_real() spec_ioctl() fop_ioctl() ioctl() syscall_trap32() DESCRIPTION: The vxsnap(1M) command internally calls the vxassist(1M) command, which is waiting for the open operation to be successful on a volume. When the operation fails because of a mismatch in the counter values causing the vxsnap(1M) command to become unresponsive. RESOLUTION: The code is modified such that the 'open on' operation does not fail on the volume. * 3358310 (Tracking ID: 2743870) SYMPTOM: You may see continuous I/O error messages of "Too many sg segments". The message may look like: lpfc_scsi_prep_dma_buf_s3: Too many sg segments from dma_map_sg. Config 64, seg_cnt 65 DESCRIPTION: Volume Manager breaks large I/O according to max_hw_segments of the device's request queue. When I/O fails with a retryable error, the system retries the original I/O instead of the split I/O (sub chain). If the original I/O is bigger than the hw_segment limit of the device's request queue, the device reports I/O errors. RESOLUTION: The code is modified to handle large request queues. * 3358313 (Tracking ID: 3194358) SYMPTOM: Continuous messages in syslog with EMC not-ready (NR) Logical units. Messages from syslog (/var/log/messages): .. .. Jun 30 19:33:32 d2950-rs2 kernel: [ 1566.120349] sd 6:0:0:9: [sdat] Device not ready Jun 30 19:33:32 d2950-rs2 kernel: [ 1566.120351] sd 6:0:0:9: [sdat] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jun 30 19:33:32 d2950-rs2 kernel: [ 1566.120355] sd 6:0:0:9: [sdat] Sense Key : Not Ready [current] Jun 30 19:33:32 d2950-rs2 kernel: [ 1566.120358] sd 6:0:0:9: [sdat] Add. Sense: Logical unit not ready, manual intervention required Jun 30 19:33:32 d2950-rs2 kernel: [ 1566.120361] sd 6:0:0:9: [sdat] CDB: Read(10): 28 00 00 00 00 00 00 00 80 00 Jun 30 19:33:32 d2950-rs2 kernel: [ 1566.120369] end_request: I/O error, dev sdat, sector 0 Jun 30 19:33:32 d2950-rs2 kernel: [ 1566.125885] VxVM vxdmp V-5-0-112 disabled path 66/0xd0 belonging to the dmpnode 201/0x140 due to path failure Jun 30 19:33:32 d2950sr2 kernel: VxVM vxdmp V-5-0-0 [Error] i/o error occurred (errno=0x5) on dmpnode 201/0x1c0 .. .. DESCRIPTION: VxVM tries to online the EMC not-ready (NR) logical units. As part of the disk online process, it tries to read the disk label from the logical unit. Because the logical unit is NR the I/O fails. The failure messages are displayed in the syslog file. RESOLUTION: The code is modified to skip the disk online for the EMC NR LUNs. * 3358345 (Tracking ID: 2091520) SYMPTOM: Customers cannot selectively disable VxVM configuration copies on the disks associated with a disk group. DESCRIPTION: An enhancement is required to enable customers to selectively disable VxVM configuration copies on disks associated with a disk group. RESOLUTION: The code is modified to provide a "keepmeta=skip" option to the vxdiskset(1M) command to allow a customer to selectively disable VxVM configuration copies on disks that are a part of the disk group. * 3358346 (Tracking ID: 3353211) SYMPTOM: A. After EMC Symmetrix BCV (Business Continuance Volume) device switches to read-write mode, continuous vxdmp (Veritas Dynamic Multi Pathing) error messages flood syslog as shown below: NOTE VxVM vxdmp V-5-3-1061 dmp_restore_node: The path 18/0x2 has not yet aged - 299 NOTE VxVM vxdmp 0 dmp_tur_temp_pgr: open failed: error = 6 dev=0x24/0xD0 NOTE VxVM vxdmp V-5-3-1062 dmp_restore_node: Unstable path 18/0x230 will not be available for I/O until 300 seconds NOTE VxVM vxdmp V-5-3-1061 dmp_restore_node: The path 18/0x2 has not yet aged - 299 NOTE VxVM vxdmp V-5-0-0 [Error] i/o error occurred (errno=0x6) on dmpnode 36/0xD0 .. .. B. DMP metanode/path under DMP metanode gets disabled unexpectedly. DESCRIPTION: A. DMP caches the last discovery NDELAY open for the BCV dmpnode paths. BCV device switching to read-write mode is an array side operation. Typically in such cases, the system administrators are required to run the following command: 1. vxdisk rm OR In case of parallel backup jobs, 1. vxdisk offline 2. vxdisk online This causes DMP to close cached open, and during the next discovery, the device is opened in read-write mode. If the above steps are skipped then it causes the DMP device to go in state where one of the paths is in read-write mode and the others remain in NDELAY mode. If the above layers request for NORMAL open, then the DMP has the code to close NDELAY cached open and reopen in NORMAL mode. When the dmpnode is online, this happens only for one of the paths of dmpnode. B. DMP performs error analysis for paths on which IO has failed. In some cases, the SCSI probes are sent, failed with the return value/sense codes that are not handled by DMP. This causes the paths to get disabled. RESOLUTION: A. The code is modified for the DMP EMC ASL (Array Support Library) to handle case A for EMC Symmetrix arrays. B. The DMP code is modified to handle the SCSI conditions correctly for case B. * 3358348 (Tracking ID: 2665425) SYMPTOM: The vxdisk -px "attribute" list(1M) Command Line Interface (CLI) does not support some basic VxVM attributes, nor does it allow the user to specify multiple attributes in a specific sequence. The display layout was not presented in a readable or parsable manner. DESCRIPTION: The vxdisk -px "attribute" list(1M) CLI, which is useful for customizing the command output, does not support Some basic VxVM disk attributes. The display output is also not aligned by column or suitable for parsing by a utility. In addition, the CLI does not allow multiple attributes to be specified in a usable manner. RESOLUTION: Support for the following VxVM disk attributes has been added to the CLI: SETTINGS ALERTS INFO HOSTID DISK_TYPE FORMAT DA_INFO PRIV_OFF PRIV_LEN PUB_OFF PUB_LEN PRIV_UDID DG_NAME DGID DG_STATE DISKID DISK_TIMESTAMP STATE The CLI has been enhanced to support multiple attributes separated by a comma, and to align the display output by a column, separable by a comma for parsing. For example: # vxdisk -px ENCLOSURE_NAME, DG_NAME, LUN_SIZE, SETTINGS, state list DEVICE ENCLOSURE_NAME DG_NAME LUN_SIZE SETTINGS STATE sda disk - 143374650 - online sdb disk - 143374650 - online sdc storwizev70000 fencedg 10485760 thinrclm, coordinator online * 3358351 (Tracking ID: 3158320) SYMPTOM: VxVM (Veritas Volume Manager) command "vxdisk -px REPLICATED list (disk)" displays wrong output. DESCRIPTION: The "vxdisk -px REPLICATED list disk" command when executed, shows the same output as "vxdisk -px REPLICATED_TYPE list disk" and does not work as designed to show the values as "yes", "no" or "-". The command line parameter specified is parsed incorrectly and hence, the REPLICATED attribute is wrongly dealt as REPLICATED_TYPE. RESOLUTION: The code is modified to display the "REPLICATED" attribute correctly. * 3358352 (Tracking ID: 3326964) SYMPTOM: VxVM () hangs in CVM environment in presence of Fast Mirror Resync FMR)/Flashsnap operations with the following stack trace: voldco_cvm_serialize() voldco_serialize() voldco_handle_dco_error() voldco_mapor_sio_done() voliod_iohandle() voliod_loop() child_rip() voliod_loop() child_rip() DESCRIPTION: During split brain testing in presence of FMR activities, when errors occur on the Data change object (DCO), the DCO error handling code sets up a flag due to which the same error gets set again in its handler. Consequently the VxVM Staged I/O (SIO) loop around the same code and causes the hang. RESOLUTION: The code is changed to appropriately handle the scenario. * 3358354 (Tracking ID: 3332796) SYMPTOM: The following message is seen while initializing any EFI disk, though the disk was not used previously as ASM disk. "VxVM vxisasm INFO v-5-1-0 seeking block #... " DESCRIPTION: As a part of disk initialization for every EFI disk, VxVM will check if an EFI disk has ASM label. "VxVM vxisasm INFO v-5-1-0 seeking block #..." is printed unconditionally and this is unreasonable. RESOLUTION: Code changes have been made to not display the message. * 3358367 (Tracking ID: 3230148) SYMPTOM: CVM hangs during split brain testing with the following stack trace: volmv_cvm_serialize() vol_mv_wrback_done() voliod_iohandle() voliod_loop() kernel_thread_helper() DESCRIPTION: During split brain testing in presence of Fast Mirror Resync (FMR) activities, a read-writeback operation or Staged I/O (SIO) can be issued as part of Data change object (DCO) chunk update. The SIO tries to read from plex1, and when the read operation fails, it reads from other available plex(es) and performs a write on all other plexes. As the other plex has already failed, write operation also fails and gets retried with IOSHIPPING, which also fails due to unavailable plex from other nodes as well (because of split brain testing). As remote plex is unavailable, write fails again and serialization is called again on the SIO during which system hangs due to mismatch in active and serial counts. RESOLUTION: The code is changed to take care of active or serial counts when the SIOs are restarted with IOSHIPPING. * 3358368 (Tracking ID: 3249264) SYMPTOM: Veritas Volume Manager (VxVM) thin disk reclamation functionality causes disk label loss, private region corruption and data corruption. Following DESCRIPTION: The partition offset is not taken into consideration when VxVM calls array specific reclamation interface. Incorrect data blocks, which may have disk label and VxVM private/public region content, are reclaimed. RESOLUTION: Code changes have been made to take partition offset into consideration when calling array specific reclamation interface. * 3358369 (Tracking ID: 3250369) SYMPTOM: Execution of vxdisk scandisks command causes endless I/O error messages in syslog. DESCRIPTION: Execution of the command triggers a re-online of all the disks, which involves reading of the private region from all the disks. Failure of these reading of I/Os generate error events, which are notified to all the clients waiting on "vxnotify". One of such clients is the "vxattachd" daemon. The daemon initiates a "vxdisk scandisks", when the number of events are more than 256. Therefore, the "vxattachd" initiates another cycle of above activity resulting in endless events. RESOLUTION: The code is modified to change the count which triggers vxattachd daemon from '256' to '1024'. Also, the DMP events are sub categorized further as per requirement of vxattachd daemon. * 3358370 (Tracking ID: 2921147) SYMPTOM: The udid_mismatch flag is absent on a clone disk when source disk is unavailable. The 'vxdisk list' command does not show the udid_mismatch flag on a disk. This happens even when the 'vxdisk -o udid list' or 'vxdisk -v list diskname | grep udid' commands show different Device Discovery Layer (DDL) generated and private region unique identifier for disks (UDIDs). DESCRIPTION: When DDL generates the UDID and private region UDID of a disk do not match, Veritas Volume Manager (VxVM) sets the udid_mismatch flag on the disk. This flag is used to detect a disk as clone, which is marked with the clone-disk flag. The vxdisk (1M) utility is used to suppress the display of the udid_mismatch flag if the source Logical Unit Number (LUN) is unavailable on the same host. RESOLUTION: The vxdisk (1M) utility is modified to display the udid_mismatch flag, if it is set on the disk. Display of this flag is no longer suppressed, even when source LUN is unavailable on same host. * 3358371 (Tracking ID: 3125711) SYMPTOM: When the secondary node is restarted and the reclaim operation is going on the primary node, the system panics with the following stack: do_page_fault() page_fault() dmp_reclaim_device() dmp_reclaim_storage() gendmpioctl() dmpioctl() vol_dmp_ktok_ioctl() voldisk_reclaim_region() vol_reclaim_disk() vol_subdisksio_start() voliod_iohandle() voliod_loop() DESCRIPTION: In the Veritas Volume Replicator (VVR)environment, there is a corner case with reclaim operation on the secondary node. The reclaim length is calculated incorrectly and thus the memory allocation failure. This resulted in the system panic. RESOLUTION: Modify the condition to calculate reclaim length correctly. * 3358372 (Tracking ID: 3156295) SYMPTOM: When Dynamic Multi-pathing (DMP) native support is enabled for Oracle Automatic Storage Management (ASM) devices, the permission and ownership of /dev/raw/raw# devices goes wrong after reboot. DESCRIPTION: When VxVM binds the Dynamic Multi-pathing (DMP) devices to raw devices during a restart, it invokes the 'raw' command to create raw devices and then tries to set the permission and ownership of the raw devices immediately after invoking the 'raw' command asynchronously. However, in some cases, VxVM doesn't create the raw devices at the time when it tries to set the permission and ownership. In that case, VxVM eventually creates the raw device without correct permission and ownership. RESOLUTION: The code is modified to set the permission and ownership of the raw devices when DMP gets the OS event which implies the raw device is created. It ensures that VxVM sets up permission and ownership of the raw devices correctly. * 3358373 (Tracking ID: 3218013) SYMPTOM: Dynamic Reconfiguration (DR) Tool does not delete the stale OS (Operating System) device handles. DESCRIPTION: DR Tool does not check for the stale OS device handles during the Logical Unit Number (LUN) removal operation. As a result, there are stale OS device handles even after LUNs are successfully removed. RESOLUTION: Code has been changed to check and delete stale OS device handles. * 3358374 (Tracking ID: 3237503) SYMPTOM: System hangs after creating space-optimized snapshot with large size cache volume. DESCRIPTION: For all the changes written to the cache volume after the snapshot volume is created, a translation map with B+tree data structure is used to accelerate search/insert/delete operations. During an attempt to insert a node to the tree, type casting of page offset to 'unsigned int' causes value truncation for the offset beyond maximum 32 bit integer. The value truncation corrupts the B+tree data structure, resulting in SIO (VxVM Staged IO) hang. RESOLUTION: The code is modified to remove all type casting to 'unsigned int' in cache volume code. * 3358377 (Tracking ID: 3199398) SYMPTOM: Output of the command "vxdmpadm pgrrereg" depends on the order of DMP (Dynamic MultiPathing) node list where the terminal output depends on the last LUN (DMP node). 1. Terminal message when PGR (Persistent Group Reservation) re-registration is succeeded on the last LUN # vxdmpadm pgrrereg VxVM vxdmpadm INFO V-5-1-0 DMP PGR re-registration done for ALL PGR enabled dmpnodes. 2. Terminal message when PGR re-registration is failed on the last LUN # vxdmpadm pgrrereg vxdmpadm: Permission denied DESCRIPTION: "vxdmpadm pgrrereg" command has been introduced to support the facility to move a guest OS on one physical node to another node. In Solaris LDOM environment, the feature is called "Live Migration". When a customer is using I/O fencing feature and a guest OS is moved to another physical node, I/O will not be succeeded in the guest OS after the physical node migration because each DMP nodes of the guest OS doesn't have a valid SCSI-3 PGR key as the physical HBA is changed. This command will help on re-registering the valid PGR keys for new physical nodes, however its command output is depending on the last LUN (DMP node). RESOLUTION: Code changes are done to log the re-registration failures in System log file. Terminal output now instructs to look into the system log when an error is seen on a LUN. * 3358379 (Tracking ID: 1783763) SYMPTOM: In a VVR environment, the vxconfigd(1M) daemon may hang during a configuration change operation. The following stack trace is observed: delay vol_rv_transaction_prepare vol_commit_iolock_objects vol_ktrans_commit volconfig_ioctl volsioctl_real volsioctl vols_ioctl ... DESCRIPTION: Incorrect serialization primitives are used. This results in the vxconfigd(1M) daemon to hang. RESOLUTION: The code is modified to use the correct serialization primitives. * 3358380 (Tracking ID: 2152830) SYMPTOM: A diskgroup (DG) import fails with a non-descriptive error message when multiple copies (clones) of the same device exist and the original devices are either offline or not available. For example: # vxdg import mydg VxVM vxdg ERROR V-5-1-10978 Disk group mydg: import failed: No valid disk found containing disk group DESCRIPTION: If the original devices are offline or unavailable, the vxdg(1M) command picks up cloned disks for import.DG import fails unless the clones are tagged and the tag is specified during the DG import. The import failure is expected, but the error message is non-descriptive and does not specify the corrective action to be taken by the user. RESOLUTION: The code is modified to give the correct error message when duplicate clones exist during import. Also, details of the duplicate clones are reported in the system log. * 3358381 (Tracking ID: 2859470) SYMPTOM: The EMC SRDF-R2 disk may go in error state when the Extensible Firmware Interface (EFI) label is created on the R1 disk. For example: R1 site # vxdisk -eo alldgs list | grep -i srdf emc0_008c auto:cdsdisk emc0_008c SRDFdg online c1t5006048C5368E580d266 srdf-r1 R2 site # vxdisk -eo alldgs list | grep -i srdf emc1_0072 auto - - error c1t5006048C536979A0d65 srdf-r2 DESCRIPTION: Since R2 disks are in write protected mode, the default open() call made for the read-write mode fails for the R2 disks, and the disk is marked as invalid. RESOLUTION: The code is modified to change Dynamic Multi-Pathing (DMP) to be able to read the EFI label even on a write-protected SRDF-R2 disk. * 3358382 (Tracking ID: 3086627) SYMPTOM: The "vxdisk -o thin, fssize list" command fails with the following error: VxVM vxdisk ERROR V-5-1-16282 Cannot retrieve stats: Bad address DESCRIPTION: This issue happens when system has more than 200 Logical Unit Numbers (LUNs). VxVM reads the file system statistical information for each LUN to generate file system size data. After reading the information for first 200 LUNs, buffer is not reset correctly. So, subsequent access to buffer address generates this error. RESOLUTION: The code has been changed to properly reset buffer address. * 3358404 (Tracking ID: 3021970) SYMPTOM: A secondary node panics due to NULL pointer dereference when the system frees an interlock. The stack trace looks like the following: page_fault volsio_ilock_free vol_rv_inactivate_wsio vol_rv_restart_wsio vol_rv_serialise_sec_logging vol_rv_serialize vol_rv_errorhandler_start voliod_iohandle voliod_loop ... DESCRIPTION: The panic occurs if there is a node crash or anode reconfiguration on the primary node. The secondary node does not correctly handle the updates for the period of crash and results in a panic. RESOLUTION: The code is modified to properly handle the freeing of an interlock for node crash or reconfigurations on the primary side. * 3358414 (Tracking ID: 3139983) SYMPTOM: Failed I/Os from SCSI are retried only on very few paths to a LUN instead of utilizing all the available paths. Sometimes this can cause multiple I/O retrials without success. This can cause DMP to send I/O failures to the application bounded by the recoveryoption tunable. The following messages are displayed in the console log: [..] Mon Apr xx 04:18:01.885: I/O analysis done as DMP_PATH_OKAY on Path belonging to Dmpnode Mon Apr xx 04:18:01.885: I/O error occurred (errno=0x0) on Dmpnode [..] DESCRIPTION: When I/O failure is returned to DMP with a retry error from SCSI, DMP retries that I/O on another path. However, it fails to choose the path that has the higher probability of successfully handling the I/O. RESOLUTION: The code is modified to implement the intelligence of choosing appropriate paths that can successfully process the I/Os during retrials. * 3358416 (Tracking ID: 3312162) SYMPTOM: Data corruption may occur on Secondary Symantec Volume Replicator (VVR) Disaster Recovery (DR) Site with the following signsi1/4 1) The vradmin verifydata command output reports data differences even though replication is up-to-date. 2) Secondary site may require a Full Fsck operations after the Migrate or Takeover Operations. 3) Error messages may be displayed. For example: msgcnt 21 mesg 017: V-2-17: vx_dirlook - /dev/vx/dsk// file system inode marked bad incore 4) Silent corruption may occur without any visible errors. DESCRIPTION: With Secondary Logging enabled, VVR writes replicated data on DR site on to its Storage Replicator Log (SRL) first, and later applied to the corresponding data volumes. When VVR flushes the write operations from SRL on to the data volumes, data corruption may occur, provided all the following conditions occur together: aC/ Multiple write operations for the same data block occur in a short time. For example, when VVR flushes the given set of SRL writes on to its data volumes. aC/ Based on relative timing, VVR grants the locks to perform the write operations on the same data block out of order. As a result, VVR applies the write operations out of order. RESOLUTION: The code is modified to protect the write order fidelity in strict order by ensuring VVR grants locks in its strict order. * 3358417 (Tracking ID: 3325122) SYMPTOM: In a Clustered Volume Replicator (CVR) environment, when you create stripe-mirror volumes with logtype=dcm, creation may fail with the following error message: VxVM vxplex ERROR V-5-1-10128 Unexpected kernel error in configuration update DESCRIPTION: In layered volumes, Data Change Map (DCM) plex is attached to the storage volumes, rather than attaching to the top level volume. CVR configuration didn't handle that case correctly. RESOLUTION: The code is modified to handle the DCM plex placement correctly in the case of layered volumes. * 3358418 (Tracking ID: 3283525) SYMPTOM: Stop and start of the data volume (with an associated DCO volume) results in a VxVM configuration daemon vxconfigd(1m) hang with the following stack trace. The data volume has undergone vxresize earlier. volsync_wait() volsiowait() volpvsiowait() voldco_get_accumulator() voldco_acm_pagein() voldco_write_pervol_maps_instant() voldco_write_pervol_maps() volfmr_copymaps_instant() vol_mv_precommit() vol_commit_iolock_objects() vol_ktrans_commit() volconfig_ioctl() volsioctl_real() vols_ioctl() vols_compat_ioctl() compat_sys_ioctl() sysenter_dispatch() DESCRIPTION: In the VxVM code, the Data Change Object (DCO) Table of Content (TOC) entry is not marked with an appropriate flag which prevents the in-core new map size to be flushed to the disk. This leads to corruption. A subsequent stop and start of the volume reads the incorrect TOC from disk detecting the corruption and results in VxVM configuration daemon vxconfigd(1M) daemon hang. RESOLUTION: The code is modified to mark the DCO TOC entry with an appropriate flag which ensures that the in-core data is flushed to disk to prevent the corruption and the subsequent VxVM configuration daemon vxconfigd(1m) hang. Also, a fix is made to ensure that the precommit fails if the paging module growth fails. * 3358420 (Tracking ID: 3236773) SYMPTOM: "vxdmpadm getattr enclosure failovermode" generates multiple "vxdmp V-5-3-0 dmp_indirect_ioctl: Ioctl Failed" error messages in the system log if the enclosure is configured as EMC ALUA. DESCRIPTION: EMC disk array with ALUA mode only supports "implicit" type of failover-mode. Moreover such disk array doesn't support the set or get failover-mode operations. Then any set or get attempts for the failover-mode attribute generates "Ioctl Failed" error messages. RESOLUTION: The code is modified to not log such error messages during setting or getting failover-mode for EMC ALUA hardware configurations. * 3358423 (Tracking ID: 3194305) SYMPTOM: In the Veritas Volume Replicator (VVR) environment, replication status goes in a paused state since the vxstart_vvr command does not start the vxnetd daemon automatically on the secondary side. vradmin -g vvrdg repstatus vvrvg Replicated Data Set: vvrvg Primary: Host name: Host IP RVG name: vvrvg DG name: vvrdg RVG state: enabled for I/O Data volumes: 1 VSets: 0 SRL name: srlvol SRL size: 5.00 G Total secondaries: 1 Secondary: Host name: Host IP RVG name: vvrvg DG name: vvrdg Data status: consistent, up-to-date Replication status: paused due to network disconnection Current mode: asynchronous Logging to: SRL Timestamp Information: behind by 0h 0m 0s DESCRIPTION: The vxnetd daemon stops on the secondary side, as a result, the replication status pauses on the primary side. The vxnetd daemon needs to start gracefully on the secondary for the replication to be in proper state. RESOLUTION: The code is modified to implement internal retry-able mechanism for starting vxnetd. * 3358429 (Tracking ID: 3300418) SYMPTOM: VxVM volume operations on shared volumes cause unnecessary read I/Os on the disks that have both configuration copy and log copy disabled on slaves. DESCRIPTION: The unnecessary disk read I/Os are generated on slaves when VxVM is refreshing the private region information into memory during VxVM transaction. In fact, there is no need to refresh the private region information if the disk already has disabled the configuration copy and log copy. RESOLUTION: The code has been changed to skip the refreshing if both configuration copy and log copy are already disabled on master and slaves. * 3358433 (Tracking ID: 3301470) SYMPTOM: In a CVR environment, a recovery on the primary side causes all the nodes to panic with the following stack: trap ktl0 search_vxvm_mem voliomem_range_iter vol_ru_alloc_buffer_start voliod_iohandle voliod_loop DESCRIPTION: Recovery tries to do a zero size readback from the Storage Replicator Log (SRL), which results in a panic. RESOLUTION: The code is modified to handle the corner case which causes a zero sized readback. * 3362234 (Tracking ID: 2994976) SYMPTOM: System panics during mirror break-off snapshot creation or plex detach operation with the following stack trace: vol_mv_pldet_callback() vol_klog_start() voliod_iohandle() voliod_loop() DESCRIPTION: VxVM performs some metadata update operations in-core as well as on Data Change Object (DCO) of volume during creation of mirror break-off snapshot or detaching plex due to I/O error. The panic occurs due to incorrect access in-core metadata fields. This issue is observed when volume has DCO configured and it is mounted with VxFS (veritas file system) file system. RESOLUTION: The code is modified so that the in-core metadata fields are accessed properly during plex detach and snapshot creation. * 3365296 (Tracking ID: 2824977) SYMPTOM: The CLI "vxdmpadm setattr enclosure failovermode" which is meant for ALUA arrays fails with an error on certain arrays without providing an appropriate reason for the failure. DESCRIPTION: The CLI "vxdmpadm setattr enclosure failovermode" which is meant for AULA arrays fails on certain arrays without an appropriate reason, because, the failover mode attribute for an ALUA array can be set only if the array provides such a facility which is indicated by response to the SCSI mode sense command. RESOLUTION: The code is modified to check for the SCSI mode sense command response to check if the array supports changing the failover mode attribute and reports an appropriate message if such a facility is not available for the array. * 3366670 (Tracking ID: 2982834) SYMPTOM: The vxdmpraw enable command fails to create a raw device for a full device when the Dynamic Multi-Mathing (DMP) support is enabled for Automatic Storage Management (ASM). Additionally, the vxdmpraw enable command does not delete the raw devices when the DMP support is disabled for ASM. DESCRIPTION: When you execute the vxdmpraw command to enable DMP support for ASM, the raw devices corresponding to DMP devices are created. These raw devices are created only for the partitions of DMP device and not for the DMP device that represents a whole or full disk. When you execute the vxdmpraw command to disable DMP support for ASM, some raw devices that do not belong to the input DMP device are deleted. This deletion occurs due to an error in the parsing of an input. RESOLUTION: The code for the /etc/vx/bin/vxdmpraw command is modified to fix this issue. * 3366688 (Tracking ID: 2957645) SYMPTOM: When the vxconfigd daemon/command is restarted, the terminal gets flooded with error messages as shown: VxVM INFO V-5-2-16543 connresp: new client ID allocation failed for cvm nodeid * with error *. DESCRIPTION: When the vxconfigd daemon is restarted, it fails to get a client ID. There is no need to print error message at the default level. But in spite of this, the terminal gets flooded with error messages. RESOLUTION: The code is modified to print error messages only at debug level. * 3366703 (Tracking ID: 3056311) SYMPTOM: Following problems can be seen on disks initialized with 5.1SP1 listener and which are being used for older releases like 4.1, 5.0, 5.0.1: 1. Creation of a volume failed on a disk indicating in-sufficient space available. 2. Data corruption seen. CDS backup label signature seen within PUBLIC region data. 3. Disks greater than 1TB in size will appear "online invalid" after on older releases. DESCRIPTION: VxVM listener can be used to initialize Boot disks and Data disks which can be used with older VxVM releases. Eg: 5.1SP1 Listener can be used to initialize disks which can be used with all previous VxVM releases like 5.0.1, 5.0, 4.1 etc. With 5.1SP1 onwards VxVM always uses Fabricated geometry while initializing disk with CDS format. Older releases like 4.1, 5.0, 5.0.1 use Raw geometry. These releases do not honor LABEL geometry. Hence, if a disk was initialized through 5.1SP1 listener, disk would be stamped with Fabricated geometry. When such a disk was used with older VxVM releases like 5.0.1, 5.0, 4.1, there can be a mismatch between the stamped geometry (Fabricated) and in-memory geometry (Raw). If on-disk cylinder size < in-memory cylinder size, we might encounter data corruption issues. To prevent any data corruption issues, we need to initialize disks through listener with older CDS format by using raw geometry. Also, if disk size is >= 1TB, 5.1SP1 VxVM will initialize the disk with CDS EFI format. Older releases like 4.1, 5.0, 5.0.1 etc. do not understand EFI format. RESOLUTION: From releases 5.1SP1 onwards, through HP-UX listener, disk to be used for older releases like 4.1, 5.0, 5.0.1 will be initialized with raw geometry. Also, initialization of disk through HPUX listener whose size is greater than 1TB will fail. * 3368236 (Tracking ID: 3327842) SYMPTOM: In the Cluster Volume Replication (CVR) environment, with IO load on Primary and replication going on, if the user runs the vradmin resizevol(1M) command on Primary, often these operations terminate with error message "vradmin ERROR Lost connection to host". DESCRIPTION: There is a race condition on the Secondary between the transaction and messages delivered from Primary to Secondary. This results in repeated timeouts of transactions on the Secondary. The repeated transaction timeout resulted in session timeouts between Primary and Secondary vradmind. RESOLUTION: The code is modified to resolve the race condition. * 3371753 (Tracking ID: 3081410) SYMPTOM: When you remove the LUNs, DR tool reports "ERROR: No luns available for removal". DESCRIPTION: The DR tool does not check correct condition for the devices under the control of the third-party driver (TPD). Therefore no device gets listed for removal under certain cases. RESOLUTION: The code is modified to correctly identify the TPD-controlled devices in the DR tool. * 3373213 (Tracking ID: 3373208) SYMPTOM: Veritas Dynamic Multipathing (DMP) wrongly sends the SCSI PR OUT command with Activate Persist Through Power Loss (APTPL) bit with value as A0A to array that supports the APTPL capabilities. DESCRIPTION: DMP correctly recognizes the APTPL bit settings and stores them in the database. DMP verifies this information before sending the SCSI PR OUT command so that the APTPL bit can be set appropriately in the command. But, due to issue in the code, DMP was not handling the node's device number properly. Due to which, the APTPL bit was getting incorrectly set in the SCSI PR OUT command. RESOLUTION: The code is modified to handle the node's device number properly in the DMP SCSI command code path. * 3374166 (Tracking ID: 3325371) SYMPTOM: Panic occurs in the vol_multistepsio_read_source() function when VxVM's FastResync feature is used. The stack trace observed is as following: vol_multistepsio_read_source() vol_multistepsio_start() volkcontext_process() vol_rv_write2_start() voliod_iohandle() voliod_loop() kernel_thread() DESCRIPTION: When a volume is resized, Data Change Object (DCO) also needs to be resized. However, the old accumulator contents are not copied into the new accumulator. Thereby, the respective regions are marked as invalid. Subsequent I/O on these regions triggers the panic. RESOLUTION: The code is modified to appropriately copy the accumulator contents during the resize operation. * 3374735 (Tracking ID: 3423316) SYMPTOM: The vxconfigd(1M) daemon observes a core dump while executing the vxdisk(1M) scandisks command along with the following stack: ncopy_tree_build() dg_balance_copies_helper() dg_balance_copies() dg_update() commit() dg_trans_commit() devintf_dm_reassoc_da() devintf_add_autoconfig_main() devintf_add_autoconfig() req_set_naming_scheme() request_loop() main() DESCRIPTION: As part of the vxdisk scandisks operation, device discovery process happens. During this process, unique device entries are populated in the device list. The core dump occurs due to improper freeing of device entry in the device list. RESOLUTION: The code is modified to appropriately handle the device list. * 3375424 (Tracking ID: 3250450) SYMPTOM: While running the vxdisk(1M) command with the-o thin, fssize list option in the presence of a linked volume causes a system panic with the following stack vol_mv_lvsio_ilock() vol_mv_linkedvol_sio_start() volkcontext_process() volsiowait() vol_objioctl() vol_object_ioctl() voliod_ioctl() volsioctl_real() volsioctl() DESCRIPTION: The vxdisk(1M) command with the -o thin, fssize list option creates reclaim I/Os. All the I/Os performed on the linked volumes are stabilized. However, the reclaim I/Os should not be stabilized since it leads to a null pointer dereference. RESOLUTION: The code is modified to prevent stabilization of reclaim I/Os, whichprevents the null pointer dereference from occurring. * 3375575 (Tracking ID: 3403172) SYMPTOM: In the Insyslog, the following messages are observed for paths belonging to EMC Symmterix Not-Ready devices. Nov 26 06:15:58 kernel: VxVM vxdmp V-5-0-0 [Warn] SCSI error opcode=0x28 returned rq_status=0xc cdb_status=0x1 key=0x2 asc=0x4 ascq=0x3 on path 65/0xc0 Nov 26 06:15:58 kernel: VxVM vxdmp V-5-0-0 [Info] i/o error analysis done (status = 1) on path 65/0xc0 belonging to dmpnode 201/0x190 Nov 26 06:15:58 kernel: VxVM vxdmp V-5-0-112 [Warn] disabled path 65/0xc0 belonging to the dmpnode 201/0x190 due to path failure Nov 26 06:15:58 kernel: VxVM vxdmp V-5-3-1062 dmp_restore_node: Unstable path 65/0xc0 will not be available for I/O until 300 seconds Nov 26 06:15:58 kernel: VxVM vxdmp V-5-0-148 [Warn] repeated failures detected on path 65/0xc0 belonging to dmpnode 201/0x190 DESCRIPTION: The SCSI command fails to identify the EMC Symmetrix Not-Ready device state due to truncated inquiry response. This issue is observed because the inquiry buffer length is not correctly getting copied in the SCSI command. RESOLUTION: The code is modified to fix SCSI command issue. * 3377209 (Tracking ID: 3377383) SYMPTOM: The vxconfigd crashes when a disk under DMP reports device failure. After this, the following error will be seen when a VxVM command is excuted:- "VxVM vxdisk ERROR V-5-1-684 IPC failure: Configuration daemon is not accessible" DESCRIPTION: If a disk fails and reports certain failure to DMP, then vxconfigd crashes because that error is not handled properly. RESOLUTION: The code is modified to properly handle the device failures reported by a failed disk under DMP. * 3381922 (Tracking ID: 3235350) SYMPTOM: If an operational volume has version 20 data change object (DCO) attached, operations that lead to growing of volume can lead to system panic, such as 'vxresize' and 'vxassist growby/growto'. The panic stack looks like following: volpage_gethash+000074() volpage_getlist_internal+00075C() volpage_getlist+000060() voldco_get_regionstate+0001C8() volfmr_get_regionstate+00003C() voldco_checktargetsio_start+0000A8() voliod_iohandle+000050() DESCRIPTION: When any update is done on the grown region of a volume, it verifies the state of the same region on the snapshot to avoid inconsistency. If the snapshot volume is not grown, it tries to verify a non-existent region on the snapshot volume. The accessing memory area goes beyond allocation, which leads to system panic. RESOLUTION: The code is modified to identify such conditions where the verification is done on a non-existent region and handle it correctly. * 3387376 (Tracking ID: 2945658) SYMPTOM: If you modify the disk label for an Active/Passive LUN on Linux platforms, the current passive paths don't reflect this modification after failover. DESCRIPTION: Whenever failover happens for an Active/Passive LUN, the disk label information is updated. RESOLUTION: The BLKRRPART ioctl is issued on the passive paths during the failover in order to update the disk label or partition information. * 3387405 (Tracking ID: 3019684) SYMPTOM: I/O hang is observed when SRL is about to overflow after the logowner switches from slave to master. Stack trace looks like the following: biowait default_physio volrdwr fop_write write syscall_trap32 DESCRIPTION: I/O hang is observed when SRL is about to overflow after the logowner switches from slave to master, because the master has a stale flag set with incorrect value related to the last SRL overflow. RESOLUTION: Reset the stale flag as to whether the logowner is master or slave. * 3387417 (Tracking ID: 3107741) SYMPTOM: The vxrvg snapdestroy command fails with the "Transaction aborted waiting for io drain" error message, and vxconfigd(1M) hangs with the following stack trace: vol_commit_iowait_objects vol_commit_iolock_objects vol_ktrans_commit volconfig_ioctl volsioctl_real vols_ioctl vols_compat_ioctl compat_sys_ioctl ... DESCRIPTION: The SmartMove query of Veritas File System (VxFS) depends on some reads and writes. If some transaction in Veritas Volume Manager (VxVM) blocks the new reads and writes, then Application Programming Interface (API) hangs, waiting for the response. This results in a deadlock-like situation where SmartMove API is waiting for a transaction to complete yet the transaction is waiting for the SmartMove API, hence the hang. RESOLUTION: Do not allow transactions when the SmartMove API is used. * 2892702 (Tracking ID: 2567618) SYMPTOM: The VRTSexplorer dumps core with the segmentation fault in checkhbaapi/print_target_map_entry. The stack trace is observed as follows: print_target_map_entry() check_hbaapi() main() _start() DESCRIPTION: The checkhbaapi utility uses the HBA_GetFcpTargetMapping() API which returns the current set of mappings between the OS and the Fiber Channel Protocol (FCP) devices for a given Host Bus Adapter (HBA) port. The maximum limit for mappings is set to 512 and only that much memory is allocated. When the number of mappings returned is greater than 512, the function that prints this information tries to access the entries beyond that limit, which results in core dumps. RESOLUTION: The code is modified to allocate enough memory for all the mappings returned by the HBA_GetFcpTargetMapping() API. * 3090670 (Tracking ID: 3090667) SYMPTOM: The "vxdisk -o thin, fssize list" command can cause system to hang or panic due to a kernel memory corruption. This command is also issued by Veritas Operations Manager (VOM) internally during Storage Foundation (SF) discovery. The following stack trace is observed: panic string: kernel heap corruption detected vol_objioctl vol_object_ioctl voliod_ioctl - frame recycled volsioctl_real DESCRIPTION: Veritas Volume Manager (VxVM) allocates data structures and invokes thin Logical Unit Numbers (LUNs) specific function handlers, to determine the disk space that is actively used by the file system. One of the function handlers wrongly accesses the system memory beyond the allocated data structure, which results in the kernel memory corruption. RESOLUTION: The code is modified so that the problematic function handler accesses only the allocated memory. * 3140411 (Tracking ID: 2959325) SYMPTOM: The vxconfigd(1M) daemon dumps core while performing the disk group move operation with the following stack trace: dg_trans_start () dg_configure_size () config_enable_copy () da_enable_copy () ncopy_set_disk () ncopy_set_group () ncopy_policy_some () ncopy_set_copies () dg_balance_copies_helper () dg_transfer_copies () in vold_dm_dis_da () in dg_move_complete () in req_dg_move () in request_loop () in main () DESCRIPTION: The core dump occurs when the disk group move operation tries to reduce the size of the configuration records in the disk group, when the size is large and the disk group move operation needs more space for the new- configrecord entries. Since, both the reduction of the size of configuration records (compaction) and the configuration change by disk group move operation cannot co-exist, this result in the core dump. RESOLUTION: The code is modified to make the compaction first before the configuration change by the disk group move operation. * 3150893 (Tracking ID: 3119102) SYMPTOM: Live migration of virtual machine having Storage Foundation stack with data disks fencing enabled, causes service groups configured on virtual machine to fault. DESCRIPTION: After live migration of virtual machine having Storage Foundation stack with data disks fencing enabled, the I/O operations fail on shared SAN devices with reservation conflict and causes service groups to fault. Live migration causes the SCSI initiator change. Hence, the I/O operations coming from migrated servers to shared SAN storage fail with reservation conflict. RESOLUTION: The code is modified to check whether the host is fenced off from cluster. If the host is not fenced off, registration key is re-registered for dmpnode through the migrated server and restart I/O. The administrator needs to manually invoke the vxdmpadm pgrrereg command from guest which was live migrated after live migration. * 3156719 (Tracking ID: 2857044) SYMPTOM: System crashes with following stack when resizing volume with DCO version 30. PID: 43437 TASK: ffff88402a70aae0 CPU: 17 COMMAND: "vxconfigd" #0 [ffff884055a47600] machine_kexec at ffffffff8103284b #1 [ffff884055a47660] crash_kexec at ffffffff810ba972 #2 [ffff884055a47730] oops_end at ffffffff81501860 #3 [ffff884055a47760] no_context at ffffffff81043bfb #4 [ffff884055a477b0] __bad_area_nosemaphore at ffffffff81043e85 #5 [ffff884055a47800] bad_area at ffffffff81043fae #6 [ffff884055a47830] __do_page_fault at ffffffff81044760 #7 [ffff884055a47950] do_page_fault at ffffffff8150383e #8 [ffff884055a47980] page_fault at ffffffff81500bf5 [exception RIP: voldco_getalloffset+38] RIP: ffffffffa0bcc436 RSP: ffff884055a47a38 RFLAGS: 00010046 RAX: 0000000000000001 RBX: ffff883032f9eac0 RCX: 000000000000000f RDX: ffff88205613d940 RSI: ffff8830392230c0 RDI: ffff883fd1f55800 RBP: ffff884055a47a38 R8: 0000000000000000 R9: 0000000000000000 R10: 000000000000000e R11: 000000000000000d R12: ffff882020e80cc0 R13: 0000000000000001 R14: ffff883fd1f55800 R15: ffff883fd1f559e8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffff884055a47a40] voldco_get_map_extents at ffffffffa0bd09ab [vxio] #10 [ffff884055a47a90] voldco_update_extents_info at ffffffffa0bd8494 [vxio] #11 [ffff884055a47ab0] voldco_instant_resize_30 at ffffffffa0bd8758 [vxio] #12 [ffff884055a47ba0] volfmr_instant_resize at ffffffffa0c03855 [vxio] #13 [ffff884055a47bb0] voldco_process_instant_op at ffffffffa0bcae2f [vxio] #14 [ffff884055a47c30] volfmr_process_instant_op at ffffffffa0c03a74 [vxio] #15 [ffff884055a47c40] vol_mv_precommit at ffffffffa0c1ad02 [vxio] #16 [ffff884055a47c90] vol_commit_iolock_objects at ffffffffa0c1244f [vxio] #17 [ffff884055a47cf0] vol_ktrans_commit at ffffffffa0c131ce [vxio] #18 [ffff884055a47d70] volconfig_ioctl at ffffffffa0c8451f [vxio] #19 [ffff884055a47db0] volsioctl_real at ffffffffa0c8c9b8 [vxio] #20 [ffff884055a47e90] vols_ioctl at ffffffffa0040126 [vxspec] #21 [ffff884055a47eb0] vols_compat_ioctl at ffffffffa004034d [vxspec] #22 [ffff884055a47ee0] compat_sys_ioctl at ffffffff811ce0ed #23 [ffff884055a47f80] sysenter_dispatch at ffffffff8104a880 DESCRIPTION: While updating DCO TOC(Table Of Content) entries into in-core TOC, a TOC entry is wrongly freed and zeroed out. As a result, traversing TOC entries leads to NULL pointer dereference and thus, causing the panic. RESOLUTION: Code changes have been made to appropriately update the TOC entries. * 3159096 (Tracking ID: 3146715) SYMPTOM: The 'rinks' do not connect with the Network Address Translation (NAT) configurations on Little Endian Architecture (LEA). DESCRIPTION: On LEAs, the Internet Protocol (IP) address configured with the NAT mechanism is not converted from the host-byte order to the network-byte order. As a result, the address used for the rlink connection mechanism gets distorted and the 'rlinks' fail to connect. RESOLUTION: The code is modified to convert the IP address to the network-byte order before it is used. * 3209160 (Tracking ID: 2750782) SYMPTOM: The Veritas Volume Manager (VxVM) upgrade process fails because it incorrectly assumes that the root disk is encapsulated. The VxVM upgrade package fails with the following error: "Please un-encapsulate the root disk and then upgrade. error: %pre(VRTSvxvm) scriptlet failed, exit status 1 error: install: %pre scriptlet failed (2), skipping VRTSvxvm" DESCRIPTION: The VxVM upgrade process fails when the root disk under the Logical Volume Manager (LVM) has a customized name as rootvg-rootvol for the root volume. The customized name causes the VxVM's preinstall script to incorrectly identify the root disk as encapsulated. The script also assumes that VxVM controls the encapsulated disk. RESOLUTION: The code is modified such that VxVM now handles all the customized LVM names. * 3210759 (Tracking ID: 3177758) SYMPTOM: Performance degradation occurs after upgrade from SF 5.1SP1RP3 to SF 6.0.1 on Linux DESCRIPTION: The degradation in performance occurs because the operating system (OS) unplugs the I/O before OS delivers it to lower layers in the I/O path. OS unplugs the I/Os at a default time of 3 milliseconds, which results in an additional overhead in completion of I/Os. RESOLUTION: The code is modified to explicitly unplug the I/Os before VxVM sends them to the lower layer. * 3254132 (Tracking ID: 3186971) SYMPTOM: The system becomes unbootable after turning on DMP native support if root file system is under LVM except /boot. DESCRIPTION: As DMP native support function set the filter in LVM configuration file incorrectly, LVM Root Volume Group (VG) cannot find underlying Physical Volumes (PVs) during boot up which makes system unbootable. RESOLUTION: The code is modified to set filter in LVM configuration file correctly. * 3254227 (Tracking ID: 3182350) SYMPTOM: If there are more than 8192 paths in the system, the vxassist(1M) command hangs while creating a new VxVM volume or increasing the existing volume's size. DESCRIPTION: The vxassist(1M) command creates a hash table with max 8192 entries. Hence, other paths greater than 8192 will get hashed to an overlapping bucket in this hash table. In such case, multiple paths which hash to the same bucket are linked in a chain. In order to find a particular path in a specified bucket, vxassist (1M) command needs to search through the entire linked chain. However vxassist (1M) only searches the first element and hangs. RESOLUTION: The code has been modified to search through the entire linked chain. * 3254229 (Tracking ID: 3063378) SYMPTOM: Some VxVM (Volume Manager) commands run slowly when "read only" devices (e.g. EMC SRDF-WD, BCV-NR) are presented and managed by EMC PowerPath. DESCRIPTION: When performing IO write on a "read only" device, IO fails and retry will be done if IO is on TPD(Third Party Driver) device and path status is okay. Owing to the retry, IO will not return until timeout reaches which gives the perception that VxVM commands run slowly. RESOLUTION: Code changes have been done to return IO immediately with disk media failure if IO fails on any TPD device and path status is okay. * 3254427 (Tracking ID: 3182175) SYMPTOM: "vxdisk -o thin, fssize list" command can report incorrect File System usage data. DESCRIPTION: An integer overflow in the internal calculation can cause this command to report incorrect per disk FS usage. RESOLUTION: Code changes are made so that the command would report the correct File System usage data. * 3280555 (Tracking ID: 2959733) SYMPTOM: When the device paths are moved across LUNs or enclosures, the vxconfigd(1M) daemon can dump core, or the data corruption can occur due to the internal data structure inconsistencies. The following stack trace is observed: ddl_reconfig_partial () ddl_reconfigure_all () ddl_find_devices_in_system () find_devices_in_system () req_discover_disks () request_loop () main () DESCRIPTION: When the device path configuration is changed after a planned or unplanned disconnection by moving only a subset of the device paths across LUNs or the other storage arrays (enclosures), DMP's internal data structure is inconsistent. This causes the vxconfigd(1M) daemon to dump core. Also, for some instances the data corruption occurs due to the incorrect LUN to the path mappings. RESOLUTION: The vxconfigd(1M) code is modified to detect such situations gracefully and modify the internal data structures accordingly, to avoid a vxconfigd(1M) daemon core dump and the data corruption. * 3283644 (Tracking ID: 2945658) SYMPTOM: If you modify the disk label for an Active/Passive LUN on Linux platforms, the current passive paths don't reflect this modification after failover. DESCRIPTION: Whenever failover happens for an Active/Passive LUN, the disk label information is updated. RESOLUTION: The BLKRRPART ioctl is issued on the passive paths during the failover in order to update the disk label or partition information. * 3283668 (Tracking ID: 3250096) SYMPTOM: When using DMP-ASM (dmp_native_support enabled) with SELinux enabled, '/dev/raw/raw# ' devices vanish on reboot. DESCRIPTION: At the time of system boot with DMP-ASM (dmp_native_support enabled), DMP (Dynamic Multi-pathing) creates dmp devices under '/dev/vx/dmp' before vxvm (vxconfigd) startup. At this time, if SELinux is disabled, then it re- mounts '/dev/vx' to 'tmpfs' and cleans up the info under '/dev/vx'. If SELinux is enabled, then it does not mount and as a result, the info under '/dev/vx/' continues to exist. DMP does not bind these dmp devices to raw devices if the info already exists under '/dev/vx/' at the time of vxconfigd startup. As a result, no devices appear under '/dev/raw/' when SELinux is enabled. RESOLUTION: Code changes are done to bind the dmp devices to raw devices as per the records in '/etc/vx/.vxdmprawdev', during vxconfigd startup, irrespective of the dmp devices present under '/dev/vx/dmp' or not. * 3294641 (Tracking ID: 3107741) SYMPTOM: The vxrvg snapdestroy command fails with the "Transaction aborted waiting for io drain" error message, and vxconfigd(1M) hangs with the following stack trace: vol_commit_iowait_objects vol_commit_iolock_objects vol_ktrans_commit volconfig_ioctl volsioctl_real vols_ioctl vols_compat_ioctl compat_sys_ioctl ... DESCRIPTION: The SmartMove query of Veritas File System (VxFS) depends on some reads and writes. If some transaction in Veritas Volume Manager (VxVM) blocks the new reads and writes, then Application Programming Interface (API) hangs, waiting for the response. This results in a deadlock-like situation where SmartMove API is waiting for a transaction to complete yet the transaction is waiting for the SmartMove API, hence the hang. RESOLUTION: Do not allow transactions when the SmartMove API is used. * 3294642 (Tracking ID: 3019684) SYMPTOM: I/O hang is observed when SRL is about to overflow after the logowner switches from slave to master. Stack trace looks like the following: biowait default_physio volrdwr fop_write write syscall_trap32 DESCRIPTION: I/O hang is observed when SRL is about to overflow after the logowner switches from slave to master, because the master has a stale flag set with incorrect value related to the last SRL overflow. RESOLUTION: Reset the stale flag as to whether the logowner is master or slave. * 2853712 (Tracking ID: 2815517) SYMPTOM: vxdg adddisk succeeds to add a clone disk to non-clone and non-clone disk to clone diskgroup, resulting in mixed diskgroup. DESCRIPTION: vxdg import fails for diskgroup which has mix of clone and non-clone disks. So vxdg adddisk should not allow creation of mixed diskgroup. RESOLUTION: vxdisk adddisk code is modified to return an error for an attempt to add clone disk to non-clone or non-clone disks to clone diskgroup, Thus it prevents addition of disk in diskgroup which leads to mixed diskgroup. * 2863672 (Tracking ID: 2834046) SYMPTOM: Beginning from the 5.1 release, VxVM automatically re-minors the disk group and its objects in certain cases, and displays the following error message: vxvm:vxconfigd: V-5-1-14563 Disk group mydg: base minor was in private pool, will be change to shared pool. NFS clients that attempt to reconnect to a file system on the disk group's volume fail because the file handle becomes stale. The NFS client needs to re- mount the file system and probably a reboot to clear this. The issue is observed under the following situations: aC/ When a private disk group is imported as shared, or a shared disk group is imported as private. aC/ After upgrading from a version of VxVM prior to 5.1 DESCRIPTION: Since the HxRT 5.1 SP1 release, the minor-number space is divided into two pools, one for the private disk groups and the other for the shared disk groups. During the disk group import operation, the disk group base- minor numbers are adjusted automatically, if not in the correct pool. In a similar manner, the volumes in the disk groups are also adjusted. This behavior reduces many minor conflicting cases during the disk group import operation. However, in the NFS environment, it makes all the file handles on the client side stale. Subsequently, customers had to unmount files systems and restart the applications. RESOLUTION: The code is modified to add a new tunable "autoreminor". The default value of the autoreminor tunable is "on". Use "vxdefault set autoreminor off" to turn it off for NFS server environments. If the NFS server is in a CVM (Clustered Volume Manager) cluster, make the same change on all the nodes. * 2892590 (Tracking ID: 2779580) SYMPTOM: In a Veritas Volume Replicator (VVR) environment, the secondary node gives the configuration error 'no Primary RVG' when the primary master node (default logowner ) is rebooted and the slave node becomes the new master node. DESCRIPTION: After the primary master node is rebooted, the new master node sends a 'handshake' request for the vradmind communication to the secondary node. As a part of the 'handshake' request, the secondary node deletes the old configuration including the 'Primary RVG'. During this phase, the secondary node receives the configuration update message from the primary node for the old configuration. The secondary node does not find the old 'Primary RVG' configuration for processing this message. Hence, it cannot proceed with the pending 'handshake' request. This results in a 'no Primary RVG' configuration error. RESOLUTION: The code is modified such that during the 'handshake' request phase, the configuration messages of the old 'Primary RVG' gets discarded. * 2892682 (Tracking ID: 2837717) SYMPTOM: The "vxdisk resize" command fails if 'da name' is specified. An example is as following: # vxdisk list eva4k6k1_46 | grep ^pub pubpaths: block=/dev/vx/dmp/eva4k6k1_46 char=/dev/vx/rdmp/eva4k6k1_46 public: slice=0 offset=32896 len=680736 disk_offset=0 # vxdisk resize eva4k6k1_46 length=813632 # vxdisk list eva4k6k1_46 | grep ^pub pubpaths: block=/dev/vx/dmp/eva4k6k1_46 char=/dev/vx/rdmp/eva4k6k1_46 public: slice=0 offset=32896 len=680736 disk_offset=0 After resize operation len=680736 is not changed. DESCRIPTION: The scenario for 'da name' is not handled in the resize code path. As a result, the "vxdisk(1M) resize" command fails if the 'da name' is specified. RESOLUTION: The code is modified such that if 'dm name' is not specified to resize, then only the 'da name' specific operation is performed. * 2892684 (Tracking ID: 1859018) SYMPTOM: "Link link detached from volume " warnings are displayed when a linked-breakoff snapshot is created. DESCRIPTION: The purpose of these message is to let user and administrators know about the detach of link due to I/O errors. These messages get displayed unnecessarily whenever linked-breakoff snapshot is created. RESOLUTION: Code changes are made to display messages only when link is detached due to I/O errors on volumes involved in link-relationship. * 2892698 (Tracking ID: 2851085) SYMPTOM: DMP doesn't detect implicit LUN ownership changes DESCRIPTION: DMP does ownership monitoring for ALUA arrays to detect implicit LUN ownership changes. This helps DMP to always use Active/Optimized path for sending down I/O. This feature is controlled using dmp_monitor_ownership tune and is enabled by default. In case of partial discovery triggered through event source daemon (vxesd), ALUA information kept in kernel data structure for ownership monitoring was getting wiped. This causes ownership monitoring to not work for these dmpnodes. RESOLUTION: Source has been updated to handle such case. * 2892716 (Tracking ID: 2753954) SYMPTOM: When cable is disconnected from one port of a dual-port FC HBA, only paths going through the port should be marked as SUSPECT. But paths going through other port are also getting marked as SUSPECT. DESCRIPTION: Disconnection of a cable from a HBA port generates a FC event. When the event is generated, paths of all ports of the corresponding HBA are marked as SUSPECT. RESOLUTION: The code changes are done to mark the paths only going through the port on which FC event is generated. * 2940447 (Tracking ID: 2940446) SYMPTOM: The I/O may hang on a volume with space optimized snapshot if the underlying cache object is of a very large size ( 30 TB ). It can also lead to data corruption in the cache-object. The following stack is observed: pvthread() et_wait() uphyswait() uphysio() vxvm_physio() volrdwr() volwrite() vxio_write() rdevwrite() cdev_rdwr() spec_erdwr() spec_rdwr() vnop_rdwr() vno_rw() rwuio() rdwr() kpwrite() ovlya_addr_sc_flih_main() DESCRIPTION: The cache volume maintains a B+ tree for mapping the offset and its actual location in the cache object. Copy-on-write I/O generated on snapshot volumes needs to determine the offset of the particular I/O in the cache object. Due to the incorrect type-casting, the value calculated for the large offset truncates to the smaller value due to the overflow, leading to data corruption. RESOLUTION: The code is modified to avoid overflow during the offset calculation in the cache object. It is advised to create multiple cache objects of around 10 TB, rather than creating a single cache object of a very large size. * 2941193 (Tracking ID: 1982965) SYMPTOM: The file system mount operation fails when the volume is resized and the volume has a link to the volume. The following error messages are displayed: # mount -V vxfs /dev/vx/dsk// UX:vxfs fsck: ERROR: V-3-26248: could not read from block offset devid/blknum . Device containing meta data may be missing in vset or device too big to be read on a 32 bit system. UX:vxfs fsck: ERROR: V-3-20694: cannot initialize aggregate file system check failure, aborting ... UX:vxfs mount: ERROR: V-3-26883: fsck log replay exits with 1 UX:vxfs mount: ERROR: V-3-26881: Cannot be mounted until it has been cleaned by fsck. Please run "fsck -V vxfs -y /dev/vx/dsk//"before mounting DESCRIPTION: The vxconfigd(1M) daemon stores disk access records based on the Dynamic Multi Pathing (DMP) names. If the vxdg(1M) command passes a name other than DMP name for the device, vxconfigd(1M) daemon cannot map it to a disk access record. As the vxconfigd(1M) daemon cannot locate a disk access record corresponding to passed input name from the vxdg(1M) command, it fails the import operation. RESOLUTION: The code is modified so that the vxdg(1M) command converts the input name to DMP name before passing it to the vxconfigd(1M) daemon for further processing. * 2941226 (Tracking ID: 2915063) SYMPTOM: During the detachment of a plex of a volume in the Cluster Volume Manager (CVM) environment, the master node panics with the following stack trace: vol_klog_findent() vol_klog_detach() vol_mvcvm_cdetsio_callback() vol_klog_start() voliod_iohandle() voliod_loop() DESCRIPTION: During the plex-detach operation, VxVM searches the plex object to be detached in the kernel. If any transaction is in progress on any disk group in the system, an incorrect plex object may be selected. This results in dereferencing of the invalid addresses and causes the system to panic. RESOLUTION: The code is modified to make sure that the correct plex object is selected. * 2941234 (Tracking ID: 2899173) SYMPTOM: In a Clustered Volume Replicator (CVR) environment, an Storage Replicator Log (SRL) failure may cause the vxconfigd(1M) daemon to hang. This eventually causes the 'vradmin stoprep' command to hang. DESCRIPTION: The 'vradmin stoprep' command hangs because of the vxconfigd(1M) daemon that waits indefinitely during a transaction. The transaction waits for I/O completion on SRL. An error handler is generated to handle the I/O failure on the SRL. But if there is an ongoing transaction, the error is not handled properly. This causes the transaction to hang. RESOLUTION: The code is modified so that when an SRL failure is encountered, the transaction itself handles the I/O error on SRL. * 2941237 (Tracking ID: 2919318) SYMPTOM: In a CVM environment with fencing enabled, wrong fencing keys are registered for opaque disks during node join or dg import operations. DESCRIPTION: During cvm node join and shared dg import code path, when opaque disk registration happens, fencing keys in internal dg records are not in sync with actual keys generated. This was causing wrong fencing keys registered for opaque disks. For rest disks fencing key registration happens correctly. RESOLUTION: Fix is to copy correctly generated key to internal dg record for current dg import/node join scenario and use it for disk registration. * 2941252 (Tracking ID: 1973983) SYMPTOM: Relocation fails with the following error when the Data Change Object (DCO) plex is in a disabled state: VxVM vxrelocd ERROR V-5-2-600 Failure recovering in disk group DESCRIPTION: When a mirror-plex is added to a volume using the "vxassist snapstart" command, the attached DCO plex can be in DISABLED or DCOSNP state. If the enclosure is disabled while recovering such DCO plexes, the plex can get in to the DETACHED/DCOSNP state. This can result in relocation failures. RESOLUTION: The code is modified to handle the DCO plexes in disabled state during relocation. * 2942259 (Tracking ID: 2839059) SYMPTOM: When system connected to HP smart array "CCISS" disks boots, it logs following warning messages on console: VxVM vxconfigd WARNING V-5-1-16737 cannot open /dev/vx/rdmp/cciss/c0d to check for ASM disk format VxVM vxconfigd WARNING V-5-1-16737 cannot open /dev/vx/rdmp/cciss/c0d to check for ASM disk format VxVM vxconfigd WARNING V-5-1-16737 cannot open /dev/vx/rdmp/cciss/c0d to check for ASM disk format VxVM vxconfigd WARNING V-5-1-16737 cannot open /dev/vx/rdmp/cciss/c0d to check for ASM disk format DESCRIPTION: HP smart array CCISS disk names end with a digit which is not parsed correctly. The last digit of the disk names are truncated leading to invalid disk names. This leads to warnings as the file can't be opened for checking ASM format due to invalid device path. RESOLUTION: Code changes are made to handle the parsing correctly and get the valid device names. * 2942336 (Tracking ID: 1765916) SYMPTOM: VxVM socket files don't have write protection from other users. The following files are writeable by all users: srwxrwxrwx root root /etc/vx/vold_diag/socket srwxrwxrwx root root /etc/vx/vold_inquiry/socket srwxrwxrwx root root /etc/vx/vold_request/socket DESCRIPTION: These sockets are used by the admin/support commands to communicate with the vxconfigd. These sockets are created by vxconfigd during it's start up process. RESOLUTION: Proper write permissions are given to VxVM socket files. Only vold_inquiry/socket file is still writeable by all, because it is used by many VxVM commands like vxprint, which are used by all users. * 2944708 (Tracking ID: 1725593) SYMPTOM: The 'vxdmpadm listctlr' command does not show the count of device paths seen through it DESCRIPTION: The 'vxdmpadm listctlr' currently does not show the number of device paths seen through it. The CLI option has been enhanced to provide this information as an additional column at the end of each line in the CLI's output RESOLUTION: The number of paths under each controller is counted and the value is displayed as the last column in the 'vxdmpadm listctlr' CLI output * 2944710 (Tracking ID: 2744004) SYMPTOM: When VVR (Veritas Volume Replicator) is configured, vxconfigd on secondary gets hung. Any VxVM commands issued during this time does not complete. DESCRIPTION: Vxconfigd is waiting for I/Os to drain before allowing a configuration change command to proceed. The I/Os never drain completely resulting into the hang. This is because there is a deadlock where pending I/Os are unable to start and vxconfigd keeps waiting for their completion. RESOLUTION: Changed the code so that this deadlock does not arise. The I/Os can be started properly and complete allowing vxconfigd to function properly. * 2944714 (Tracking ID: 2833498) SYMPTOM: vxconfigd daemon hangs in vol_ktrans_commit() while reclaim operation is in progress on volumes having instant snapshots. Stack trace is given below: vol_ktrans_commit volconfig_ioctl DESCRIPTION: Storage reclaim leads to the generation of special IOs (termed as Reclaim IOs), which can be very large in size(>4G) and unlike application IOs, these are not broken into smaller sized IOs. Reclaim IOs need to be tracked in snapshot maps if the volume has full snapshots configured. The mechanism to track reclaim IO is not capable of handling such large IOs causing hang. RESOLUTION: Code changes are made to use the alternative mechanism in Volume manager to track the reclaim IOs. * 2944717 (Tracking ID: 2851403) SYMPTOM: System panics while unloading 'vxio' module when VxVM SmartMove feature is used and the "vxportal" module gets reloaded (for e.g. during VxFS package upgrade). Stack trace looks like: vxportalclose() vxfs_close_portal() vol_sr_unload() vol_unload() DESCRIPTION: During a smart-move operation like plex attach, VxVM opens the 'vxportal' module to read in-use file system maps information. This file descriptor gets closed only when 'vxio' module is unloaded. If the 'vxportal' module is unloaded and reloaded before 'vxio', the file descriptor with 'vxio' becomes invalid and results in a panic. RESOLUTION: Code changes are made to close the file descriptor for 'vxportal' after reading free/invalid file system map information. This ensures that stale file descriptors don't get used for 'vxportal'. * 2944722 (Tracking ID: 2869594) SYMPTOM: Master node would panic with following stack after a space optimized snapshot is refreshed or deleted and master node is selected using 'vxclustadm setmaster' volilock_rm_from_ils vol_cvol_unilock vol_cvol_bplus_walk vol_cvol_rw_start voliod_iohandle voliod_loop thread_start In addition to this, all space optimized snapshots on the corresponding cache object may be corrupted. DESCRIPTION: In CVM, the master node owns the responsibility of maintaining the cache object indexing structure for providing space optimized functionality. When a space optimized snapshot is refreshed or deleted, the indexing structure would get rebuilt in background after the operation is returned. When the master node is switched using 'vxclustadm setmaster' before index rebuild is complete, both old master and new master nodes would rebuild the index in parallel which results in index corruption. Since the index is corrupted, the data stored on space optimized snapshots should not be trusted. I/Os issued on corrupted index would lead to panic. RESOLUTION: When the master role is switched using 'vxclustadm setmaster', the index rebuild on old master node would be safely aborted. Only new master node would be allowed to rebuild the index. * 2944724 (Tracking ID: 2892983) SYMPTOM: vxvol command dumps core with the following stack trace, if executed parallel to vxsnap addmir command strcmp() do_link_recovery trans_resync_phase1() vxvmutil_trans() trans() common_start_resync() do_noderecover() main() DESCRIPTION: During creation of link between two volumes if vxrecover is triggered, vxvol command may not have information about the newly created links. This leads to NULL pointer dereference and dumps core. RESOLUTION: The code has been modified to check if links information is properly present with vxvol command and fail operation with appropriate error message. * 2944725 (Tracking ID: 2910043) SYMPTOM: When VxVM operations like the plex attach, snapshot resync or reattach are performed, frequent swap-in and swap-out activities are observed due to the excessive memory allocation by the vxiod daemons. DESCRIPTION: When the VxVM operations such as the plex attach, snapshot resync, or reattach operations are performed, the default I/O size of the operation is 1 MB and VxVM allocates this memory from the OS. Such huge memory allocations can result in swap-in and swap-out of pages and are not very efficient. When many operations are performed, the system may not work very efficiently. RESOLUTION: The code is modified to make use of VxVM's internal I/O memory pool, instead of directly allocating the memory from the OS. * 2944727 (Tracking ID: 2919720) SYMPTOM: The vxconfigd(1M) daemon dumps core in the rec_lock1_5() function. The following stack trace is observed: rec_lock1_5() rec_lock1() rec_lock() client_trans_start() req_vol_trans() request_loop() main() DESCRIPTION: During any configuration changes in VxVM, the vxconfigd(1M) command locks all the objects involved in the operations to avoid any unexpected modification. Some objects which do not belong to the current transactions are not handled properly. This results in a core dump. This case is particularly observed during the snapshot operations of the cross disk group linked-volume snapshots. RESOLUTION: The code is modified to avoid the locking of the records which are not yet a part of the committed VxVM configuration. * 2944729 (Tracking ID: 2933138) SYMPTOM: System panics with stack trace given below: voldco_update_itemq_chunk() voldco_chunk_updatesio_start() voliod_iohandle() voliod_loop() DESCRIPTION: While tracking IOs in snapshot MAPS information is stored in- memory pages. For large sized IOs (such as reclaim IOs), this information can span across multiple pages. Sometimes the pages are not properly referenced in MAP update for IOs of larger size which lead to panic because of invalid page addresses. RESOLUTION: Code is modified to properly reference pages during MAP update for large sized IOs. * 2944741 (Tracking ID: 2866059) SYMPTOM: When disk-resize operation fails, the error messages displayed do not give the exact details of the failure. The following error messages are displayed: 1. "VxVM vxdisk ERROR V-5-1-8643 Device : resize failed: One or more subdisks do not fit in pub reg" 2. "VxVM vxdisk ERROR V-5-1-8643 Device : resize failed: Cannot remove last disk in disk group" DESCRIPTION: When a disk-resize operation fails, both the error messages need to be enhanced to display the exact details of the failure. RESOLUTION: The code is modified to improve the error messages. Error message (1) is modified to: VxVM vxdisk ERROR V-5-1-8643 Device emc_clariion0_338: resize failed: One or more subdisks do not fit in pub region. vxconfigd log : 01/16 02:23:23: VxVM vxconfigd DEBUG V-5-1-0 dasup_resize_check: SD emc_clariion0_338-01 not contained in disk sd: start=0 end=819200, public: offset=0 len=786560 Error message (2) is modified to: VxVM vxdisk ERROR V-5-1-0 Device emc_clariion0_338: resize failed: Cannot remove last disk in disk group. Resizing this device can result in data loss. Use -f option to force resize. * 2962257 (Tracking ID: 2898547) SYMPTOM: The 'vradmind' process dumps core on the Veritas Volume Replicator (VVR) secondary site in a Clustered Volume Replicator (CVR) environment. The stack trace would look like: __kernel_vsyscall raise abort fmemopen malloc_consolidate delete delete[] IpmHandle::~IpmHandle IpmHandle::events main DESCRIPTION: When log owner service group is moved across the nodes on the primary site, it induces the deletion of IpmHandle of the old log owner node, as the IpmHandle of the new log owner node gets created. During the destruction of IpmHandle object, a pointer '_cur_rbufp' is not set to NULL, which can lead to freeing up of memory which is already freed. This causes 'vradmind' to dump core. RESOLUTION: The code is modified for the destructor of IpmHandle to set the pointer to NULL after it is deleted. * 2965542 (Tracking ID: 2928764) SYMPTOM: If the tunable dmp_fast_recovery is set to off, PGR( Persistent Group Reservation) key registration fails except for the first path i.e. only for the first path PGR key gets registered. Consider we are registering keys as follows. # vxdmpadm settune dmp_fast_recovery=off # vxdmpadm settune dmp_log_level=9 # vxdmppr read -t REG /dev/vx/rdmp/hitachi_r7000_00d9 Node: /dev/vx/rdmp/hitachi_r7000_00d9 ASCII-KEY HEX-VALUE ----------------------------- # vxdmppr register -s BPGR0000 /dev/vx/rdmp/hitachi_r7000_00d9 # vxdmppr read -t REG /dev/vx/rdmp/hitachi_r7000_00d9 Node: /dev/vx/rdmp/hitachi_r7000_00d9 ASCII-KEY HEX-VALUE ----------------------------- BPGR0000 0x4250475230303030 This being a multipathed disk, only first path gets PGR key registered through it. You will see the log messages similar to following: Sep 6 11:29:41 clabcctlx04 kernel: VxVM vxdmp V-5-0-0 SCSI error opcode=0x5f returned rq_status=0x12 cdb_status=0x1 key=0x6 asc=0x2a ascq=0x3 on path 8/0x90 Sep 6 11:29:41 clabcctlx04 kernel: VxVM vxdmp V-5-3-0 dmp_scsi_ioctl: SCSI ioctl completed host_byte = 0x0 rq_status = 0x8 Sep 6 11:29:41 clabcctlx04 kernel: sd 4:0:0:4: reservation conflict Sep 6 11:29:41 clabcctlx04 kernel: VxVM vxdmp V-5-3-0 dmp_scsi_ioctl: SCSI ioctl completed host_byte = 0x11 rq_status = 0x17 Sep 6 11:29:41 clabcctlx04 kernel: VxVM vxdmp V-5-0-0 SCSI error opcode=0x5f returned rq_status=0x17 cdb_status=0x0 key=0x0 asc=0x0 ascq=0x0 on path 8/0xb0 Sep 6 11:29:41 clabcctlx04 kernel: VxVM vxdmp V-5-3-0 dmp_pr_send_cmd failed with transport error: uscsi_rqstatus = 23ret = -1 status = 0 on dev 8/0xb0 Sep 6 11:29:41 clabcctlx04 kernel: VxVM vxdmp V-5-3-0 dmp_scsi_ioctl: SCSI ioctl completed host_byte = 0x0 rq_status = 0x8 DESCRIPTION: After key for first path gets registered successfully, the second path gets a reservation conflict which is expected. But in case of synchronous mode i.e. when dmp_fast_recovery is off, we don't set the proper reservation flag, due to which the registration command fails with the transport error and PGR keys on other paths don't get registered. In asynchronous mode we set it correctly, hence don't see the issue there. RESOLUTION: Set the proper reservation flag so that the key can be registered for other paths as well. * 2973659 (Tracking ID: 2943637) SYMPTOM: System panicked during the process of expanding DMP IO stats queue. The following call stack can be observed in syslog before panic: oom_kill_process select_bad_process out_of_memory __alloc_pages_nodemask alloc_pages_current __vmalloc_area_node dmp_alloc __vmalloc_node dmp_alloc vmalloc_32 dmp_alloc dmp_zalloc dmp_iostatq_add dmp_iostatq_op dmp_process_stats dmp_daemons_loop DESCRIPTION: During the process of expanding the DMP queue used to collect IO stats, memory is allocated using blocking calls. While satisfying this request, if the kernel finds the system is critically low on available memory, it can invoke the OOM killer. System panic can result if this happens to kill some critical process (e.g. HAD, which will cause VCS heart-beating to stop). RESOLUTION: Code changes were made to allocate memory using non-blocking calls. This will cause Linux kernel to fail the allocation request immediately if it cannot be satisfied and prevent DMP from further straining the system already low on memory. * 2974870 (Tracking ID: 2935771) SYMPTOM: In a Veritas Volume Replicator (VVR) environment, the 'rlinks' disconnect after switching the master node. DESCRIPTION: Sometimes switching a master node on the primary node can cause the 'rlinks' to disconnect. The "vradmin repstatus" command displays "paused due to network disconnection" as the replication status. VVR uses a connection to check if the secondary node is alive. The secondary node responds to these requests by replying back, indicating that it is alive. On a master node switch, the old master node fails to close this connection with the secondary node. Thus, after the master node switch the old master node as well as the new master node sends the requests to the secondary node. This causes a mismatch of connection numbers on the secondary node, and the secondary node does not reply to the requests of the new master node. Thus, it causes the 'rlinks' to disconnect. RESOLUTION: The code is modified to close the connection of the old master node with the secondary node, so that it does not send the connection requests to the secondary node. * 2976946 (Tracking ID: 2919714) SYMPTOM: On a thin Logical Unit Number (LUN), the vxevac(1M) command returns 0 without migrating the unmounted-VxFS volumes. The following error messages are displayed when the unmounted-VxFS volumes are processed: VxVM vxsd ERROR V-5-1-14671 Volume v2 is configured on THIN luns and not mounted. Use 'force' option, to bypass smartmove. To take advantage of smartmove for supporting thin luns, retry this operation after mounting the volume. VxVM vxsd ERROR V-5-1-407 Attempting to clean up after failure ... DESCRIPTION: On a thin LUN, VxVM does not move or copy data on the unmounted-VxFS volumes unless the 'smartmove' is bypassed. The vxevac(1M) command error messages need to be enhanced to detect the unmounted-VxFS volumes on thin LUNs, and to support a 'force' option that allows the user to bypass the 'smartmove'. RESOLUTION: The vxevac script is modified to check for the unmounted-VxFS volumes on thin LUNs, before the migration is performed. If the unmounted-VxFS volume is detected, the vxevac(1M) command fails with a non-zero return code. Subsequently, a message is displayed that notifies the user to mount the volumes or bypass the 'smartmove' by specifying the 'force' option. The rectified error message is displayed as following: VxVM vxevac ERROR V-5-2-0 The following VxFS volume(s) are configured on THIN luns and not mounted: v2 To take advantage of smartmove support on thin luns, retry this operation after mounting the volume(s). Otherwise, bypass smartmove by specifying the '-f' force option. * 2978189 (Tracking ID: 2948172) SYMPTOM: Execution of command "vxdisk -o thin, fssize list" can cause hang or panic. Hang stack trace might look like: pse_block_thread pse_sleep_thread .hkey_legacy_gate volsiowait vol_objioctl vol_object_ioctl voliod_ioctl volsioctl_real volsioctl Panic stack trace might look like: voldco_breakup_write_extents volfmr_breakup_extents vol_mv_indirect_write_start volkcontext_process volsiowait vol_objioctl vol_object_ioctl voliod_ioctl volsioctl_real vols_ioctl vols_compat_ioctl compat_sys_ioctl sysenter_dispatch DESCRIPTION: Command "vxdisk -o thin, fssize list" triggers reclaim I/Os to get file system usage from veritas file system on veritas volume manager mounted volumes. We currently do not support reclamation on volumes with space optimized (SO) snapshots. But because of a bug, reclaim IOs continue to execute for volumes with SO Snapshots leading to system panic/hang. RESOLUTION: Code changes are made to not to allow reclamation IOs to proceed on volumes with SO Snapshots. * 2979767 (Tracking ID: 2798673) SYMPTOM: System panic is observed with the stack trace given below: voldco_alloc_layout voldco_toc_updatesio_done voliod_iohandle voliod_loop DESCRIPTION: DCO (data change object) contains metadata information required to start DCO volume and decode further information from the DCO volume. This information is stored in the 1st block of DCO volume. If this metadata information is incorrect/corrupted, the further processing of volume start resulted into panic due to divide-by-zero error in kernel. RESOLUTION: Code changes are made to verify the correctness of DCO volumes metadata information during startup. If the information read is incorrect, volume start operations fails. * 2983679 (Tracking ID: 2970368) SYMPTOM: The SRDF-R2 WD (write-disabled) devices are shown in an error state and many enable and disable path messages that are generated in the "/etc/vx/dmpevents.log" file. DESCRIPTION: DMP driver disables the paths of the write-protected devices. Therefore, these devices are shown in an error state. The vxattachd(1M) daemon tries to online these devices and executes the partial-device discovery for these devices. As part of the partial-device discovery, enabling and disabling the paths of such write-protected devices, generates many path enable and disable messages in the "/etc/vx/dmpevents.log" file. RESOLUTION: The code is modified so that the paths of the write-protected devices in DMP are not disabled. * 2988017 (Tracking ID: 2971746) SYMPTOM: For single-path device, bdget() function is being called for each I/O, which cause high cpu usage and leads to I/O performance degradation. DESCRIPTION: For each I/O on the single-path DMP device, OS function bdget() is being called while swapping DMP whole device with its subpath OS whole device, which consumes cpu cycles while doing whole device look-up into block device database. RESOLUTION: Code changes are done to cache the OS whole block device pointer during device open to prevent calling bdget() function during each I/O operation. * 2988018 (Tracking ID: 2964169) SYMPTOM: In multiple CPUs environment, I/O performance degradation is seen when I/O is done through VxFS and VxVM specific private interface. DESCRIPTION: I/O performance degradation is seen when I/O is being done through VxFS and VxVM specific private interface and 'bi_comp_cpu' (CPU ID) field of 'struct bio' which is being set by VxFS module is not being honoured within VxVM module and hence not being propagated to underlying device driver, due to which iodone routines are being called mostly on CPU 0, causing performance degradation. VxFS and VxVM private interface is being used in case of smartmove, Oracle Resilvering, smartsync etc.. RESOLUTION: Code changes are done to propagate the 'bi_comp_cpu' (CPU ID) field of struct bio, which is being set by VxFS (struct bio owner) within VxVM so that VxVM can pass down the CPU id to underlying module and iodone will be called on the specified CPU. * 3004823 (Tracking ID: 2692012) SYMPTOM: When moving the subdisks by using the vxassist(1M) command or the vxevac(1M) command, if the disk tags are not the same for the source and the destination, the command fails with a generic error message. The error message does not provide the reason for the failure of the operation. The following error message is displayed: VxVM vxassist ERROR V-5-1-438 Cannot allocate space to replace subdisks DESCRIPTION: When moving the subdisks using the "vxassist move" command, if no target disk is specified, it uses the available disks from the disk group to move. If the disks have the site tag set and the value of the site- tag attribute is not the same, the subsequent move operation using the vxassist(1M) command is expected to fail. However, it fails with the generic message that does not provide the reason for the failure of the operation. RESOLUTION: The code is modified so that a new error message is introduced to specify that the disk failure is due to the mismatch in the site-tag attribute. The enhanced error message is as follows: VxVM vxassist ERROR V-5-1-0 Source and/or target disk belongs to site, can not move over sites * 3004852 (Tracking ID: 2886333) SYMPTOM: The vxdg(1M) join operation allows mixing of clone and non-clone disks in a disk group. The subsequent import of a new joined disk group fails. The following error message is displayed: "VxVM vxdg ERROR V-5-1-17090 Source disk group tdg and destination disk group tdg2 are not homogeneous, trying to Mix of cloned diskgroup to standard disk group or vice versa is not allowed. Please follow the vxdg (1M) man page." DESCRIPTION: Mixing of the clone and non-clone disk group is not allowed. The part of the code where the join operation is performed, executes the operation without validating the mix of the clone and the non-clone disk groups. This results in the newly joined disk group having a mix of the clone and non-clone disks. Subsequent import of the newly joined disk group fails. RESOLUTION: The code is modified so that during the disk group join operation, both the disk groups are checked. If a mix of clone and non-clone disk group is found, the join operation is aborted. * 3005921 (Tracking ID: 1901838) SYMPTOM: After installation of a license key that enables multi-pathing, the state of the controller is shown as DISABLED in the command- line-interface (CLI) output for the vxdmpadm(1M) command. DESCRIPTION: When the multi-pathing license key is installed, the state of the active paths of the Logical Unit Number (LUN) is changed to the ENABLED state. However, the state of the controller is not updated. As a result, the state of the controller is shown as DISABLED in the CLI output for the vxdmpadm(1M) command. RESOLUTION: The code is modified so that the states of the controller and the active LUN paths are updated when the multi-pathing license key is installed. * 3006262 (Tracking ID: 2715129) SYMPTOM: Vxconfigd hangs during Master takeover in a CVM (Clustered Volume Manager) environment. This results in VxVM command hang. DESCRIPTION: During Master takeover, VxVM (Veritas Volume Manager) kernel signals Vxconfigd with the information of new Master. Vxconfigd then proceeds with a vxconfigd- level handshake with the nodes across the cluster. Before kernel could signal to vxconfigd, vxconfigd handshake mechanism got started, resulting in the hang. RESOLUTION: Code changes are done to ensure that vxconfigd handshake gets started only upon receipt of signal from the kernel. * 3011391 (Tracking ID: 2965910) SYMPTOM: vxassist dumps core with following stack: setup_disk_order() volume_alloc_basic_setup() fill_volume() setup_new_volume() make_trans() vxvmutil_trans() trans() transaction() do_make() main() DESCRIPTION: When -o ordered is used, vxassist handles non-disk parameters in a different way. This scenario may result in invalid comparison, leading to a core dump. RESOLUTION: Code changes are made to handle the parameter comparison logic properly. * 3011444 (Tracking ID: 2398416) SYMPTOM: vxassist dumps core with the following stack: merge_attributes() get_attributes() do_make() main() _start() DESCRIPTION: vxassist dumps core while creating volume when attribute 'wantmirror=ctlr' is added to the '/etc/default/vxassist' file. vxassist reads this default file initially and uses the attributes specified to allocate the storage during the volume creation. However, during the merging of attributes specified in the default file, it accesses NULL attribute structure causing the core dump. RESOLUTION: Necessary code changes have been done to check the attribute structure pointer before accessing it. * 3020087 (Tracking ID: 2619600) SYMPTOM: Live migration of virtual machine having SFHA/SFCFSHA stack with data disks fencing enabled, causes service groups configured on virtual machine to fault. DESCRIPTION: After live migration of virtual machine having SFHA/SFCFSHA stack with data disks fencing enabled is done, I/O fails on shared SAN devices with reservation conflict and causes service groups to fault. Live migration causes SCSI initiator change. Hence I/O coming from migrated server to shared SAN storage fails with reservation conflict. RESOLUTION: Code changes are added to check whether the host is fenced off from cluster. If host is not fenced off, then registration key is re-registered for dmpnode through migrated server and restart IO. * 3025973 (Tracking ID: 3002770) SYMPTOM: When a SCSI-inquiry command is executed, any NULL- pointer dereference in Dynamic Multi-Pathing (DMP) causes the system to panic with the following stack trace: dmp_aa_recv_inquiry() dmp_process_scsireq() dmp_daemons_loop() DESCRIPTION: The panic occurs when the SCSI response for the SCSI-inquiry command is handled. In order to determine if the path on which the SCSI-inquiry command issued is read-only, DMP needs to check the error buffer. However, the error buffer is not always prepared. So before making any further checking, DMP should examine if the error buffer is valid before any further checking. Without any error-buffer examination, the system panics with a NULL pointer. RESOLUTION: The code is modified to verify that the error buffer is valid. * 3026288 (Tracking ID: 2962262) SYMPTOM: When DMP Native Stack support is enabled and some devices are being managed by a multipathing solution other than DMP, then uninstalling DMP fails with an error for not being able to turn off DMP Native Stack support. Performing DMP prestop tasks ...................................... Done The following errors were discovered on the systems: CPI ERROR V-9-40-3436 Failed to turn off dmp_native_support tunable on pilotaix216. Refer to Dynamic Multi-Pathing Administrator's guide to determine the reason for the failure and take corrective action. VxVM vxdmpadm ERROR V-5-1-15690 Operation failed for one or more volume groups The CLI 'vxdmpadm settune dmp_native_support=off' also fails with following error. # vxdmpadm settune dmp_native_support=off VxVM vxdmpadm ERROR V-5-1-15690 Operation failed for one or more volume groups DESCRIPTION: With DMP Native Stack support it is expected that devices which are being used by LVM are multipathed by DMP. Co-existence with other multipath solutions in such cases are not supported. Having some other multipath solution results in this error. RESOLUTION: Code changes have been made to not error out while turning off DMP Native Support if device is not being managed by DMP. * 3027482 (Tracking ID: 2273190) SYMPTOM: The device discovery commands 'vxdisk scandisks' or 'vxdctl enable' issued just after license key installation may fail and abort. DESCRIPTION: After addition of license key that enables multi-pathing, the state of paths maintained at user level is incorrect. RESOLUTION: As a fix, whenever multi-pathing license key is installed, the operation updates the state of paths both at user level and kernel level. * 2860207 (Tracking ID: 2859470) SYMPTOM: The EMC SRDF-R2 disk may go in error state when the Extensible Firmware Interface (EFI) label is created on the R1 disk. For example: R1 site # vxdisk -eo alldgs list | grep -i srdf emc0_008c auto:cdsdisk emc0_008c SRDFdg online c1t5006048C5368E580d266 srdf-r1 R2 site # vxdisk -eo alldgs list | grep -i srdf emc1_0072 auto - - error c1t5006048C536979A0d65 srdf-r2 DESCRIPTION: Since R2 disks are in write protected mode, the default open() call made for the read-write mode fails for the R2 disks, and the disk is marked as invalid. RESOLUTION: The code is modified to change Dynamic Multi-Pathing (DMP) to be able to read the EFI label even on a write-protected SRDF-R2 disk. * 2876865 (Tracking ID: 2510928) SYMPTOM: The extended attributes reported by "vxdisk -e list" for the EMC SRDF luns are reported as "tdev mirror", instead of "tdev srdf-r1". Example, # vxdisk -e list DEVICE TYPE DISK GROUP STATUS OS_NATIVE_NAME ATTR emc0_028b auto:cdsdisk - - online thin c3t5006048AD5F0E40Ed190s2 tdev mirror DESCRIPTION: The extraction of the attributes of EMC SRDF luns was not done properly. Hence, EMC SRDF luns are erroneously reported as "tdev mirror", instead of "tdev srdf- r1". RESOLUTION: Code changes have been made to extract the correct values. * 2892499 (Tracking ID: 2149922) SYMPTOM: Record the diskgroup import and deport events in the /var/log/messages file. Following type of message can be logged in syslog: vxvm: vxconfigd: V-5-1-16254 Disk group import of succeeded. DESCRIPTION: With the diskgroup import or deport, appropriate success message or failure message with the cause for failure should be logged. RESOLUTION: Code changes are made to log diskgroup import and deport events in syslog. * 2892621 (Tracking ID: 1903700) SYMPTOM: vxassist remove mirror does not work if nmirror and alloc is specified, giving an error "Cannot remove enough mirrors" DESCRIPTION: During remove mirror operation, VxVM does not perform correct analysis of plexes. Hence the issue. RESOLUTION: Necessary code changes have been done so that vxassist works properly. * 2892643 (Tracking ID: 2801962) SYMPTOM: Operations that lead to growing of volume, including 'vxresize', 'vxassist growby/growto' take significantly larger time if the volume has version 20 DCO(Data Change Object) attached to it in comparison to volume which doesn't have DCO attached. DESCRIPTION: When a volume with a DCO is grown, it needs to copy the existing map in DCO and update the map to track the grown regions. The algorithm was such that for each region in the map it would search for the page that contains that region so as to update the map. Number of regions and number of pages containing them are proportional to volume size. So, the search complexity is amplified and observed primarily when the volume size is of the order of terabytes. In the reported instance, it took more than 12 minutes to grow a 2.7TB volume by 50G. RESOLUTION: Code has been enhanced to find the regions that are contained within a page and then avoid looking-up the page for all those regions. * 2892650 (Tracking ID: 2826125) SYMPTOM: VxVM script daemons are not up after they are invoked with the vxvm-recover script. DESCRIPTION: When the VxVM script daemon is starting, it will terminate any stale instance if it does exist. When the script daemon is invoking with exactly the same process id of the previous invocation, the daemon itself is abnormally terminated by killing one own self through a false-positive detection. RESOLUTION: Code changes are made to handle the same process id situation correctly. * 2892660 (Tracking ID: 2000585) SYMPTOM: If 'vxrecover -sn' is run and at the same time one volume is removed, vxrecover exits with the error 'Cannot refetch volume', the exit status code is zero but no volumes are started. DESCRIPTION: vxrecover assumes that volume is missing because the diskgroup must have been deported while vxrecover was in progress. Hence, it exits without starting remaining volumes. vxrecover should be able to start other volumes, if the DG is not deported. RESOLUTION: Modified the source to skip missing volume and proceed with remaining volumes. * 2892689 (Tracking ID: 2836798) SYMPTOM: 'vxdisk resize' fails with the following error on the simple format EFI (Extensible Firmware Interface) disk expanded from array side and system may panic/hang after a few minutes. # vxdisk resize disk_10 VxVM vxdisk ERROR V-5-1-8643 Device disk_10: resize failed: Configuration daemon error -1 DESCRIPTION: As VxVM doesn't support Dynamic Lun Expansion on simple/sliced EFI disk, last usable LBA (Logical Block Address) in EFI header is not updated while expanding LUN. Since the header is not updated, the partition end entry was regarded as illegal and cleared as part of partition range check. This inconsistent partition information between the kernel and disk causes system panic/hang. RESOLUTION: Added checks in VxVM code to prevent DLE on simple/sliced EFI disk. * 2922798 (Tracking ID: 2878876) SYMPTOM: The vxconfigd(1M) daemon dumps core with the following stack trace: vol_cbr_dolog () vol_cbr_translog () vold_preprocess_request () request_loop () main () DESCRIPTION: This core is a result of a race between the two threads which are processing the requests from the same client. While one thread completes processing a request and is in the phase of releasing the memory used, the other thread processes a "DISCONNECT" request from the same client. Due to the race condition, the second thread attempts to access the memory released and dumps core. RESOLUTION: The code is modified by protecting the common data of the client by a mutex. * 2924117 (Tracking ID: 2911040) SYMPTOM: The restore operation from a cascaded snapshot succeeds even when one of the source is inaccessible. Subsequently, if the primary volume is made accessible for the restore operation, the I/O operation may fail on the volume, as the source of the volume is inaccessible. Any deletion of the snapshot also fails due to the dependency of the primary volume on the snapshots. When the user tries to remove the snapshot using the "vxedit rm" command, the following error message is displayed: "VxVM vxedit ERROR V-5-1-XXXX Volume YYYYYY has dependent volumes" DESCRIPTION: When a snapshot is restored from any snapshot, the snapshot becomes the source of the data for the regions on the primary volume that differ between the two volumes. If the snapshot itself depends on some other volume and that volume is not accessible, effectively the primary volume becomes inaccessible after the restore operation. For such instances, the snapshot cannot be deleted as the primary volume depends on it. RESOLUTION: The code is modified so that if a snapshot or any later cascaded snapshot is inaccessible, the restore operation from that snapshot is prevented. * 2924188 (Tracking ID: 2858853) SYMPTOM: In CVM(Cluster Volume Manager) environment, after master switch, vxconfigd dumps core on the slave node (old master) when a disk is removed from the disk group. dbf_fmt_tbl() voldbf_fmt_tbl() voldbsup_format_record() voldb_format_record() format_write() ddb_update() dg_set_copy_state() dg_offline_copy() dasup_dg_unjoin() dapriv_apply() auto_apply() da_client_commit() client_apply() commit() dg_trans_commit() slave_trans_commit() slave_response() fillnextreq() vold_getrequest() request_loop() main() DESCRIPTION: During master switch, disk group configuration copy related flags are not cleared on the old master, hence when a disk is removed from a disk group, vxconfigd dumps core. RESOLUTION: Necessary code changes have been made to clear configuration copy related flags during master switch. * 2924207 (Tracking ID: 2886402) SYMPTOM: When re-configuring dmp devices, typically using command 'vxdisk scandisks', vxconfigd hang is observed. Since it is in hang state, no VxVM(Veritas volume manager)commands are able to respond. Following process stack of vxconfigd was observed. dmp_unregister_disk dmp_decode_destroy_dmpnode dmp_decipher_instructions dmp_process_instruction_buffer dmp_reconfigure_db gendmpioctl dmpioctl dmp_ioctl dmp_compat_ioctl compat_blkdev_ioctl compat_sys_ioctl cstar_dispatch DESCRIPTION: When DMP(dynamic multipathing) node is about to be destroyed, a flag is set to hold any IO(read/write) on it. The IOs which may come in between the process of setting flag and actual destruction of DMP node, are placed in dmp queue and are never served. So the hang is observed. RESOLUTION: Appropriate flag is set for node which is to be destroyed so that any IO after marking flag will be rejected so as to avoid hang condition. * 2933468 (Tracking ID: 2916094) SYMPTOM: These are the issues for which enhancements are done: 1. All the DR operation logs are accumulated in one log file 'dmpdr.log', and this file grows very large. 2. If a command takes long time, user may think DR operations have stuck. 3. Devices controlled by TPD are seen in list of luns that can be removed in 'Remove Luns' operation. DESCRIPTION: 1. All the logs of DR operations accumulate and form one big log file which makes it difficult for user to get to the current DR operation logs. 2. If a command takes time, user has no way to know whether the command has stuck. 3. Devices controlled by TPD are visible to user which makes him think that he can remove those devices without removing them from TPD control. RESOLUTION: 1. Now every time user opens DR Tool, a new log file of form dmpdr_yyyymmdd_HHMM.log is generated. 2. A messages is displayed to inform user if a command takes longer time than expected. 3. Changes are made so that devices controlled by TPD are not visible during DR operations. * 2933469 (Tracking ID: 2919627) SYMPTOM: While doing 'Remove Luns' operation of Dynamic Reconfiguration Tool, there is no feasible way to remove large number of LUNs, since the only way to do so is to enter all LUN names separated by comma. DESCRIPTION: When removing luns in bulk during 'Remove Luns' option of Dynamic Reconfiguration Tool, it would not be feasible to enter all the luns separated by comma. RESOLUTION: Code changes are done in Dynamic Reconfiguration scripts to accept file containing luns to be removed as input. * 2934259 (Tracking ID: 2930569) SYMPTOM: The LUNs in 'error' state in output of 'vxdisk list' cannot be removed through DR(Dynamic Reconfiguration) Tool. DESCRIPTION: The LUNs seen in 'error' state in VM(Volume Manager) tree are not listed by DR(Dynamic Reconfiguration) Tool while doing 'Remove LUNs' operation. RESOLUTION: Necessary changes have been made to display LUNs in error state while doing 'Remove LUNs' operation in DR(Dynamic Reconfiguration) Tool. * 2942166 (Tracking ID: 2942609) SYMPTOM: You will see following message as error message when quitting from Dynamic Reconfiguration Tool. "FATAL: Exiting the removal operation." DESCRIPTION: When user quits from an operation, Dynamic Reconfiguration Tool displays it is quitting as error message. RESOLUTION: Made changes to display the message as Info. * 3952992 (Tracking ID: 3951938) SYMPTOM: Retpoline support for ASLAPM on RHEL6.10 and RHEL6.x retpoline kernels DESCRIPTION: The RHEL6.10 is new release and it has Retpoline kernel. Also redhat released retpoline kernel for older RHEL6.x releases. The APM module should be recompiled with retpoline aware GCC to support retpoline kernel. RESOLUTION: Compiled APM with retpoline GCC. Patch ID: VRTSgms 6.0.500.200 * 3957182 (Tracking ID: 3957857) SYMPTOM: GMS support for RHEL6.10. DESCRIPTION: Since RHEL6.10 is new release and it has retpoline kernel therefore adding GMS support for it. RESOLUTION: Added GMS support for RHEL6.10. * 3952376 (Tracking ID: 3952375) SYMPTOM: GMS support for RHEL6.x retpoline kernels DESCRIPTION: Redhat released retpoline kernel for older RHEL6.x releases. The GMS module should recompile with retpoline aware GCC to support retpoline kernel. RESOLUTION: Compiled GMS with retpoline GCC. Patch ID: VRTSglm 6.0.500.600 * 3957181 (Tracking ID: 3957855) SYMPTOM: GLM support for RHEL6.10. DESCRIPTION: Since RHEL6.10 is new release and it has retpoline kernel therefore adding GLM support for it. RESOLUTION: Added GLM support for RHEL6.10. * 3952369 (Tracking ID: 3952368) SYMPTOM: GLM support for RHEL6.x retpoline kernels DESCRIPTION: Redhat released retpoline kernel for older RHEL6.x releases. The GLM module should recompile with retpoline aware GCC to support retpoline kernel. RESOLUTION: Compiled GLM with retpoline GCC. * 3845273 (Tracking ID: 2850818) SYMPTOM: GLM thread may get panic if cache pointer remains null. DESCRIPTION: When GLM cache pointer is de-refrenced, thread got panicked. This is due to the reason that there can be a case where memory is not allocated to the cache and pointer remained null but we missed the check for null pointer and later we de-referenced that resulting into the panic. RESOLUTION: Code has been modified to handle the situation where memory is not allocated for the cache. * 3364311 (Tracking ID: 3364309) SYMPTOM: Internal stress test on the cluster file system, hit a debug assert in the Group Lock Manager (GLM). DESCRIPTION: In GLM, the code to handle the last revoke for a lock may cause a deadlock. This deadlock is caught upfront by the debug assert. RESOLUTION: The code is modified to avoid the deadlock when the last revoke for the lock is in progress. Patch ID: VRTSodm 6.0.500.300 * 3957180 (Tracking ID: 3957853) SYMPTOM: ODM support for RHEL6.10. DESCRIPTION: Since RHEL6.10 is new release and it has retpoline kernel therefore adding ODM support for it. RESOLUTION: Added ODM support for RHEL6.10. * 3952366 (Tracking ID: 3952365) SYMPTOM: ODM support for RHEL6.x retpoline kernels DESCRIPTION: Redhat released retpoline kernel for older RHEL6.x releases. The ODM module should recompile with retpoline aware GCC to support retpoline kernel. RESOLUTION: Compiled ODM with retpoline GCC. * 3559203 (Tracking ID: 3529371) SYMPTOM: The package verification of VRTSodm on Linux using the rpm(1M) command with A-V:optionA fails. DESCRIPTION: The package verification fails due to permission mode of /dev/odm directory after odm device is mounted differs from entry in rpm database. RESOLUTION: The Creation time permission mode for /dev/odm directory is corrected. * 3322294 (Tracking ID: 3323866) SYMPTOM: Some ODM operations may fail with the following error: ODM ERROR V-41-4-1-328-22 Invalid argument DESCRIPTION: On systems having heavy database activity using ODM some operations may fail an error. This is a corner case and it occurs when a new task enters in ODM. To avoid deadlocks ODM maintains two lists of tasks viz. hold list and deny list. All the active tasks are maintained in the hold list and the task that are being exited from ODM are stored in the deny list. The error is returned when the ODM PID structure gets re-used for a PID that is still being exited from the ODM and is there in the deny list in that case ODM don't allow the task to enter in the ODM and above error is returned. RESOLUTION: The code is modified such as to add an extra check while adding a new task in ODM to avoid returning the error in such scenarios. * 3369038 (Tracking ID: 3349649) SYMPTOM: Oracle Disk Manager (ODM) module fails to load on RHEL6.5 with the following system log error message: kernel: vxodm: disagrees about version of symbol putname kernel: vxodm: disagrees about version of symbol getname DESCRIPTION: In RHEL6.5, the kernel interfaces for AgetnameA and AputnameA used by VxFS have changed. RESOLUTION: The code is modified to use the latest definitions of AgetnameA and AputnameA kernel interfaces. * 3384781 (Tracking ID: 3384775) SYMPTOM: Installing patch 6.0.3.200 on RHEL 6.4 or earlier RHEL 6.* versions fails with ERROR: No appropriate modules found. # /etc/init.d/vxfs start ERROR: No appropriate modules found. Error in loading module "vxfs". See documentation. Failed to create /dev/vxportal ERROR: Module fdd does not exist in /proc/modules ERROR: Module vxportal does not exist in /proc/modules ERROR: Module vxfs does not exist in /proc/modules DESCRIPTION: VRTSvxfs and VRTSodm rpms ship four different set module for RHEL 6.1 and RHEL6.2 , RHEL 6.3 , RHEL6.4 and RHEL 6.5 each. However the current patch only contains the RHEL 6.5 module. Hence installation on earlier RHEL 6.* version fail. RESOLUTION: A superseding patch 6.0.3.300 will be released to include all the modules for RHEL 6.* versions which will be available on SORT for download. * 3349650 (Tracking ID: 3349649) SYMPTOM: ODM modules fail to load on RHEL6.5 and following error messages are reported in system log. kernel: vxodm: disagrees about version of symbol putname kernel: vxodm: disagrees about version of symbol getname DESCRIPTION: In RHEL6.5 the kernel interfaces for getname and putname used by VxFS have changed. RESOLUTION: Code modified to use latest definitions of getname and putname kernel interfaces. INSTALLING THE PATCH -------------------- Run the Installer script to automatically install the patch: ----------------------------------------------------------- Please be noted that the installation of this P-Patch will cause downtime. To install the patch perform the following steps on at least one node in the cluster: 1. Copy the patch sfha-rhel6_x86_64-Patch-6.0.5.600.tar.gz to /tmp 2. Untar sfha-rhel6_x86_64-Patch-6.0.5.600.tar.gz to /tmp/hf # mkdir /tmp/hf # cd /tmp/hf # gunzip /tmp/sfha-rhel6_x86_64-Patch-6.0.5.600.tar.gz # tar xf /tmp/sfha-rhel6_x86_64-Patch-6.0.5.600.tar 3. Install the hotfix(Please be noted that the installation of this P-Patch will cause downtime.) # pwd /tmp/hf # ./installSFHA605P6 [ ...] You can also install this patch together with 6.0.1 GA release and 6.0.5 Patch release # ./installSFHA605P6 -base_path [<601 path>] -mr_path [<605 path>] [ ...] where the -mr_path should point to the 6.0.5 image directory, while -base_path to the 6.0.1 image. Install the patch manually: -------------------------- #Manual installation is not supported REMOVING THE PATCH ------------------ #Manual uninstallation is not supported SPECIAL INSTRUCTIONS -------------------- NONE OTHERS ------ NONE