* * * READ ME * * * * * * Symantec Storage Foundation HA 6.1.1 * * * * * * Patch 6.1.1.100 * * * Patch Date: 2015-10-14 This document provides the following information: * PATCH NAME * OPERATING SYSTEMS SUPPORTED BY THE PATCH * PACKAGES AFFECTED BY THE PATCH * BASE PRODUCT VERSIONS FOR THE PATCH * SUMMARY OF INCIDENTS FIXED BY THE PATCH * DETAILS OF INCIDENTS FIXED BY THE PATCH * INSTALLATION PRE-REQUISITES * INSTALLING THE PATCH * REMOVING THE PATCH PATCH NAME ---------- Symantec Storage Foundation HA 6.1.1 Patch 6.1.1.100 (Add SLES11 SP4 support) OPERATING SYSTEMS SUPPORTED BY THE PATCH ---------------------------------------- SLES11 x86-64 PACKAGES AFFECTED BY THE PATCH ------------------------------ VRTSamf VRTSaslapm VRTSdbac VRTSgab VRTSglm VRTSgms VRTSllt VRTSodm VRTSvxfen VRTSvxfs VRTSvxvm BASE PRODUCT VERSIONS FOR THE PATCH ----------------------------------- * Symantec Cluster Server 6.1 * Symantec Dynamic Multi-Pathing 6.1 * Symantec File System 6.1 * Symantec Storage Foundation 6.1 * Symantec Storage Foundation Cluster File System HA 6.1 * Symantec Storage Foundation for Oracle RAC 6.1 * Symantec Storage Foundation HA 6.1 * Symantec Volume Manager 6.1 SUMMARY OF INCIDENTS FIXED BY THE PATCH --------------------------------------- Patch ID: VRTSvxvm-6.1.1.200-SLES11 * 3607232 (3596330) 'vxsnap refresh' operation fails with `Transaction aborted waiting for IO drain` error * 3662397 (3662392) In the Cluster Volume Manager (CVM) environment, if I/Os are getting executed on slave node, corruption can happen when the vxdisk resize(1M) command is executing on the master node. * 3682022 (3648719) The server panics while adding or removing LUNs or HBAs. * 3729172 (3726110) On systems with high number of CPU's, Dynamic Multipathing (DMP) devices may perform considerably slower than OS device paths. * 3736352 (3729078) VVR(Veritas Volume Replication) secondary site panic occurs during patch installation because of flag overlap issue. * 3778391 (3565212) IO failure is seen during controller giveback operations on Netapp Arrays in ALUA mode. * 3790099 (3581646) Logical Volumes fail to migrate back to OS devices Dynamic Multipathing (DMP) when DMP native support is disabled while root("/") is mounted on LVM. * 3790106 (3488071) The command "vxdmpadm settune dmp_native_support=on" fails to enable Dynamic Multipathing (DMP) Native Support * 3790115 (3623617) The command "vxdmpadm settune dmp_native_support=on" may fail with a perl error * 3790117 (3776520) Filters are not updated properly in lvm.conf file in VxDMP initrd (initial ramdisk) while Dynamic Multipathing (DMP) Native Support is being enabled. * 3795623 (3795622) With Dynamic Multipathing (DMP) Native Support enabled, LVM global_filter is not updated properly in lvm.conf file. * 3800388 (3581264) VxVM package uninstallation succeeds even if DMP Native Support is enable and IO's are in progress on LV. * 3800421 (3762580) In Linux kernels greater than or equal to RHEL6.6 like RHEL7 and SLES11SP3, the vxfen module fails to register the SCSI-3 PR keys to EMC devices when powerpath exists in coexistence with DMP (Dynamic Multipathing). * 3808593 (3795788) Performance degradation seen when many application session open the same data files on VxVM(Veritas Volume Manager) volume. * 3812272 (3811946) When invoking "vxsnap make" command with cachesize option to create space optimized snapshot, the command succeeds but a plex IO error message is displayed in syslog. * 3852806 (3823283) While unencapsulating a boot disk in SAN environment (Storage Area etwork), Linux operating system sticks in grub after reboot. * 3852809 (3825467) SLES11-SP4 build fails. * 3852831 (3806909) Due to some modification in licensing , for STANDALONE DMP, DMP keyless license was not working. Patch ID: VRTSvxfs-6.1.1.300-SLES11 * 3851511 (3821686) VxFS module failed to load on SLES11 SP4. * 3852733 (3729158) Deadlock due to incorrect locking order between write advise and dalloc flusher thread. * 3852736 (3457801) Kernel panics in block_invalidatepage(). Patch ID: VRTSvxfs-6.1.1.100-SLES11 * 3520113 (3451284) Internal testing hits an assert "vx_sum_upd_efree1" * 3521945 (3530435) Panic in Internal test with SSD cache enabled. * 3529243 (3616722) System panics because of race between the writeback cache offline thread and the writeback data flush thread. * 3536233 (3457803) File System gets disabled intermittently with metadata IO error. * 3583963 (3583930) When the external quota file is restored or over-written, the old quota records are preserved. * 3617774 (3475194) Veritas File System (VxFS) fscdsconv(1M) command fails with metadata overflow. * 3617776 (3473390) The multiple stack overflows with Veritas File System (VxFS) on RHEL6 lead to panics or system crashes. * 3617781 (3557009) After the fallocate() function reserves allocation space, it results in the wrong file size. * 3617788 (3604071) High CPU usage consumed by the vxfs thread process. * 3617790 (3574404) Stack overflow during rename operation. * 3617793 (3564076) The MongoDB noSQL db creation fails with an ENOTSUP error. * 3617877 (3615850) Write system call hangs with invalid buffer length * 3620279 (3558087) The ls -l command hangs when the system takes backup. * 3620284 (3596378) The copy of a large number of small files is slower on vxfs compared to ext4 * 3620288 (3469644) The system panics in the vx_logbuf_clean() function. * 3621420 (3621423) The VxVM caching shouldnt be disabled while mounting a file system in a situation where the VxFS cache area is not present. * 3628867 (3595896) While creating OracleRAC 12.1.0.2 database, the node panics. * 3636210 (3633067) While converting from ext3 file system to VxFS using vxfsconvert, it is observed that many inodes are missing.. * 3644006 (3451686) During internal stress testing on cluster file system(CFS), debug assert is hit due to invalid cache generation count on incore inode. * 3645825 (3622326) Filesystem is marked with fullfsck flag as an inode is marked bad during checkpoint promote Patch ID: VRTSamf-6.1.1.100-SLES11 * 3794260 (3794203) Veritas Cluster Server (VCS) does not support SuSE Linux Enterprise Server 11 Service Pack 4 (SLES 11 SP4). Patch ID: VRTSdbac-6.1.1.100-SLES11 * 3831491 (3831489) 6.1.1 vcsmm module does not load with SLES11SP4 (3.0.101-63-default kernel) Patch ID: VRTSgab-6.1.0.200-Linux_SLES11 * 3794260 (3794203) Veritas Cluster Server (VCS) does not support SuSE Linux Enterprise Server 11 Service Pack 4 (SLES 11 SP4). Patch ID: VRTSgab-6.1.0.100-GA_SLES11 * 3728108 (3728106) On Linux, the value corresponding to 15 minute CPU load average as shown in /proc/loadavg file wrongly increases to about 4. Patch ID: VRTSglm-6.1.0.100-SLES11 * 3851513 (3821698) GLM module failed to load on SLES11 SP4. Patch ID: VRTSgms-6.1.0.100-SLES11 * 3851514 (3821700) GMS module failed to load on SLES11 SP4. Patch ID: VRTSllt-6.1.1.200-SLES11 * 3794260 (3794203) Veritas Cluster Server (VCS) does not support SuSE Linux Enterprise Server 11 Service Pack 4 (SLES 11 SP4). Patch ID: VRTSodm-6.1.1.100-SLES11 * 3851512 (3821693) ODM module failed to load on SLES11 SP4. Patch ID: VRTSvxfen-6.1.1.100-SLES11 * 3794260 (3794203) Veritas Cluster Server (VCS) does not support SuSE Linux Enterprise Server 11 Service Pack 4 (SLES 11 SP4). DETAILS OF INCIDENTS FIXED BY THE PATCH --------------------------------------- This patch fixes the following Symantec incidents: Patch ID: VRTSvxvm-6.1.1.200-SLES11 * 3607232 (Tracking ID: 3596330) SYMPTOM: 'vxsnap refresh' operation fails with following indicants: Errors occur from DR (Disaster Recovery) Site of VVR (Veritas Volume Replicator): o vxio: [ID 160489 kern.notice] NOTICE: VxVM vxio V-5-3-1576 commit: Timedout waiting for rvg [RVG] to quiesce, iocount [PENDING_COUNT] msg 0 o vxvm:vxconfigd: [ID 702911 daemon.warning] V-5-1-8011 Internal transaction failed: Transaction aborted waiting for io drain At the same time, following errors occur from Primary Site of VVR: vxio: [ID 218356 kern.warning] WARNING: VxVM VVR vxio V-5-0-267 Rlink [RLINK] disconnecting due to ack timeout on update message DESCRIPTION: VM (Volume Manager) Transactions on DR site get aborted as pending IOs could not be drained in stipulated time leading to failure of FMR (Fast-Mirror Resync) 'snap' operations. These IOs could not be drained because of IO throttling. A bug/race in conjunction with timing in VVR causes a miss in clearing this throttling condition/state. RESOLUTION: Code changes have been done to fix the race condition which ensures clearance of throttling state at appropriate time. * 3662397 (Tracking ID: 3662392) SYMPTOM: In the CVM environment, if I/Os are getting executed on slave node, corruption can happen when the vxdisk resize(1M) command is executing on the master node. DESCRIPTION: During the first stage of resize transaction, the master node re-adjusts the disk offsets and public/private partition device numbers. On a slave node, the public/private partition device numbers are not adjusted properly. Because of this, the partition starting offset is are added twice and causes the corruption. The window is small during which public/private partition device numbers are adjusted. If I/O occurs during this window then only corruption is observed. After the resize operation completes its execution, no further corruption will happen. RESOLUTION: The code has been changed to add partition starting offset properly to an I/O on slave node during execution of a resize command. * 3682022 (Tracking ID: 3648719) SYMPTOM: The server panics with a following stack trace while adding or removing LUNs or HBAs: dmp_decode_add_path() dmp_decipher_instructions() dmp_process_instruction_buffer() dmp_reconfigure_db() gendmpioctl() vxdmpioctl() DESCRIPTION: While deleting a dmpnode, Dynamic Multi-Pathing (DMP) releases the memory associated with the dmpnode structure. In case the dmpnode doesn't get deleted for some reason, and if any other tasks access the freed memory of this dmpnode, then the server panics. RESOLUTION: The code is modified to avoid the tasks from accessing the memory that is freed by the dmpnode, which is deleted. The change also fixed the memory leak issue in the buffer allocation code path. * 3729172 (Tracking ID: 3726110) SYMPTOM: On systems with high number of CPU's, Dynamic Multipathing (DMP) devices may perform considerably slower than OS device paths. DESCRIPTION: In high CPU configuration, IO statistics related functionality in DMP takes more CPU time as DMP statistics are collected on per CPU basis. This stat collection happens in DMP IO code path hence it reduces the IO performance. Because of this DMP devices perform slower than OS device paths. RESOLUTION: Code changes are made to remove some of the stats collection functionality from DMP IO code path. Along with this, following tunable need to be turned off. 1. Turn off idle lun probing. #vxdmpadm settune dmp_probe_idle_lun=off 2. Turn off statistic gathering functionality. #vxdmpadm iostat stop Notes: 1. Please apply this patch if system configuration has large number of CPU and if DMP is performing considerably slower than OS device paths. For normal systems this issue is not applicable. * 3736352 (Tracking ID: 3729078) SYMPTOM: In VVR environment, the panic may occur after SF(Storage Foundation) patch installation or uninstallation on the secondary site. DESCRIPTION: VXIO Kernel reset invoked by SF patch installation removes all Disk Group objects that have no preserved flag set, because the preserve flag is overlapped with RVG(Replicated Volume Group) logging flag, the RVG object won't be removed, but its rlink object is removed, result of system panic when starting VVR. RESOLUTION: Code changes have been made to fix this issue. * 3778391 (Tracking ID: 3565212) SYMPTOM: While performing controller giveback operations on NetApp ALUA arrays, the below messages are observed in /etc/vx/dmpevents.log [Date]: I/O error occured on Path belonging to Dmpnode [Date]: I/O analysis done as DMP_PATH_BUSY on Path belonging to Dmpnode [Date]: I/O analysis done as DMP_IOTIMEOUT on Path belonging to Dmpnode DESCRIPTION: During the asymmetric access state transition, DMP puts the buffer pointer in the delay queue based on the flags observed in the logs. This delay resulted in timeout and thereby filesystem went into disabled state. RESOLUTION: DMP code is modified to perform immediate retries instead of putting the buffer pointer in the delay queue for transition in progress case. * 3790099 (Tracking ID: 3581646) SYMPTOM: Sometimes Logical Volumes may fail to migrate back to OS devices when Dynamic Multipathing (DMP) Native Support is disabled when the root is mounted on LVM. DESCRIPTION: lvmetad caches open count on devices which are present in accept section of filter in lvm.conf file. When DMP Native Support is enabled, all non-VxVM devices are put in reject section of filter so that only "/dev/vx/dmp" devices remain in accept section of filter in lvm.conf file. So lvmetad caches open count on "/dev/vx/dmp" devices. When DMP Native Support is disabled "/dev/vx/dmp" devices are not put in reject section of filter causing a stale open count for lvmetad which is causing physical volumes to point to stale devices even when DMP Native Support is disabled. RESOLUTION: Code changes have been made to add "/dev/vx/dmp" devices in reject section of filter in lvm.conf file so lvmetad releases open count on these devices. * 3790106 (Tracking ID: 3488071) SYMPTOM: The command "vxdmpadm settune dmp_native_support=on" fails to enable Dynamic Multipathing (DMP) Native Support and fails with the below error: VxVM vxdmpadm ERROR V-5-1-15690 Operation failed for one or more volume groups DESCRIPTION: From LVM version 105, global_filters were introduced as part of lvm.conf file. RHEL 6.6 and RHEL 6.6 onwards has LVM version 111 and hence supports global_filters. The code changes were not done to handle global_filter in lvm.conf file while 6.1.1.100 was released. RESOLUTION: Code changes have been done to handle global_filter in lvm.conf file and allow DMP Native Support to work. * 3790115 (Tracking ID: 3623617) SYMPTOM: The command "vxdmpadm settune dmp_native_support=on" may fail with a below perl error: Can't locate Sys/Syslog.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /usr/lib/vxvm/bin/vxupdatelvm line 73. BEGIN failed--compilation aborted at /usr/lib/vxvm/bin/vxupdatelvm line 73. DESCRIPTION: In one of VxVM specific script used for enabling Dynamic Multipathing (DMP) Native Support, perl syslog module is used. It can happen sometimes that minimal package installation of OS might not contain perl syslog module leading to this error. RESOLUTION: Code changes have been done to use internally developed perl module and avoid the dependency. * 3790117 (Tracking ID: 3776520) SYMPTOM: Filters are not updated properly in lvm.conf file in VxDMP initrd while DMP Native Support is being enabled. As a result, root Logical Volume (LV) is mounted on OS device upon reboot. DESCRIPTION: From LVM version 105, global_filter was introduced as part of lvm.conf file. VxDMP updates initird lvm.conf file with the filters required for DMP Native Support to function. While updating the lvm.conf, VxDMP checks for the filter field to be updated, but ideally we should check for global_filter field to be updated in the latest LVM version. This leads to lvm.conf file not updated with the proper filters. RESOLUTION: The code is modified to properly update global_filter field in lvm.conf file in VxDMP initrd. * 3795623 (Tracking ID: 3795622) SYMPTOM: With Dynamic Multipathing (DMP) Native Support enabled, LVM global_filter is not updated properly in lvm.conf file to reject the newly added paths. DESCRIPTION: With Dynamic Multipathing (DMP) Native Support enabled, when new paths are added to existing LUNs, LVM global_filter is not updated properly in lvm.conf file to reject the newly added paths. This can lead to duplicate PV (physical volumes) found error reported by LVM commands. RESOLUTION: Code changes have been made to properly update global_filter field in lvm.conf file when new paths are added to existing disks. * 3800388 (Tracking ID: 3581264) SYMPTOM: VxVM package uninstallation succeeds even When DMP (Dynamic Multipathing) Native Support is on and IO's are in progress on LV (Logical Volume). DESCRIPTION: While uninstalling the VxVM package, no precaution is taken to log an error message if DMP native support could not be successfully disabled. As a result, the uninstallation proceeds without any error message. RESOLUTION: Code changes have been made to display an error message if DMP native support could not be disabled during uninstallation. Additionally, if the rootvg is enabled, the uninstallation will proceed with an error message; otherwise uninstallation will fail. * 3800421 (Tracking ID: 3762580) SYMPTOM: Below logs were seen while setting up fencing for the cluster. VXFEN: vxfen_reg_coord_pt: end ret = -1 vxfen_handle_local_config_done: Could not register with a majority of the coordination points. DESCRIPTION: It is observed that in Linux kernels greater than or equal to RHEL6.6, RHEL7 and SLES11SP3, the interface used by DMP to send the SCSI commands to block devices does not transfer the data to/from the device. Therefore the SCSI-3 PR keys does not get registered. RESOLUTION: Code change is done to use scsi request_queue to send the SCSI commands to the underlying block device. Additional patch is required from EMC to support processing SCSI commands via the request_queue mechanism on EMC PowerPath devices. Please contact EMC for patch details for a specific kernel version. * 3808593 (Tracking ID: 3795788) SYMPTOM: Performance degradation seen when many application session open the same data files on VxVM(Veritas Volume Manager) volume. DESCRIPTION: This issue occurs because of lock contention while opening the volume.While opening the volume, exclusive lock is taken on all CPU's. If there are a lot of CPU's in the system, this process could be quite time consuming, which leads to performance degradation seen during the initial start of applications. RESOLUTION: Code changes have been made to change exclusive lock to shared lock while opening the volume. * 3812272 (Tracking ID: 3811946) SYMPTOM: When invoking "vxsnap make" command with cachesize option to create space optimized snapshot, the command succeeds but following error message is displayed in syslog: kernel: VxVM vxio V-5-0-603 I/O failed. Subcache object does not have a valid sdid allocated by cache object . kernel: VxVM vxio V-5-0-1276 error on Plex while writing volume offset 0 length 2048 DESCRIPTION: When space optimized snapshot is created using "vxsnap make" command along with cachesize option, cache and subcache objects get created with the help of same command. During the creation of snapshot, it can happen that IO from the volumes are pushed onto a subcache even though the subcache ID has not yet been allocated. Since the subcache ID is not allocated, the IO fails. RESOLUTION: Code changes have been made to make sure that the IO's on the subcache are pushed only after the subcache ID has been allocated. * 3852806 (Tracking ID: 3823283) SYMPTOM: Linux operating system sticks in grub after reboot. Manual kernel load is required to make operating system functional. DESCRIPTION: During unencapsulation of a boot disk in SAN environment, multiple entries corresponding to root disk are found in by-id device directory. As a result, a parse command fails, leading to the creation of an improper menu file in grub directory. This menu file defines the device path to load kernel and other modules. RESOLUTION: The code is modified to handle multiple entries for SAN boot disk. * 3852809 (Tracking ID: 3825467) SYMPTOM: Error showing about symbol d_lock. DESCRIPTION: This symbol is already present in this kernel, so its duplicate getting declared in kernel code of VxVM. RESOLUTION: Code is modified to remove the definition, hence solved the issue * 3852831 (Tracking ID: 3806909) SYMPTOM: During installation of volume manager installation using CPI in key-less mode, following logs were observed. VxVM vxconfigd DEBUG V-5-1-5736 No BASIC license VxVM vxconfigd ERROR V-5-1-1589 enable failed: License has expired or is not available for operation transactions are disabled. DESCRIPTION: While using CPI for STANDALONE DMP installation in key less mode, volume manager Daemon(vxconfigd) cannot be started due to a modification in a DMP NATIVE license string that is used for license verification and this verification was failing. RESOLUTION: Appropriate code changes are incorporated to resolve the DMP keyless License issue to work with STANDALONE DMP. Patch ID: VRTSvxfs-6.1.1.300-SLES11 * 3851511 (Tracking ID: 3821686) SYMPTOM: VxFS module might not get loaded on SLES11 SP4. DESCRIPTION: Since SLES11 SP4 is new release therefore VxFS module failed to load on it. RESOLUTION: Added VxFS support for SLES11 SP4. * 3852733 (Tracking ID: 3729158) SYMPTOM: fuser and other commands hang on vxfs file systems. DESCRIPTION: The hang is seen while 2 threads contest for 2 locks -ILOCK and PLOCK. The writeadvise thread owns the ILOCK but is waiting for the PLOCK. The dalloc thread owns the PLOCK and is waiting for the ILOCK. RESOLUTION: Correct order of locking is PLOCK followed by the ILOCK. * 3852736 (Tracking ID: 3457801) SYMPTOM: Kernel panics in block_invalidatepage(). DESCRIPTION: The address-space struct of a page has "a_ops" as "vx_empty_aops". This is an empty structure, so do_invalidatepage() calls block_invalidatepage() - but these pages have VxFS's page buffer-heads attached, not kernel buffer-heads. So, block_invalidatepage() panics. RESOLUTION: Code is modified to fix this by flushing pages before vx_softcnt_flush. Patch ID: VRTSvxfs-6.1.1.100-SLES11 * 3520113 (Tracking ID: 3451284) SYMPTOM: While allocating extent during write operation, if summary and bitmap data for filesystem allocation unit get mismatched then the assert hits. DESCRIPTION: if extent was allocated using SMAP on the deleted inode, and part of the AU space is moved from deleted inode to the new inode. At this point SMAP state is set to VX_EAU_ALLOCATED and EMAP is not initialized. When more space is needed for new inode, it tries to allocate from the same AU using EMAP and can hit "f:vx_sum_upd_efree1:2a" assert, as EMAP is not initialized. RESOLUTION: Code has been modified to expand AU while moving partial AU space from one inode to other inode. * 3521945 (Tracking ID: 3530435) SYMPTOM: Panic in Internal test with SSD cache enabled. DESCRIPTION: The record end of the write back log record was wrongly getting modified while adding a skip list node in the punch hole case where expunge flag is set where then insertion of new node is skipped RESOLUTION: Code to modified to skip modification of the writeback log record when the expunge flag is set and left end of the record is smaller or equal to the end offset of the next punch hole request. * 3529243 (Tracking ID: 3616722) SYMPTOM: Race between the writeback cache offline thread and the writeback data flush thread causes null pointer dereference, resulting in system panic. DESCRIPTION: While disabling writeback, the writeback cache information is deinitialized from each inode which results in the removal of writeback bmap lock pointer. But during this time frame, if the writeback flushing is still going on through some other thread which has writeback bmap lock, then while removing the writeback bmap lock, null pointer dereference hits since it was already removed through previous thread. RESOLUTION: The code is modified to handle such race conditions. * 3536233 (Tracking ID: 3457803) SYMPTOM: File System gets disabled with the following message in the system log: WARNING: V-2-37: vx_metaioerr - vx_iasync_wait - /dev/vx/dsk/testdg/test file system meta data write error in dev/block DESCRIPTION: The inode's incore information gets inconsistent as one of its field is getting modified without the locking protection. RESOLUTION: Protect the inode's field properly by taking the lock operation. * 3583963 (Tracking ID: 3583930) SYMPTOM: When external quota file is over-written or restored from backup, new settings which were added after the backup still remain. DESCRIPTION: The internal quota file is not always updated with correct limits, so the quotaon operation is to copy the quota limits from external to internal quota file. To complete the copy operation, the extent of external file is compared to the extent of internal file at the corresponding offset. If the external quota file is overwritten (or restored to its original copy) and the size of internal file is more than that of external, the quotaon operation does not clear the additional (stale) quota records in the internal file. Later, the sync operation (part of quotaon) copies these stale records from internal to external file. Hence, both internal and external files contain stale records. RESOLUTION: The code has been modified to remove the stale records in the internal file at the time of quotaon. * 3617774 (Tracking ID: 3475194) SYMPTOM: Veritas File System (VxFS) fscdsconv(1M) command fails with the following error message: ... UX:vxfs fscdsconv: INFO: V-3-26130: There are no files violating the CDS limits for this target. UX:vxfs fscdsconv: INFO: V-3-26047: Byteswapping in progress ... UX:vxfs fscdsconv: ERROR: V-3-25656: Overflow detected UX:vxfs fscdsconv: ERROR: V-3-24418: fscdsconv: error processing primary inode list for fset 999 UX:vxfs fscdsconv: ERROR: V-3-24430: fscdsconv: failed to copy metadata UX:vxfs fscdsconv: ERROR: V-3-24426: fscdsconv: Failed to migrate. DESCRIPTION: The fscdsconv(1M) command takes a filename argument which is used as a recovery failure, to be used to restore the original file system in case of failure when the file system conversion is in progress. This file has two parts: control part and data part. The control part is used to store information about all the metadata like inodes and extents etc. In this instance, the length of the control part is being underestimated for some file systems where there are few inodes, but the average number of extents per file is very large (this can be seen in the fsadm E report). RESOLUTION: Make recovery file sparse, start the data part after 1TB offset, and then the control part can do allocating writes to the hole from the beginning of the file. * 3617776 (Tracking ID: 3473390) SYMPTOM: In memory pressure scenarios, you see panics or system crashes due to stack overflows. DESCRIPTION: Specifically on RHEL6, the memory allocation routines consume much more memory than other distributions like SLES, or even RHEL5. Due to this, multiple overflows are reported for the RHEL6 platform. Most of these overflows occur when VxFS tries to allocate memory under memory pressure. RESOLUTION: The code is modified to fix multiple overflows by adding handoff code paths, adjusting handoff limits, removing on-stack structures and reducing the number of function frames on stack wherever possible. * 3617781 (Tracking ID: 3557009) SYMPTOM: Run the fallocate command with -l option to specify the length of the reserve allocation. The file size is not expected, but multiple of file system block size. For example: If block size = 8K: # fallocate -l 8860 testfile1 # ls -l total 16 drwxr-xr-x. 2 root root 96 Jul 1 11:40 lost+found/ -rw-r--r--. 1 root root 16384 Jul 1 11:41 testfile1 The file size should be 8860, but it's 16384(which is 2*8192). DESCRIPTION: The vx_fallocate() function on Veritas File System (VxFS) creates larger file than specified because it allocates the extent in blocks. So the reserved file size is multiples of block size, instead of what the fallocate command specifies. RESOLUTION: The code is modified so that the vx_fallocate() function on VxFS sets the reserved file size to what it specifies, instead of multiples of block size. * 3617788 (Tracking ID: 3604071) SYMPTOM: With the thin reclaim feature turned on, you can observe high CPU usage on the vxfs thread process. The backtrace of such kind of threads usually look like this: - vx_dalist_getau - vx_recv_bcastgetemapmsg - vx_recvdele - vx_msg_recvreq - vx_msg_process_thread - vx_kthread_init DESCRIPTION: In the routine to get the broadcast information of a node which contains maps of Allocation Units (AUs) for which node holds the delegations, the locking mechanism is inefficient. Thus every time when this routine is called, it will perform a series of down-up operation on a certain semaphore. This can result in a huge CPU cost when many threads calling the routine in parallel. RESOLUTION: The code is modified to optimize the locking mechanism in the routine to get the broadcast information of a node which contains maps of Allocation Units (AUs) for which node holds the delegations, so that it only does down-up operation on the semaphore once. * 3617790 (Tracking ID: 3574404) SYMPTOM: System panics because of a stack overflow during rename operation. The following stack trace can be seen during the panic: machine_kexec crash_kexec oops_end no_context __bad_area_nosemaphore bad_area_nosemaphore __do_page_fault do_page_fault page_fault task_tick_fair scheduler_tick update_process_times tick_sched_timer __run_hrtimer hrtimer_interrupt local_apic_timer_interrupt smp_apic_timer_interrupt apic_timer_interrupt --- <IRQ stack> --- apic_timer_interrupt mempool_free_slab mempool_free vx_pgbh_free vx_pgbh_detach vx_releasepage try_to_release_page shrink_page_list.clone.3 shrink_inactive_list shrink_mem_cgroup_zone shrink_zone zone_reclaim get_page_from_freelist __alloc_pages_nodemask alloc_pages_current __get_free_pages vx_getpages vx_alloc vx_bc_getfreebufs vx_bc_getblk vx_getblk_bp vx_getblk_cmn vx_getblk vx_getmap vx_getemap vx_extfind vx_searchau_downlevel vx_searchau_downlevel vx_searchau_downlevel vx_searchau_downlevel vx_searchau_uplevel vx_searchau vx_extentalloc_device vx_extentalloc vx_bmap_ext4 vx_bmap_alloc_ext4 vx_bmap_alloc vx_write_alloc3 vx_tran_write_alloc vx_idalloc_off1 vx_idalloc_off vx_int_rename vx_do_rename vx_rename1 vx_rename vfs_rename sys_renameat sys_rename system_call_fastpath DESCRIPTION: The stack is overflown by 88 bytes in the rename code path. The thread_info structure is disrupted with VxFS page buffer head addresses.. RESOLUTION: We now use dynamic allocation of local structures in vx_write_alloc3 and vx_int_rename. Thissaves 256 bytes and gives enough room. * 3617793 (Tracking ID: 3564076) SYMPTOM: The MongoDB noSQL db creation fails with an ENOTSUP error. MongoDB uses posix_fallocate to create a file first. When it writes at offset which is not aligned with File System block boundary, an ENOTSUP error comes up. DESCRIPTION: On a file system with 8k bsize and 4k page size, the application creates a file using posix_fallocate, and then writes at some offset which is not aligned with fs block boundary. In this case, the pre-allocated extent is split at the unaligned offset into two parts for the write. However the alignment requirement of the split fails the operation. RESOLUTION: Split the extent down to block boundary. * 3617877 (Tracking ID: 3615850) SYMPTOM: The write system call writes up to count bytes from the pointed buffer to the file referred to by the file descriptor field: ssize_t write(int fd, const void *buf, size_t count); When the count parameter is invalid, sometimes it can cause the write() to hang on VxFS file system. E.g. with a 10000 bytes buffer, but the count is set to 30000 by mistake, then you may encounter such problem. DESCRIPTION: On recent linux kernels, you cannot take a page-fault while holding a page locked so as to avoid a deadlock. This means uiomove can copy less than requested, and any partially populated pages created in routine which establish a virtual mapping for the page are destroyed. This can cause an infinite loop in the write code path when the given user-buffer is not aligned with a page boundary and the length given to write() causes an EFAULT; uiomove() does a partial copy, segmap_release destroys the partially populated pages and unwinds the uio. The operation is then repeated. RESOLUTION: The code is modified to move the pre-faulting to the buffered IO write-loops; The system either shortens the length of the copy if all of the requested pages cannot be faulted, or fails with EFAULT if no pages are pre-faulted. This prevents the infinite loop. * 3620279 (Tracking ID: 3558087) SYMPTOM: Run simultaneous dd threads on a mount point and start the ls l command on the same mount point. Then the system hangs. DESCRIPTION: When the delayed allocation (dalloc) feature is turned on, the flushing process takes much time. The process keeps the glock held, and needs writers to keep the irwlock held. Thels l command starts stat internally and keeps waiting for irwlock to real ACLs. RESOLUTION: Redesign dalloc to keep the glock unlocked while flushing. * 3620284 (Tracking ID: 3596378) SYMPTOM: The copy of a large number of small files is slower on Veritas File System (VxFS) compared to EXT4. DESCRIPTION: VxFS implements the fsetxattr() system call in a synchronized way. Hence, before returning to the system call, the VxFS will take some time to flush the data to the disk. In this way, the VxFS guarantees the file system consistency in case of file system crash. However, this implementation has a side-effect that it serializes the whole processing, which takes more time. RESOLUTION: The code is modified to change the transaction to flush the data in a delayed way. * 3620288 (Tracking ID: 3469644) SYMPTOM: The system panics in the vx_logbuf_clean() function when it traverses chain of transactions off the intent log buffer. The stack trace is as follows: vx_logbuf_clean () vx_logadd () vx_log() vx_trancommit() vx_exh_hashinit () vx_dexh_create () vx_dexh_init () vx_pd_rename () vx_rename1_pd() vx_do_rename () vx_rename1 () vx_rename () vx_rename_skey () DESCRIPTION: The system panics as the vx_logbug_clean() function tries to access an already freed transaction from transaction chain to flush it to log. RESOLUTION: The code has been modified to make sure that the transaction gets flushed to the log before it is freed. * 3621420 (Tracking ID: 3621423) SYMPTOM: The Veritas Volume manager (VxVM) caching is disabled or stopped after mounting a file system in a situation where the Veritas File System (VxFS) cache area is not present. DESCRIPTION: When the VxFS cache area is not present and the VxVM cache area is present and in ENABLED state, if you mount a file system on any of the volumes, the VxVM caching gets stopped for that volume, which is not an expected behavior. RESOLUTION: The code is modified not to disable VxVM caching for any mounted file system if the VxFS cache area is not present. * 3628867 (Tracking ID: 3595896) SYMPTOM: While creating OracleRAC 12.1.0.2 database, the node panics with the following stack: aio_complete() vx_naio_do_work() vx_naio_worker() vx_kthread_init() DESCRIPTION: For a zero size request (with a correctly aligned buffer), Veritas File System (VxFS) wrongly queues the work internally and returns -EIOCBQUEUED. The kernel calls function aio_complete() for this zero size request. However, while VxFS is performing the queued work internally, the aio_complete() function gets called again. The double call of the aio_complete() function results in the panic. RESOLUTION: The code is modified so that the zero size requests will not queue elements inside VxFS work queue. * 3636210 (Tracking ID: 3633067) SYMPTOM: While converting from ext3 file system to VxFS using vxfsconvert, it is observed that many inodes are missing. DESCRIPTION: When vxfsconvert(1M) is run on an ext3 file system, it misses an entire block group of inodes. This happens because of an incorrect calculation of block group number of a given inode in border case. The inode which is the last inode for a given block group is calculated to have the correct inode offset, but is calculated to be in the next block group. This causes the entire next block group to be skipped when the code attempts to find the next consecutive inode. RESOLUTION: The code is modified to correct the calculation of block group number. * 3644006 (Tracking ID: 3451686) SYMPTOM: During internal stress testing on cluster file system(CFS), debug assert is hit due to invalid cache generation count on incore inode. DESCRIPTION: Reset of the cache generation count in incore inode used in Disk Layout Version(DLV) 10 was missed during inode reuse, causing the debug assert. RESOLUTION: The code is modified to reset the cache generation count in incore inode during inode reuse. * 3645825 (Tracking ID: 3622326) SYMPTOM: Filesystem is marked with fullfsck flag as an inode is marked bad during checkpoint promote DESCRIPTION: VxFS incorrectly skipped pushing of data to clone inode due to which the inode is marked bad during checkpoint promote which intern resulted in filesystem being marked with fullfsck flag. RESOLUTION: Code is modified to push the proper data to clone inode. Patch ID: VRTSamf-6.1.1.100-SLES11 * 3794260 (Tracking ID: 3794203) SYMPTOM: Veritas Cluster Server (VCS) does not support SuSE Linux Enterprise Server 11 Service Pack 4 (SLES 11 SP4). DESCRIPTION: VCS did not support SLES versions released after SLES 11 SP3. RESOLUTION: VCS support for SLES 11 SP4 is now introduced. Patch ID: VRTSdbac-6.1.1.100-SLES11 * 3831491 (Tracking ID: 3831489) SYMPTOM: VRTSdbac patch version does not work with SLES11SP4 (3.0.101-63-default kernel) and is unable to load the vcsmm module on SLES11SP4. DESCRIPTION: Installation of VRTSdbac patch version 6.1.1 fails on SLES11Sp4 as the VCSMM module is not available on SLES11SP4 kernel 3.0.101-63-default. The system log file logs the following messages: Starting VCSMM: ERROR: No appropriate modules found. Error in loading module "vcsmm". See documentation. Error : VCSMM driver could not be loaded. Error : VCSMM could not be started. Error : VCSMM could not be started. RESOLUTION: The VRTSdbac package is re-compiled with SLES11SP4 kernel in the build environment to mitigate the failure. Patch ID: VRTSgab-6.1.0.200-Linux_SLES11 * 3794260 (Tracking ID: 3794203) SYMPTOM: Veritas Cluster Server (VCS) does not support SuSE Linux Enterprise Server 11 Service Pack 4 (SLES 11 SP4). DESCRIPTION: VCS did not support SLES versions released after SLES 11 SP3. RESOLUTION: VCS support for SLES 11 SP4 is now introduced. Patch ID: VRTSgab-6.1.0.100-GA_SLES11 * 3728108 (Tracking ID: 3728106) SYMPTOM: On Linux, the value corresponding to 15 minute CPU load average increases to about 4 even when clients of GAB are not running and the CPU usage is relatively low. DESCRIPTION: GAB reads the count of currently online CPUs to correctly adapt the client heartbeat timeout with the system load. Due to an issue, it accidentally overwrites the kernel's load average value. As a result, even though the actual CPU usage does not increase, the value that is observed from /proc/loadavg for the 15 minute CPU load average is increased. RESOLUTION: The code is modified so that the GAB module does not overwrite the kernel's load average value. Patch ID: VRTSglm-6.1.0.100-SLES11 * 3851513 (Tracking ID: 3821698) SYMPTOM: GLM module will not get loaded on SLES11 SP4. DESCRIPTION: Since SLES11 SP4 is new release therefore GLM module failed to load on it. RESOLUTION: Added GLM support for SLES11 SP4. Patch ID: VRTSgms-6.1.0.100-SLES11 * 3851514 (Tracking ID: 3821700) SYMPTOM: GMS module will not get loaded on SLES11 SP4. DESCRIPTION: Since SLES11 SP4 is new release therefore GMS module failed to load on it. RESOLUTION: Added GMS support for SLES11 SP4. Patch ID: VRTSllt-6.1.1.200-SLES11 * 3794260 (Tracking ID: 3794203) SYMPTOM: Veritas Cluster Server (VCS) does not support SuSE Linux Enterprise Server 11 Service Pack 4 (SLES 11 SP4). DESCRIPTION: VCS did not support SLES versions released after SLES 11 SP3. RESOLUTION: VCS support for SLES 11 SP4 is now introduced. Patch ID: VRTSodm-6.1.1.100-SLES11 * 3851512 (Tracking ID: 3821693) SYMPTOM: ODM module may not get loaded on SLES11 SP4. DESCRIPTION: Since SLES11 SP4 is new release therefore ODM module was not getting loaded on it. RESOLUTION: Added ODM support for SLES11 SP4. Patch ID: VRTSvxfen-6.1.1.100-SLES11 * 3794260 (Tracking ID: 3794203) SYMPTOM: Veritas Cluster Server (VCS) does not support SuSE Linux Enterprise Server 11 Service Pack 4 (SLES 11 SP4). DESCRIPTION: VCS did not support SLES versions released after SLES 11 SP3. RESOLUTION: VCS support for SLES 11 SP4 is now introduced. INSTALLING THE PATCH -------------------- Run the Installer script to automatically install the patch: ----------------------------------------------------------- To install the patch perform the following steps on at least one node in the cluster: 1. Copy the hot-fix sfha-sles11sp4_x86_64-Patch-6.1.1.100.tar.gz to /tmp 2. Untar sfha-sles11sp4_x86_64-Patch-6.1.1.100.tar.gz to /tmp/hf # mkdir /tmp/hf # cd /tmp/hf # gunzip /tmp/sfha-sles11sp4_x86_64-Patch-6.1.1.100.tar.gz # tar xf /tmp/sfha-sles11sp4_x86_64-Patch-6.1.1.100.tar 3. Install the hotfix # pwd /tmp/hf # ./installSFHA611P100 [ ...] You can also install this patch together with 6.1 GA release and 6.1.1 Patch release using Install Bundles 1. Download Storage Foundation and High Availability Solutions 6.1 2. Extract the tar ball into the /tmp/sfha6.1/ directory 3. Download SFHA Solutions 6.1.1 from https://sort.veritas.com/patches 4. Extract it to the /tmp/sfha6.1.1 directory 5. Change to the /tmp/sfha6.1.1 directory by entering: # cd /tmp/sfha6.1.1 6. Invoke the installmr script with -base_path and -hotfix_path option where the -base_path should point to the 61 image directory, while -hotfix_path to the 6.1.1.100 directory. # ./installmr -base_path [<61 path>] -hotfix_path [] [ ...] Install the patch manually: -------------------------- o Before-the-upgrade :- (a) Stop I/Os to all the VxVM volumes. (b) Umount any filesystems with VxVM volumes. (c) Stop applications using any VxVM volumes. o Select the appropriate RPMs for your system, and upgrade to the new patch. # rpm -Uhv REMOVING THE PATCH ------------------ # rpm -e SPECIAL INSTRUCTIONS -------------------- NONE