* * * READ ME * * * * * * Symantec Storage Foundation HA 6.1.1 * * * * * * Patch 6.1.1.100 * * * Patch Date: 2016-07-25 This document provides the following information: * PATCH NAME * OPERATING SYSTEMS SUPPORTED BY THE PATCH * PACKAGES AFFECTED BY THE PATCH * BASE PRODUCT VERSIONS FOR THE PATCH * SUMMARY OF INCIDENTS FIXED BY THE PATCH * DETAILS OF INCIDENTS FIXED BY THE PATCH * INSTALLATION PRE-REQUISITES * INSTALLING THE PATCH * REMOVING THE PATCH PATCH NAME ---------- Symantec Storage Foundation HA 6.1.1 Patch 6.1.1.100 OPERATING SYSTEMS SUPPORTED BY THE PATCH ---------------------------------------- Solaris 10 SPARC PACKAGES AFFECTED BY THE PATCH ------------------------------ VRTSvxvm VRTSvxfs VRTSfsadv VRTSodm VRTSglm BASE PRODUCT VERSIONS FOR THE PATCH ----------------------------------- * Symantec Dynamic Multi-Pathing 6.1 * Symantec File System 6.1 * Symantec Storage Foundation 6.1 * Symantec Storage Foundation Cluster File System HA 6.1 * Symantec Storage Foundation for Oracle RAC 6.1 * Symantec Storage Foundation for Sybase ASE CE 6.1 * Symantec Storage Foundation HA 6.1 * Symantec Volume Manager 6.1 SUMMARY OF INCIDENTS FIXED BY THE PATCH --------------------------------------- Patch ID: 150717-06 * 3607232 (3596330) 'vxsnap refresh' operation fails with `Transaction aborted waiting for IO drain` error * 3640641 (3622068) After mirroring encapsulated root disk, rootdg fails to import if any disk in the disk group becomes unavailable. * 3644360 (3644291) Some of the disk groups that are composed by 2TB LUNs fail to be auto-imported after a system restart. * 3674567 (3636089) VVR(Veritas Volume Replicator) secondary was panicked due to a bad mutex produced by vxconfigd while committing the transactions. * 3682022 (3648719) The server panics while adding or removing LUNs or HBAs. * 3690253 (3606480) Enabling dmp_native_support fails since the devid and phys_path are not populated. * 3729172 (3726110) On systems with high number of CPUs, Dynamic Multi-Pathing (DMP) devices may perform considerably slower than OS device paths. * 3736352 (3729078) VVR(Veritas Volume Replication) secondary site panic occurs during patch installation because of flag overlap issue. * 3771073 (3755209) The Veritas Dynamic Multi-pathing(VxDMP) device configured in Solaris Logical DOMains(LDOM) guest is disabled when an active controller of an ALUA array is failed. * 3778391 (3565212) IO failure is seen during controller giveback operations on Netapp Arrays in ALUA mode. * 3808593 (3795788) Performance degrades when many application sessions open the same data file on the VxVMvolume. * 3812272 (3811946) When invoking "vxsnap make" command with cachesize option to create space optimized snapshot, the command succeeds but a plex I/O error message is displayed in syslog. * 3835566 (3804214) VxDMP (Dynamic Multi-Pathing) path enable operation fails after the disk label is changed from guest LDOM. Open fails with error 5 on the path being enabled. * 3852147 (3852146) Shared DiskGroup(DG) fails to import when "-c" and "-o noreonline" options are specified together * 3852455 (3603792) The first boot after live upgrade to new version of Solaris 11 and VxVM (Veritas Volume Manager) takes long time. * 3852800 (3819670) When smartmove with 'vxevac' command is run in background by hitting 'ctlr-z' key and 'bg' command, the execution of 'vxevac' is terminated abruptly. * 3853086 (3783356) After Dynamic Multi-Pathing (DMP) module fails to load, dmp_idle_vector is not NULL. * 3854390 (3853049) The display of stats delayed beyond the set interval for vxstat and multiple sessions of vxstat impacted the IO performance. * 3854526 (3544020) Volume Manager (VxVM) tunables are getting reset to default after patch upgrade. * 3856384 (3525490) System panic occurs when partial data received by VVR for incorrect type casting. * 3857121 (3857120) Oracle database process hangs when being stopped. * 3864112 (3860503) Poor performance of vxassist mirroring is observed on some high end servers. * 3864625 (3599977) During a replica connection, referencing a port that is already deleted in another thread causes a system panic. * 3864627 (3677359) VxDMP (Veritas Dynamic MultiPathing) causes system panic after a shutdown or reboot. * 3864628 (3686698) vxconfigd was getting hung due to deadlock between two threads * 3864980 (3594158) The spinlock and unspinlock are referenced to different objects when interleaving with a kernel transaction. * 3864983 (3614182) First system reboot after migration from Solaris Multi-Pathing (MPXIO) to Symantec Dynamic Multi-Pathing (DMP) native takes extremely long time. * 3864986 (3621232) The vradmin ibc command cannot be started or executed on Veritas Volume Replicators (VVR) secondary node. * 3864987 (3513392) Reference to replication port that is already deleted caused panic. * 3864988 (3625890) vxdisk resize operation on CDS disks fails with an error message of "Invalid attribute specification" * 3864989 (3521726) System panicked for double freeing IOHINT. * 3865415 (3736502) Memory leakage is found when transaction aborts. * 3865631 (3564260) VVR commands are unresponsive when replication is paused and resumed in a loop. * 3865633 (3721565) vxconfigd hang is seen. * 3865638 (3727939) Data corruption may occur due to stale device entries in the /dev/vx/[r]dmp directories. * 3865640 (3749557) System hangs because of high memory usage by vxvm. * 3865645 (3841242) Use of deprecated APIs provided by Oracle, may result in a system hang. * 3865646 (3628743) New BE takes too much time to startup during live upgrade on Solaris 11.2 * 3865649 (3848351) vxglm fails to stop as the vxglm driver could not be unloaded * 3865653 (3795739) In a split brain scenario, cluster formation takes very long time. * 3866241 (3790136) File system hang observed due to IO's in Ditry Region Logging (DRL). * 3866269 (3736156) Solaris 11 update 1 system fails to come up after enabling dmp_native_support and reboot. * 3866593 (3861544) For EFI disks, in some cases DMP disables the disks for I/Os when one of its controller(switch) fails. * 3866624 (3769927) "vxdmpadm settune dmp_native_support=off" command fails on Solaris. * 3866643 (3539548) After adding removed MPIO disk back, 'vxdisk list' or 'vxdmpadm listctlr all' commands may show duplicate entry for DMP node with error state. * 3866651 (3596282) Snap operations fail with error "Failed to allocate a new map due to no free map available in DCO". * 3866679 (3856146) The Solaris sparc 11.2 latest SRUs and Solaris sparc 11.3 System panics during reboot and fails to come up, after turning off the dmp_native_support. * 3867134 (3486861) Primary node panics when storage is removed while replication is going on with heavy IOs. * 3867135 (3658079) If the size of backup slice of volumes is larger than 64GB, Veritas Volume manager (VxVM) wraps it around it and exports it to LDOM (Logical Domains) guest domain. * 3867137 (3674614) Restarting the vxconfigd(1M) daemon on the slave (joiner) node during node-join operation may cause the vxconfigd(1M) daemon to become unresponsive on the master and the joiner node. * 3867308 (3442024) The vxdiskadm(1M) option #1(adding disk to disk group and initialization) does not support Extensible Firmware Interface(EFI) disks on Solaris operating system. * 3867315 (3672759) The vxconfigd(1M) daemon may core dump when DMP database is corrupted. * 3867357 (3544980) Vxconfigd reports error message like "vxconfigd V-5-1-7920 di_init() failed" after SAN tape online event. * 3867706 (3645370) vxevac command fails to evacuate disks with Dirty Region Log(DRL) plexes. * 3867708 (3495548) The vxdisk rm command fails with devices when Operating System Naming (OSN) scheme is used for devices controlled by the EMC Powerpath. * 3867709 (3767531) In Layered volume layout with FSS configuration, when few of the FSS_Hosts are rebooted, Full resync is happening for non-affected disks on master. * 3867710 (3788644) Reuse raw device number when checking for available raw devices. * 3867711 (3802750) VxVM (Veritas Volume Manager) volume I/O-shipping functionality is not disabled even after the user issues the correct command to disable it. * 3867712 (3807879) User data corrupts because of the writing of the backup EFT GPT disk label during the VxVM disk-group flush operation. * 3867714 (3819832) No syslog message seen when dmp detects controller disabled/enabled * 3867881 (3867145) When VVR SRL occupation > 90%, then output the SRL occupation is shown by 10 percent. * 3867928 (3764326) VxDMP(Veritas Dynamic Multi-Pathing) repeatedly reports "failed to get devid". * 3869659 (3868444) Disk header timestamp is updated even if the disk group(DG) import fails. * 3870440 (3873489) pkgchk failing with patch on solaris 10. * 3874736 (3874387) Disk header information is not logged to the syslog sometimes even if the disk is missing and dg import fails. * 3874961 (3871750) In parallel VxVM vxstat commands report abnormal disk IO statistic data * 3875230 (3554608) Mirroring a volume on 6.1 creates a larger plex than the original. * 3875564 (3875563) While dumping the disk header information, human readable timestamp was not converted correctly from corresponding epoch time. Patch ID: 150717-05 * 3372831 (2573229) On RHEL6, the server panics when Dynamic Multi-Pathing (DMP) executes PERSISTENT RESERVE IN command with REPORT CAPABILITIES service action on powerpath controlled device. * 3440232 (3408320) Thin reclamation fails for EMC 5875 arrays. * 3455533 (3524376) Removed Patch ID from the VxVM Solaris 10 patch PSTAMP. * 3457363 (3462171) When SCSI-3 persistent reservation command 'ioctls' are issued on non-SCSI devices, dmpnode gets disabled. * 3470255 (2847520) The resize operation on a shared linked-volume can cause data corruption on the target volume. * 3470257 (3372724) When the user installs VxVM, the system panics with a warning. * 3470260 (3415188) I/O hangs during replication in Veritas Volume Replicator (VVR). * 3470262 (3077582) A Veritas Volume Manager (VxVM) volume may become inaccessible causing the read/write operations to fail. * 3470265 (3326964) VxVM hangs in Clustered Volume Manager (CVM) environments in the presence of FMR operations. * 3470270 (3403390) After a crash, the linked-to volume goes into NEEDSYNC state. * 3470272 (3385753) Replication to the Disaster Recovery (DR) site hangs even though Replication links (Rlinks) are in the connected state. * 3470274 (3373208) DMP wrongly sends the SCSI PR OUT command with APTPL bit value as A0A to arrays. * 3470275 (3417044) System becomes unresponsive while creating a VVR TCP connection. * 3470279 (3300418) VxVM volume operations on shared volumes cause unnecessary read I/Os. * 3470282 (3374200) A system panic or exceptional IO delays are observed while executing snapshot operations, such as, refresh. * 3470287 (3271315) The vxdiskunsetup command with the shred option fails to shred sliced or simple disks on Solaris X86 platform. * 3470290 (2999871) The vxinstall(1M) command gets into a hung state when it is invoked through Secure Shell (SSH) remote execution. * 3470300 (3340923) For Asymmetric Logical Unit Access (ALUA) array type Logical Unit Numbers (LUN), Dynamic Multi-Pathing (DMP) disables and enables the unavailable asymmetric access state paths on I/O load. * 3470301 (2812161) In a Veritas Volume Replicator (VVR) environment, after the Rlink is detached, the vxconfigd(1M) daemon on the secondary host may hang. * 3470303 (3314647) The vxcdsconvert(1M)command fails with error: Plex column offset is not strictly increasing for column/plex. * 3470322 (3399323) The reconfiguration of Dynamic Multipathing (DMP) database fails. * 3470345 (3281004) For DMP minimum queue I/O policy with large number of CPUs a couple of issues are observed. * 3470347 (3444765) In Cluster Volume Manager (CVM), shared volume recovery may take long time for large configurations. * 3470350 (3437852) The system panics when Symantec Replicator Option goes to PASSTHRU mode. * 3470352 (3450758) The slave node was not able to join CVM cluster and resulted in panic. * 3470353 (3236772) Heavy I/O loads on primary sites result in transaction/session timeouts between the primary and secondary sites. * 3470354 (3446415) A pool may get added to the file system when the file system shrink operation is performed on FileStore. * 3470382 (3368361) When site consistency is configured within a private disk group and CVM is up, the reattach operation of a detached site fails. * 3470383 (3455460) The vxfmrshowmap and verify_dco_header utilities fail with an error. * 3470384 (3440790) The vxassist(1M) command with parameter mirror and the vxplex command(1M) with parameter att hang. * 3470385 (3373142) Updates to vxassist and vxedit man pages for behavioral changes after 6.0. * 3475525 (3475521) During a system reboot, the following error message is displayed on the console: es_rcm.pl:scripting protocol error * 3490147 (3485907) Panic occurs in the I/O code path. * 3506675 (3441356) Pre-check of the upgrade_start.sh script fails on Solaris. * 3506676 (3435475) The vxcdsconvert(1M) conversion process gets aborted for a thin LUN formatted as a simple disk with Extensible Firmware Interface (EFI) format. * 3506679 (3435225) In a given CVR setup, rebooting the master node causes one of the slaves to panic. * 3506707 (3400504) Upon disabling the host side Host Bus Adapter (HBA) port, extended attributes of some devices are not seen anymore. * 3506709 (3259732) In a CVR environment, rebooting the primary slave followed by connect-disconnect in loop causes rlink to detach. * 3531906 (3526500) Disk IO failures occur with DMP IO timeout error messages when DMP (Dynamic Multi-pathing) IO statistics demon is not running. * 3536289 (3492062) Dynamic Multi-Pathing (DMP) fails to get page 0x83 LUN identifier for EMC symmetrix LUNS and continuously logs error messages. * 3540122 (3482026) The vxattachd(1M) daemon reattaches plexes of manually detached site. * 3543944 (3520991) The vxconfigd(1M) daemon dumps core due to memory corruption. * 3547093 (2422535) Changes on the Veritas Volume Manager (VxVM) recovery operations are not retained after the patch or package upgrade. Patch ID: 150717-01 * 3424815 (3424704) vxbootsetup command fails with localized strings * 3440980 (3107699) VxDMP (Veritas Dynamic MultiPathing) causes system panic after a shutdown/reboot. * 3444900 (3399131) For PowerPath (PP) enclosure, both DA_TPD and DA_COEXIST_TPD flags are set. * 3445233 (3339195) While running vxdiskadm error is observed * 3445234 (3358904) System with ALUA enclosures sometimes panics during path fault scenarios * 3445249 (3344796) shared disk group import fails on LDOM guest with SCSI-3 Persistent Reservation enabled * 3445268 (3422504) Paths gets unexpectdely disabled/enabled in LDOM guest * 3445991 (3421326) DMP keeps on logging 'copyin failure messages' in system log repeatedly * 3445992 (3421322) LDOM guest experiences intermittent I/O failures * 3446010 (3421330) vxfentsthdw utility tests fail in LDOM guest for DMP backed virtual devices. * 3446126 (3338208) writes from fenced out node on Active-Passive (AP/F) shared storage device fail with unexpected error * 3447306 (3424798) Veritas Volume Manager (VxVM) mirror attach operations (e.g., plex attach, vxassist mirror, and third-mirror break-off snapshot resynchronization) may take longer time under heavy application I/O load. * 3447894 (3353211) A. After EMC Symmetrix BCV (Business Continuance Volume) device switches to read-write mode, continuous vxdmp (Veritas Dynamic Multi Pathing) error messages flood syslog. B. DMP metanode/path under DMP metanode gets disabled unexpectedly. * 3449714 (3417185) Rebooting host, after the exclusion of a dmpnode while I/O is in progress on it, leads to vxconfigd core dump. * 3452709 (3317430) The vxdiskunsetup utility throws error after upgradation from 5.1SP1RP4. * 3452727 (3279932) The vxdisksetup and vxdiskunsetup utilities were failing on disk which is part of a deported disk group (DG), even if "-f" option is specified. * 3452811 (3445120) Change tunable VOL_MIN_LOWMEM_SZ value to trigger early readback. * 3455455 (3409612) The value of reclaim_on_delete_start_time cannot be set to values outside the range: 22:00-03:59 * 3456729 (3428025) System running Symantec Replication Option (VVR) and configured as VVR primary crashes when heavy parallel I/Os load are issued * 3458036 (3418830) Node boot-up is getting hung while starting vxconfigd * 3458799 (3197987) vxconfigd dumps core, when 'vxddladm assign names file=' is executed and the file has one or more invalid values for enclosure vendor ID or product ID. * 3470346 (3377383) The vxconfigd crashes when a disk under Dynamic Multi-pathing (DMP) reports device failure. * 3498923 (3087893) EMC TPD emcpower names are changing every reboot with VxVM Patch ID: 152134-01 * 3655626 (3616753) In solaris 11, odm module is not loading automatically after rebooting the machine. * 3864145 (3451730) Installation of VRTSodm, VRTSvxfs in a zone fails when running zoneadm -z Zone attach -U * 3864248 (3757609) CPU usage going high because of contention over ODM_IO_LOCK * 3864254 (3832329) The stat() on /dev/odm/ctl in Solaris 11.2 results in system panic. Patch ID: 152144-01 * 3876345 (3865248) Allow asynchronous and synchronous lock calls for the same lock level. Patch ID: 150736-03 * 3652109 (3553328) During internal testing full fsck failed to clean the file system cleanly. * 3690067 (3615043) Data loss when writing to a file while dalloc is on. * 3729811 (3719523) 'vxupgrade' retains the superblock replica of old layout versions. * 3852733 (3729158) Deadlock occurs due to incorrect locking order between write advise and dalloc flusher thread. * 3859806 (3451730) Installation of VRTSodm, VRTSvxfs in a zone fails when running zoneadm -z Zone attach -U * 3864007 (3558087) The ls -l and other commands which uses stat system call may take long time to complete. * 3864010 (3269553) VxFS returns inappropriate message for read of hole via Oracle Disk Manager (ODM). * 3864013 (3811849) System panics while executing lookup() in a directory with large directory hash(LDH). * 3864035 (3790721) High cpu usage caused by vx_send_bcastgetemapmsg_remaus * 3864036 (3233276) With a large file system, primary to secondary migration takes longer duration. * 3864037 (3616907) System is unresponsive causing the NMI watchdog service to stall. * 3864040 (3633683) vxfs thread consumes high CPU while running an application that makes excessive sync() calls. * 3864042 (3466020) File system is corrupted with an error message "vx_direrr: vx_dexh_keycheck_1". * 3864141 (3647749) On Solaris, an obsolete v_path is created for the VxFS vnode. * 3864148 (3695367) Unable to remove volume from multi-volume VxFS using "fsvoladm" command. * 3864150 (3602322) System panics while flushing the dirty pages of the inode. * 3864155 (3707662) Race between reorg processing and fsadm timer thread (alarm expiry) leads to panic in vx_reorg_emap. * 3864156 (3662284) File Change Log (FCL) read may retrun ENXIO. * 3864158 (2560032) System panics after SFHA is upgraded from 5.1SP1 to 5.1SP1RP2 or from 6.0.1 to 6.0.5 * 3864160 (3691633) Remove RCQ Full messages * 3864161 (3708836) fallocate causes data corruption * 3864164 (3762125) Directory size increases abnormally. * 3864165 (3751049) The umountall operation fails on Solaris. * 3864167 (3735697) vxrepquota reports error * 3864170 (3743572) File system may get hang when reaching 1 billion inode limit * 3864173 (3779916) vxfsconvert fails to upgrade layout verison for a vxfs file system with large number of inodes. * 3864175 (3804400) /opt/VRTS/bin/cp does not return any error when quota hard limit is reached and partial write is encountered. * 3864177 (3808033) When using 6.2.1 ODM on RHEL7, Oracle resource cannot be killed after forced umount via VCS. * 3864178 (1428611) 'vxcompress' can spew many GLM block lock messages over the LLT network. * 3864184 (3857444) The default permission of /etc/vx/vxfssystem file is incorrect. * 3864185 (3859032) System panics in vx_tflush_map() due to NULL pointer de-reference. * 3864186 (3855726) Panic in vx_prot_unregister_all(). * 3864246 (3657482) Stress test on cluster file system fails due to data corruption * 3864247 (3861713) High %sys CPU seen on Large CPU/Memory configurations. * 3864250 (3833816) Read returns stale data on one node of the CFS. * 3864255 (3827491) Data relocation is not executed correctly if the IOTEMP policy is set to AVERAGE. * 3864256 (3830300) Degraded CPU performance during backup of Oracle archive logs on CFS vs local filesystem * 3864257 (3844820) Removing/Adding vCPU on Solaris could trigger system panic * 3864259 (3856363) Filesystem inodes have incorrect blocks. * 3864260 (3846521) "cp -p" fails if modification time in nano seconds have 10 digits. * 3866968 (3866962) Data corruption seen when dalloc writes are going on the file and simultaneously fsync started on the same file. * 3874662 (3871489) Performance issue observed when number of HBAs increased on high end servers. * 3877070 (3880121) Internal assert failure when coalescing the extents on clone. * 3877339 (3880113) Internal assert failure when pushing zfod extent on clone. Patch ID: 150736-02 * 3520113 (3451284) Internal testing hits an assert "vx_sum_upd_efree1" * 3536233 (3457803) File System gets disabled intermittently with metadata IO error. * 3583963 (3583930) When the external quota file is restored or over-written, the old quota records are preserved. * 3617774 (3475194) Veritas File System (VxFS) fscdsconv(1M) command fails with metadata overflow. * 3617788 (3604071) High CPU usage consumed by the vxfs thread process. * 3617793 (3564076) The MongoDB noSQL db creation fails with an ENOTSUP error. * 3620279 (3558087) The ls -l command hangs when the system takes backup. * 3620288 (3469644) The system panics in the vx_logbuf_clean() function. * 3645825 (3622326) Filesystem is marked with fullfsck flag as an inode is marked bad during checkpoint promote Patch ID: 150736-01 * 3383149 (3383147) The ACA operator precedence error may occur while turning off delayed allocation. * 3422580 (1949445) System is unresponsive when files were created on large directory. * 3422584 (2059611) The system panics due to a NULL pointer dereference while flushing bitmaps to the disk. * 3422586 (2439261) When the vx_fiostats_tunable value is changed from zero to non-zero, the system panics. * 3422604 (3092114) The information output displayed by the "df -i" command may be inaccurate for cluster mounted file systems. * 3422614 (3297840) A metadata corruption is found during the file removal process. * 3422626 (3332902) While shutting down, the system running the fsclustadm(1M) command panics. * 3422629 (3335272) The mkfs (make file system) command dumps core when the log size provided is not aligned. * 3422636 (3340286) After a file system is resized, the tunable setting of dalloc_enable gets reset to a default value. * 3422649 (3394803) A panic is observed in VxFS routine vx_upgrade7() function while running the vxupgrade command(1M). * 3436431 (3434811) The vxfsconvert(1M) in VxFS 6.1 hangs. * 3448503 (3448492) In Solaris SPARC, introduced Vnode Page Mapping (VPM) interface. * 3496391 (3499886) The patch ID from the VxFS in Solaris 10 patch is displayed in PSTAMP. * 3501832 (3413926) Internal testing hangs due to high memory consumption resulting in fork failure. * 3504362 (3472551) The attribute validation (pass 1d) of full fsck takes too much time to complete. * 3507608 (3478017) Internal test hits assert in voprwunlock. * 3512292 (3348520) In a Cluster File System (CFS) cluster having multi volume file system of a smaller size, execution of the fsadm command causes system hang if the free space in the file system is low. * 3518943 (3534779) Internal stress testing on Cluster File System (CFS) hits a debug assert. * 3519809 (3463464) Internal kernel functionality conformance test hits a kernel panic due to null pointer dereference. * 3528770 (3449152) Failed to set 'thin_friendly_alloc' tunable in case of cluster file system (CFS). * 3529852 (3463717) Information regarding Cluster File System (CFS) that does not support the 'thin_friendly_alloc' tunable is not updated in the vxtunefs(1M) command man page. * 3529862 (3529860) The package verification using the Apkg verifyA command fails for VRTSglm, VRTSgms, and VRTSvxfs packages on Solaris 11. * 3530038 (3417321) The vxtunefs(1M) tunable man page gives an incorrect * 3541125 (3541083) The vxupgrade(1M) command for layout version 10 creates 64-bit quota files with inappropriate permission configurations. Patch ID: 151226-02 * 3864144 (3451730) Installation of VRTSodm, VRTSvxfs in a zone fails when running zoneadm -z Zone attach -U * 3864174 (2905552) fsdedupadm schedule was executed even if there were no schedule * 3864176 (3801320) Core dump is generated when deduplication is running. Patch ID: 151226-01 * 3620250 (3621205) OpenSSL common vulnerability exposure (CVE): POODLE and Heartbleed. DETAILS OF INCIDENTS FIXED BY THE PATCH --------------------------------------- This patch fixes the following Symantec incidents: Patch ID: 150717-06 * 3607232 (Tracking ID: 3596330) SYMPTOM: 'vxsnap refresh' operation fails with following indicants: Errors occur from DR (Disaster Recovery) Site of VVR (Veritas Volume Replicator): o vxio: [ID 160489 kern.notice] NOTICE: VxVM vxio V-5-3-1576 commit: Timedout waiting for rvg [RVG] to quiesce, iocount [PENDING_COUNT] msg 0 o vxvm:vxconfigd: [ID 702911 daemon.warning] V-5-1-8011 Internal transaction failed: Transaction aborted waiting for io drain At the same time, following errors occur from Primary Site of VVR: vxio: [ID 218356 kern.warning] WARNING: VxVM VVR vxio V-5-0-267 Rlink [RLINK] disconnecting due to ack timeout on update message DESCRIPTION: VM (Volume Manager) Transactions on DR site get aborted as pending IOs could not be drained in stipulated time leading to failure of FMR (Fast-Mirror Resync) 'snap' operations. These IOs could not be drained because of IO throttling. A bug/race in conjunction with timing in VVR causes a miss in clearing this throttling condition/state. RESOLUTION: Code changes have been done to fix the race condition which ensures clearance of throttling state at appropriate time. * 3640641 (Tracking ID: 3622068) SYMPTOM: After mirroring encapsulated root disk, rootdg fails to import if any disk in the disk group becomes unavailable. It causes system becoming unbootable if /usr is mounted on a separate disk and is also encapsulated. DESCRIPTION: Starting from VxVM(Veritas Volume Manager) 6.1, by default all non-detached disks within a disk group should be accessible to the system so that the disk group can be successfully imported. It impacts system boot with alternative encapsulated boot disk when original encapsulated boot disk is unavailable. RESOLUTION: Code changes have been made to skip such check when trying to import rootdg. * 3644360 (Tracking ID: 3644291) SYMPTOM: Some of the disk groups that are composed by 2TB LUNs fail to be auto-imported after a system restart. DESCRIPTION: During an early boot stage, sometimes system calls the efi_alloc_and_read() function that fails due to operating system issues. As a result, the disks cannot be brought online and the disk groups cannot be auto-imported. RESOLUTION: The code is modified to remove the efi_alloc_and_read()function. * 3674567 (Tracking ID: 3636089) SYMPTOM: VVR secondary was panicked while committing the transactions with the following stack: vpanic() vol_rv_sec_write_childdone+0x18() vol_rv_transaction_prepare+0x63c() vol_commit_iolock_objects+0xbc() vol_ktrans_commit+0x2d8() volsioctl_real+0x2ac() DESCRIPTION: While unwinding updates in vol_rv_transaction_prepare(), double SIO(Staged IO) done happens result of panic. RESOLUTION: Code changes were made to fix the issue. * 3682022 (Tracking ID: 3648719) SYMPTOM: The server panics with a following stack trace while adding or removing LUNs or HBAs: dmp_decode_add_path() dmp_decipher_instructions() dmp_process_instruction_buffer() dmp_reconfigure_db() gendmpioctl() vxdmpioctl() DESCRIPTION: While deleting a dmpnode, Dynamic Multi-Pathing (DMP) releases the memory associated with the dmpnode structure. In case the dmpnode doesn't get deleted for some reason, and if any other tasks access the freed memory of this dmpnode, then the server panics. RESOLUTION: The code is modified to avoid the tasks from accessing the memory that is freed by the dmpnode, which is deleted. The change also fixed the memory leak issue in the buffer allocation code path. * 3690253 (Tracking ID: 3606480) SYMPTOM: Issue 1: Enabling dmp_native_support tunable fails on Solaris with following error: VxVM vxdmpadm ERROR V-5-1-15690 Operation failed for one or more zpools VxVM vxdmpadm ERROR V-5-1-15686 The following zpool(s) could not be migrated as failed to obtain root pool information - Issue 2: After enabling dmp_native_support, zfs can panic due to I/O error since phys_path for zpool points to OS device path and not DMP (Dynamic Multipathing) device. DESCRIPTION: Issue 1: Enabling the dmp_native_support tunable is failing because devid (device id) on the Dynamic Multipathing (DMP) device is not available. The devid is not updated by DMP while populating the devid on root disk under DMP control. This causes dmp_native_support enable command to fail. Issue 2: Ideally the phys_path should have been set to NULL when the zpool is migrated from OS device to DMP device. But phys_path was getting set to OS device when zpool is migrated and dmp_native_support is set to on. RESOLUTION: Issue 1: Code changes are done such that the devid on the DMP devices get updated for the root disk. Issue 2: Code changes are done so that phys_path corresponding to DMP device is populated for ZFS pools when the migration happens. * 3729172 (Tracking ID: 3726110) SYMPTOM: On systems with high number of CPUs, DMP devices may perform considerably slower than OS device paths. DESCRIPTION: In high CPU configuration, I/O statistics related functionality in DMP takes more CPU time because DMP statistics are collected on per CPU basis. This stat collection happens in DMP I/O code path hence it reduces the I/O performance. Because of this, DMP devices perform slower than OS device paths. RESOLUTION: The code is modified to remove some of the stats collection functionality from DMP I/O code path. Along with this, the following tunable need to be turned off: 1. Turn off idle lun probing. #vxdmpadm settune dmp_probe_idle_lun=off 2. Turn off statistic gathering functionality. #vxdmpadm iostat stop Notes: 1. Please apply this patch if system configuration has large number of CPU and if DMP performs considerably slower than OS device paths. For normal systems this issue is not applicable. * 3736352 (Tracking ID: 3729078) SYMPTOM: In VVR environment, the panic may occur after SF(Storage Foundation) patch installation or uninstallation on the secondary site. DESCRIPTION: VXIO Kernel reset invoked by SF patch installation removes all Disk Group objects that have no preserved flag set, because the preserve flag is overlapped with RVG(Replicated Volume Group) logging flag, the RVG object won't be removed, but its rlink object is removed, result of system panic when starting VVR. RESOLUTION: Code changes have been made to fix this issue. * 3771073 (Tracking ID: 3755209) SYMPTOM: The Dynamic Multi-pathing(DMP) device configured in Solaris LDOM guest is disabled when an active controller of an ALUA array is failed. DESCRIPTION: DMP in guest environment monitors cached target port ID of virtual paths in LDOM. If a controller of an ALUA array fails for some reason, active/primary target port ID of an ALUA array will be changed in I/O domain resulting in stale entry in the guest. DMP in the guest wrongly interprets this target port change to mark the path as unavailable. This causes I/O on the path to be failed. As a result the DMP device is disabled in LDOM. RESOLUTION: The code is modified to not use the cached target port IDs for LDOM virtual disks. * 3778391 (Tracking ID: 3565212) SYMPTOM: While performing controller giveback operations on NetApp ALUA arrays, the below messages are observed in /etc/vx/dmpevents.log [Date]: I/O error occured on Path belonging to Dmpnode [Date]: I/O analysis done as DMP_PATH_BUSY on Path belonging to Dmpnode [Date]: I/O analysis done as DMP_IOTIMEOUT on Path belonging to Dmpnode DESCRIPTION: During the asymmetric access state transition, DMP puts the buffer pointer in the delay queue based on the flags observed in the logs. This delay resulted in timeout and thereby filesystem went into disabled state. RESOLUTION: DMP code is modified to perform immediate retries instead of putting the buffer pointer in the delay queue for transition in progress case. * 3808593 (Tracking ID: 3795788) SYMPTOM: Performance degradation is seen when many application sessions open the same data file on Veritas Volume Manager (VxVM) volume. DESCRIPTION: This issue occurs because of the lock contention. When many application sessions open the same data file on the VxVM volume, the exclusive lock is occupied on all CPUs. If there are a lot of CPUs in the system, this process could be quite time- consuming, which leads to performance degradation at the initial start of applications. RESOLUTION: The code is modified to change the exclusive lock to the shared lock when the data file on the volume is open. * 3812272 (Tracking ID: 3811946) SYMPTOM: When invoking "vxsnap make" command with cachesize option to create space optimized snapshot, the command succeeds but the following error message is displayed in syslog: kernel: VxVM vxio V-5-0-603 I/O failed. Subcache object <subcache-name> does not have a valid sdid allocated by cache object <cache-name>. kernel: VxVM vxio V-5-0-1276 error on Plex <plex-name> while writing volume <volume-name> offset 0 length 2048 DESCRIPTION: When space optimized snapshot is created using "vxsnap make" command along with cachesize option, cache and subcache objects are created by the same command. During the creation of snapshot, I/Os from the volumes may be pushed onto a subcache even though the subcache ID has not yet been allocated. As a result, the I/O fails. RESOLUTION: The code is modified to make sure that I/Os on the subcache are pushed only after the subcache ID has been allocated. * 3835566 (Tracking ID: 3804214) SYMPTOM: VxDMP (Dynamic Multi-Pathing) path enable operation fails after the disk label is changed from guest LDOM. Open fails with error 5 (EIO) on the path being enabled. Following error messages can be seen in /var/adm/messages: vxdmp: [ID 808364 kern.notice] NOTICE: VxVM vxdmp V-5-3-0 dmp_open_path: Open failed with 5 for path 237/0x30 vxdmp: [ID 382146 kern.notice] NOTICE: VxVM vxdmp V-5-0-112 [Warn] disabled path 237/0x30 belonging to the dmpnode 307/0x38 due to open failure DESCRIPTION: While a disk is exported to the Solaris LDOM, Solaris OS in the control/IO domain holds NORMAL mode open on the existing partitions of the DMP node. If the disk partitions/label is changed from LDOM such that some of the older partitions are removed, Solaris OS in the control/IO domain does not know about this change and continues to hold NORMAL mode open on those deleted partitions. If a disabled DMP path is enabled in this scenario, the NORMAL mode open the path fails and path enable operation errors out. This can be worked around by detaching and reattaching the disk to the LDOM. Due to a problem in DMP code, the stale NORMAL mode open flag was not being reset even when the DMP disk was detached from the LDOM. This was preventing the DMP path to be enabled even after the DMP disk was detached from the LDOM. RESOLUTION: Code was fixed to reset NORMAL mode open when the DMP disk is detached from the LDOM. With this fix, DMP disk will have to reattached to the LDOM only once after the disk labels change. When the disk is reattached, it will get the correct open mode (NORMAL/NDELAY) on the partitions that exist after label change. * 3852147 (Tracking ID: 3852146) SYMPTOM: Shared DiskGroup fails to import when "-c" and "-o noreonline" options are specified together with the below error: VxVM vxdg ERROR V-5-1-10978 Disk group : import failed: Disk for disk group not found DESCRIPTION: When "-c" option is specified we update the DISKID and DGID of the disks in the DG. When the information about the disks in the DG is passed to Slave node, slave node does not have the latest information since the online of the disks would not happen because of "-o noreonline" being specified. Now since slave node does not have the latest information, it would not be able to identify proper disks belonging to the DG which leads to DG import failing with "Disk for disk group not found". RESOLUTION: Code changes have been done to handle the working of "-c" and "-o noreonline" together. * 3852455 (Tracking ID: 3603792) SYMPTOM: The first boot after live upgrade to new version of Solaris 11 and VxVM takes long time since post installation stalled for a long time. DESCRIPTION: In Solaris 11, the OS command devlinks which used to add /dev entries stalled for a long time in post installation of VxVM. The OS command devfsadm should be used in the post-install script. RESOLUTION: The code is modified to replace devlinks with devfsadm in the post installation process of VxVM. * 3852800 (Tracking ID: 3819670) SYMPTOM: When smartmove with 'vxevac' command is run in background by hitting 'ctlr-z' key and 'bg' command, the execution of 'vxevac' is terminated abruptly. DESCRIPTION: As part of "vxevac" command for data movement, VxVM submits the data as a task in the kernel, and use select() primitive on the task file descriptor to wait for task finishing events to arrive. When "ctlr-z" and bg is used to run vxevac in background, the select() returns -1 with errno EINTR. VxVM wrongly interprets it as user termination action and hence vxevac is terminated. Instead of terminating vxevac, the select() should be retried untill task completes. RESOLUTION: The code is modified so that when select() returns with errno EINTR, it checks whether vxevac task is finished. If not, the select() is retried. * 3853086 (Tracking ID: 3783356) SYMPTOM: After DMP module fails to load, dmp_idle_vector is not NULL. DESCRIPTION: After DMP module load failure, DMP resources are not cleared off from the system memory, so some of the resources are in NON-NULL value. When system retries to load, it frees invalid data, leading to system panic with error message BAD FREE, because the data being freed is not valid at that point. RESOLUTION: The code is modified to clear up the DMP resources when module failure happens. * 3854390 (Tracking ID: 3853049) SYMPTOM: On a server with more number CPUs, the stats output of vxstat is delayed beyond the set interval. Also multiple sessions of vxstat is impacting the IO performance. DESCRIPTION: The vxstat acquires an exclusive lock on each CPU in order to gather the stats. This would affect the consolidation and display of stats in an environment with huge number of CPUs and disks. The output of stats for interval of 1 second can get delayed beyond the set interval. Also the acquisition of lock happens in the IO path which would affect the IO performance due contention of these locks. RESOLUTION: The code modified to remove the exclusive spin lock. * 3854526 (Tracking ID: 3544020) SYMPTOM: Volume Manager tunables are getting reset to default after patch upgade DESCRIPTION: The file "/kernel/drv/vxio.conf" which is used to store all the tunable values is getting replaced with the default template during patch upgrade. This leads to values of all the tunables getting reset to default after patch upgrade. RESOLUTION: Changes are done to preserve the file "/kernel/drv/vxio.conf" as part of patch upgrade. * 3856384 (Tracking ID: 3525490) SYMPTOM: In VVR (Veritas Volume Replication) environment, system panic with the following stack, #0 crash_kexec at ffffffff800b1509 #1 __die at ffffffff80065137 #2 do_page_fault at ffffffff80067430 #3 error_exit at ffffffff8005ddf9 [exception RIP: nmcom_wait_msg+449] #4 nmcom_server_proc at ffffffff889f04b6 [vxio] DESCRIPTION: While VVR sending messages with UDP, it uses different data structure with TCP. In case of receiving partial data, the message is explicitly typecast to the same structure for both TCP and UDP. This will cause some pointers in UDP message header become invalid and panic the system while copying data. RESOLUTION: Code changes have been made to fix the problem. * 3857121 (Tracking ID: 3857120) SYMPTOM: After stopping Oracle database process, this process does not run Oracle database again but it hangs there. DESCRIPTION: Two threads from VxVM(Veritas Volume Manager) trying to asynchronously manipulate the per-cpu data structures for I/O counting, which causes to a race condition that might make I/O count leaking, hence the volume cant be closed and results in the process hang. RESOLUTION: VxVM code has been changed to avoid the race condition happening. * 3864112 (Tracking ID: 3860503) SYMPTOM: Poor performance of vxassist mirroring is observed compared to using raw dd utility to do mirroring . DESCRIPTION: There is huge lock contention on high end server with large number of cpus, because doing copy on each region needs to obtain some unnecessary cpu locks. RESOLUTION: VxVM code has been changed to decrease the lock contention. * 3864625 (Tracking ID: 3599977) SYMPTOM: During a replica connection, referencing a port that is already deleted in another thread causes a system panic with a similar stack trace as below: .simple_lock() soereceive() soreceive() .kernel_add_gate_cstack() kmsg_sys_rcv() nmcom_get_next_mblk() nmcom_get_next_msg() nmcom_wait_msg_tcp() nmcom_server_proc_tcp() nmcom_server_proc_enter() vxvm_start_thread_enter() DESCRIPTION: During a replica connection, a port is created before increasing the count. This is to protect the port from getting deleted. However, another thread deletes the port before the count is increased and after the port is created. While the replica connection thread proceeds, it refers to the port that is already deleted, which causes a NULL pointer reference and a system panic. RESOLUTION: The code is modified to prevent asynchronous access to the count that is associated with the port by means of locks. * 3864627 (Tracking ID: 3677359) SYMPTOM: VxDMP causes system panic after a shutdown or reboot with the following stack trace: mutex_enter() volinfo_ioct() volsioctl_real() cdev_ioctl() dmp_signal_vold() dmp_throttle_paths() dmp_process_stats() dmp_daemons_loop() thread_start() or panicsys() vpanic_common() panic+0x1c() mutex_enter() cdev_ioctl() dmp_signal_vold() dmp_check_path_state() dmp_restore_callback() dmp_process_scsireq() dmp_daemons() thread_start() DESCRIPTION: In a special scenario of system shutdown or reboot, the DMP (Dynamic MultiPathing) I/O statistic daemon tries to call the ioctl functions in VxIO module which is being unloaded. As a result, the system panics. RESOLUTION: The code is modified to stop the DMP I/O statistic daemon and DMP restore daemon before system shutdown or reboot. Also, the code is modified to avoid other probes to VxIO devices during shutdown. * 3864628 (Tracking ID: 3686698) SYMPTOM: vxconfigd was getting hung due to deadlock between two threads DESCRIPTION: Two threads were waiting for same lock causing deadlock between them. This will lead to block all vx commands. untimeout function will not return until pending callback is cancelled (which is set through timeout function) OR pending callback has completed its execution (if it has already started). Therefore locks acquired by callback routine should not be held across call to untimeout routine or deadlock may result. Thread 1: untimeout_generic() untimeout() voldio() volsioctl_real() fop_ioctl() ioctl() syscall_trap32() Thread 2: mutex_vector_enter() voldsio_timeout() callout_list_expire() callout_expire() callout_execute() taskq_thread() thread_start() RESOLUTION: Code changes have been made to call untimeout outside the lock taken by callback handler. * 3864980 (Tracking ID: 3594158) SYMPTOM: The system panics on a VVR secondary node with the following stack trace: .simple_lock() soereceive() soreceive() .kernel_add_gate_cstack() kmsg_sys_rcv() nmcom_get_next_mblk() nmcom_get_next_msg() nmcom_wait_msg_tcp() nmcom_server_proc_tcp() nmcom_server_proc_enter() vxvm_start_thread_enter() DESCRIPTION: You may issue a spinlock or unspinlock to the replica to check whether to use a checksum in the received packet. During the lock or unlock operation, if there is a transaction that is being processed with the replica, which rebuilds the replica object in the kernel, then there is a possibility that the replica referenced in spinlock is different than the one which the replica has referenced in unspinlock (especially when the replica is referenced through several pointers). As a result, the system panics. RESOLUTION: The code is modified to set the flag in the port attribute to indicate whether to use the checksum during a port creation. Hence, for each packet that is received, you only need to check the flag in the port attribute rather than referencing it to the replica object. As part of the change, the spinlock or unspinlock statements are also removed. * 3864983 (Tracking ID: 3614182) SYMPTOM: First system reboot after migration from Solaris Multi-Pathing (MPXIO) to Symantec Dynamic Multi-Pathing (DMP) native takes extremely long time. Node may appear hung. This happens when there are a large number of zpools to be migrated. In some cases, the system may take 20+ hours to come up after first reboot. DESCRIPTION: This issue is specific to a configuration where the zpools (and the LUNs hosting the zpools) are shared between multiple systems, and the zpools are already imported on a different system than the one where MPXIO to DMP migration is being performed. An example of this would be multiple nodes of a Symantec VCS (Veritas Cluster Server) configuration. Such zpools should be skipped during DMP migration. Instead, the migration logic was unnecessarily running time consuming ZFS commands on each of these zpools. These commands were failing and contributing to extremely long boot up times. RESOLUTION: DMP migration code was changed to detect above mentioned zpools and skip running ZFS commands on them. * 3864986 (Tracking ID: 3621232) SYMPTOM: When the vradmin ibc command is executed to initiate the In-band Control (IBC) procedure, the vradmind (VVR daemon) on VVR secondary node goes into the disconnected state. Due to which, the following IBC procedure or vradmin ibc commands cannot be started or executed on VVRs secondary node and a message similar to the following appears on VVRs primary node: VxVM VVR vradmin ERROR V-5-52-532 Secondary is undergoing a state transition. Please re-try the command after some time. VxVM VVR vradmin ERROR V-5-52-802 Cannot start command execution on Secondary. DESCRIPTION: When IBC procedure runs into the commands finish state, the vradmin on VVR secondary node goes into a disconnected state, which the vradmind on primary node fails to realize. In such a scenario, the vradmind on primary refrains from sending a handshake request to the secondary node which can change the primary nodes state from disconnected to running. As a result, the vradmind in the primary node continues to be in the disconnected state and the vradmin ibc command fails to run on the VVR secondary node despite of being in the running state on the VVR primary node. RESOLUTION: The code is modified to make sure the vradmind on VVR primary node is notified while it goes into the disconnected state on VVR secondary node. As a result, it can send out a handshake request to take the secondary node out of the disconnected state. * 3864987 (Tracking ID: 3513392) SYMPTOM: secondary panics when rebooted while heavy IOs are going on primary PID: 18862 TASK: ffff8810275ff500 CPU: 0 COMMAND: "vxiod" #0 [ffff880ff3de3960] machine_kexec at ffffffff81035b7b #1 [ffff880ff3de39c0] crash_kexec at ffffffff810c0db2 #2 [ffff880ff3de3a90] oops_end at ffffffff815111d0 #3 [ffff880ff3de3ac0] no_context at ffffffff81046bfb #4 [ffff880ff3de3b10] __bad_area_nosemaphore at ffffffff81046e85 #5 [ffff880ff3de3b60] bad_area_nosemaphore at ffffffff81046f53 #6 [ffff880ff3de3b70] __do_page_fault at ffffffff810476b1 #7 [ffff880ff3de3c90] do_page_fault at ffffffff8151311e #8 [ffff880ff3de3cc0] page_fault at ffffffff815104d5 #9 [ffff880ff3de3d78] volrp_sendsio_start at ffffffffa0af07e3 [vxio] #10 [ffff880ff3de3e08] voliod_iohandle at ffffffffa09991be [vxio] #11 [ffff880ff3de3e38] voliod_loop at ffffffffa0999419 [vxio] #12 [ffff880ff3de3f48] kernel_thread at ffffffff8100c0ca DESCRIPTION: If the replication stage IOs are started after serialization of the replica volume, replication port could be deleted and set to NULL during handling the replica connection changes, this will cause the panic since we have not checked if the replication port is still valid before referencing to it. RESOLUTION: Code changes have been done to abort the stage IO if replication port is NULL. * 3864988 (Tracking ID: 3625890) SYMPTOM: After running the vxdisk resize command, the following message is displayed: "VxVM vxdisk ERROR V-5-1-8643 Device resize failed: Invalid attribute specification" DESCRIPTION: Two reserved cylinders for special usage for CDS (Cross-platform Data Sharing) VTOC(Volume Table of Contents) disks. In case of expanding a disk with particular disk size on storage side, VxVM(Veritas Volume Manager) may calculate the cylinder number as 2, which causes the vxdisk resize fails with the error message of "Invalid attribute specification". RESOLUTION: The code is modified to avoid the failure of resizing a CDS VTOC disk. * 3864989 (Tracking ID: 3521726) SYMPTOM: When using Symantec Replication Option, system panic happens while freeing memory with the following stack trace on AIX, pvthread+011500 STACK: [0001BF60]abend_trap+000000 () [000C9F78]xmfree+000098 () [04FC2120]vol_tbmemfree+0000B0 () [04FC2214]vol_memfreesio_start+00001C () [04FCEC64]voliod_iohandle+000050 () [04FCF080]voliod_loop+0002D0 () [04FC629C]vol_kernel_thread_init+000024 () [0025783C]threadentry+00005C () DESCRIPTION: In certain scenarios, when a write IO gets throttled or un-winded in VVR, we free the memory related to one of our data structures. When we restart this IO, the same memory gets illegally accessed and freed again even though it was freed.It causes system panic. RESOLUTION: Code changes have been done to fix the illegal memory access issue. * 3865415 (Tracking ID: 3736502) SYMPTOM: When FMR is configured in VVR environment, 'vxsnap refresh' fails with below error message: "VxVM VVR vxsnap ERROR V-5-1-10128 DCO experienced IO errors during the operation. Re-run the operation after ensuring that DCO is accessible". Also, multiple messages of connection/disconnection of replication link(rlink) are seen. DESCRIPTION: Inherently triggered rlink connection/disconnection causes the transaction retries. During transaction, memory is allocated for Data Change Object(DCO) maps and is not cleared on abortion of a transaction. This leads to a problem of memory leak and eventually to exhaustion of maps. RESOLUTION: The fix has been added to clear the allocated DCO maps when transaction aborts. * 3865631 (Tracking ID: 3564260) SYMPTOM: VVR commands are unresponsive when replication is paused and resumed in a loop. DESCRIPTION: While Veritas Volume Replicator (VVR) is in the process of sending updates then pausing a replication is deferred until acknowledgements of updates are received or until an error occurs. For some reason, if the acknowledgements get delayed or the delivery fails, the pause operation continues to get deferred resulting in unresponsiveness. RESOLUTION: The code is modified to resolve the issue that caused unresponsiveness. * 3865633 (Tracking ID: 3721565) SYMPTOM: vxconfigd hang is seen with below stack. genunix:cv_wait_sig_swap_core genunix:cv_wait_sig_swap genunix:pause unix:syscall_trap32 DESCRIPTION: In FMR environment, write is done on a source volume having space-optimized(SO) snapshot. Memory is acquired first and then ILOCKs are acquired on individual SO volumes for pushed writes. On the other hand, a user write on SO snapshot will first acquire ILOCK and then acquire memory. This causes deadlock. RESOLUTION: Code is modified to resolve deadlock. * 3865638 (Tracking ID: 3727939) SYMPTOM: Data corruption or VTOC label corruption may occur due to stale device entries present in the /dev/vx/[r]dmp directories. DESCRIPTION: The /dev/vx/[r]dmp directories, where the DMP device entries are created, are mounted as tmpfs (swap) during a boot cycle. These directories can be unmounted while the vxconfigd(1M) daemon is running and the DMP devices are in use. The DMP device entries are recreated on the disk instead of swap device in such situations. These are re-mounted on tmpfs during the next boot cycle thus the device entries created on disk become stale eventually. In such scenarios, the subsequent unmount of these directories results in reading those stale entries. Thus, data corruption or VTOC label corruption occurs. RESOLUTION: The code is modified to clear the stale entries in the /dev/vx/[r]dmp directories before starting the vxconfigd(1M) daemon. * 3865640 (Tracking ID: 3749557) SYMPTOM: System hangs and becomes unresponsive because of heavy memory consumption by vxvm. DESCRIPTION: In the Dirty Region Logging(DRL) update code path an erroneous condition was present that lead to an infinite loop which keeps on consuming memory. This leads to consumption of large amounts of memory making the system unresponsive. RESOLUTION: Code has been fixed, to avoid the infinite loop, hence preventing the hang which was caused by high memory usage. * 3865645 (Tracking ID: 3841242) SYMPTOM: Threads will be hung and and the stack will contain any of the following function. ddi_pathname_to_dev_t() ddi_find_devinfo() ddi_install_driver() devinfo_tree_lock e_ddi_get_dev_info() Stack may look like below - void genunix:cv_wait void genunix:ndi_devi_enter int genunix:devi_config_one int genunix:ndi_devi_config_one int genunix:resolve_pathname_noalias int genunix:resolve_pathname dev_t genunix:ddi_pathname_to_dev_t void vxdmp:dmp_setbootdev int vxdmp:_init int genunix:modinstall DESCRIPTION: Oracle has deprecated some APIs ddi_pathname_to_dev_t() ddi_find_devinfo() ddi_install_driver() devinfo_tree_lock e_ddi_get_dev_info() which were used by VxVM(Veritas Volume Manager) and which were not thread safe. If VxVM modules are loaded in parallel to the other OS modules while making use of these APIs, it may result in a deadlock and a hang could be observed. RESOLUTION: Deprecated ddi_x() API calls have been replaced with ldi_x() calls which are thread safe. * 3865646 (Tracking ID: 3628743) SYMPTOM: On Solaris 11.2, New boot environment takes long time to start up during live upgrade. Here deadlock is seen in ndi_devi_enter( ), when loading VxDMP driver and Deadlocks caused by VXVM drivers due to use of Solaris ddi_pathname_to_dev_t or ddi_hold_devi_by_path private interfaces. DESCRIPTION: Here deadlocks caused by VXVM drivers due to use of Solaris ddi_pathname_to_dev_t or e_ddi_hold_devi_by_path private interface and ddi_pathname_to_dev_t/e_ddi_hold_devi_by_path are Solaris internal use only routine and is not multi-thread safe. Normally this is not a problem as the various VXVM drivers don't unload or detach, however there are certain conditions where our _init routines might be called which can expose this deadlock condition. RESOLUTION: Code is modified to resolve deadlock. * 3865649 (Tracking ID: 3848351) SYMPTOM: While uninstalling the Veritas Storage Foundation stack, vxglm fails to stop as the vxglm driver could not be unloaded. Veritas InfoScale Enterprise Shutdown did not complete successfully. vxglm failed to stop on DESCRIPTION: During GAB initialization, 'e_ddi_get_dev_info' API calls 'ndi_hold_drive' which increases the driver reference count. During de-initialization, driver hold count remains non-zero and vxglm driver fails to unload. RESOLUTION: During GAB de-initialization, vxglm driver count is decreased using 'ndi_rele_driver'. * 3865653 (Tracking ID: 3795739) SYMPTOM: In a split brain scenario, cluster formation takes very long time. DESCRIPTION: In a split brain scenario, the surviving nodes in the cluster try to preempt the keys of nodes leaving the cluster. If the keys have been already preempted by one of the surviving nodes, other surviving nodes will receive UNIT Attention. DMP (Dynamic Multipathing) then retries the preempt command after a delayof 1 second if it receives Unit attention. Cluster formation cannot complete untill PGR keys of all the leaving nodes are removed from all the disks. If the number of disks are very large, the preemption of keys takes a lot of time, leading to the very long time for cluster formation. RESOLUTION: The code is modified to avoid adding delay for first couple of retries when reading PGR keys. This allows faster cluster formation with arrays that clear the Unit Attention condition sooner. * 3866241 (Tracking ID: 3790136) SYMPTOM: File system hang can be observed sometimes due to IO's hung in DRL. DESCRIPTION: There might be some IO's hung in DRL of mirrored volume due to incorrect calculation of outstanding IO's on volume and number of active IO's which are currently in progress on DRL. The value of the outstanding IO on volume can get modified incorrectly leading to IO's on DRL not to progress further which in turns results in a hang kind of scenario. RESOLUTION: Code changes have been done to avoid incorrect modification of value of outstanding IO's on volume and prevent the hang. * 3866269 (Tracking ID: 3736156) SYMPTOM: Solaris 11 update 1 system fails to come up after enabling dmp_native_support and reboot. DESCRIPTION: Veritas Dynamic Multi-pathing (DMP) doesnt handle EFI labeled non scsi disks correctly. Hence, during starting up, the vxconfigd process generates instructions to mark EFI labeled non scsi disks as failed. If failed disk happens to be the boot disk controlled by Veritas Dynamic Multi-pathing (DMP), the system is stuck during starting up. RESOLUTION: The Veritas Dynamic Multi-pathing (DMP) code is modified to correctly handle EFI labeled non scsi disks. * 3866593 (Tracking ID: 3861544) SYMPTOM: For Extensible Firmware Interface (EFI) labelled disks, in some cases Dynamic Multipathing(DMP) disables the disks for I/Os when one of its controller(switch) fails. DESCRIPTION: When an open is issued on disk, DMP tries to perform open operation on the disks through all of its path. When one of its controller(switch) fails the paths going through that controller can still be in active state for short time. In case of an EFI labelled disk when an open is issued on the failed path (path failed because controller is down), DMP returns an error even though the open through other paths(available paths) of disk might succeed. And hence we see the disk as faulted and unavailable for I/Os. RESOLUTION: More code changes are done so that available paths can be used to perform open operation if some of the paths are unavailable. * 3866624 (Tracking ID: 3769927) SYMPTOM: Turning off dmp_native_support tunable fails with the following errors: VxVM vxdmpadm ERROR V-5-1-15690 Operation failed for one or more zpools VxVM vxdmpadm ERROR V-5-1-15686 The following zpool(s) could not be migrated as they are not healthy - <zpool_name>. DESCRIPTION: Turning off the dmp_native_support tunable fails even if the zpools are healthy. The vxnative script doesn't allow turning off the dmp_native_support if it detects that the zpool is unhealthy, which means the zpool state is ONLINE and some action is required to be taken on zpool. "upgrade zpool" is considered as one of the actions indicating unhealthy zpool state. This is not correct. RESOLUTION: The code is modified to consider "upgrade zpool" action as expected. Turning off dmp_native_support tunable is supported if the action is "upgrade zpool".. * 3866643 (Tracking ID: 3539548) SYMPTOM: Adding MPIO(Multi Path I/O) disk that had been removed earlier may result in following two issues: 1. 'vxdisk list' command shows duplicate entry for DMP (Dynamic Multi-Pathing) node with error state. 2. 'vxdmpadm listctlr all' command shows duplicate controller names. DESCRIPTION: 1. Under certain circumstances, deleted MPIO disk record information is left in /etc/vx/disk.info file with its device number as -1 but its DMP node name is reassigned to other MPIO disk. When the deleted disk is added back, it is assigned the same name, without validating for conflict in the name. 2. When some devices are removed and added back to the system, we are adding a new controller for each and every path that we have discovered. This leads to duplicated controller entries in DMP database. RESOLUTION: 1. Code is modified to properly remove all stale information about any disk before updating MPIO disk names. 2. Code changes have been made to add the controller for selected paths only. * 3866651 (Tracking ID: 3596282) SYMPTOM: FMR (Fast Mirror Resync) operations fail with error "Failed to allocate a new map due to no free map available in DCO". "vxio: [ID 609550 kern.warning] WARNING: VxVM vxio V-5-3-1721 voldco_allocate_toc_entry: Failed to allocate a new map due to no free map available in DCO of [volume]" It often leads to disabling of the snapshot. DESCRIPTION: For instant space optimized snapshots, stale maps are left behind for DCO (Data Change Object) objects at the time of creation of cache objects. So, over the time if space optimized snapshots are created that use a new cache object, stale maps get accumulated, which eventually consume all the available DCO space, resulting in the error. RESOLUTION: Code changes have been done to ensure no stale entries are left behind. * 3866679 (Tracking ID: 3856146) SYMPTOM: Two issues are hit on latest SRUs Solaris 11.2.8 and greater and Solaris sparc 11.3 when dmp_native support is on. These issues are mentioned below: 1. Turning of dmp_native_support on and off requires reboot. System gets panic during the reboot as a part of setting dmp_native_support off. 2. Sometimes, system comes up after reboot when dmp_native_support is set to off. In such case, panic is observed when system is rebooted after uninstallation of SF and it fails to boot up. The panic string is same for both the issues. panic[cpu0]/thread=20012000: read_binding_file: /etc/name_to_major file not found DESCRIPTION: The issue happened because of /etc/system and /etc/name_to_major files. As per the discussion with Oracle through SR(3-11640878941) Removal of aforementioned 2 files from the boot-archive is causing this panic. Because files -> /etc/name_to_major & /etc/system are included in the SPARC boot_archive of Solaris 11.2.8.4.0 ( and greater versions) and they should not be removed. The system will fail to come up if they are removed." RESOLUTION: The code has been modified to avoid panic while setting dmp_native_support to off. * 3867134 (Tracking ID: 3486861) SYMPTOM: Primary node panics with below stack when storage is removed while replication is going on with heavy IOs. Stack: oops_end no_context page_fault vol_rv_async_done vol_rv_flush_loghdr_done voliod_iohandle voliod_loop DESCRIPTION: In VVR environment, when write to data volume failson primary node, error handling is initiated. As a part of it, SRL header will be flushed. As primary storage is removed, flushing will fail. Panic will be hit as invalid values will be accessed while logging error message. RESOLUTION: Code is modified to resolve the issue. * 3867135 (Tracking ID: 3658079) SYMPTOM: If the size of backup slice of volumes is larger than 64GB, VxVMwraps it around it and exports it to LDOM guest domain. DESCRIPTION: The data type of variables declared to store volume's sector/track and tracks/cylinder are not sufficient to handle the value when the volume size goes beyond 64GB. Due to this the size of the cylinder gets wrapped around to 2048 sectors. RESOLUTION: The code has been changed to support volumes up to size 1.4TB. * 3867137 (Tracking ID: 3674614) SYMPTOM: Restarting the vxconfigd(1M) daemon on the slave (joiner) node during node-join operation may cause the vxconfigd(1M) daemon to become unresponsive on the master and the joiner node. DESCRIPTION: When vxconfigd(1M) daemon is restarted in the middle of the node join process, it wrongly assumes that this is a slave rejoin case, and sets the rejoin flag. Since the rejoin flag is wrongly set, the import operation of the disk groups on the slave node fails, and the join process is not terminated smoothly. As a result, the vxconfigd(1M) daemon becomes unresponsive on the master and the slave node. RESOLUTION: The code is modified to differentiate between the rejoin scenario and the vxconfigd(1M) daemon restart scenario. * 3867308 (Tracking ID: 3442024) SYMPTOM: The adddisk <disk_name> command fails to add the Compressed Diagonal Storage(CDS) EFI disk to the CDS disk group. The failure occurs with the following message: Creating a new disk group named <disk group> containing the disk device <disk name> with the name <disk group>. VxVM ERROR V-5-2-121 Creating disk group <disk group> with disk device <disk name> failed. VxVM vxdg ERROR V-5-1-6478 Device <disk name> cannot be added to a CDS disk Group DESCRIPTION: The vxdiskadm(1M) option #1(adding disk to disk group and initialization) does not provide the CDS format option for an EFI labeled disk, which causes the EFI labeled disk to be formatted as sliced. Subsequently, the adddisk operation fails because a disk with sliced format cannot be added to a CDS disk group. RESOLUTION: The vxdiskadm utility code is modified to allow an EFI labeled disk to be added to an existing CDS disk group, and it can be initialized with CDS format. * 3867315 (Tracking ID: 3672759) SYMPTOM: When a DMP database is corrupted, the vxconfigd(1M) daemon may core dump with the following stack trace: database is corrupted. ddl_change_dmpnode_state () ddl_data_corruption_msgs () ddl_reconfigure_all () ddl_find_devices_in_system () find_devices_in_system () req_change_state () request_loop () main () DESCRIPTION: The issue is observed because the corrupted DMP database is not properly destroyed. RESOLUTION: The code is modified to remove the corrupted DMP database. * 3867357 (Tracking ID: 3544980) SYMPTOM: Vxconfigd reports error message like below when scanning disks. "vxconfigd V-5-1-7920 di_init() failed" DESCRIPTION: In Solaris discovery codepath, when di_init() system call fails, walknode() is called to retrieve the device info. If any error occurs during discovery, all disks will disappear after scanning disks. OS vendor suggested to retry calling di_init() with delay if di_init() fails, instead of calling walknode(). RESOLUTION: Code changes have been made to deprecate the call of walknode() and to retry calling di_init() 3 times with delay if di_init() fails. * 3867706 (Tracking ID: 3645370) SYMPTOM: After running the vxevac command, if the user tries to rollback or commit the evacuation for a disk containing DRL plex, the action fails with the following errors: /etc/vx/bin/vxevac -g testdg commit testdg02 testdg03 VxVM vxsd ERROR V-5-1-10127 deleting plex %1: Record is associated VxVM vxassist ERROR V-5-1-324 fsgen/vxsd killed by signal 11, core dumped VxVM vxassist ERROR V-5-1-12178 Could not commit subdisk testdg02-01 in volume testvol VxVM vxevac ERROR V-5-2-3537 Aborting disk evacuation /etc/vx/bin/vxevac -g testdg rollback testdg02 testdg03 VxVM vxsd ERROR V-5-1-10127 deleting plex %1: Record is associated VxVM vxassist ERROR V-5-1-324 fsgen/vxsd killed by signal 11, core dumped VxVM vxassist ERROR V-5-1-12178 Could not rollback subdisk testdg02-01 in volume testvol VxVM vxevac ERROR V-5-2-3537 Aborting disk evacuation DESCRIPTION: When the user uses the vxevac command, new plexes are created on the target disks. Later, during the commit or roll back operation, VxVM deletes the plexes on the source or the target disks. For deleting a plex, VxVM should delete its sub disks first, otherwise the plex deletion fails with the following error message: VxVM vxsd ERROR V-5-1-10127 deleting plex %1: Record is associated The error is displayed because the code does not handle the deletion of subdisks of plexes marked for DRL (dirty region logging) correctly. RESOLUTION: The code is modified to handle evacuation of disks with DRL plexes correctly. . * 3867708 (Tracking ID: 3495548) SYMPTOM: For EMC PowerPath controlled devices with Dynamic Multipathings (DMP), and using Operating System Naming (OSN) scheme, vxdisk(1M) rm command fails with the following error message: VxVM vxdisk ERROR V-5-1-639 Failed to obtain locks: :: no such object in the configuration DESCRIPTION: The failure occurs because the slice information is incorrectly added to the disk name that is specified on command line to generate a full disk name. Eventually, Veritas Volume Manager (VxVM) fails to find this incorrect disk name in its database and reports an error. RESOLUTION: The code is modified to properly generate the whole disk name in case of PowerPath controlled devices with OSN naming scheme. * 3867709 (Tracking ID: 3767531) SYMPTOM: In Layered volume layout with FSS configuration, when few of the FSS_Hosts are rebooted, Full resync is happening for non-affected disks on master. DESCRIPTION: In configuration, where there are multiple FSS-Hosts, with layered volume created on the hosts. When the slave nodes are rebooted , few of the sub-volumes of non-affected disks are fully getting synced on master. RESOLUTION: Code-changes have been made to sync only needed part of sub- volume. * 3867710 (Tracking ID: 3788644) SYMPTOM: When DMP (Dynamic Multi-Pathing) native support enabled for Oracle ASM environment, if we constantly adding and removing DMP devices, it will cause error like: /etc/vx/bin/vxdmpraw enable oracle dba 775 emc0_3f84 VxVM vxdmpraw INFO V-5-2-6157 Device enabled : emc0_3f84 Error setting raw device (Invalid argument) DESCRIPTION: There is a limitation (8192) of maximum raw device number N (exclusive) of /dev/raw/rawN. This limitation is defined in boot configuration file. When binding a raw device to a dmpnode, it uses /dev/raw/rawN to bind the dmpnode. The rawN is calculated by one-way incremental process. So even if we unbind the device later on, the "released" rawN number will not be reused in the next binding. When the rawN number is increased to exceed the maximum limitation, the error will be reported. RESOLUTION: Code has been changed to always use the smallest available rawN number instead of calculating by one-way incremental process. * 3867711 (Tracking ID: 3802750) SYMPTOM: Once VxVM (Veritas Volume Manager) volume I/O-shipping functionality is turned on, it is not getting disabled even after the user issues the correct command to disable it. DESCRIPTION: VxVM (Veritas Volume Manager) volume I/O-shipping functionality is turned off by default. The following two commands can be used to turn it on and off: vxdg -g <dgname> set ioship=on vxdg -g <dgname> set ioship=off The command to turn off I/O-shipping is not working as intended because I/O-shipping flags are not reset properly. RESOLUTION: The code is modified to correctly reset I/O-shipping flags when the user issues the CLI command. * 3867712 (Tracking ID: 3807879) SYMPTOM: Writing the backup EFI GPT disk label during the disk-group flush operation may cause data corruption on volumes in the disk group. The backup label could incorrectly get flushed to the disk public region and overwrite the user data with the backup disk label. DESCRIPTION: For EFI disks initialized under VxVM (Veritas Volume Manager), it is observed that during a disk-group flush operation, vxconfigd (veritas configuration daemon) could stop writing the EFI GPT backup label to the volume public region, thereby causing user data corruption. When this issue happens, the real user data are replaced with the backup EFI disk label RESOLUTION: The code is modified to prevent the writing of the EFI GPT backup label during the VxVM disk-group flush operation. * 3867714 (Tracking ID: 3819832) SYMPTOM: No syslog message seen when dmp detects controller disabled/enabled DESCRIPTION: Whenever there is action taken from Storage whether addtion or removal, DMP detects the events of failure or notifications of additions, and as prt of discovery it disables / enables the controller and message for this was not getting logged in syslog file. RESOLUTION: Code-changes made to show the syslog message for controller disable/enable event. * 3867881 (Tracking ID: 3867145) SYMPTOM: When VVR SRL occupation > 90%, then output the SRL occupation is shown by 10 percent. DESCRIPTION: This is kind of enhancement, to show the SRL Occupation when it's more than 90% is previously shown with 10 percentage gap. Here the enhancement is to show the logs with 1 percentage granularity. RESOLUTION: Changes are done to show the syslog messages wih 1 percent granularity, when SRL is filled > 90%. * 3867928 (Tracking ID: 3764326) SYMPTOM: VxDMP repeatedly reports warning messages in system log: WARNING: VxVM vxdmp V-5-0-2046 : Failed to get devid for device 0x70259720 WARNING: VxVM vxdmp V-5-3-2065 dmp_devno_to_devidstr ldi_get_devid failed for devno 0x13800000a60 DESCRIPTION: Due to VxDMP code issue, the device path name is inconsistent during creation and deletion. It leaves stale device file under /devices. Because some devices don't support Solaris devid operations, the devid related functions fail against such devices. VxDMP doesn't skip such devices when creating or removing minor nodes. RESOLUTION: The code is modified to address the device path name inconsistency and skip devid manipulation for third party devices. * 3869659 (Tracking ID: 3868444) SYMPTOM: Disk header timestamp is updated even if the disk group import fails. DESCRIPTION: While doing dg import operation, during join operation disk header timestamps are updated. This makes difficult for support to understand which disk is having latest config copy if dg import is failed and decision is to be made if force dg import is safe or not. RESOLUTION: Dump the old disk header timestamp and sequence number in the syslog which can be referred on deciding if force dg import would be safe or not * 3870440 (Tracking ID: 3873489) SYMPTOM: "pkgchk -n VRTSvxvm" fails with output as ERROR of chksum or mtime for vxvm-configuration files. DESCRIPTION: In solaris, patch is delivered and file attribute for changing vxvm configuration files needs to be having volatile "v" type in pkgmap file. The file attribute was not volatile for vxvm-configuration files, hence the errors for few config files were reported by pkgchk command because of check-sum and modification- time change. RESOLUTION: The code has been modified so as to change vxvm-config files' attribute to volatile. * 3874736 (Tracking ID: 3874387) SYMPTOM: Disk header information is not logged to the syslog sometimes even if the disk is missing and dg import fails. DESCRIPTION: In scenarios where disk has config copy enabled and get active disk record, then disk header information was not getting logged even though the disk is missing thereafter dg import fails. RESOLUTION: Dump the disk header information even if the disk record is active and attached to the disk group. * 3874961 (Tracking ID: 3871750) SYMPTOM: In parallel VxVM(Veritas Volume Manager) vxstat commands report abnormal disk IO statistic data. Like below: # /usr/sbin/vxstat -g -u k -dv -i 1 -S ...... dm emc0_2480 4294967210 4294962421 -382676k 4294967.38 4294972.17 ...... DESCRIPTION: After VxVM IO statistics was optimized for huge CPUs and disks, there's a race condition when multiple vxstat commands are running to collect disk IO statistic data. It causes disk's latest IO statistic value become smaller than previous one, hence VxVM treates the value overflow so that abnormal large IO statistic value is printed. RESOLUTION: Code changes are done to eliminate such race condition. * 3875230 (Tracking ID: 3554608) SYMPTOM: Mirroring a volume on 6.1 creates a larger plex than the original. DESCRIPTION: when mirror a volume, VXVM should ignore the disk alignment and use the DG(Disk Group) alignment. After VXVM got all the configuration from user command, it didn't check whether disk alignment will be used. Since disk alignment isn't ignored, the length of the mirror will be based on disk alignment. Hence the issue. RESOLUTION: Code changes have been made to use the DG alignment instead of disk alignment when mirror a volume. * 3875564 (Tracking ID: 3875563) SYMPTOM: While dumping the disk header information, human readable timestamp was not converted correctly from corresponding epoch time. DESCRIPTION: When disk group import fails if one of the disk is missing while importing the disk group, it will dump the disk header information the syslog. But, human readable time stamp was not getting converted correctly from corresponding epoch time. RESOLUTION: Code changes done to dump disk header information correctly. Patch ID: 150717-05 * 3372831 (Tracking ID: 2573229) SYMPTOM: On RHEL6, the server panics when Dynamic Multi-Pathing (DMP) executes PERSISTENT RESERVE IN command with REPORT CAPABILITIES service action on powerpath controlled device. The following stack trace is displayed: enqueue_entity at ffffffff81068f09 enqueue_task_fair at ffffffff81069384 enqueue_task at ffffffff81059216 activate_task at ffffffff81059253 pull_task at ffffffff81065401 load_balance_fair at ffffffff810657b7 thread_return at ffffffff81527d30 schedule_timeout at ffffffff815287b5 wait_for_common at ffffffff81528433 wait_for_completion at ffffffff8152854d blk_execute_rq at ffffffff8126d9dc emcp_scsi_cmd_ioctl at ffffffffa04920a2 [emcp] PowerPlatformBottomDispatch at ffffffffa0492eb8 [emcp] PowerSyncIoBottomDispatch at ffffffffa04930b8 [emcp] PowerBottomDispatchPirp at ffffffffa049348c [emcp] PowerDispatchX at ffffffffa049390d [emcp] MpxSendScsiCmd at ffffffffa061853e [emcpmpx] ClariionKLam_groupReserveRelease at ffffffffa061e495 [emcpmpx] MpxDefaultRegister at ffffffffa061df0a [emcpmpx] MpxTestPath at ffffffffa06227b5 [emcpmpx] MpxExtraTry at ffffffffa06234ab [emcpmpx] MpxTestDaemonCalloutGuts at ffffffffa062402f [emcpmpx] MpxIodone at ffffffffa0624621 [emcpmpx] MpxDispatchGuts at ffffffffa0625534 [emcpmpx] MpxDispatch at ffffffffa06256a8 [emcpmpx] PowerDispatchX at ffffffffa0493921 [emcp] GpxDispatch at ffffffffa0644775 [emcpgpx] PowerDispatchX at ffffffffa0493921 [emcp] GpxDispatchDown at ffffffffa06447ae [emcpgpx] VluDispatch at ffffffffa068b025 [emcpvlumd] GpxDispatch at ffffffffa0644752 [emcpgpx] PowerDispatchX at ffffffffa0493921 [emcp] GpxDispatchDown at ffffffffa06447ae [emcpgpx] XcryptDispatchGuts at ffffffffa0660b45 [emcpxcrypt] XcryptDispatch at ffffffffa0660c09 [emcpxcrypt] GpxDispatch at ffffffffa0644752 [emcpgpx] PowerDispatchX at ffffffffa0493921 [emcp] GpxDispatch at ffffffffa0644775 [emcpgpx] PowerDispatchX at ffffffffa0493921 [emcp] PowerSyncIoTopDispatch at ffffffffa04978b9 [emcp] emcp_send_pirp at ffffffffa04979b9 [emcp] emcp_pseudo_blk_ioctl at ffffffffa04982dc [emcp] __blkdev_driver_ioctl at ffffffff8126f627 blkdev_ioctl at ffffffff8126faad block_ioctl at ffffffff811c46cc dmp_ioctl_by_bdev at ffffffffa074767b [vxdmp] dmp_kernel_scsi_ioctl at ffffffffa0747982 [vxdmp] dmp_scsi_ioctl at ffffffffa0786d42 [vxdmp] dmp_send_scsireq at ffffffffa078770f [vxdmp] dmp_do_scsi_gen at ffffffffa077d46b [vxdmp] dmp_pr_check_aptpl at ffffffffa07834dd [vxdmp] dmp_make_mp_node at ffffffffa0782c89 [vxdmp] dmp_decode_add_disk at ffffffffa075164e [vxdmp] dmp_decipher_instructions at ffffffffa07521c7 [vxdmp] dmp_process_instruction_buffer at ffffffffa075244e [vxdmp] dmp_reconfigure_db at ffffffffa076f40e [vxdmp] gendmpioctl at ffffffffa0752a12 [vxdmp] dmpioctl at ffffffffa0754615 [vxdmp] dmp_ioctl at ffffffffa07784eb [vxdmp] dmp_compat_ioctl at ffffffffa0778566 [vxdmp] compat_blkdev_ioctl at ffffffff8128031d compat_sys_ioctl at ffffffff811e0bfd sysenter_dispatch at ffffffff81050c20 DESCRIPTION: Dynamic Multi-Pathing (DMP) uses PERSISTENT RESERVE IN command with the REPORT CAPABILITIES service action to discover target capabilities. On RHEL6, system panics unexpectedly when Dynamic Multi-Pathing (DMP) executes PERSISTENT RESERVE IN command with REPORT CAPABILITIES service action on powerpath controlled device coming from EMC Clarion/VNX array. This bug has been reported to EMC powperpath engineering. RESOLUTION: The Dynamic Multi-Pathing (DMP) code is modified to execute PERSISTENT RESERVE IN command with the REPORT CAPABILITIES service action to discover target capabilities only on non-third party controlled devices. * 3440232 (Tracking ID: 3408320) SYMPTOM: Thin reclamation fails for EMC 5875 arrays with the following message: # vxdisk reclaim Reclaiming thin storage on: Disk : Reclaim Partially Done. Device Busy. DESCRIPTION: As a result of recent changes in EMC Microcode 5875, Thin Reclamation for EMC 5875 arrays fails because reclaim request length exceeds the maximum "write_same" length supported by the array. RESOLUTION: The code has been modified to correctly set the maximum "write_same" length of the array. * 3455533 (Tracking ID: 3524376) SYMPTOM: Removed Patch ID from the VxVM Solaris 10 patch PSTAMP. old pstamp: "6.1.1.000-2014-03-30-150717-05" DESCRIPTION: Patch ID is used only in Solaris 10 platform. To make the PSTAMP consistent across platforms, Symantec has formulated a unique PSTAMP format for all stack products such as <patch-version>-<date>-<timestamp>. For example: 6.1.1.000-2014-03-30-19.00.01 RESOLUTION: To display the patch id installed on the machine, use the command "showrev -p | grep <pkgname>". * 3457363 (Tracking ID: 3462171) SYMPTOM: When SCSI-3 Persistent Reservation command ioctls are issued on non-SCSI devices, dmpnode gets disabled with the following messages in the system log: [..] Mar 10 22:44:19 s40sb1 vxdmp: NOTICE: VxVM vxdmp V-5-3-0 dmp_scsi_ioctl: devno=0x11700000002 ret=0x19 Mar 10 22:44:19 s40sb1 vxdmp: NOTICE: VxVM vxdmp V-5-0-0 [Warn] SCSI error opcode=0x5e returned rq_status=0x7 cdb_status=0x0 key=0x0 asc=0x0 ascq=0x0 on path 279/0x2 Mar 10 22:44:19 s40sb1 vxdmp: NOTICE: VxVM vxdmp V-5-3-1476 dmp_notify_events: Total number of events = 1 Mar 10 22:44:19 s40sb1 vxdmp: NOTICE: VxVM vxdmp V-5-3-0 dmp_pr_send_cmd failed with transport error: uscsi_rqstatus = 7ret = -1 status = 0 on dev 279/0x2 Mar 10 22:44:19 s40sb1 vxdmp: NOTICE: VxVM vxdmp V-5-0-112 [Warn] disabled path 279/0x0 belonging to the dmpnode 302/0x0 due to path failure Mar 10 22:44:19 s40sb1 vxdmp: NOTICE: VxVM vxdmp V-5-3-1476 dmp_notify_events: Total number of events = 2 Mar 10 22:44:19 s40sb1 vxdmp: NOTICE: VxVM vxdmp V-5-0-111 [Warn] disabled dmpnode 302/0x0 [..] DESCRIPTION: Non-SCSI devices do not support SCSI persistent reservation commands. Thus, SCSI-3 persistent commands 'ioctls' on non-SCSI device fails with unsupported error codes. Due to which, Dynamic Multipathing (DMP) ends up treating ioctl failure as path error causing dmpnode to get disabled. RESOLUTION: The DMP code is modified such that SCSI-3 persistent reservation commands are not sent on non-SCSI devices. * 3470255 (Tracking ID: 2847520) SYMPTOM: Users can create linked volume using 'vxsnap addmir ... mirdg= mirvol=' CLI. The target volume then can be used to create snapshot using 'vxsnap make source= snap= snapdg=' CLI. When such linked volume is created in the clustered environment and the resize operation is executed on the volume, it can cause corruption of target volume. So, if such a target volume is used to create snapshot, it would be corrupted as well. If the linked volume had a VxFS filesystem created on it and the user tries to mount the snapshot created using such a corrupted target volume, it might fail with the following error message: UX:vxfs mount: ERROR: V-3-26883: fsck log replay exits with 12 UX:vxfs mount: ERROR: V-3-26881: Cannot be mounted until it has been cleaned by fsck. Please run "fsck -V vxfs -y /dev/vx/dsk/snapdg/snapvol" before mounting. DESCRIPTION: When a linked volume is resized, maps are updated to keep track of regions inconsistent between the source volume and the target volume for the grown/shrunk region. Such update of map should ideally happen only from one node/machine in the cluster. Due to some problems, the map was getting updated concurrently from two different nodes/machine causing inconsistent maps. When this map was used to synchronize the target volume, it would lead to data on target volume not getting synchronized correctly and thus led to corrupted target volume. RESOLUTION: The code is modified to make sure that the map is updated only from one node if the volume is shared. * 3470257 (Tracking ID: 3372724) SYMPTOM: When the user installs VxVM, the system panics with the following warnings: vxdmp: WARNING: VxVM vxdmp V-5-0-216 mod_install returned 6 vxspec V-5-0-0 vxspec: vxio not loaded. Aborting vxspec load DESCRIPTION: When the user installs VxVM, if the DMP module fails to load, the cleanup procedure fails to reset the statistics timer (which is set while loading). As a result, the timer dereferences a function pointer which is already unloaded. Thereby, the system panics. RESOLUTION: The code is modified to perform a complete cleanup when DMP fails to load. * 3470260 (Tracking ID: 3415188) SYMPTOM: Filesystem/IO hangs during data replication with Symantec Replication Option (VVR) with the following stack trace: schedule() volsync_wait() volopobjenter() vol_object_ioctl() voliod_ioctl() volsioctl_real() vols_ioctl() vols_compat_ioctl() compat_sys_ioctl() sysenter_dispatch() DESCRIPTION: One of the structures of Symantec Replication Option that are associated with Storage Replicator Log (SRL) can become invalid because of improper locking mechanism in code that leads to IO/file system hang. RESOLUTION: The code is changed to have appropriate locks to protect the code. * 3470262 (Tracking ID: 3077582) SYMPTOM: A Veritas Volume Manager (VxVM) volume may become inaccessible causing the read/write operations to fail with the following error: # dd if=/dev/vx/dsk/<dg>/<volume> of=/dev/null count=10 dd read error: No such device 0+0 records in 0+0 records out DESCRIPTION: If I/Os to the disks timeout due to some hardware failures like weak Storage Area Network (SAN) cable link or Host Bus Adapter (HBA) failure, VxVM assumes that the disk is faulty or slow and it sets the failio flag on the disk. Due to this flag, all the subsequent I/Os fail with the ANo such deviceA error. RESOLUTION: The code is modified such that vxdisk now provides a way to clear the failio flag. To check whether the failio flag is set on the disks, use the vxkprint(1M) utility (under /etc/vx/diag.d). To reset the failio flag, execute the Avxdisk set <disk_name> failio=offA command, or deport and import the disk group that holds these disks. * 3470265 (Tracking ID: 3326964) SYMPTOM: VxVM () hangs in CVM environment in presence of Fast Mirror Resync FMR)/Flashsnap operations with the following stack trace: voldco_cvm_serialize() voldco_serialize() voldco_handle_dco_error() voldco_mapor_sio_done() voliod_iohandle() voliod_loop() child_rip() voliod_loop() child_rip() DESCRIPTION: During split brain testing in presence of FMR activities, when errors occur on the Data change object (DCO), the DCO error handling code sets up a flag due to which the same error gets set again in its handler. Consequently the VxVM Staged I/O (SIO) loop around the same code and causes the hang. RESOLUTION: The code is changed to appropriately handle the scenario. * 3470270 (Tracking ID: 3403390) SYMPTOM: The linked-to volume goes into NEEDSYNC state if the system crashes while I/Os are ongoing on the linked-from volume or the linked-from volume is open by any application. DESCRIPTION: In this case, when the system comes up, volume recovery is performed on the linked-from volume which also recovers the linked-to volumes associated to it. But still, the linked-to volumes were shown in NEEDSYNC state even though no recovery was required. RESOLUTION: The code is modified to prevent linked-to volume from going into NEEDSYNC state in the above mentioned scenarios and hence no recovery is required for the linked-to volume. * 3470272 (Tracking ID: 3385753) SYMPTOM: Replication to the Disaster Recovery (DR) site hangs even though Replication links (Rlinks) are in the connected state. DESCRIPTION: Based on Network conditions under User Datagram Protocol (UDP), Symantec Replication Option (Veritas Volume Replicator (VVR) has its own flow control mechanism to control the flow/amount of data to be sent over the network. Under error prone network conditions which cause timeouts, VVR's flow control values become invalid, resulting in replication hang. RESOLUTION: The code is modified to ensure valid values for the flow control even under error prone network conditions. * 3470274 (Tracking ID: 3373208) SYMPTOM: Veritas Dynamic Multipathing (DMP) wrongly sends the SCSI PR OUT command with Activate Persist Through Power Loss (APTPL) bit with value as A0A to array that supports the APTPL capabilities. DESCRIPTION: DMP correctly recognizes the APTPL bit settings and stores them in the database. DMP verifies this information before sending the SCSI PR OUT command so that the APTPL bit can be set appropriately in the command. But, due to issue in the code, DMP was not handling the node's device number properly. Due to which, the APTPL bit was getting incorrectly set in the SCSI PR OUT command. RESOLUTION: The code is modified to handle the node's device number properly in the DMP SCSI command code path. * 3470275 (Tracking ID: 3417044) SYMPTOM: The system becomes unresponsive while creating Veritas Volume Replication (VVR) TCP connection. The vxiod kernel thread reports the following stack trace: mt_pause_trigger() wait_for_lock() spinlock_usav() kfree() t_kfree() kmsg_sys_free() nmcom_connect() vol_rp_connect() vol_rp_connect_start() voliod_iohandle() voliod_loop() DESCRIPTION: When multiple TCP connections are configured, some of these connections are still in the active state and the connection request process function attempts to free a memory block. If this block is already freed by a previous connection, then the kernel thread may become unresponsive on a HPUX platform. RESOLUTION: The code is modified to resolve the issue of freeing a memory block which is already freed by another connection. * 3470279 (Tracking ID: 3300418) SYMPTOM: VxVM volume operations on shared volumes cause unnecessary read I/Os on the disks that have both configuration copy and log copy disabled on slaves. DESCRIPTION: The unnecessary disk read I/Os are generated on slaves when VxVM is refreshing the private region information into memory during VxVM transaction. In fact, there is no need to refresh the private region information if the disk already has disabled the configuration copy and log copy. RESOLUTION: The code has been changed to skip the refreshing if both configuration copy and log copy are already disabled on master and slaves. * 3470282 (Tracking ID: 3374200) SYMPTOM: A Linux-based system panic or exceptional IO delay is observed on the volume when a snapshot operation is executed. In case of a panic, the following stack trace is reported: spin_lock_irqsave() volpage_getlist_internal() volpage_getlist() voldco_needupdate_instant() volfmr_needupdate_instant() DESCRIPTION: Volume Manager uses a system wide pool of memory to manage IO and snapshot operations (paging module). The default size is 6MB on Linux, which suffices for 1 TB volume. For snapshot operations, the default size is increased dynamically without considering the IO performed by other volumes. Thus, during a snapshot operation, contention on the paging module occurs leading to delays in IO handling or sometimes panics. RESOLUTION: The code is modified such that the default paging module size on Linux is increased to 64M. However, if you still face the issue then you can manually increase the paging module size using the volpagemod_max_memsz tunable. Or, to avoid manual intervention, To avoid contention during the snapshot operation, the paging module size is increased considering the other online volumes in the system. * 3470287 (Tracking ID: 3271315) SYMPTOM: The vxdiskunsetup command with the shred option fails to shred sliced or simple disks on Solaris X86 platform. Errors of the following format can be seen: VxVM vxdisk ERROR V-5-1-16576 disk_shred: Shred failed one or more writes VxVM vxdisk ERROR V-5-1-16658 disk_shred: Shred wrote 1 pages, of which 1 encountered errors DESCRIPTION: The error occurs as the check for 'disk size' in the vxdiskunsetup command returns incorrect value for sliced or simple disks on Solaris x86. RESOLUTION: The code is modified to correctly check for disk size of simple and sliced disks. * 3470290 (Tracking ID: 2999871) SYMPTOM: The vxinstall(1M) command gets into a hung state when it is invoked through Secure Shell (SSH) remote execution. DESCRIPTION: The vxconfigd process which starts from the vxinstall script fails to close the inherited file descriptors, causing the vxinstall to enter the hung state. RESOLUTION: The code is modified to handle the inherited file descriptors for the vxconfigd process. * 3470300 (Tracking ID: 3340923) SYMPTOM: The following kind of messages are seen in system log corresponding to the unavailable asymmetric access state paths. .. .. kernel: [859680.551729] end_request: I/O error, dev sdc, sector 44515143296 kernel: [859680.552219] VxVM vxdmp V-5-0-112 disabled path 8/0x20 belonging to the dmpnode 201/0x40 due to path failure kernel: [859690.554947] VxVM vxdmp V-5-0-1652 dmp_alua_update_alua_info: change in aas detected for node=8/32 old_aas: 0 new_aas: 3 kernel: [859690.554980] VxVM vxdmp V-5-0-148 enabled path 8/0x20 belonging to the dmpnode 201/0x40 .. .. DESCRIPTION: The paths with unavailable asymmetric access state do not service I/O. Hence by design, DMP tries to return path failure for such paths. This avoids such paths getting selected for I/Os. But due to an issue, DMP returns path okay and causes path to be enabled back. RESOLUTION: The code is modified to fix this issue. * 3470301 (Tracking ID: 2812161) SYMPTOM: In a VVR environment, after the Rlink is detached, the vxconfigd(1M) daemon on the secondary host may hang. The following stack trace is observed: cv_wait delay_common delay vol_rv_service_message_start voliod_iohandle voliod_loop ... DESCRIPTION: There is a race condition if there is a node crash on the primary site of VVR and if any subsequent Rlink is detached. The vxconfigd(1M) daemon on the secondary site may hang, because it is unable to clear the I/Os received from the primary site. RESOLUTION: The code is modified to resolve the race condition. * 3470303 (Tracking ID: 3314647) SYMPTOM: The vxcdsconvert(1M) command fails with following options when disk group (DG) contains multiple volumes: /etc/vx/bin/vxcdsconvert -o novolstop -g group move_subdisks_ok=yes evac_subdisks_ok=yes relayout: ERROR! Plex column offset is not strictly increasing for column/plex 0/ DESCRIPTION: When the vxcdsconvert(1M) command is invoked with "group" option, it converts non-cds disks to Cross-Platform-Data Sharing(CDS) formatted disks. The command will align and resize subdisks within the DG to make sure alignment of all volumes is 8k and finally set CDS flag on the DG. The command may need to relocate subdisks when it formats the disks with CDS format. To do this, it uses some global variables which indicate the subdisks from volume which need to be re-analyzed. To align all volumes at 8K boundaries, same global variables are used and since these global variables were already set during disk initialization, stale values were used during volume alignment which causes the error. RESOLUTION: The code is modified so that before volumes are considered for 8K alignment, the global variables, which indicate whether some subdisks need to be reanalyzed for alignment or not, are reset. * 3470322 (Tracking ID: 3399323) SYMPTOM: The reconfiguration of Dynamic Multipathing (DMP) database fails with the below error: VxVM vxconfigd DEBUG V-5-1-0 dmp_do_reconfig: DMP_RECONFIGURE_DB failed: 2 DESCRIPTION: As part of the DMP database reconfiguration process, controller information from DMP user-land database is not removed even though it is removed from DMP kernel database. This creates inconsistency between the user-land and kernel-land DMP database. Because of this, subsequent DMP reconfiguration fails with above error. RESOLUTION: The code changes have been made to properly remove the controller information from the user-land DMP database. * 3470345 (Tracking ID: 3281004) SYMPTOM: For DMP minimum queue I/O policy with large number of CPUs, the following issues are observed since the VxVM 5.1 SP1 release: 1. CPU usage is high. 2. I/O throughput is down if there are many concurrent I/Os. DESCRIPTION: The earlier minimum queue I/O policy is used to consider the host controller I/O load to select the least loaded path. For VxVM 5.1 SP1 version, an addition was made to consider the I/O load of the underlying paths of the selected host based controllers. However, this resulted in the performance issues, as there were lock contentions with the I/O processing functions and the DMP statistics daemon. RESOLUTION: The code is modified such that the host controller paths I/O load is not considered to avoid the lock contention. * 3470347 (Tracking ID: 3444765) SYMPTOM: In Cluster Volume Manager (CVM), shared volume recovery may take long time for large configurations. DESCRIPTION: In the CVM environment, the volume recovery operation involves following tasks: 1. Unconditional Data Change Object (DCO) volume recovery. 2. Recovery using the vxvol noderecover(1M) command. But the DCO volume recovery is done serially and the noderecover(1M) command is executed separately for each volume which needs recovery. Therefore, the complete recovery operation can take longer time. RESOLUTION: The code changes are done to recover DCO volume only if required and the recovery of multiple DCO volumes will be done in parallel. Similarly, single vxvol noderecover(1M) command will be issued for multiple volumes. * 3470350 (Tracking ID: 3437852) SYMPTOM: The system panics when Symantec Replicator Option goes to PASSTHRU mode. Panic stack trace might look like: vol_rp_halt() vol_rp_state_trans() vol_rv_replica_reconfigure() vol_rv_error_handle() vol_rv_errorhandler_callback() vol_klog_start() voliod_iohandle() voliod_loop() DESCRIPTION: When Storage Replicator Log (SRL) gets faulted for any reason, VVR goes into the PASSTHRU Mode. At this time, a few updates are erroneously freed. When these updates are accessed during the correct processing, access to these updates results in panic as the updates are already freed. RESOLUTION: The code changes have been made not to free the updates erroneously. * 3470352 (Tracking ID: 3450758) SYMPTOM: The slave node panics when it tries to join the cluster with the following stack: bad_area_nosemaphore() page_fault() vol_plex_iogen() volobject_iogen() vol_cvol_volobject_iogen() vol_cvol_init_iogen() vol_cache_linkdone() cvm_msg_cfg_end() vol_kmsg_request_receive() vol_kmsg_receiver() kernel_thread() DESCRIPTION: The panic happens when the node generates a Staged I/O (SIO) for the plex object in the process of validating the cache object (parent) and plex object (child) associations. Some of the fields of the plex object which have not been populated are accessed as part of the SIO generation. This access to NULL fields leads to panic. RESOLUTION: The code changes have been made to avoid accessing those NULL fields of plex object while the slave node joins the cluster. * 3470353 (Tracking ID: 3236772) SYMPTOM: Replication with heavy I/O loads on primary sites result in the following errors on the secondary site: 1. "Transaction aborted waiting for io drain" and/or 2. "vradmin ERROR Lost connection to host" DESCRIPTION: A deadlock between 'transaction' and messages delivered to the secondary site results in repeated timeouts of transactions on the secondary site. These repeated transaction timeouts cause transaction failures and/or session timeouts between primary and secondary sites. RESOLUTION: The code is modified to resolve the deadlock condition. * 3470354 (Tracking ID: 3446415) SYMPTOM: When the file system shrink operation is performed on FileStore, a different pool may get added to the file system Example: # fs create mirrored 8g 2 pool01 protection=disk # fs list FS STATUS SIZE LAYOUT MIRRORS COLUMNS USE% NFS SHARED CIFS SHARED SECONDARY TIER POOL LIST ========================= ====== ==== ====== ======= ======= ==== ========== =========== ============== ========= online 8.00G mirrored 2 - 1% no no no pool01 # fs shrinkto primary 4g # fs list FS STATUS SIZE LAYOUT MIRRORS COLUMNS USE% NFS SHARED CIFS SHARED SECONDARY TIER POOL LIST ========================= ====== ==== ====== ======= ======= ==== ========== =========== ============== ============= online 4.00G mirrored 2 - 2% no no no pool01, pool02 DESCRIPTION: While performing shrink operation on volume, associated DCO volume gets recreated. But during this operation, specified site tags on command line are not taken into consideration. Thus, new DCO volumes may get created on the disk that has different site tag. Due to this, different pools are added to file system on FileStore. RESOLUTION: The code is modified to properly allocate DCO volume within the specified pool of the disk during the volume shrink operation. * 3470382 (Tracking ID: 3368361) SYMPTOM: When site consistency is configured within a private disk group and Cluster Volume Manager (CVM) is up, the reattach operation of a detached site fails. DESCRIPTION: When you try to reattach the detached site configured in a private disk-group with CVM up on that node, the reattach operation fails with the following error "Disk (disk_name) do not have connectivity from one or more cluster nodes". The reattach operation fails because you are not checking the shared attribute of the disk group when you apply the disk connectivity check for a private disk group. RESOLUTION: The code is modified to make the disk connectivity check explicit for a shared disk group by checking the shared attribute of a disk group. * 3470383 (Tracking ID: 3455460) SYMPTOM: The vxfmrshowmap and verify_dco_header utilities fail with the following error message: vxfmrshowmap: VxVM ERROR V-5-1-15443 seek to for FAILED:Error 0 verify_dco_header: Cannot lseek to offset: DESCRIPTION: The issue occurs because the large offsets are not handled properly while seeking using 'lseek', as a part of the vxfmrshowmap and verify_dco_header utilities. RESOLUTION: The code is modified to properly handle large offsets as a part of the vxfmrhsowmap and verify_dco_header utilities. * 3470384 (Tracking ID: 3440790) SYMPTOM: Sometimes the vxassist(1M) command with parameter mirror and the vxplex(1M) command with parameter att hang. The hang is observed by looking at the vxtask list output which shows no progress of the tasks. DESCRIPTION: Sometimes the vxassist(1M) command with parameter mirror and the vxplex command(1M) with parameter att result into hang, showing no progress of operation in the vxtask list o/p. This operation requires a kernel memory to copy data from one plex to another and this memory is allocated from a dedicated Veritas Volume Manager (VxVM) memory pool. Due to some problems in calculating free memory in the pool, the task forever waits for memory even though enough memory is available in pool. RESOLUTION: The code is modified to properly compute the free memory available in VxVM's memory pool use for these operations. * 3470385 (Tracking ID: 3373142) SYMPTOM: Manual pages for vxedit and vxassist do not contain details about updated behavior of these commands. DESCRIPTION: 1. vxedit manual page On this page it explains that if the reserve flag is set for a disk, then vxassist will not allocate a data subdisk on that disk unless the disk is specified on the vxassist command line. But Data Change Object (DCO) volume creation by vxassist or vxsnap command will not honor the reserved flag. 2. vxassist manual page DCO allocation policy has been updated starting from 6.0. The allocation policy may not succeed if there is insufficient disk space. The vxassist command then uses available space on the remaining disks of the disk group. This may prevent certain disk group from splitting or moving if the DCO plexes cannot accompany their parent data volume. RESOLUTION: The manual pages for both commands have been updated to reflect the new behavioral changes. * 3475525 (Tracking ID: 3475521) SYMPTOM: During a system reboot, the following error message is displayed on the console: es_rcm.pl:scripting protocol error DESCRIPTION: For printing message, Solaris RCM script requires to follow syntax of 'name=value' pair format. However, the print message in the RCM script does not follow this syntax. As a result, when the script is invoked, it returns an error message RESOLUTION: The code is modified to rectify the syntax of the print message in the RCM script. * 3490147 (Tracking ID: 3485907) SYMPTOM: Panic occurs in the I/O code path. The following stack trace is observed: ... volkcontext_process() volkiostart() vxiostrategy() ... or ... voliod_iohandle() voliod_loop() ... DESCRIPTION: When the snapshot reattach operation is in progress on a volume, the metadata of the snapshot get's updated. If any parallel I/O during this operation gets incorrect state of Metadata, this leads to IO's of zero size being created. This leads to system panic. RESOLUTION: The code is modified to avoid the generation of I/Os of zero length, on volumes which are under the snapshot operations. * 3506675 (Tracking ID: 3441356) SYMPTOM: Pre-check of the upgrade_start.sh script fails with the following error: ERROR "VxVM vxprint ERROR V-5-1-15324 Specify a disk group with -g " DESCRIPTION: Current logic in upgrade_start.sh expects "RESERVED_DG_BOOT" to be set to "nodg". This is valid only for releases older than 5.x. In the case of newer releases, "RESERVED_DG_BOOT" is set to either "bootdg" or . RESOLUTION: The code is modified to change the logic in the upgrade_start.sh script. * 3506676 (Tracking ID: 3435475) SYMPTOM: The vxcdsconvert(1M) conversion process gets aborted for a thin LUN formatted as a simple disk with Extensible Firmware Interface (EFI) format with the following error: VxVM vxcdsconvert ERROR V-5-2-2767 : Unable to add the disk back to the disk group DESCRIPTION: The vxcdsconvert(1M) command evacuates sub-disks to some other disks within DG before initializing non-CDS disks with CDS format. Subdisks residing on thin reclaimable disks are marked pending to be reclaimed after having been evacuated to other disks. When such a disk is removed from DG, since its subdisk is pending reclamation, subdisk records still point to the same disk. For EFI formatted disks the public region length of non-CDS disk is greater than that of CDS disks. When this disk is converted from non-CDS format to CDS format and is added back to DG, vxconfigd thinks the subdisk residing on the disk lies beyond the public region space of the converted CDS disk and it fails the conversion. RESOLUTION: The code is modified to not check sub-disk boundaries for sub-disks pending RECLAMATION as these records will be deleted eventually and are stale. * 3506679 (Tracking ID: 3435225) SYMPTOM: In a given CVR setup, rebooting the master node causes one of the slaves to panic with following stack: pse_sleep_thread vol_rwsleep_rdlock vol_kmsg_send_common vol_kmsg_send_prealloc cvm_obj_sendmsg_prealloc vol_rv_async_done volkcontext_process voldiskiodone DESCRIPTION: The issue is triggered by one of the code paths sleeping in interrupt context. RESOLUTION: The code is modified so that sleep is not invoked in interrupt context. * 3506707 (Tracking ID: 3400504) SYMPTOM: While disabling the host side HBA port, extended attributes of some devices are not present anymore. This happens even when there is a redundant controller present on the host which is in enabled state. An example output is shown below where the 'srdf' attribute of an EMC device (which has multiple paths through multiple controllers) gets affected. Before the port is disabled- # vxdisk -e list emc1_4028 emc1_4028 auto:cdsdisk emc1_4028 dg21 online c6t5000097208191154d112s2 srdf-r1 After the port is disabled- # vxdisk -e list emc1_4028 emc1_4028 auto:cdsdisk emc1_4028 dg21 online c6t5000097208191154d112s2 - DESCRIPTION: The code which prints the extended attributes used to print the attributes of the first path in the list of all paths. If the first path belongs to the controller which is disabled, its attributes will be empty. RESOLUTION: The code is modified to look for the path in enabled state among all the paths and then print the attributes of such path. * 3506709 (Tracking ID: 3259732) SYMPTOM: In a Clustered Volume Replicator (CVR) environment, if the SRL size grows and if it is followed by a slave node leaving and then re-joining the cluster then rlink is detached. DESCRIPTION: After the slave re-joins the cluster, it does not correctly receive and process the SRL resize information received from the master. This means that application writes initiated on this slave may corrupt the SRL causing rlink to detach. RESOLUTION: The code is modified so that when a slave joins the cluster, make sure that the SRL resize related information is correctly received and processed by the slave. * 3531906 (Tracking ID: 3526500) SYMPTOM: Disk IO failures occur with DMP IO timeout error messages when DMP (Dynamic Multi-pathing) IO statistics daemon is not running. Following are the timeout error messages: VxVM vxdmp V-5-3-0 I/O failed on path 65/0x40 after 1 retries for disk 201/0x70 VxVM vxdmp V-5-3-0 Reached DMP Threshold IO TimeOut (100 secs) I/O with start 3e861909fa0 and end 3e86190a388 time VxVM vxdmp V-5-0-0 [Error] i/o error occurred (errno=0x206) on dmpnode 201/0x70 DESCRIPTION: When IO is submitted to DMP, it sets the start time on the IO buffer. The value of the start time depends on whether the DMP IO statistics daemon is running or not. When the IO is returned as error from SCSI to DMP, instead of retrying the IO on alternate paths, DMP failed that IO with 300 seconds timeout error, but the IO has elapsed only few milliseconds in its execution. The miscalculation of DMP timeout happens only when DMP IO statistics daemon is not running. RESOLUTION: The code is modified to calculate appropriate DMP IO timeout value when DMP IO statistics demon is not running. * 3536289 (Tracking ID: 3492062) SYMPTOM: Dynamic Multi-Pathing (DMP) fails to get page 0x83 LUN identifier for EMC symmetrix LUNS and continuously logs the following error message: AVxVM vxdmp V-5-3-1984 dmp_restore_callback: devid could not be extracted from pg 0x83 on pathA DESCRIPTION: If DMP fails to get page 0x83 LUN identifier for EMC symmetrix LUNS during discovery, then DMP should set a device identifier unsupported flag on the corresponding DMP node. Currently, there is no code to set this flag and hence, the restore daemon identifies such devices as device identifier mismatch case and logs error messages. RESOLUTION: The code is modified to mark device identifier unsupported flag if it could not be extracted during discovery process so that the restore daemon does not pick it up for device identifier mismatch case. Also, the message is converted from NOTE message to LOG message type. * 3540122 (Tracking ID: 3482026) SYMPTOM: The vxattachd(1M) daemon reattaches plexes of manually detached site. DESCRIPTION: The vxattachd daemon reattaches plexes for a manually detached site that is the site with state as OFFLINE. As there was no check to differentiate between a manually detach site and the site that was detached due to IO failure. Hence, the vxattachd(1M) daemon brings the plexes online for manually detached site also. RESOLUTION: The code is modified to differentiate between manually detached site and the site detached due to IO failure. * 3543944 (Tracking ID: 3520991) SYMPTOM: The vxconfigd(1M) daemon dumps core due to memory corruption and displays the following stack trace: 1. malloc_y() malloc_common misc.xalloc() dll_iter_next() ddl_claim_single_disk() ddl_thread_claim_disk() ddl_task_start() volddl_thread_start() OR 2. free_y() free_common() ddl_name_value_api.nv_free ddl_vendor_claim_device() ddl_claim_single_disk() ddl_thread_claim_disk() ddl_task_start() at volddl_thread_start() DESCRIPTION: Device Discovery Layer (DDL) maintains a bitmap data structure for the Array Support Libraries (ASLs) present on hosts. If the number of Array Support Libraries (ASLs) are greater than or equal to 64, then, while setting the 64th bit in the bitmap data structure associated with the ASLs, DDL performs an out of bound write operation. This memory corruption causes either vxconfigd daemon to dump core or other unexpected issues. RESOLUTION: The Device Discovery Layer (DDL) code is modified to fix this issue. * 3547093 (Tracking ID: 2422535) SYMPTOM: On Solaris, the vxrelocd progress parameter is hard coded in the vxvm-recover binary and after installing VxVM patches or latest packages, the specific option is lost. DESCRIPTION: On Solaris, you can modify the vxrelocd parameters in the vxvm-recover binary. But after installing Veritas Volume Manager (VxVM) patch or latest packages, the new vxvm-recover binary overwrites the modified vxvm-recover. As a result, the specific parameter is lost. RESOLUTION: On Solaris, a configure file (/etc/vx/vxvm-recover.conf) is added to hold the modified parameter. Patch ID: 150717-01 * 3424815 (Tracking ID: 3424704) SYMPTOM: vxbootsetup command fails while using localized messages with below error: VxVM vxbootsetup ERROR V-5-2-5651 DESCRIPTION: The issue occurs because while comparing the output of 'vxsctl mode' command, it is not converted in English language. The output of the command comes in localized form which leads to the failure of vxbootsetup. RESOLUTION: Code changes have been made to convert the output of 'vxdctl mode' command in English before comparing. * 3440980 (Tracking ID: 3107699) SYMPTOM: VxDMP causes system panic after a shutdown or reboot and displays the following stack trace: mutex_enter() volinfo_ioct() volsioctl_real() cdev_ioctl() dmp_signal_vold() dmp_throttle_paths() dmp_process_stats() dmp_daemons_loop() thread_start() OR panicsys() vpanic_common() panic+0x1c() mutex_enter() cdev_ioctl() dmp_signal_vold() dmp_check_path_state() dmp_restore_callback() dmp_process_scsireq() dmp_daemons() thread_start() DESCRIPTION: In a special scenario of system shutdown/reboot, the DMP (Dynamic MultiPathing) I/O statistic daemon tries to call the ioctl functions in VXIO module which is being unloaded and this causes system panic. RESOLUTION: The code is modified to stop the DMP I/O statistic daemon before system shutdown/reboot. Also added a code change to avoid to other probe to vxio devices during shutdown. * 3444900 (Tracking ID: 3399131) SYMPTOM: The following command fails with an error for a path managed by Third Party Driver (TPD) which co-exists with DMP. # vxdmpadm -f disable path= VxVM vxdmpadm ERROR V-5-1-11771 Operation not supported DESCRIPTION: The Third Party Drivers manage the devices with or without the co-existence of Dynamic Multi Pathing driver. Disabling the paths managed by a third party driver which does not co-exist with DMP is not supported. But due to bug in the code, disabling the paths managed by a third party driver which co-exists with DMP also fails. The same flags are set for all third party driver devices. RESOLUTION: The code has been modified to block this command only for the third party drivers which cannot co-exist with DMP. * 3445233 (Tracking ID: 3339195) SYMPTOM: While running vxdiskadm following error message is observed. "/usr/lib/vxvm/voladm.d/bin/exclude.do: syntax error at line 103: `CMD=$' unexpected" Volume Manager Device Operations Menu: VolumeManager/Disk/ExcludeDevices 1 Suppress all paths through a controller from VxVM's view 2 Suppress a path from VxVM's view 3 Suppress disks from VxVM's view by specifying a VID:PID combination 4 Suppress all paths to a disk 5 Prevent multipathing of all disks on a controller by VxVM 6 Prevent multipathing of a disk by VxVM 7 Prevent multipathing of disks by specifying a VID:PID combination 8 List currently suppressed devices ? Display help about menu ?? Display help about the menuing system q Exit from menus Select an operation to perform: 4 /usr/lib/vxvm/voladm.d/bin/exclude.do: syntax error at line 103: `CMD=$' unexpected DESCRIPTION: Script fails as legacy backtick option "`CMD`" to store command in a variable is not used. RESOLUTION: Code changes are made to use correct syntax by putting required backticks. * 3445234 (Tracking ID: 3358904) SYMPTOM: System panics with following stack: dmp_alua_get_owner_state() dmp_alua_get_path_state() dmp_get_path_state() dmp_get_enabled_ctlrs() dmp_info_ioctl() gendmpioctl() dmpioctl() vol_dmp_ktok_ioctl() dmp_get_enabled_cntrls() vx_dmp_config_ioctl() quiescesio_start() voliod_iohandle() voliod_loop() kernel_thread() DESCRIPTION: System running with Asymmetric Logical Unit Access (ALUA) LUNs sometimes panics during path fault scenarios. This happens due to possible NULL pointer access in some of cases due to bug in the code . RESOLUTION: Code changes have been made to fix the bug. * 3445249 (Tracking ID: 3344796) SYMPTOM: shared disk group import fails. I/O failure messages with reservation conflict seen in system log.. .. .. Tue Oct 22 03:40:54.071: I/O error occurred (errno=0x5) on Dmpnode ams_23000_60 Tue Oct 22 03:40:54.160: I/O error occurred on Path c3t50060E801025BEE0d12s2 belonging to Dmpnode ams_23000_61 Tue Oct 22 03:40:54.161: Unmarked as ioerr Path c3t50060E801025BEE0d12s2 belonging to Dmpnode ams_23000_61 Tue Oct 22 03:40:54.161: I/O analysis done as DMP_PATH_OKAY on Path c3t50060E801025BEE0d12s2 belonging to Dmpnode ams_23000_61 Tue Oct 22 03:40:54.161: I/O retry(1) on Path c3t50060E801025BEE0d12s2 belonging to Dmpnode ams_23000_61 Tue Oct 22 03:40:54.461: SCSI error occurred on Path c3t50060E801025BEE0d12s2: opcode=0x2a reported reservation conflict (status=0x18, key=0x0, asc=0x0, ascq=0x0) Tue Oct 22 03:40:54.461: I/O analysis done as DMP_RSV_CONFLICT_ERR on Path c3t50060E801025BEE0d12s2 belonging to Dmpnode ams_23000_61 .. .. DESCRIPTION: From I/O domains DMP metanode is exported to LDOM guest as virtual device. On this virtual device when any SCSI-3 Persistent Reservation(PR)command is issued then DMP at I/O domain layer catches it and takes appropriate action. This was not working as expected causing missing keys from some of paths at I/O domain layer. Hence I/O routed from unregistered path failing with reservation conflict. RESOLUTION: Made code changes to fix the same. * 3445268 (Tracking ID: 3422504) SYMPTOM: path disable/enable messages seen in system log of LDOM guest: .. Sat Jan 25 11:20:29.054: Disabled Path c2d2s2 belonging to Dmpnode hitachi_vsp0_1986 due to path failure Sat Jan 25 11:20:33.053: SCSI error occurred on Path c2d1s2: opcode=0x5f reported unit attention (status=0x2, key=0x6, asc=0x2a, ascq=0x4) reservations released Sat Jan 25 11:20:33.536: Enabled Path c2d2s2 belonging to Dmpnode hitachi_vsp0_1986 .. DESCRIPTION: In case of LDOMs, DMP(Veritas Dynamic Multipathing) routes SCSI commands using ioctl interface to underlaying SCSI device. In some specific scenario, this interface was making incorrect decision causing path to get disabled. This gets enabled back subsequently. RESOLUTION: code changes were done in DMP to fix issue. * 3445991 (Tracking ID: 3421326) SYMPTOM: DMP(Veritas Dynamic Multipathing) keeps on logging 'copyin failure messages' in system log repeatedly .. Jan 24 00:53:34 s5120sb0 vxdmp: NOTICE: VxVM vxdmp V-5-3-0 dmp_handler_pgr_out: uscsi_cdbcopyin failed err = 14 .. DESCRIPTION: Corresponding log message was kept at default log level. In some cases, it is expected that 'copyin' would fail. RESOLUTION: Log level of message has been bumped. * 3445992 (Tracking ID: 3421322) SYMPTOM: 1. I/O failure messages with reservation conflict logged in system log of LDOM guest.. .. .. Tue Oct 22 03:40:54.161: I/O analysis done as DMP_PATH_OKAY on Path c3t50060E801025BEE0d12s2 belonging to Dmpnode 3pardata0_940 Tue Oct 22 03:40:54.461: SCSI error occurred on Path c3t50060E801025BEE0d12s2: opcode=0x2a reported reservation conflict (status=0x18, key=0x0, asc=0x0, ascq=0x0) Tue Oct 22 03:40:54.461: I/O analysis done as DMP_RSV_CONFLICT_ERR on Path c3t50060E801025BEE0d12s2 belonging to Dmpnode 3pardata0_940 Tue Oct 22 03:40:54.461: I/O error occurred (errno=0x5) on Dmpnode ams_23000_60 .. 2. faiiling tag is seen against disks in vxdisk list output of LDOM guest bash-3.2# vxdisk list .. 3pardata0_940 auto:cdsdisk 3pardata0_940 dg1 online failing DESCRIPTION: With disk based I/O fencing, multipathing software like Dynamic multipathing (DMP) does SCSI-3 Persistent Reservation (SCSI-3 PR) on a virtual storage device (vSCSI). Now DMP in I/O domains would catch those commands. While processing it might generate multiple SCSI commands and those may get retried for retryable errors from end storage device. This retry passes but failure was returned to LDOM guest due to code bug. RESOLUTION: code changes were done in DMP to fix issue. * 3446010 (Tracking ID: 3421330) SYMPTOM: [root@vcst42b-v9 ~]#vxfentsthdw -n Veritas vxfentsthdw version 6.1.0 Solaris .. .. Remove key KeyB on node vcst42b-v9 ..................................... Failed Removing test keys and temporary files, if any... DESCRIPTION: The vxfentsthdw is compliance utility provided to test disks for support of SCSI-3 persistent reservations. It verifies by simulating various scenarios that the shared storage device can be used for I/O fencing. Some of scenarios were failing with DMP (Veritas Dynamic Multipathing)backed virtual devices due to unhandled cases from DMP in I/O domains. RESOLUTION: Code changes were done to fix unhandled cases. ALL SCSI-3 scenarios from vxfentsthdw utility tests PASS for DMP backed virtual devices from LDOM guest. * 3446126 (Tracking ID: 3338208) SYMPTOM: writes from fenced out LDOM guest node on Active-Passive (AP/F) shared storage device fail. I/O failure messages seen in system log.. .. Mon Oct 14 06:03:39.411: I/O retry(6) on Path c0d8s2 belonging to Dmpnode emc_clariion0_48 Mon Oct 14 06:03:39.951: SCSI error occurred on Path c0d8s2: opcode=0x2a reported device not ready (status=0x2, key=0x2, asc=0x4, ascq=0x3) LUN not ready, manual intervention required Mon Oct 14 06:03:39.951: I/O analysis done as DMP_PATH_FAILURE on Path c0d8s2 belonging to Dmpnode emc_clariion0_48 Mon Oct 14 06:03:40.311: Marked as failing Path c0d1s2 belonging to Dmpnode emc_clariion0_48 Mon Oct 14 06:03:40.671: Disabled Path c0d8s2 belonging to Dmpnode emc_clariion0_48 due to path failure .. .. DESCRIPTION: write SCSI commands from fenced out host should fail with reservation conflict from shared device. This error code needs to be propagated to upper layers for appropriate action. In DMP ioctl interface, DMP first sends command through available active paths. If command fails then command was unnecessarily tried on passive paths. This causes command to be failed with not ready error code and this error code getting propagated to upper layer instead of reservation conflict. RESOLUTION: Added code changes to not retry IO SCSI commands on passive paths in case of Active-Passive (AP/F) shared storage device. * 3447306 (Tracking ID: 3424798) SYMPTOM: Veritas Volume Manager (VxVM) mirror attach operations (e.g., plex attach, vxassist mirror, and third-mirror break-off snapshot resynchronization) may take longer time under heavy application I/O load. The vxtask list command shows tasks are in the 'auto-throttled (waiting)' state for a long time. DESCRIPTION: With the AdminIO de-prioritization feature, VxVM administrative I/O's (e.g. plex attach, vxassist mirror, and third-mirror break-off snapshot resynchronization) are de-prioritized under heavy application I/O load, but this can lead to very slow progress of these operations. RESOLUTION: The code is modified to disable the AdminIO de-prioritization feature. * 3447894 (Tracking ID: 3353211) SYMPTOM: A. After EMC Symmetrix BCV (Business Continuance Volume) device switches to read-write mode, continuous vxdmp (Veritas Dynamic Multi Pathing) error messages flood syslog as shown below: NOTE VxVM vxdmp V-5-3-1061 dmp_restore_node: The path 18/0x2 has not yet aged - 299 NOTE VxVM vxdmp 0 dmp_tur_temp_pgr: open failed: error = 6 dev=0x24/0xD0 NOTE VxVM vxdmp V-5-3-1062 dmp_restore_node: Unstable path 18/0x230 will not be available for I/O until 300 seconds NOTE VxVM vxdmp V-5-3-1061 dmp_restore_node: The path 18/0x2 has not yet aged - 299 NOTE VxVM vxdmp V-5-0-0 [Error] i/o error occurred (errno=0x6) on dmpnode 36/0xD0 .. .. B. DMP metanode/path under DMP metanode gets disabled unexpectedly. DESCRIPTION: A. DMP caches the last discovery NDELAY open for the BCV dmpnode paths. BCV device switching to read-write mode is an array side operation. Typically in such cases, the system administrators are required to run the following command: 1. vxdisk rm OR In case of parallel backup jobs, 1. vxdisk offline 2. vxdisk online This causes DMP to close cached open, and during the next discovery, the device is opened in read-write mode. If the above steps are skipped then it causes the DMP device to go in state where one of the paths is in read-write mode and the others remain in NDELAY mode. If the above layers request for NORMAL open, then the DMP has the code to close NDELAY cached open and reopen in NORMAL mode. When the dmpnode is online, this happens only for one of the paths of dmpnode. B. DMP performs error analysis for paths on which I/O has failed. In some cases, the SCSI probes are sent, failed with the return value/sense codes that are not handled by DMP. This causes the paths to get disabled. RESOLUTION: A. The code is modified for the DMP EMC ASL (Array Support Library) to handle case A for EMC Symmetrix arrays. B. The DMP code is modified to handle the SCSI conditions correctly for case B. * 3449714 (Tracking ID: 3417185) SYMPTOM: Rebooting host, after the exclusion of a dmpnode while I/O is in progress on it, leads to vxconfigd core dump. DESCRIPTION: The function which deletes the path after exclusion, does not update the corresponding data structures properly. Consequently, rebooting the host, after the exclusion of dmpnode while I/O is in progress on it, leads to vxconfigd core dump with following stack ddl_find_devno_in_table () ddl_get_disk_policy () devintf_add_autoconfig_main () devintf_add_autoconfig () mode_set () req_vold_enable () request_loop () main () RESOLUTION: Code changes have been made to update the data structure properly. * 3452709 (Tracking ID: 3317430) SYMPTOM: "vxdiskunsetup" utility throws error during execution. Following error message is observed. "Device unexport failed: Operation is not supported". DESCRIPTION: In vxdisksetup vxunexport is called, without checking if the disk is exported or the CVM protocol version. When the bits are upgraded from 5.1SP1RP4, the CVM protocol version is not updated. Hence the error. RESOLUTION: Code changes done so that the vxunexport is called after doing proper checks. * 3452727 (Tracking ID: 3279932) SYMPTOM: The vxdisksetup and vxdiskunsetup utilities were failing on disk which is part of deported disk group (DG), even if '-f' option is specified. The vxdisksetup command fails with following error: VxVM vxedpart ERROR V-5-1-10089 partition modification failed : Device or resource busy The vxdiskunsetup command fails following error: VxVM vxdisk ERROR ERROR V-5-1-0 Device appears to be owned by disk group . Use -f option to force destroy. VxVM vxdiskunsetup ERROR V-5-2-5052 : Disk destroy failed. DESCRIPTION: The vxdisksetup and vxdiskunsetup utilities internally call the 'vxdisk' utility. Due to a defect in vxdisksetup and vxdiskunsetup, the vxdisk operation used to fail on disk which is part of deported DG, even if '-f' operation is requested by user. RESOLUTION: Code changes are done to the vxdisksetup and vxdiskunsetup utilities so that when "-f" option is specified the operation succeeds. * 3452811 (Tracking ID: 3445120) SYMPTOM: Default value of tunable 'vol_min_lowmem_sz' is not consistent on all platforms. DESCRIPTION: Default value of tunable 'vol_min_lowmem_sz' is not consistent on all platforms. RESOLUTION: Set the default value of tunable 'vol_min_lowmem_sz' to 32MB on all platforms. * 3455455 (Tracking ID: 3409612) SYMPTOM: Fails to run "vxtune reclaim_on_delete_start_time " if the specified value is outside the range of 22:00-03:59 (E.g. setting it to 04:00 or 19:30 fails). DESCRIPTION: Tunable reclaim_on_delete_start_time can be set to any time value within 00:00 to 23:59. But because of the wrong regular expression to parse time, it cannot be set to all values in 00:00 - 23:59. RESOLUTION: The regular expression has been updated to parse time format correctly. Now all values in 00:00-23:59 can be set. * 3456729 (Tracking ID: 3428025) SYMPTOM: When heavy parallel I/O load is issued, system running Symantec Replication Option (VVR) and configured as VVR primary, panics with the below stack: schedule vol_alloc vol_zalloc volsio_alloc_kc_types vol_subdisk_iogen_base volobject_iogen vol_rv_iogen volobject_iogen vol_rv_batch_write_start volkcontext_process vol_rv_start_next_batch vol_rv_batch_write_done [...] vol_rv_batch_write_done volkcontext_process vol_rv_start_next_batch vol_rv_batch_write_done volkcontext_process vol_rv_start_next_batch vol_rv_batch_kio volkiostart vol_linux_kio_start vol_fsvm_strategy DESCRIPTION: Heavy parallel I/O load leads to I/O throttling in Symantec Replication Option (VVR). Improper throttle handling leads to kernel stack overflow. RESOLUTION: Handled I/O throttle correctly which avoids stack overflow and subsequent panic. * 3458036 (Tracking ID: 3418830) SYMPTOM: Node boot-up is getting hung while starting vxconfigd if '/etc/VRTSvcs/conf/sysname' file or '/etc/vx/vxvm-hostprefix' file is not present. User will see following messages on console. # Starting up VxVM ... # VxVM general startup... After these messages, it will hung. DESCRIPTION: While generating unique prefix, we were calling scanf instead of sscanf for fetching prefix which resulted in this hang. So while starting vxconfigd, it was waiting for some user input because of scanf which resulted in hang while booting up the node. RESOLUTION: Code changes are done to address this issue. * 3458799 (Tracking ID: 3197987) SYMPTOM: vxconfigd dumps core, when 'vxddladm assign names file= is executed and the file has one or more invalid values for enclosure vendor ID or product ID. DESCRIPTION: When the input file provided to 'vxddladm assign names file=' has invalid vendor ID or product ID, the vxconfigd daemon is unable to find the corresponding enclosure being referred to and makes an invalid memory reference. Following stack trace can be seen- strncasecmp () from /lib/libc.so.6 ddl_load_namefile () req_ddl_set_names () request_loop () main () RESOLUTION: As a fix the vxconfigd daemon verifies the validity of input vendor ID and product ID before making a memory reference to the corresponding enclosure in its internal data structures. * 3470346 (Tracking ID: 3377383) SYMPTOM: vxconfigd crashes when a disk under DMP reports device failure. After this, the following error will be seen when a VxVM (Veritas Volume Manager)command is excuted:- "VxVM vxdisk ERROR V-5-1-684 IPC failure: Configuration daemon is not accessible" DESCRIPTION: If a disk fails and reports certain failure to DMP (Veritas Dynamic Multipathing), then vxconfigd crashes because that error is not handled properly. RESOLUTION: The code is modified to properly handle the device failures reported by a failed disk under DMP. * 3498923 (Tracking ID: 3087893) SYMPTOM: EMC PowerPath pseudo device mappings change with each reboot with VxVM (Veritas Volume Manager). DESCRIPTION: VxVM invokes PowerPath command 'powermt display unmanaged' to discover PowerPath unmanaged devices. This command destroys PowerPath devices mappings during early boot stage when PowerPath isn't fully up. RESOLUTION: EMC fixed the issue by introducing an environment variable MPAPI_EARLY_BOOT to powermt command. VxVM startup script set the variable to TRUE before calling powermt command. Powermt understands the early boot phase and does things differently. The variable is unset by VxVM after device discovery. Patch ID: 152134-01 * 3655626 (Tracking ID: 3616753) SYMPTOM: In solaris 11, odm module is not loading automatically after rebooting the machine. DESCRIPTION: The ODM service is offline and /dev/odm is not mounted due to which ODM module is not loading automatically after rebooting the machine. RESOLUTION: Code is added to bring up the odm services as soon as the package is installed. * 3864145 (Tracking ID: 3451730) SYMPTOM: Installation of VRTSodm, VRTSvxfs in a zone fails when running zoneadm -z Zoneattach -U DESCRIPTION: When you upgrade a zone using attach U option, the checkinstall script is executed. There were certain zone-irrelevant commands (which should not be executed during attach) in the checkinstall script which failed the installation of VRTSodm, VRTSvxfs. RESOLUTION: Code is added in the postinstall script to fix the checkinstall script. * 3864248 (Tracking ID: 3757609) SYMPTOM: High CPU usage because of contention over ODM_IO_LOCK DESCRIPTION: While performing ODM IO, to update some of the ODM counters we take ODM_IO_LOCK which leads to contention from multiple of iodones trying to update these counters at the same time. This is results in high CPU usage. RESOLUTION: Code modified to remove the lock contention. * 3864254 (Tracking ID: 3832329) SYMPTOM: On Solaris 11.2, system panic occurs because the fop_getattr() finds a vnode with NULL v_vfsp. DESCRIPTION: ODM is unmounted in the global zone, while it is mounted in the local zone. When the file /dev/odm/ctl is accessed from global zone, its vnode would have NULL v_vfsp. The stack is as follows: - die - trap - ktl0 - fop_getattr - cstat - cstatat - syscall_trap RESOLUTION: The code is modified to make sure that whenever ODM is mounted in local zone, it has already been mounted in global. Patch ID: 152144-01 * 3876345 (Tracking ID: 3865248) SYMPTOM: Allow asynchronous and synchronous lock calls for the same lock level. DESCRIPTION: GLM currently disallows mixing of asynchronous and synchronous lock calls on the same lock for the same level throughout the lifetime of the lock. This patch relaxes this restriction and allows clients to be able to make both synchronous and asynchronous lock calls. RESOLUTION: Code is modified to allow asynchronous and synchronous lock calls for the same lock level. Patch ID: 150736-03 * 3652109 (Tracking ID: 3553328) SYMPTOM: During internal testing it was found that per node LCT file was corrupted, due to which attribute inode reference counts were mismatching, resulting in fsck failure. DESCRIPTION: During clone creation LCT from 0th pindex is copied to the new clone's LCT. Any update to this LCT file from non-zeroth pindex can cause count mismatch in the new fileset. RESOLUTION: The code is modified to handle this issue. * 3690067 (Tracking ID: 3615043) SYMPTOM: At times, while writing to a file, data could be missed. DESCRIPTION: While writing to a file when delayed allocation is on, Solaris could dishonor the NON_CLUSTERING flag and cluster pages beyond the range for which we have issued the flushing, leading to data loss. RESOLUTION: Make sure we clear the flag and flush the exact range, in case of dalloc. * 3729811 (Tracking ID: 3719523) SYMPTOM: 'vxupgrade' does not clear the superblock replica of old layout versions. DESCRIPTION: While upgrading the file system to a new layout version, a new superblock inode is allocated and an extent is allocated for the replica superblock. After writing the new superblock (primary + replica), VxFS frees the extent of the old superblock replica. Now, if the primary superblock corrupts, the full fsck searches for replica to repair the file system. If it finds the replica of old superblock, it restores the file system to the old layout, instead of creating a new one. This behavior is wrong. In order to take the file system to a new version, we should clear the replica of old superblock as part of vxupgrade, so that full fsck won't detect it later. RESOLUTION: Clear the replica of old superblock as part of vxupgrade. * 3852733 (Tracking ID: 3729158) SYMPTOM: The fuser and other commands hang on VxFS file systems. DESCRIPTION: The hang is seen while 2 threads contest for 2 locks -ILOCK and PLOCK. The writeadvise thread owns the ILOCK but is waiting for the PLOCK, while the dalloc thread owns the PLOCK and is waiting for the ILOCK. RESOLUTION: The code is modified to correct the order of locking. Now PLOCK is followed by ILOCK. * 3859806 (Tracking ID: 3451730) SYMPTOM: Installation of VRTSodm, VRTSvxfs in a zone fails when running zoneadm -z Zoneattach -U DESCRIPTION: When you upgrade a zone using attach U option, the checkinstall script is executed. There were certain zone-irrelevant commands (which should not be executed during attach) in the checkinstall script which failed the installation of VRTSodm, VRTSvxfs. RESOLUTION: Code is added in the postinstall script to fix the checkinstall script. * 3864007 (Tracking ID: 3558087) SYMPTOM: When stat system call is executed on VxFS File System with delayed allocation feature enabled, it may take long time or it may cause high cpu consumption. DESCRIPTION: When delayed allocation (dalloc) feature is turned on, the flushing process takes much time. The process keeps the get page lock held, and needs writers to keep the inode reader writer lock held. Stat system call may keeps waiting for inode reader writer lock. RESOLUTION: Delayed allocation code is redesigned to keep the get page lock unlocked while flushing. * 3864010 (Tracking ID: 3269553) SYMPTOM: VxFS returns inappropriate message for read of hole via ODM. DESCRIPTION: Sometimes sparse files containing temp or backup/restore files are created outside the Oracle database. And, Oracle can read these files only using the ODM. As a result, ODM fails with an ENOTSUP error. RESOLUTION: The code is modified to return zeros instead of an error. * 3864013 (Tracking ID: 3811849) SYMPTOM: System panics due to size mismatch in the cluster-wide buffers containing hash bucket data. Offending stack looks like below: $cold_vm_hndlr bubbledown as_ubcopy vx_populate_bpdata vx_getblk_clust $cold_vx_getblk vx_exh_getblk vx_exh_get_bucket vx_exh_lookup vx_dexh_lookup vx_dirscan vx_dirlook vx_pd_lookup vx_lookup_pd vx_lookup lookupname lstat syscall On some platforms, instead of panic, LDH corruption can be reported. Full fsck can report some meta-data inconsistencies, which looks like the below sample messages: fileset 999 primary-ilist inode 263 has invalid alternate directory index (fileset 999 attribute-ilist inode 8193), clear index? (ynq)y fileset 999 primary-ilist inode 29879 has invalid alternate directory index (fileset 999 attribute-ilist inode 8194), clear index? (ynq)y fileset 999 primary-ilist inode 1070691 has invalid alternate directory index (fileset 999 attribute-ilist inode 24582), clear index? (ynq)y fileset 999 primary-ilist inode 1262102 has invalid alternate directory index (fileset 999 attribute-ilist inode 8198), clear index? (ynq)y DESCRIPTION: On a very fragmented file system with FS block sizes 1K, 2K or 4K, any segment of the hash inode (i.e. buckets/CDF/directory segment with fixed size: 8K) can spread across multiple extents. Instead of initializing the buffers on the final bmap after all allocations are finished, LDH code allocates the buffer-cache buffers as the allocations come along.As a result, small allocations can be merged in final bmap, e.g. two CFS nodes can end up having buffers representing same metadata, with different sizes. This leads to panics because the buffers are passed around the cluster or the corruption reaches LDH portions on the disk. RESOLUTION: The code is modified to separate the allocation and buffer initialization in LDH code paths. * 3864035 (Tracking ID: 3790721) SYMPTOM: High CPU usage on the vxfs thread process. The backtrace of such kind of threads usually look like this: schedule schedule_timeout __down down vx_send_bcastgetemapmsg_remaus vx_send_bcastgetemapmsg vx_recv_getemapmsg vx_recvdele vx_msg_recvreq vx_msg_process_thread vx_kthread_init kernel_thread DESCRIPTION: The locking mechanism in vx_send_bcastgetemapmsg_process() is inefficient. So that every time vx_send_bcastgetemapmsg_process() is called, it will perform a series of down-up operation on a certain semaphore. This can result in a huge CPU cost when multiple threads have contention on this semaphore. RESOLUTION: Optimize the locking mechanism in vx_send_bcastgetemapmsg_process(), so that it only do down-up operation on the semaphore once. * 3864036 (Tracking ID: 3233276) SYMPTOM: On a 40 TB file system, the fsclustadm setprimary command consumes more than 2 minutes for execution. And, the unmount operation consumes more time causing a primary migration. DESCRIPTION: The old primary needs to process the delegated allocation units while migrating from primary to secondary. The inefficient implementation of the allocation unit list is consuming more time while removing the element from the list. As the file system size increases, the allocation unit list also increases, which results in additional migration time. RESOLUTION: The code is modified to process the allocation unit list efficiently. With this modification, the primary migration is completed in 1 second on the 40 TB file system. * 3864037 (Tracking ID: 3616907) SYMPTOM: While performing the garbage collection operation, VxFS causes the non-maskable interrupt (NMI) service to stall. DESCRIPTION: With a highly fragmented Reference Count Table (RCT), when a garbage collection operation is performed, the CPU could be used for a longer duration. The CPU could be busy if a potential entry that could be freed is not identified. RESOLUTION: The code is modified such that the CPU is released after a when it is idle after a specified time interval. * 3864040 (Tracking ID: 3633683) SYMPTOM: "top" command output shows vxfs thread consuming high CPU while running an application that makes excessive sync() calls. DESCRIPTION: To process sync() system call vxfs scans through inode cache which is a costly operation. If an user application is issuing excessive sync() calls and there are vxfs file systems mounted, this can make vxfs sync processing thread to consume high CPU. RESOLUTION: Combine all the sync() requests issued in last 60 second into a single request. * 3864042 (Tracking ID: 3466020) SYMPTOM: File system is corrupted with the following error message in the log: WARNING: msgcnt 28 mesg 008: V-2-8: vx_direrr: vx_dexh_keycheck_1 - /TraceFile file system dir inode 3277090 dev/block 0/0 diren WARNING: msgcnt 27 mesg 008: V-2-8: vx_direrr: vx_dexh_keycheck_1 - /TraceFile file system dir inode 3277090 dev/block 0/0 diren WARNING: msgcnt 26 mesg 008: V-2-8: vx_direrr: vx_dexh_keycheck_1 - /TraceFile file system dir inode 3277090 dev/block 0/0 diren WARNING: msgcnt 25 mesg 096: V-2-96: vx_setfsflags - /dev/vx/dsk/a2fdc_cfs01/trace_lv01 file system fullfsck flag set - vx_direr WARNING: msgcnt 24 mesg 008: V-2-8: vx_direrr: vx_dexh_keycheck_1 - /TraceFile file system dir inode 3277090 dev/block 0/0 diren DESCRIPTION: In case an error is returned from the vx_dirbread() function via the vx_dexh_keycheck1() function, the FULLFSCK flag is set on the file system unconditionally. A corrupted Large Directory Hash (LDH) can lead to the incorrect block being read, this results in the FULLFSCK flag being set. The system does not verify whether it reads the incorrect value due to a corrupted LDH. Subsequently, the FULLFSCK flag is set unnecessarily, because a corrupted LDH is fixed online by recreating the hash. RESOLUTION: The code is modified such that when a LDH corruption is detected, the system removes the LDH, instead of setting FULLFSCK. The LDH is recreated the next time the directory is modified. * 3864141 (Tracking ID: 3647749) SYMPTOM: An obsolete v_path is created for the VxFS node when the following steps are performed: 1) Create a file(file1). 2) Delete the file (file2). 3) Create a new file(file2, has the same inode number as file1). 4) vnode of file2 has an obsolete v_path. However, it still shows file1. DESCRIPTION: When VxFS reuses an inode, it performs some clear or reset operations to clean the obsolete information. However, the corresponding Solaris vnode may not be improperly handled, which leads to the obsolete v_path. RESOLUTION: The code is modified to call the vn_recycle() function in the VxFS inode clear routine to reset the corresponding Solaris vnode. * 3864148 (Tracking ID: 3695367) SYMPTOM: Unable to remove volume from multi-volume VxFS using "fsvoladm" command. It fails with "Invalid argument" error. DESCRIPTION: Volumes are not being added in the in-core volume list structure correctly. Therefore while removing volume from multi-volume VxFS using "fsvoladm", command fails. RESOLUTION: The code is modified to add volumes in the in-core volume list structure correctly. * 3864150 (Tracking ID: 3602322) SYMPTOM: System may panic while flushing the dirty pages of the inode. DESCRIPTION: Panic may occur due to the synchronization problem between one thread that flushes the inode, and the other thread that frees the chunks that contain the inodes on the freelist. The thread that frees the chunks of inodes on the freelist grabs an inode, and clears/de-reference the inode pointer while deinitializing the inode. This may result in the pointer de-reference, if the flusher thread is working on the same inode. RESOLUTION: The code is modified to resolve the race condition by taking proper locks on the inode and freelist, whenever a pointer in the inode is de- referenced. If the inode pointer is already de-initialized to NULL, then the flushing is attempted on the next inode. * 3864155 (Tracking ID: 3707662) SYMPTOM: Race between reorg processing and fsadm timer thread (alarm expiry) leads to panic in vx_reorg_emap with the following stack:: vx_iunlock vx_reorg_iunlock_rct_reorg vx_reorg_emap vx_extmap_reorg vx_reorg vx_aioctl_full vx_aioctl_common vx_aioctl vx_ioctl fop_ioctl ioctl DESCRIPTION: When the timer expires (fsadm with -t option), vx_do_close() calls vx_reorg_clear() on local mount which performs cleanup on reorg rct inode. Another thread currently active in vx_reorg_emap() will panic due to null pointer dereference. RESOLUTION: When fop_close is called in alarm handler context, we defer the cleaning up untill the kernel thread performing reorg completes its operation. * 3864156 (Tracking ID: 3662284) SYMPTOM: File Change Log (FCL) read may retrun ENXIO as follows: # file changelog changelog: ERROR: cannot read `changelog' (No such device or address) DESCRIPTION: VxFS reads FCL file and returns ENXIO when there is a HOLE in the file. RESOLUTION: The code is modified to zero out the user buffer when hitting a hole if FCL read is from user space. * 3864158 (Tracking ID: 2560032) SYMPTOM: System may panics while upgrading VRTSvxfs in the presence of a zone mounted on VxFS. DESCRIPTION: When the upgrade happens from base version to the target version, The post install script unloads the base level fdd module and loads the target level fdd modules when the VxFS module is still at the "base version" level. This leads to an inconsistency in the file device driver (fdd) and VxFS modules. RESOLUTION: The post install script is modified such as to avoid inconsistency. * 3864160 (Tracking ID: 3691633) SYMPTOM: Remove RCQ Full messages DESCRIPTION: Too many unnecessary RCQ Full messages were logging in the system log. RESOLUTION: The RCQ Full messages removed from the code. * 3864161 (Tracking ID: 3708836) SYMPTOM: When using fallocate together with delayed extending write, data corruption may happen. DESCRIPTION: When doing fallocate after EOF, vxfs grows the file by splitting the last extent of the file into two parts, then converts the part after EOF to a ZFOD extent. During this procedure, a stale file size is used to calculate the start offset of the newly zeroed extent. This may overwrite the blocks which contain the unflushed data generated by the extending write and cause data corruption. RESOLUTION: The code is modified to use up-to-date file size instead of the stale file size, to make sure the new ZFOD extent is created correctly. * 3864164 (Tracking ID: 3762125) SYMPTOM: Directory size sometimes keeps increasing even though the number of files inside it doesn't increase. DESCRIPTION: This only happens to CFS. A variable in the directory inode structure marks the start of directory free space. But when the directory ownership changes, the variable may become stale, which could cause this issue. RESOLUTION: The code is modified to reset this free space marking variable when there's ownershipchange. Now the space search goes from beginning of the directory inode. * 3864165 (Tracking ID: 3751049) SYMPTOM: The umountall operation fails on Solaris with error "V-3-20358: cannot open mnttab" DESCRIPTION: On Solaris, normally, fopen() returns an EMFILE error for 32-bit applications if it attempts to associate a stream with a file accessed by a file descriptor with a value greater than 255. When using umountall to umount more than 256 file systems, the command will fork child process and open more than 256 file descriptors at the same time.This will cross the 256 file descriptor maximum limit and cause the operation to fail. RESOLUTION: Use "F" mode in fopen call to avoid the 256 file descriptor limitation. * 3864167 (Tracking ID: 3735697) SYMPTOM: vxrepquota reports error like, # vxrepquota -u /vx/fs1 UX:vxfs vxrepquota: ERROR: V-3-20002: Cannot access /dev/vx/dsk/sfsdg/fs1:ckpt1: No such file or directory UX:vxfs vxrepquota: ERROR: V-3-24996: Unable to get disk layout version DESCRIPTION: vxrepquota checks each mount point entry in mounted file system table. If any checkpoint mount point entry presents before the mount point specified in the vxrepquota command, vxrepquota will report errors, but the command can succeed. RESOLUTION: Skip checkpoint mount point in the mounted file system table. * 3864170 (Tracking ID: 3743572) SYMPTOM: File system may get hang when reaching 1 billion inode limit, the hung stack is as following: vx_svar_sleep_unlock vx_event_wait vx_async_waitmsg vx_msg_send llt_msgalloc vx_cfs_getias vx_update_ilist vx_find_partial_au vx_cfs_noinode vx_noinode vx_dircreate_tran vx_pd_create vx_dirlook vx_create1_pd vx_create1 vx_create_vp vx_create DESCRIPTION: The maximum number of inodes supported by VxFS is 1 billion. When the file system is running out of inodes and the maximum inode allocation unit(IAU) limit is reached, VxFS can still create two extra IAUs if there is a hole in the last IAU. Because of the hole, when a secondary requests more inodes, the primary still thinks there is a hole available and notifies the secondary to retry. However, the secondary fails to find a slot since the 1 billion limit is hit, then it goes back to the primary to request free inodes again, and this loops infinitely, hence the hang. RESOLUTION: When the maximum IAU number is reached, prevent primary to create the extra IAUs. * 3864173 (Tracking ID: 3779916) SYMPTOM: vxfsconvert fails to upgrade layout verison for a vxfs file system with large number of inodes. Error message will show some inode discrepancy. DESCRIPTION: vxfsconvert walks through the ilist and converts inode. It stores chunks of inodes in a buffer and process them as a batch. The inode number parameter for this inode buffer is of type unsigned integer. The offset of a particular inode in the ilist is calculated by multiplying the inode number with size of inode structure. For large inode numbers this product of inode_number * inode_size can overflow the unsigned integer limit, thus giving wrong offset within the ilist file. vxfsconvert therefore reads wrong inode and eventually fails. RESOLUTION: The inode number parameter is defined as unsigned long to avoid overflow. * 3864175 (Tracking ID: 3804400) SYMPTOM: /opt/VRTS/bin/cp does not return any error when quota hard limit is reached and partial write is encountered. DESCRIPTION: When quota hard limit is reached, /opt/VRTS/bin/cp encounters a partial write, but it does not return any error to up layer application in such situation. RESOLUTION: The code is modified to adjust /opt/VRTS/bin/cp to detect the partial write caused by quota limit, and return a proper error to up layer application. * 3864177 (Tracking ID: 3808033) SYMPTOM: After a service group is set offline via VOM or VCSOracle process is left in an unkillable state. DESCRIPTION: Whenever ODM issues an async request to FDD, FDD is required to do iodone processing on it, regardless of how far the request gets. The forced unmount causes FDD to take one of the early error branch which misses iodone routine for this particular async request. From ODM's perspective, the request is submitted, but iodone will never be called. This has several bad consequences, one of which is a user thread is blocked uninterruptibly forever, if it waits for request. RESOLUTION: The code is modified to add iodone routine in the error handling code. * 3864178 (Tracking ID: 1428611) SYMPTOM: 'vxcompress' command can cause many GLM block lock messages to be sent over the network. This can be observed with 'glmstat -m' output under the section "proxy recv", as shown in the example below - bash-3.2# glmstat -m message all rw g pg h buf oth loop master send: GRANT 194 0 0 0 2 0 192 98 REVOKE 192 0 0 0 0 0 192 96 subtotal 386 0 0 0 2 0 384 194 master recv: LOCK 193 0 0 0 2 0 191 98 RELEASE 192 0 0 0 0 0 192 96 subtotal 385 0 0 0 2 0 383 194 master total 771 0 0 0 4 0 767 388 proxy send: LOCK 98 0 0 0 2 0 96 98 RELEASE 96 0 0 0 0 0 96 96 BLOCK_LOCK 2560 0 0 0 0 2560 0 0 BLOCK_RELEASE 2560 0 0 0 0 2560 0 0 subtotal 5314 0 0 0 2 5120 192 194 DESCRIPTION: 'vxcompress' creates placeholder inodes (called IFEMR inodes) to hold the compressed data of files. After the compression is finished, IFEMR inode exchange their bmap with the original file and later given to inactive processing. Inactive processing truncates the IFEMR extents (original extents of the regular file, which is now compressed) by sending cluster-wide buffer invalidation requests. These invalidations need GLM block lock. Regular file data need not be invalidated across the cluster, thus making these GLM block lock requests unnecessary. RESOLUTION: Pertinent code has been modified to skip the invalidation for the IFEMR inodes created during compression. * 3864184 (Tracking ID: 3857444) SYMPTOM: The default permission of /etc/vx/vxfssystem file is incorrect. DESCRIPTION: When creating the file "/etc/vx/vxfssystem", no permission is passed, which results in having the permission to this file as 000. RESOLUTION: The code is modified to create the file "/etc/vx/vxfssystem" with default permission as "600". * 3864185 (Tracking ID: 3859032) SYMPTOM: System panics in vx_tflush_map() due to NULL pointer dereference. DESCRIPTION: When converting VxFS using vxconvert, new blocks are allocated to the structural files like smap etc which can contain garbage. This is done with the expectation that fsck will rebuild the correct smap. but in fsck, we have missed to distinguish between EAU fully EXPANDED and ALLOCATED. because of which, if allocation to the file which has the last allocation from such affected EAU is done, it will create the sub transaction on EAU which are in allocated state. Map buffers of such EAUs are not initialized properly in VxFS private buffer cache, as a result, these buffers will be released back as stale during the transaction commit. Later, if any file-system wide sync tries to flush the metadata, it can refer to these buffer pointers and panic as these buffers are already released and reused. RESOLUTION: Code is modified in fsck to correctly set the state of EAU on disk. Also, modified the involved code paths as to avoid using doing transactions on unexpanded EAUs. * 3864186 (Tracking ID: 3855726) SYMPTOM: Panic happens in vx_prot_unregister_all(). The stack looks like this: - vx_prot_unregister_all - vxportalclose - __fput - fput - filp_close - sys_close - system_call_fastpath DESCRIPTION: The panic is caused by a NULL fileset pointer, which is due to referencing the fileset before it's loaded, plus, there's a race on fileset identity array. RESOLUTION: Skip the fileset if it's not loaded yet. Add the identity array lock to prevent the possible race. * 3864246 (Tracking ID: 3657482) SYMPTOM: Stress test on cluster file system fails due to data corruption DESCRIPTION: In direct I/O write code path, there is an optimization which avoids invalidation of any in-core pages in the range. Instead, in-core pages are updated with new data together with disk write. This optimization comes into picture when cached qio is enabled on the file. When we modify an in-core page, it is not getting marked dirty. If the page was already not dirty, there are chances that in-core changes might be lost if page was reused. This can cause a corruption if the page is read again before the disk update completes. RESOLUTION: In case of cached qio/ODM, disable the page overwrite optimization. * 3864247 (Tracking ID: 3861713) SYMPTOM: Contention observed on vx_sched_lk and vx_worklist_lk spinlock when profiled using lockstats. DESCRIPTION: Internal worker threads take a lock to sleep on a CV while waiting for work. This lock is global, If there are large numbers of CPU's and large numbers of worker threads then contention can be seen on the vx_sched_lk and vx_worklist_lk using lockstat as well as an increased %sys CPU RESOLUTION: Make the lock more scalable in large CPU configs * 3864250 (Tracking ID: 3833816) SYMPTOM: In a CFS cluster, one node returns stale data. DESCRIPTION: In a 2-node CFS cluster, when node 1 opens the file and writes to it, the locks are used with CFS_MASTERLESS flag set. But when node 2 tries to open the file and write to it, the locks on node 1 are normalized as part of HLOCK revoke. But after the Hlock revoke on node 1, when node 2 takes the PG Lock grant to write, there is no PG lock revoke on node 1, so the dirty pages on node 1 are not flushed and invalidated. The problem results in reads returning stale data on node 1. RESOLUTION: The code is modified to cache the PG lock before normalizing it in vx_hlock_putdata, so that after the normalizing, the cache grant is still with node 1.When node 2 requests PG lock, there is a revoke on node 1 which flushes and invalidates the pages. * 3864255 (Tracking ID: 3827491) SYMPTOM: Data relocation is not executed correctly if the IOTEMP policy is set to AVERAGE. DESCRIPTION: Database table is not created correctly which results in an error on the database query. This affects the relocation policy of data and the files are not relocated properly. RESOLUTION: The code is modified fix the database table creation issue. Therelocation policy based calculations are done correctly. * 3864256 (Tracking ID: 3830300) SYMPTOM: Heavy cpu usage while oracle archive process are running on a clustered fs. DESCRIPTION: The cause of the poor read performance in this case was due to fragmentation, fragmentation mainly happens when there are multiple archivers running on the same node. The allocation pattern of the oracle archiver processes is 1. write header with O_SYNC 2. ftruncate-up the file to its final size ( a few GBs typically) 3. do lio_listio with 1MB iocbs The problem occurs because all the allocations in this manner go through internal allocations i.e. allocations below file size instead of allocations past the file size. Internal allocations are done at max 8 Pages at once. So if there are multiple processes doing this, they all get these 8 Pages alternately and the fs becomes very fragmented. RESOLUTION: Added a tunable, which will allocate zfod extents when ftruncate tries to increase the size of the file, instead of creating a hole. This will eliminate the allocations internal to file size thus the fragmentation. Fixed the earlier implementation of the same fix, which ran into locking issues. Also fixed the performance issue while writing from secondary node. * 3864257 (Tracking ID: 3844820) SYMPTOM: System panic got triggered by the stress test to release removing/adding VCPUs to the guest domain while VxFS I/O was continued. Stack looks like this: - panicsys - vpanic_common - panic - die - trap - ktl0 DESCRIPTION: The adding/removing of vCPUs can cause address change of this Solaris global array: cpu[]. VxFS saved the addresses of cpu[].cpu_stat at initialization. So updating to this stale address triggered the panic. RESOLUTION: Update the addresses of cpu[].cpu_stat before vx_sar_cpu_update(). * 3864259 (Tracking ID: 3856363) SYMPTOM: vxfs reports mapbad errors in the syslog as below: vxfs: msgcnt 15 mesg 003: V-2-3: vx_mapbad - vx_extfind - /dev/vx/dsk/vgems01/lvems01 file system free extent bitmap in au 0 marked bad. And, full fsck reports following metadata inconsistencies: fileset 999 primary-ilist inode 6 has invalid number of blocks (18446744073709551583) fileset 999 primary-ilist inode 6 failed validation clear? (ynq)n pass2 - checking directory linkage fileset 999 directory 8192 block devid/blknum 0/393216 offset 68 references free inode ino 6 remove entry? (ynq)n fileset 999 primary-ilist inode 8192 contains invalid directory blocks clear? (ynq)n pass3 - checking reference counts fileset 999 primary-ilist inode 5 unreferenced file, reconnect? (ynq)n fileset 999 primary-ilist inode 5 clear? (ynq)n fileset 999 primary-ilist inode 8194 unreferenced file, reconnect? (ynq)n fileset 999 primary-ilist inode 8194 clear? (ynq)n fileset 999 primary-ilist inode 8195 unreferenced file, reconnect? (ynq)n fileset 999 primary-ilist inode 8195 clear? (ynq)n pass4 - checking resource maps DESCRIPTION: While processing the VX_IEZEROEXT extop, VxFS frees the extent without setting VX_TLOGDELFREE flag. Similarly, there are other cases where the flag VX_TLOGDELFREE is not set in the case of the delayed extent free, this could result in mapbad errors and invalid block counts. RESOLUTION: Since the flag VX_TLOGDELFREE need to be set on every extent free, modified to code to discard this flag and treat every extent free as delayed extent free implicitly. * 3864260 (Tracking ID: 3846521) SYMPTOM: cp -p is failing with EINVAL for files with 10 digit modification time. EINVAL error is returned if the value in tv_nsec field is greater than/outside the range of 0 to 999, 999, 999. VxFS supports the update in usec but when copying in the user space, we convert the usec to nsec. So here in this case, usec has crossed the upper boundary limit i.e 999, 999. DESCRIPTION: In a cluster, its possible that time across nodes might differ.so when updating mtime, vxfs check if it's cluster inode and if nodes mtime is newer time than current node time, then accordingly increment the tv_usec instead of changing mtime to older time value. There might be chance that it, tv_usec counter got overflowed here, which resulted in 10 digit mtime.tv_nsec. RESOLUTION: Code is modified to reset usec counter for mtime/atime/ctime when upper boundary limit i.e. 999999 is reached. * 3866968 (Tracking ID: 3866962) SYMPTOM: Data corruption seen when dalloc writes are going on the file and simultaneously fsync started on the same file. DESCRIPTION: In case if dalloc writes are going on the file and simultaneously synchronous flushing is started on the same file, then synchronous flushing will try to flush all the dirty pages of the file without considering underneath allocation. In this case, flushing can happen on the unallocated blocks and this can result into data loss. RESOLUTION: Code is modified to flush data till actual allocation in case of dalloc writes. * 3874662 (Tracking ID: 3871489) SYMPTOM: IO service times increased with IO intensive workload on high end server. DESCRIPTION: VxFS has worklist threads which sleep on single conditional variable. while waking up the worker threads contention can be seen on the OS sleep dispatch locks and service time for the IO can increase due to this contention. RESOLUTION: Scale the number of conditional variables to reduce contention. And also add padding to the conditional variable structure to avoid cache allocation problems. Also make sure to wakeup exact number of threads that required. * 3877070 (Tracking ID: 3880121) SYMPTOM: Internal assert failure when coalescing the extents on clone. DESCRIPTION: When coalescing extents on clone, resolving overlay extent is not supported but still code try to resolve these overlay extents. This was resulting in internal assert failure. RESOLUTION: Code is modified to not resolve these overlay extents when coalescing. * 3877339 (Tracking ID: 3880113) SYMPTOM: Internal assert failure when pushing zfod extent on clone. DESCRIPTION: If filesnaps are created for a file which has zfod extents and if the clone is created with these snapped files then writing on cloned files can end up with same extent being allocated to multiple files, resulting in emap corruption. RESOLUTION: Code is modified to not push zfod extents on clone. Patch ID: 150736-02 * 3520113 (Tracking ID: 3451284) SYMPTOM: While allocating extent during write operation, if summary and bitmap data for filesystem allocation unit get mismatched then the assert hits. DESCRIPTION: if extent was allocated using SMAP on the deleted inode, and part of the AU space is moved from deleted inode to the new inode. At this point SMAP state is set to VX_EAU_ALLOCATED and EMAP is not initialized. When more space is needed for new inode, it tries to allocate from the same AU using EMAP and can hit "f:vx_sum_upd_efree1:2a" assert, as EMAP is not initialized. RESOLUTION: Code has been modified to expand AU while moving partial AU space from one inode to other inode. * 3536233 (Tracking ID: 3457803) SYMPTOM: File System gets disabled with the following message in the system log: WARNING: V-2-37: vx_metaioerr - vx_iasync_wait - /dev/vx/dsk/testdg/test file system meta data write error in dev/block DESCRIPTION: The inode's incore information gets inconsistent as one of its field is getting modified without the locking protection. RESOLUTION: Protect the inode's field properly by taking the lock operation. * 3583963 (Tracking ID: 3583930) SYMPTOM: When external quota file is over-written or restored from backup, new settings which were added after the backup still remain. DESCRIPTION: The internal quota file is not always updated with correct limits, so the quotaon operation is to copy the quota limits from external to internal quota file. To complete the copy operation, the extent of external file is compared to the extent of internal file at the corresponding offset. If the external quota file is overwritten (or restored to its original copy) and the size of internal file is more than that of external, the quotaon operation does not clear the additional (stale) quota records in the internal file. Later, the sync operation (part of quotaon) copies these stale records from internal to external file. Hence, both internal and external files contain stale records. RESOLUTION: The code has been modified to remove the stale records in the internal file at the time of quotaon. * 3617774 (Tracking ID: 3475194) SYMPTOM: Veritas File System (VxFS) fscdsconv(1M) command fails with the following error message: ... UX:vxfs fscdsconv: INFO: V-3-26130: There are no files violating the CDS limits for this target. UX:vxfs fscdsconv: INFO: V-3-26047: Byteswapping in progress ... UX:vxfs fscdsconv: ERROR: V-3-25656: Overflow detected UX:vxfs fscdsconv: ERROR: V-3-24418: fscdsconv: error processing primary inode list for fset 999 UX:vxfs fscdsconv: ERROR: V-3-24430: fscdsconv: failed to copy metadata UX:vxfs fscdsconv: ERROR: V-3-24426: fscdsconv: Failed to migrate. DESCRIPTION: The fscdsconv(1M) command takes a filename argument which is used as a recovery failure, to be used to restore the original file system in case of failure when the file system conversion is in progress. This file has two parts: control part and data part. The control part is used to store information about all the metadata like inodes and extents etc. In this instance, the length of the control part is being underestimated for some file systems where there are few inodes, but the average number of extents per file is very large (this can be seen in the fsadm E report). RESOLUTION: Make recovery file sparse, start the data part after 1TB offset, and then the control part can do allocating writes to the hole from the beginning of the file. * 3617788 (Tracking ID: 3604071) SYMPTOM: With the thin reclaim feature turned on, you can observe high CPU usage on the vxfs thread process. The backtrace of such kind of threads usually look like this: - vx_dalist_getau - vx_recv_bcastgetemapmsg - vx_recvdele - vx_msg_recvreq - vx_msg_process_thread - vx_kthread_init DESCRIPTION: In the routine to get the broadcast information of a node which contains maps of Allocation Units (AUs) for which node holds the delegations, the locking mechanism is inefficient. Thus every time when this routine is called, it will perform a series of down-up operation on a certain semaphore. This can result in a huge CPU cost when many threads calling the routine in parallel. RESOLUTION: The code is modified to optimize the locking mechanism in the routine to get the broadcast information of a node which contains maps of Allocation Units (AUs) for which node holds the delegations, so that it only does down-up operation on the semaphore once. * 3617793 (Tracking ID: 3564076) SYMPTOM: The MongoDB noSQL db creation fails with an ENOTSUP error. MongoDB uses posix_fallocate to create a file first. When it writes at offset which is not aligned with File System block boundary, an ENOTSUP error comes up. DESCRIPTION: On a file system with 8k bsize and 4k page size, the application creates a file using posix_fallocate, and then writes at some offset which is not aligned with fs block boundary. In this case, the pre-allocated extent is split at the unaligned offset into two parts for the write. However the alignment requirement of the split fails the operation. RESOLUTION: Split the extent down to block boundary. * 3620279 (Tracking ID: 3558087) SYMPTOM: Run simultaneous dd threads on a mount point and start the ls l command on the same mount point. Then the system hangs. DESCRIPTION: When the delayed allocation (dalloc) feature is turned on, the flushing process takes much time. The process keeps the glock held, and needs writers to keep the irwlock held. Thels l command starts stat internally and keeps waiting for irwlock to real ACLs. RESOLUTION: Redesign dalloc to keep the glock unlocked while flushing. * 3620288 (Tracking ID: 3469644) SYMPTOM: The system panics in the vx_logbuf_clean() function when it traverses chain of transactions off the intent log buffer. The stack trace is as follows: vx_logbuf_clean () vx_logadd () vx_log() vx_trancommit() vx_exh_hashinit () vx_dexh_create () vx_dexh_init () vx_pd_rename () vx_rename1_pd() vx_do_rename () vx_rename1 () vx_rename () vx_rename_skey () DESCRIPTION: The system panics as the vx_logbug_clean() function tries to access an already freed transaction from transaction chain to flush it to log. RESOLUTION: The code has been modified to make sure that the transaction gets flushed to the log before it is freed. * 3645825 (Tracking ID: 3622326) SYMPTOM: Filesystem is marked with fullfsck flag as an inode is marked bad during checkpoint promote DESCRIPTION: VxFS incorrectly skipped pushing of data to clone inode due to which the inode is marked bad during checkpoint promote which intern resulted in filesystem being marked with fullfsck flag. RESOLUTION: Code is modified to push the proper data to clone inode. Patch ID: 150736-01 * 3383149 (Tracking ID: 3383147) SYMPTOM: The ACA operator precedence error may occur while turning AoffA delayed allocation. DESCRIPTION: Due to the C operator precedence issue, VxFS evaluates a condition wrongly. RESOLUTION: The code is modified to evaluate the condition correctly. * 3422580 (Tracking ID: 1949445) SYMPTOM: System is unresponsive when files are created on large directory. The following stack is logged: vxg_grant_sleep() vxg_cmn_lock() vxg_api_lock() vx_glm_lock() vx_get_ownership() vx_exh_coverblk() vx_exh_split() vx_dexh_setup() vx_dexh_create() vx_dexh_init() vx_do_create() DESCRIPTION: For large directories, large directory hash (LDH) is enabled to improve the lookup feature. When a system takes ownership of LDH inode twice in same thread context (while building hash for directory), it becomes unresponsive RESOLUTION: The code is modified to avoid taking ownership again if we already have the ownership of the LDH inode. * 3422584 (Tracking ID: 2059611) SYMPTOM: The system panics due to a NULL pointer dereference while flushing the bitmaps to the disk and the following stack trace is displayed: a| a| vx_unlockmap+0x10c vx_tflush_map+0x51c vx_fsq_flush+0x504 vx_fsflush_fsq+0x190 vx_workitem_process+0x1c vx_worklist_process+0x2b0 vx_worklist_thread+0x78 DESCRIPTION: The vx_unlockmap() function unlocks a map structure of the file system. If the map is being used, the hold count is incremented. The vx_unlockmap() function attempts to check whether this is an empty mlink doubly linked list. The asynchronous vx_mapiodone routine can change the link at random even though the hold count is zero. RESOLUTION: The code is modified to change the evaluation rule inside the vx_unlockmap() function, so that further evaluation can be skipped over when map hold count is zero. * 3422586 (Tracking ID: 2439261) SYMPTOM: When the vx_fiostats_tunable is changed from zero to non-zero, the system panics with the following stack trace: vx_fiostats_do_update vx_fiostats_update vx_read1 vx_rdwr vno_rw rwuio pread DESCRIPTION: When vx_fiostats_tunable is changed from zero to non-zero, all the incore-inode fiostats attributes are set to NULL. When these attributes are accessed, the system panics due to the NULL pointer dereference. RESOLUTION: The code has been modified to check the file I/O stat attributes are present before dereferencing the pointers. * 3422604 (Tracking ID: 3092114) SYMPTOM: The information output by the "df -i" command can often be inaccurate for cluster mounted file systems. DESCRIPTION: In Cluster File System 5.0 release a concept of delegating metadata to nodes in the cluster is introduced. This delegation of metadata allows CFS secondary nodes to update metadata without having to ask the CFS primary to do it. This provides greater node scalability. However, the "df -i" information is still collected by the CFS primary regardless of which node (primary or secondary) the "df -i" command is executed on. For inodes the granularity of each delegation is an Inode Allocation Unit [IAU], thus IAUs can be delegated to nodes in the cluster. When using a VxFS 1Kb file system block size each IAU will represent 8192 inodes. When using a VxFS 2Kb file system block size each IAU will represent 16384 inodes. When using a VxFS 4Kb file system block size each IAU will represent 32768 inodes. When using a VxFS 8Kb file system block size each IAU will represent 65536 inodes. Each IAU contains a bitmap that determines whether each inode it represents is either allocated or free, the IAU also contains a summary count of the number of inodes that are currently free in the IAU. The ""df -i" information can be considered as a simple sum of all the IAU summary counts. Using a 1Kb block size IAU-0 will represent inodes numbers 0 - 8191 Using a 1Kb block size IAU-1 will represent inodes numbers 8192 - 16383 Using a 1Kb block size IAU-2 will represent inodes numbers 16384 - 32768 etc. The inaccurate "df -i" count occurs because the CFS primary has no visibility of the current IAU summary information for IAU that are delegated to Secondary nodes. Therefore the number of allocated inodes within an IAU that is currently delegated to a CFS Secondary node is not known to the CFS Primary. As a result, the "df -i" count information for the currently delegated IAUs is collected from the Primary's copy of the IAU summaries. Since the Primary's copy of the IAU is stale, therefore the "df -i" count is only accurate when no IAUs are currently delegated to CFS secondary nodes. In other words - the IAUs currently delegated to CFS secondary nodes will cause the "df -i" count to be inaccurate. Once an IAU is delegated to a node it can "timeout" after a 3 minutes of inactivity. However, not all IAU delegations will timeout. One IAU will always remain delegated to each node for performance reasons. Also an IAU whose inodes are all allocated (so no free inodes remain in the IAU) it would not timeout either. The issue can be best summarized as: The more IAUs that remain delegated to CFS secondary nodes, the greater the inaccuracy of the "df -i" count. RESOLUTION: Allow the delegations for IAU's whose inodes are all allocated (so no free inodes in the IAU) to "timeout" after 3 minutes of inactivity. * 3422614 (Tracking ID: 3297840) SYMPTOM: A metadata corruption is found during the file removal process with the inode block count getting negative. DESCRIPTION: When the user removes or truncates a file having the shared indirect blocks, there can be an instance where the block count can be updated to reflect the removal of the shared indirect blocks when the blocks are not removed from the file. The next iteration of the loop updates the block count again while removing these blocks. This will eventually lead to the block count being a negative value after all the blocks are removed from the file. The removal code expects the block count to be zero before updating the rest of the metadata. RESOLUTION: The code is modified to update the block count and other tracking metadata in the same transaction as the blocks are removed from the file. * 3422626 (Tracking ID: 3332902) SYMPTOM: The system running the fsclustadm(1M) command panics while shutting down. The following stack trace is logged along with the panic: machine_kexec crash_kexec oops_end page_fault [exception RIP: vx_glm_unlock] vx_cfs_frlpause_leave [vxfs] vx_cfsaioctl [vxfs] vxportalkioctl [vxportal] vfs_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath DESCRIPTION: There exists a race-condition between "fsclustadm(1M) cfsdeinit" and "fsclustadm(1M) frlpause_disable". The "fsclustadm(1M) cfsdeinit" fails after cleaning the Group Lock Manager (GLM), without downgrading the CFS state. Under the false CFS state, the "fsclustadm(1M) frlpause_disable" command enters and accesses the GLM lock, which "fsclustadm(1M) cfsdeinit" frees resulting in a panic. Another race condition exists between the code in vx_cfs_deinit() and the code in fsck, and it leads to the situation that although fsck has a reservation held, but this couldn't prevent vx_cfs_deinit() from freeing vx_cvmres_list because there is no such a check for vx_cfs_keepcount. RESOLUTION: The code is modified to add appropriate checks in the "fsclustadm(1M) cfsdeinit" and "fsclustadm(1M) frlpause_disable" to avoid the race-condition. * 3422629 (Tracking ID: 3335272) SYMPTOM: The mkfs (make file system) command dumps core when the log size provided is not aligned. The following stack trace is displayed: (gdb) bt #0 find_space () #1 place_extents () #2 fill_fset () #3 main () (gdb) DESCRIPTION: While creating the VxFS file system using the mkfs command, if the log size provided is not aligned properly, you may end up in doing miscalculations for placing the RCQ extents and finding no place. This leads to illegal memory access of AU bitmap and results in core dump. RESOLUTION: The code is modified to place the RCQ extents in the same AU where log extents are allocated. * 3422636 (Tracking ID: 3340286) SYMPTOM: The tunable setting of dalloc_enable gets reset to a default value after a file system is resized. DESCRIPTION: The file system resize operation triggers the file system re-initialization process. During this process, the tunable value of dalloc_enable gets reset to the default value instead of retaining the old tunable value. RESOLUTION: The code is fixed such that the old tunable value of dalloc_enable is retained. * 3422649 (Tracking ID: 3394803) SYMPTOM: The vxupgrade(1M) command causes VxFS to panic with the following stack trace: panic_save_regs_switchstack() panic bad_kern_reference() $cold_pfault() vm_hndlr() bubbleup() vx_fs_upgrade() vx_upgrade() $cold_vx_aioctl_common() vx_aioctl() vx_ioctl() vno_ioctl() ioctl() syscall() DESCRIPTION: The panic is caused due to de_referencing the operator in the NULL device (one of the devices in the DEVLIST is showing as a NULL device). RESOLUTION: The code is modified to skip the NULL devices when the device in EVLIST is processed. * 3436431 (Tracking ID: 3434811) SYMPTOM: In VxFS 6.1, the vxfsconvert(1M) command hangs within the vxfsl3_getext() Function with following stack trace: search_type() bmap_typ() vxfsl3_typext() vxfsl3_getext() ext_convert() fset_convert() convert() DESCRIPTION: There is a type casting problem for extent size. It may cause a non-zero value to overflow and turn into zero by mistake. This further leads to infinite looping inside the function. RESOLUTION: The code is modified to remove the intermediate variable and avoid type casting. * 3448503 (Tracking ID: 3448492) SYMPTOM: In Solaris SPARC, introduced Vnode Page Mapping (VPM) interface instead of the legacy segmap interface. DESCRIPTION: For improved performance, the VPM interface is introduced, which employs the kernel page mapping (KPM). RESOLUTION: The code is modified to support the VPM interface. * 3496391 (Tracking ID: 3499886) SYMPTOM: The patch ID from Veritas File System (VxFS) in Solaris 10 patch is displayed in PSTAMP. DESCRIPTION: The patch ID in PSTAMP is used only in Solaris 10 platform. To make the PSTAMP consistent across platforms, Symantec have formulated a unique PSTAMP format for all stack products. ie -- eg: 6.1.1.000-2014-03-30-19.00.01 RESOLUTION: Removed the patch-id from patch PSTAMP. To display the patch id installed on the machine, use the command "showrev -p | grep ". * 3501832 (Tracking ID: 3413926) SYMPTOM: Internal testing hangs due to high memory consumption resulting in fork failure DESCRIPTION: The issue of high swap usage occurs with recent updates of Solaris10 and Solaris11. This issue is predominantly seen with internal stress/noise testing. The recent solaris update release had increased ncpu. As large number of buffer cache free lists in VxFS are spawned with respect to ncpu, there is high memory consumption which results in fork failure. RESOLUTION: For large number of CPU greater than 16, the number of buffer cache free lists is adjusted according to the maximum number of CPU supported. * 3504362 (Tracking ID: 3472551) SYMPTOM: The attribute validation (pass 1d) of full fsck takes too much time to complete. DESCRIPTION: The current implementation of full fsck Pass 1d (attribute inode validation) is single threaded. This causes slow full fsck performance on large file system, especially the ones having large number of attribute inodes. RESOLUTION: The Pass 1d is modified to work in parallel using multiple threads, which enables full fsck to process the attribute inode validation faster. * 3507608 (Tracking ID: 3478017) SYMPTOM: Internal test hits assert in voprwunlock. DESCRIPTION: In slow path write routine, the inode is not locked with Vnode Operation (VOP) read-write lock before returning to Operating System (OS). RESOLUTION: The code is modified to take VOP read-write lock in shared mode on inode before returning to OS. * 3512292 (Tracking ID: 3348520) SYMPTOM: In a Cluster File System (CFS) cluster having multi volume file system of a smaller size, execution of the fsadm command causes system hang if the free space in the file system is low. The following stack trace is displayed: vx_svar_sleep_unlock() vx_extentalloc_device() vx_extentalloc() vx_reorg_emap() vx_extmap_reorg() vx_reorg() vx_aioctl_full() vx_aioctl_common() vx_aioctl() vx_unlocked_ioctl() vfs_ioctl() do_vfs_ioctl() sys_ioctl() tracesys() And vxg_svar_sleep_unlock() vxg_grant_sleep() vxg_api_lock() vx_glm_lock() vx_cbuf_lock() vx_getblk_clust() vx_getblk_cmn() vx_getblk() vx_getmap() vx_getemap() vx_extfind() vx_searchau_downlevel() vx_searchau_uplevel() vx_searchau() vx_extentalloc_device() vx_extentalloc() vx_reorg_emap() vx_extmap_reorg() vx_reorg() vx_aioctl_full() vx_aioctl_common() vx_aioctl() vx_unlocked_ioctl() vfs_ioctl() do_vfs_ioctl() sys_ioctl() tracesys() DESCRIPTION: While performing the fsadm operation, the secondary node in the CFS cluster is unable to allocate space from EAU (Extent Allocation Unit) delegation given by the primary node. It requests the primary node for another delegation. While giving such delegations, the primary node does not verify whether the EAU has exclusion zones set on it. It only verifies if it has enough free space. On secondary node, the extent allocation cannot be done from EAU which has exclusion zone set, resulting in loop. RESOLUTION: The code is modified such that the primary node will not delegate EAU to the secondary node which have exclusion zone set on it. * 3518943 (Tracking ID: 3534779) SYMPTOM: Internal stress testing on Cluster File System (CFS) hits a debug assert. DESCRIPTION: The assert was hit while refreshing the incore reference count queue (rcq) values from the disk in response to a loadfs message. Due to which, a race occurs with a rcq processing thread that has already advanced the incore rcq indexes on a primary node in CFS. RESOLUTION: The code is modified to avoid selective updates in incore rcq. * 3519809 (Tracking ID: 3463464) SYMPTOM: Internal kernel functionality conformance test hits a kernel panic due to null pointer dereference. DESCRIPTION: In the vx_fsadm_query()function, error handling code path incorrectly sets the nodeid to AnullA in the file system structure. As a result of clearing nodeid, any subsequent access to this field results in the kernel panic. RESOLUTION: The code is modified to improve the error handling code path. * 3528770 (Tracking ID: 3449152) SYMPTOM: The vxtunefs(1M) command fails to set the thin_friendly_alloc tunable in CFS. DESCRIPTION: The thin_friendly_alloc tunable is not supported on CFS. But when the vxtunefs(1M) command is used to set it in CFS, a false successful message is displayed. RESOLUTION: The code is modified to report error for the attempt to set the thin_friendly_alloc tunable in CFS. * 3529852 (Tracking ID: 3463717) SYMPTOM: CFS does not support the 'thin_friendly_alloc' tunable. And, the vxtunefs(1M) command man page is not updated with this information. DESCRIPTION: Since the man page does not explicitly mention that the 'thin_friendly_alloc' tunable is not supported, it is assumed that CFS supports this feature. RESOLUTION: The man page pertinent to the vxtunefs(1M) command is updated to denote that CFS does not support the 'thin_friendly_alloc' tunable. * 3529862 (Tracking ID: 3529860) SYMPTOM: The package verification using the pkg verifyA command fails for VRTSglm, VRTSgms, and VRTSvxfs packages on Solaris 11. DESCRIPTION: The package verification using the Apkg verifyA command fails for the Group Lock Manager (GLM), Group Messaging Services (GMS), and VxFS packages on Solaris 11 due to a missing minor node permission '* 0640 root sys' from etc/minor_perm file. RESOLUTION: The code is modified to update the entry in etc/minor_perm file. * 3530038 (Tracking ID: 3417321) SYMPTOM: The vxtunefs(1M) man page gives an incorrect DESCRIPTION: According to the current design, the tunable Adelicache_enableA is enabled by default both in case of local mount and cluster mount. But, the man page is not updated accordingly. It still specifies that this tunable is enabled by default only in case of a local mount. The man page needs to be updated to correct the RESOLUTION: The code is modified to update the man page of the vxtunefs(1m) tunable to display the correct contents for the Adelicache_enableA tunable. Additional information is provided with respect to the performance benefits, in case of CFS being limited as compared to the local mount. Also, in case of CFS, unlike the other CFS tunable parameters, there is a need to explicitly turn this tunable on or off on each node. * 3541125 (Tracking ID: 3541083) SYMPTOM: The vxupgrade(1M) command for layout version 10 creates 64-bit quota files with inappropriate permission configurations. DESCRIPTION: Layout version 10 supports 64-bit quota feature. Thus, while upgrading to version 10, 32-bit external quota files are converted to 64-bit. During this conversion process, 64-bit files are created without specifying any permission. Hence, random permissions are assigned to the 64-bit file, which creates an impression that the conversion process was not successful as expected. RESOLUTION: The code is modified such that appropriate permissions are provided while creating 64-bit quota files. Patch ID: 151226-02 * 3864144 (Tracking ID: 3451730) SYMPTOM: Installation of VRTSodm, VRTSvxfs in a zone fails when running zoneadm -z Zoneattach -U DESCRIPTION: When you upgrade a zone using attach U option, the checkinstall script is executed. There were certain zone-irrelevant commands (which should not be executed during attach) in the checkinstall script which failed the installation of VRTSodm, VRTSvxfs. RESOLUTION: Code is added in the postinstall script to fix the checkinstall script. * 3864174 (Tracking ID: 2905552) SYMPTOM: fsdedupadm schedule was executed even if there were no schedule DESCRIPTION: The fsdedupschd checks the configuration change every 30 minutes. When the fsdedupschd noticed the new configuration, it will update it from config file to memory. When it finds that the schedule is changed to NONE for one filesystem, it will call cleanup schedule to remove the schedule. n this cleanup_schedule() func, it uses filesystem name to judge whether to delete that schedule entry from the schedule-list. Unfortunately the cleanup_schedule() used the device name (/dev/vx/dsk/dupdg/yamavol) as the filesystem name to check match, while the schedule entries use mount point (/yamavol) as filesystem names. Therefore, it will never be a match. So the schedule entries will never be removed. RESOLUTION: The code is modified to remove schedule entries from cleanup function according to device name. * 3864176 (Tracking ID: 3801320) SYMPTOM: Core dump is generated when deduplication is running. DESCRIPTION: Two different cores are generated because of the following two reasons: 1. Dedup maintains list of jobs that are running in the running queue.The jobs are in three status: not running, failed or running. Depending upon the status, dedup either removesthe job (not running), or moves the job to schedule queue (failed), or else keeps the job in running queue (running state). In some cases, when removing the job in check_running(), dedup incorrectly assigns next job with a garbage value which can cause running queue corruption. 2. Dedup populates the duplicated block list in its offset tree. If the allocation failsfor one of the nodes, then dedup incorrectly arranges nodes. RESOLUTION: The code is modified to assign correct job when removing previous job and to populate offset tree correctly if allocation fails for one of the nodes. Patch ID: 151226-01 * 3620250 (Tracking ID: 3621205) SYMPTOM: OpenSSL common vulnerability exposure (CVE): POODLE and Heartbleed. DESCRIPTION: VRTSfsadv package uses old versions of OpenSSL which are vulnerable to POODLE(CVE-2014-3566) and Hearbleed(CVE-2014-0160). By upgrading to OpenSSL 0.9.8zc, many security vulnerabilities have been fixed. RESOLUTION: The VRTSfsadv package is built with OpenSSL 0.9.8zc.. INSTALLING THE PATCH -------------------- Run the Installer script to automatically install the patch: ----------------------------------------------------------- To install the patch perform the following steps on at least one node in the cluster: 1. Copy the patch sfha-sol10_sparc-Patch-6.1.1.100.tar.gz to /tmp 2. Untar sfha-sol10_sparc-Patch-6.1.1.100.tar.gz to /tmp/hf # mkdir /tmp/hf # cd /tmp/hf # gunzip /tmp/sfha-sol10_sparc-Patch-6.1.1.100.tar.gz # tar xf /tmp/sfha-sol10_sparc-Patch-6.1.1.100.tar 3. Install the hotfix # pwd /tmp/hf # ./installSFHA611P100 [ ...] You can also install this patch together with 6.1.1 maintenance release using Install Bundles 1. Download this patch and extract it to a directory 2. Change to the Veritas InfoScale 6.1.1 directory and invoke the installmr script with -patch_path option where -patch_path should point to the patch directory # ./installmr -patch_path [] [ ...] Install the patch manually: -------------------------- You can also install this patch together with 6.1 GA release using Install Bundles 1. Download this patch and extract it to a directory 2. Change to the Veritas InfoScale 6.1 directory and invoke the installer script with -patch_path option where -patch_path should point to the patch directory. # ./installer -patch_path [patch path] [host1 host2...] For the Solaris 10 release, refer to the online manual pages for instructions on using 'patchadd' and 'patchrm' scripts provided with Solaris. Any other special or non-generic installation instructions should be described below as special instructions. The following example installs a patch to a standalone machine: example# patchadd /var/spool/patch/152134-01 REMOVING THE PATCH ------------------ Run the Uninstaller script to automatically remove the patch: ------------------------------------------------------------ To uninstall the patch perform the following step on at least one node in the cluster: # /opt/VRTS/install/uninstallSFHA611P100 [ ...] Remove the patch manually: ------------------------- The following example removes a patch from a standalone system: example# patchrm 152134-01 For additional examples please see the appropriate manual pages. SPECIAL INSTRUCTIONS -------------------- You need to use the shutdown command to reboot the system after patch installation or de-installation: shutdown -g0 -y -i6 A Solaris 10 issue may prevent this patch from complete installation. Before installing this VM patch, install the Solaris patch 119254-70 (or a later revision). This Solaris patch fixes packaging, installation and patch utilities. [Sun Bug ID 6337009] Download Solaris 10 patch 119254-70 (or later) from Sun at http://sunsolve.sun.com OTHERS ------ NONE