README VERSION : 1.1 README CREATION DATE : 2012-09-21 PATCH-ID : 5.1.133.000 PATCH NAME : VRTSvxvm 5.1SP1RP3 BASE PACKAGE NAME : VRTSvxvm BASE PACKAGE VERSION : 5.1.100.000 SUPERSEDED PATCHES : 5.1.132.000 REQUIRED PATCHES : NONE INCOMPATIBLE PATCHES : NONE SUPPORTED PADV : rhel5_x86_64,rhel6_x86_64,sles10_x86_64,sles11_x86_64 (P-PLATFORM , A-ARCHITECTURE , D-DISTRIBUTION , V-VERSION) PATCH CATEGORY : CORE , CORRUPTION , HANG , MEMORYLEAK , PANIC , PERFORMANCE PATCH CRITICALITY : OPTIONAL HAS KERNEL COMPONENT : YES ID : NONE REBOOT REQUIRED : YES PATCH INSTALLATION INSTRUCTIONS: -------------------------------- Please refer to the install/upgrade/rollback section in the Release Notes PATCH UNINSTALLATION INSTRUCTIONS: ---------------------------------- Please refer to the install/upgrade/rollback section in the Release Notes SPECIAL INSTALL INSTRUCTIONS: ----------------------------- NONE SUMMARY OF FIXED ISSUES: ----------------------------------------- 2070079 (1903700) Removing mirror using vxassist does not work. 2205574 (1291519) After multiple VVR migrate operations, vrstat fails to output statistics. 2427560 (2425259) vxdg join operation fails with VE_DDL_PROPERTY: Property not found in the list 2442751 (2104887) vxdg import error message needs improvement for cloned diskgroup import failure. 2442827 (2149922) Record the diskgroup import and deport events in syslog 2492568 (2441937) vxconfigrestore precommit fails with awk errors 2515137 (2513101) User data corrupted with disk label information 2531224 (2526623) Memory leak detected in CVM code. 2560539 (2252680) vxtask abort does not cleanup tasks properly. 2567623 (2567618) VRTSexplorer coredumps in checkhbaapi/print_target_map_entry. 2570988 (2560835) I/Os and vxconfigd hung on master node after slave is rebooted under heavy I/O load. 2576605 (2576602) 'vxdg listtag' should give error message and display correct usage when executed with wrong syntax 2613584 (2606695) Machine panics in CVR (Clustered Volume Replicator) environment while performing I/O Operations. 2613596 (2606709) IO hang is seen when SRL overflows and one of the nodes reboots 2616006 (2575172) I/Os are hung on master node after rebooting the slave node. 2618339 (2486301) LxRT 6.0 SFHA SLES10 SP4, IBM DS APF, CPI, "VXFS" package installation failed. 2622029 (2620556) IO hung after SRL overflow 2622032 (2620555) IO hang due to SRL overflow & CVM reconfig 2626742 (2626741) Using vxassist -o ordered and mediatype:hdd options together do not work as expected 2626745 (2626199) "vxdmpadm list dmpnode" printing incorrect path-type 2626899 (2612301) Upgrading kernel on encapsulated root disk does not work properly. 2627000 (2578336) Failed to online the cdsdisk. 2627004 (2413763) Uninitialized memory read results in a vxconfigd coredump 2627021 (2561012) The offset of private(and/or public) region of disks are shown incorrect in the vxdisk list output which could lead to DG import problem as well as IO errors and system hang reported by VxFS or other applications. 2641932 (2348199) vxconfig dumps core while importing a Disk Group 2646417 (2556781) In cluster environment, import attempt of imported disk group may return wrong error. 2652161 (2647975) Customer ran hastop -local and shared dg had splitbrain 2663673 (2656803) Race between vxnetd start and stop operations causes panic. 2690959 (2688308) When re-import of disk group fails during master takeover, other shared disk groups should not be disabled. 2695226 (2648176) Performance difference on Master vs Slave during recovery via DCO. 2695231 (2689845) Data disk can go in error state when data at the end of the first sector of the disk is same as MBR signature 2703035 (925653) Node join fails for higher CVMTimeout value. 2705101 (2216951) vxconfigd dumps core because chosen_rlist_delete() hits NULL pointer in linked list of clone disks 2706010 (1533134) Warnings regarding depreciated SCSI ioctl appear in syslog. 2706024 (2664825) DiskGroup import fails when disk contains no valid UDID tag on config copy and config copy is disabled. 2706027 (2657797) Starting 32TB RAID5 volume fails with V-5-1-10128 Unexpected kernel error in configuration update 2706038 (2516584) startup scripts use 'quit' instead of 'exit', causing empty directories in /tmp 2726161 (2726148) System becomes unbootable after setting dmp_native_support tunable on and reboot is done. 2730149 (2515369) vxconfigd(1M) can hang in the presence of EMC BCV devices 2737373 (2556467) DMP-ASM: disabling all paths and reboot of the host causes losing of /etc/vx/.vxdmprawdev records 2737374 (2735951) Uncorrectable write error is seen on subdisk when SCSI device/bus reset occurs. 2747340 (2739709) Disk group rebuild fails as the links between volume and vset were missing from 'vxprint -D -' output. 2750455 (2560843) In VVR(Veritas Volume Replicator) setup I/Os can hang in slave nodes after one of the slave node is rebooted. 2754704 (2333540) "vxdisk resize" may incorrectly reduce the size of a VxVM disk. 2756069 (2756059) System may panic when large cross-dg mirrored volume is started at boot. 2759895 (2753954) When a cable is disconnected from one port of a dual-port FC HBA, the paths via another port are marked as SUSPECT PATH. 2763211 (2763206) "vxdisk rm" command dumps core when disk name of very large length is given 2768492 (2277558) vxassist outputs a misleading error message during snapshot related operations. 2800774 (2566174) Null pointer dereference in volcvm_msg_rel_gslock() results in panic. 2804911 (2637217) Document new storage allocation attribute support in vradmin man page for resizevol/resizesrl. 2821137 (2774406) System may panic while accessing data change map volume 2821143 (1431223) "vradmin syncvol" and "vradmin syncrvg" commands do not work if the remote diskgroup and vset names are specified when synchronizing vsets. 2821176 (2711312) New symbolic link is created in root directory when a FC channel is pulled out. 2821452 (2495332) vxcdsconvert fails if the private region of the disk to be converted is less than 1 MB. 2821519 (1765916) VxVM socket files don't have proper write protection 2821678 (2389554) vxdg listssbinfo output is incorrect. 2821695 (2599526) IO Hang seen when DCM is zero. 2826129 (2826125) VxVM script daemon is terminated abnormally on its invocation. 2826607 (1675482) "vxdg list " command shows configuration copy in new failed state. 2827791 (2760181) Panic hit on secondary slave during logowner operation. 2827794 (2775960) In secondary CVR case, IO hang seen on a DG during SRL disable activity on other DG. 2827939 (2088426) Re-onlining of disks in DG during DG deport/destroy. 2836910 (2818840) Enhance the vxdmpasm utility to support various permissions and "root:non- system" ownership can be set persistently. 2845984 (2739601) vradmin repstatus output occasionally reports abnormal timestamp. 2852270 (2715129) Vxconfigd hangs during Master takeover in a CVM (Clustered Volume Manager) environment. 2858859 (2858853) After master switch, vxconfigd dumps core on old master. 2859390 (2000585) vxrecover doesn't start remaining volumes if one of the volumes is removed during vxrecover command run. 2860281 (2838059) VVR Secondary panic in vol_rv_update_expected_pos. 2860445 (2627126) IO hang seen due to IOs stuck at DMP level. 2860449 (2836798) In VxVM, resizing simple EFI disk fails and causes system panic/hang. 2860451 (2815517) vxdg adddisk allows mixing of clone & non-clone disks in a DiskGroup. 2860812 (2801962) Growing a volume takes significantly large time when the volume has version 20 DCO attached to it. 2862024 (2680343) Manual disable/enable of paths to an enclosure leads to system panic 2863673 (2783293) After upgrade to RHEL5.8(2.6.18-308), all paths get disabled when deport/import operations are invoked on shared dgs with SCSI-3 mode. 2867483 (2886402) When re-configuring devices, vxconfigd hang is observed. 2871980 (2868790) In RHEL 6, there are some changes in the sysfs tree layout preventing vxesd to collect the HBA topology information through sysfs. 2876116 (2729911) IO errors seen during controller reboot or array port disable/enable. 2880411 (2483265) VxVM vxdmp V-5-0-0 i/o error occurred (errno=0x205) 2882488 (2754819) Diskgroup rebuild through 'vxmake -d' loops infinitely if the diskgroup configuration has multiple objects on a single cache object. 2884231 (2606978) Private region I/O errors due to DMP_PATH_FAILED do not trigger path failover in linux vxio. 2886083 (2257850) vxdiskadm leaks memory while performing operations related to enclosures. 2903216 (2558261) VxVM unable to setup/un-setup powerpath devices. 2907643 (2924440) Message "Syntax error: "fi" unexpected" seen while booting from root encapsulated machine in RHEL6. 2907643 (2924440) Message "Syntax error: "fi" unexpected" seen while booting from root encapsulated machine in RHEL6. 2911010 (2627056) 'vxmake -g DGNAME -d desc-file' fails with very large configuration due to memory leaks SUMMARY OF KNOWN ISSUES: ----------------------------------------- 2223250(2165829) Node is not able to join the cluster when recovery is in progress. 2917139(2917137) Machine panics or vxconfigd hangs if VRTSaslapm older than 5.1.100.500 is installed with VxVM 5.1SP1RP3 (5.1.133.000) on RHEL6. 2920831(2920815) Upgrading kernel on encapsulated bootdisk not working as documented on SLES11. KNOWN ISSUES : -------------- * INCIDENT NO::2223250 TRACKING ID ::2165829 SYMPTOM:: Node join fails if the recovery for the leaving node is not completed. WORKAROUND:: Retry node join after the recovery is completed. * INCIDENT NO::2917139 TRACKING ID ::2917137 SYMPTOM:: The machine will panic or vxconfigd and all subsequent VM commands will hang if VRTSaslapm older than 5.1.100.500 is installed on top of VxVM 5.1SP1RP3 (5.1.133.000) on RHEL6. WORKAROUND:: - Use VRTSaslapm package with version greater than or equal to 5.1.100.500 - While upgrading to VxVM 5.1SP1RP3 from older VxVM releases, first un-install VRTSaslapm package. Then upgrade to VxVM 5.1SP1RP3 and then install VRTSaslapm package with version greater than or equal to 5.1.100.500 * INCIDENT NO::2920831 TRACKING ID ::2920815 SYMPTOM:: WORKAROUND:: Perform following steps on machine with encapsulated root disk to upgrade kernel. 1.Unroot the encapsulated root disk. Use command: #/etc/vx/bin/vxunroot 2.Upgrade the kernel e.g. #rpm -Uvh Kernel- 3. Reboot (so that we boot into upgraded kernel) 4.Re-encapsulated the root disk Use command: #/etc/vx/bin/vxencap -c -g rootdisk= FIXED INCIDENTS: ---------------- PATCH ID:5.1.133.000 * INCIDENT NO:2070079 TRACKING ID:1903700 SYMPTOM: vxassist remove mirror does not work if nmirror and alloc is specified, giving an error "Cannot remove enough mirrors" DESCRIPTION: During remove mirror operation, VxVM does not perform correct analysis of plexes. Hence the issue. RESOLUTION: Necessary code changes have been done so that vxassist works properly. * INCIDENT NO:2205574 TRACKING ID:1291519 SYMPTOM: After two VVR migrate operations, vrstat command does not output any statistics. DESCRIPTION: Migrate operation results in RDS (Replicated Data Set) information getting updated on both primary and secondary side vradmind. After multiple migrate operations, stale handle to older RDS is used by vrstat to retrieve statistics resulting in the failure. RESOLUTION: Necessary code changes have being made to ensure correct and updated RDS handle is used by vrstat to retrieve statistics. * INCIDENT NO:2427560 TRACKING ID:2425259 SYMPTOM: vxdg join operation fails throwing error "join failed : Invalid attribute specification". DESCRIPTION: For the disk name containing "/" character e. g. cciss/c0d1, join operation fails to parse the name of disks and hence returns with error. RESOLUTION: Code changes are made to handle special characters in disk name. * INCIDENT NO:2442751 TRACKING ID:2104887 SYMPTOM: vxdg import fails with following ERROR message for cloned device import, when original diskgroup is already imported with same DGID. # vxdg -Cfn clonedg -o useclonedev=on -o tag=tag1 import testdg VxVM vxdg ERROR V-5-1-10978 Disk group testdg: import failed: Disk group exists and is imported DESCRIPTION: In case of clone device import, vxdg import without "-o updateid" option fails, if original DG is already imported. The error message returned may be interpreted as diskgroup with same name is imported, while actually the dgid is duplicated and not the dgname. RESOLUTION: vxdg utility is modified to return better error message for cloned DG import. It directs you to get details from system log. Details of conflicting dgid and suggestion to use "-o updateid" are added in the system log. * INCIDENT NO:2442827 TRACKING ID:2149922 SYMPTOM: Record the diskgroup import and deport events in the /var/log/messages file. Following type of message can be logged in syslog: vxvm: vxconfigd: V-5-1-16254 Disk group import of succeeded. DESCRIPTION: With the diskgroup import or deport, appropriate success message or failure message with the cause for failure should be logged. RESOLUTION: Code changes are made to log diskgroup import and deport events in syslog. * INCIDENT NO:2492568 TRACKING ID:2441937 SYMPTOM: vxconfigrestore(1M) command fails with the following error... "The source line number is 1. awk: Input line 22 | cannot be longer than 3,000 bytes." DESCRIPTION: In the function where we read the disk attributes from backup, we are getting the disk attributes in the variable "$disk_attr". The value of this variable "$disk_attr" comes out to be a line longer than 3000 bytes. Later this variable "$disk_attr" is used by awk(1) command to parse it and hits the awk(1) limitation of 3000 bytes. RESOLUTION: The code is modified to replace awk(1) command with cut(1) command which does not have this limitation. * INCIDENT NO:2515137 TRACKING ID:2513101 SYMPTOM: When VxVM is upgraded from 4.1MP4RP2 to 5.1SP1RP1, the data on CDS disk gets corrupted. DESCRIPTION: When CDS disks are initialized with VxVM version 4.1MP4RP2, the number of cylinders are calculated based on the disk raw geometry. If the calculated number of cylinders exceed Solaris VTOC limit (65535), because of unsigned integer overflow, truncated value of number of cylinders gets written in CDS label. After the VxVM is upgraded to 5.1SP1RP1, CDS label gets wrongly written in the public region leading to the data corruption. RESOLUTION: The code changes are made to suitably adjust the number of tracks & heads so that the calculated number of cylinders be within Solaris VTOC limit. * INCIDENT NO:2531224 TRACKING ID:2526623 SYMPTOM: Memory leak detected in CVM DMP messaging phase. Below is message: NOTICE: VxVM vxio V-5-3-3938 vol_unload(): not all memory has been freed (volkmem=424) DESCRIPTION: During CVM-DMP messaging memory was not getting freed for a specific scenario. RESOLUTION: Necessary code changes done to take care of memory deallocation. * INCIDENT NO:2560539 TRACKING ID:2252680 SYMPTOM: When a paused VxVM (Veritas Volume Manager task) is aborted using 'vxtask abort' command, it does not get aborted appropriately. It continues to show up in the output of 'vxtask list' command and the corresponding process does not get killed. DESCRIPTION: As appropriate signal is not sent to the paused VxVM task which is being aborted, it fails to abort and continues to show up in the output of 'vxtask list' command. Also, its corresponding process does not get killed. RESOLUTION: Code changes are done to send an appropriate signal to paused tasks to abort them. * INCIDENT NO:2567623 TRACKING ID:2567618 SYMPTOM: VRTSexplorer coredumps in checkhbaapi/print_target_map_entry which looks like: print_target_map_entry() check_hbaapi() main() _start() DESCRIPTION: checkhbaapi utility uses HBA_GetFcpTargetMapping() API which returns the current set of mappings between operating system and fibre channel protocol (FCP) devices for a given HBA port. The maximum limit for mappings was set to 512 and only that much memory was allocated. When the number of mappings returned was greater than 512, the function that prints this information used to try to access the entries beyond that limit, which resulted in core dumps. RESOLUTION: The code has been changed to allocate enough memory for all the mappings returned by HBA_GetFcpTargetMapping(). * INCIDENT NO:2570988 TRACKING ID:2560835 SYMPTOM: On master I/Os and vxconfigd get hung when slave is rebooted under heavy I/O load. DESCRIPTION: When slave leaves cluster without sending the DATA ack message to master, slave's I/Os get stuck on master because their logend processing can not be completed. At the same time cluster reconfiguration takes place as the slave left the cluster. In CVM (Cluster Volume Manager) reconfiguration code path these I/Os are aborted in order to proceed with the reconfiguration and recovery. But if the local I/Os on master goes to the logend queue after the logendq is aborted, these local I/Os get stuck forever in the logend queue leading to the permanent I/O hang. RESOLUTION: During CVM reconfiguration and RVG (Replicated Volume group) recovery later, no I/Os will be put into the logendq. * INCIDENT NO:2576605 TRACKING ID:2576602 SYMPTOM: listtag option for vxdg command gives results even when executed with wrong syntax. DESCRIPTION: The correct syntax, as per vxdg help, is "vxdg listtag [diskgroup ...]". However, when executed with the wrong syntax, "vxdg [-g diskgroup] listtag", it still give results. RESOLUTION: Please use the correct syntax as per help for vxdg command. The command has being modified from 6.0 release onwards to display error and usage message when wrong syntax is used. * INCIDENT NO:2613584 TRACKING ID:2606695 SYMPTOM: Panic in CVR (Clustered Volume Replicator) environment while performing I/O Operations. Panic stack traces might look like: 1) vol_rv_add_wrswaitq vol_get_timespec_latest vol_kmsg_obj_request vol_kmsg_request_receive vol_kmsg_receiver kernel_thread 2) vol_rv_mdship_callback vol_kmsg_receiver kernel_thread DESCRIPTION: In CVR, logclient requests METADATA information from logowner node to perform write operations. Logowner node looks for any duplicate messages before adding the requests to the queue for processing. When a duplicate request arrives, logowner tries to copy the data from original I/O request and responds to the logclient with the METADATA information. During this process, panic can occur i) While copying the data as the code handling "copy" is not properly locked. ii) if logclient receives inappropriate METADATA information because of improper copy. RESOLUTION: Code changes are performed with appropriate conditions and locks while copying the data from original I/O requests for the duplicates. * INCIDENT NO:2613596 TRACKING ID:2606709 SYMPTOM: SRL overflow and CVR reconfiguration lead to the reconfiguration hang. DESCRIPTION: There are 6 RVG each has 16 datavolumes in the above reported problem. This problem could happen with more than 1 RVG configured. Here, both master and slave nodes are performing I/O. Slave node is rebooted, and which trigger a reconfiguration. All 6 RVG's doing I/O which fully utilized the RVIOMEM pool (memory pool used for RVG I/O's). Due to node leave, the I/O's on all the RVG will come to halt waiting for recovery flag set by the reconfiguration code path. Some pending I/O's in all the RVG's are still kept in the queue, due to holes in the SRL beacuse of node leave. The RVIOMEM pool is completely used by 3 of the RVG (600 + I/O) which are still doing the I/O. In the reconfiguration code, the rvg1 is picked to abort all the pending IO's in the queue, and wait for the active I/O's to complete. There are still some I/O still waiting for the RVIOMEM pool and is waiting for the memory. But the other active RVG's are not releasing any memory, this is just queued or waiting for the memory. With out all the pending I/O's are serviced, the code will not move forward to abort the I/O's, and the reconfiguration will never complete. RESOLUTION: Instead of going 1 by 1 RVG to abort and start the recovery, changing the logic to abort the I/O's in all the RVG's first. Later send the recovery message for all the RVG's after the iocount drains to 0. This way, we will avoid a hang situation due to some RVG's holding the memory. * INCIDENT NO:2616006 TRACKING ID:2575172 SYMPTOM: The reconfigd thread is hung waiting for the IO to drain. DESCRIPTION: While doing CVR(Cluser Volume Replicator) reconfiguration, RVG (Replicator Volume Group) recovery is started. The recovery can get stuck in DCM(Data Change Map) read while flushing the SRL(Serial Replicator Log). Flush operation creates lrage number of (1000+) threads. When the system memory is very low. In some cases, when the memory allocation is fails, failing to reduce the count leads to hang. RESOLUTION: Reset the number_of_children to 0, when ever the I/O creation fails due to memory allocation failure. * INCIDENT NO:2618339 TRACKING ID:2486301 SYMPTOM: During SFHA CPI installation, "vxfs" package installation fails on system having good amount of luns coming from A/P-F array DESCRIPTION: VxVM package post installation scripts invokes udevtrigger to generate hwpath information. udevtrigger is asynchronous command and causes lot of udev events to be generated. You could also see end I/O error messages in console OR /var/logs/messages file during this time for secondary paths of A/P-F LUNs. Now VxFS package installation goes ahead and waits for /dev/vxportal device file creation. This file gets generated on corresponding udev event. But since udev is busy with previous task this can take time sometime and cause vxfs post scripts to timeout and fail. RESOLUTION: Removed udevtrigger from VxVM post installation scripts and replaced with equivalent code to generate hwpath information. SFHA CPI installation works fine and user shouldn't see any I/O error messages generated via udev daemon during CPI installation. * INCIDENT NO:2622029 TRACKING ID:2620556 SYMPTOM: The I/O hung on the primary after SRL overflow and during SRL flush and rlink connect/disconnect. DESCRIPTION: As part of rlink connect or disconnect, the RVG is serialized to complete the connection or disconnection. The I/O throttle is normal during the SRL flush due to memory pool pressure or reaching the max throttle limit. During the serialization, the i/o is throttled to complete the DCM flush. The remote I/O's are kept in throttleq during the throttling is triggered. Due to I/O serialization, the throttled I/O is never get flushed and because of that I/O never complete. RESOLUTION: If the serialization is successful, flush the throttleq immediately. This will make sure, the remote I/O's will get retried again in the serialization code path * INCIDENT NO:2622032 TRACKING ID:2620555 SYMPTOM: During CVM reconfig, the RVG wait for the iocount to go to '0', to start the RVG recovery and complete the reconfiguration. DESCRIPTION: In CVR, the node leave will trigger the reconfiguration. The reconfiguration code path initiate the RVG recovery of all the shared diskgroup. The recovery is needed to flush the SRL (shared by all the nodes) to the data volume to avoid any missing writes to the data volume by the leaving node. This recovery involves, reading the data fromt he SRL and copy copy it to the data volume. The flush may take its own time depend on the disk response time and the size of SRL region required to flush. During the recovery a flag is set on the RVG to avoid any new I/O. In this particular hand, the recovery is taking 30 minutes. During this time, there is another node leave happened, which triggered the second reconfiguration. The second reconfiguration before it trigger another recovery it wait for the IO count to go to zero by setting the RECOVER flag to RVG. The first RVG recovery clears the RECOVER flag after 30 minutes once completed the SRL flush. Since this is the same flag set by the second reconfiguration, the second reconfiguration waiting indefinitely for the I/O count to go to zero. Since the RECOVER flag is unset, the I/O resumed. So second reconfiguration stuck forever. RESOLUTION: If the RECOVER flag is set, dont keep waiting for the iocount to become zero in the reconfigration code path. There is no need for another recovery, if the second reconfiguration is started before the first recovery completes. * INCIDENT NO:2626742 TRACKING ID:2626741 SYMPTOM: Vxassist when used with "-o ordered " and "mediatype:hdd" during striped volume make operation does not maintain disk order. DESCRIPTION: Vxassist when invoked with "-o ordered" and "mediatype:hdd" options while creating a striped volume, does not maintain the disk order provided by the user. First stripe of the volume should correspond to the first disk provided by the user. RESOLUTION: Rectified code to use the disks as per the user specified disk order. * INCIDENT NO:2626745 TRACKING ID:2626199 SYMPTOM: "vxdmpadm list dmpnode" command shows the path-type value as "primary/secondary" for a LUN in an Active-Active array as below when it is suppose to be NULL value. dmpdev = c6t0d3 state = enabled ... array-type = A/A ###path = name state type transport ctlr hwpath aportID aportWWN attr path = c23t0d3 enabled(a) secondary FC c30 2/0/0/2/0/0/0.0x50060e800 5c0bb00 - - - DESCRIPTION: For a LUN under Active-Active array the path-type value is supposed to be NULL. In this specific case other commands like "vxdmpadm getsubpaths dmpnode=<>" were showing correct (NULL) value for path-type. RESOLUTION: The "vxdmpadm list dmpnode" code path failed to initialize the path-type variable and by default set path-type to "primary or secondary" even for Active-Active array LUN's. This is fixed by initializing the path-type variable to NULL. * INCIDENT NO:2626899 TRACKING ID:2612301 SYMPTOM: Kernel upgrade on encapsulated root disk does not work properly and may cause system boot failure. While upgrading kernel on encapsulated root disk you may experience any of the following issues: - Incorrect entry for VxVM(Veritas Volume Manager) root disk in menu.lst - System boot failure - VxVM_root_backup entry is not removed from menu.lst when un-encapsulating root disk DESCRIPTION: Kernel upgrade on encapsulated root disk does not work properly due to following reasons: - Command 'upgrade_encapped root' chooses incorrect VxVM kernel modules to generate VxVM_initrd.img file. It causes boot failure on next reboot. - Upgrading kernel disrupts existing menu entries in menu.lst file, due to which command 'upgrade_encapped root' generates incorrect VxVM_root menu entry in menu.lst. - while un-encapsulating root disk VxVM_root_backup entry is not removed from menu.lst file. RESOLUTION: Command 'upgrade_encapped root' and associated scripts for kernel upgrade process are rectified to choose correct VxVM kernel module on next reboot. While booting additional message is displayed to indicate which kernel module version is suitable for the upgraded kernel. Appropriate correct menu entries are generated for VxVM_root and previous VxVM_root entry is commented as VxVM_root_backup. Also, VxVM_root_backup entry is removed when un-encapsulating. The fix is applicable to RHEL5, RHEL6 & SLES10 only. We are not able to support kernel upgrade on encapsulated root for SLES11 as 'mkinitrd' utility fails with root encaped. * INCIDENT NO:2627000 TRACKING ID:2578336 SYMPTOM: I/O error is encountered while accessing the cdsdisk. DESCRIPTION: This issue is seen only on defective cdsdisks, where s2 partition size in sector 0 label is less than sum of public region offset and public region length. RESOLUTION: A solution has been implemented to rectify defective cdsdisk at the time the cdsdisk is onlined. * INCIDENT NO:2627004 TRACKING ID:2413763 SYMPTOM: vxconfigd, the VxVM daemon dumps core with the following stack: ddl_fill_dmp_info ddl_init_dmp_tree ddl_fetch_dmp_tree ddl_find_devices_in_system find_devices_in_system mode_set setup_mode startup main __libc_start_main _start DESCRIPTION: Dynamic Multi Pathing node buffer declared in the Device Discovery Layer was not initialized. Since the node buffer is local to the function, an explicit initialization is required before copying another buffer into it. RESOLUTION: The node buffer is appropriately initialized using memset() to address the coredump. * INCIDENT NO:2627021 TRACKING ID:2561012 SYMPTOM: In a cluster, if a disk is re-initialized and added to a DG on one node and DG is imported on another node the disk region offsets (and/or disk format type) is incorrect. This could result in disk listing errors or errors while doing IO over the disks or overlaying volumes. There could also be FS level hang or errors. DESCRIPTION: VxVM caches the disk related information for enhancing performance. In case of cluster if the disk layout is updated on disk for a disk which is not part of a shared DG, it is not reflected on all the nodes of cluster. When the DG is imported on another node, stale layout and format information about the disk is used which leads to incorrect offset of data on the disk. This could lead to incorrect display of disk attributes and type as well as IO error and subsequent issues in FS and other applications using the disk. RESOLUTION: Necessary code changes have been done to ensure that when importing a non-shared DG, the on-disk information about the disk is read rather than using the in memory information for the same. * INCIDENT NO:2641932 TRACKING ID:2348199 SYMPTOM: vxconfigd dumps core during Disk Group import with the following function call stack strcmp+0x60 () da_find_diskid+0x300 () dm_get_da+0x250 () ssb_check_disks+0x8c0 () dg_import_start+0x4e50 () dg_reimport+0x6c0 () dg_recover_all+0x710 () mode_set+0x1770 () setup_mode+0x50 () startup+0xca0 () main+0x3ca0 () DESCRIPTION: During Disk Group import, vxconfigd performs certain validations on the disks. During one such validation, it iterates through the list of available disk access records to find a match with a given disk media record. It does a string comparison of the disk IDs in the two records to find a match. Under certain conditions, the disk ID for a disk access record may have a NULL value. vxconfigd dumps core when it passes this to strcmp() function. RESOLUTION: Code was modified to check for disk access records with NULL value and skip them from disk ID comparison. * INCIDENT NO:2646417 TRACKING ID:2556781 SYMPTOM: In cluster environment, importing a disk group which is imported on another node will result in wrong error messages like given below: VxVM vxdg ERROR V-5-1-10978 Disk group : import failed: Disk is in use by another host DESCRIPTION: When VxVM is translating a given disk group name to disk group id during the disk group import process, an error return indicating that the disk group is in use by another host may be overwritten by a wrong error. RESOLUTION: The source code has been changed to handle the return value in a correct way. * INCIDENT NO:2652161 TRACKING ID:2647975 SYMPTOM: Serial Split Brain (SSB) condition caused Cluster Volume Manager(CVM) Master Takeover to fail. The below vxconfigd debug output was noticed when the issue was noticed: VxVM vxconfigd NOTICE V-5-1-7899 CVM_VOLD_CHANGE command received V-5-1-0 Preempting CM NID 1 VxVM vxconfigd NOTICE V-5-1-9576 Split Brain. da id is 0.5, while dm id is 0.4 for dm cvmdgA-01 VxVM vxconfigd WARNING V-5-1-8060 master: could not delete shared disk groups VxVM vxconfigd ERROR V-5-1-7934 Disk group cvmdgA: Disabled by errors VxVM vxconfigd ERROR V-5-1-7934 Disk group cvmdgB: Disabled by errors ... VxVM vxconfigd ERROR V-5-1-11467 kernel_fail_join() : Reconfiguration interrupted: Reason is transition to role failed (12, 1) VxVM vxconfigd NOTICE V-5-1-7901 CVM_VOLD_STOP command received DESCRIPTION: When Serial Split Brain (SSB) condition is detected by the new CVM master, on Veritas Volume Manager (VxVM) versions 5.0 and 5.1, the default CVM behaviour will cause the new CVM master to leave the cluster and causes cluster-wide downtime. RESOLUTION: Necessary code changes have been done to ensure that when SSB is detected in a diskgroup, CVM will only disable that particular diskgroup and keep the other diskgroups imported during the CVM Master Takeover, the new CVM master will not leave the cluster with the fix applied. * INCIDENT NO:2663673 TRACKING ID:2656803 SYMPTOM: VVR (Veritas Volume Replicator) panics when vxnetd start/stop operations are invoked in parallel. Panic stack trace might look like: panicsys vpanic_common panic mutex_enter() vol_nm_heartbeat_free() vol_sr_shutdown_netd() volnet_ioctl() volsioctl_real() spec_ioctl() DESCRIPTION: vxnetd start and stop operations are not serialized. Hence we hit race condition and panic if they are run in parallel, when they access the shared resources without locks. The panic stack varies depending on where the resource contention is seen. RESOLUTION: Incorporated synchronization primitive to allow only either the vxnetd start or stop process to run at a time. * INCIDENT NO:2690959 TRACKING ID:2688308 SYMPTOM: When re-import of a disk group fails during master takeover, it makes all the shared disk groups to be disabled. It also results in the corresponding node (new master) leaving the cluster. DESCRIPTION: In cluster volume manager when master goes down, the upcoming master tries to re-import the disk group. If some error occurs while re-importing the disk group then it disables all the shared disk groups and the new master leaves the cluster. This may result in cluster outage. RESOLUTION: Code changes are made to disable the disk group on which error occurred while re-importing and continue import of the other shared disk groups. * INCIDENT NO:2695226 TRACKING ID:2648176 SYMPTOM: In a clustered volume manager environment, additional data synchronization is noticed during reattach of a detached plex on a mirrored volume even when there was no I/O on the volume after the mirror was detached. This behavior is seen only on mirrored volumes with version 20 DCO attached and is part of a shared diskgroup. DESCRIPTION: In a clustered volume manager environment, write I/Os issued on a mirrored volume from the CVM master node are tracked in a bitmap unnecessarily. The tracked bitmap is then used during detach to create the tracking map for detached plex. This results in additional delta between active plex and the detached plex. So, even when there are no I/Os after detach, the reattach will do additional synchronization between mirrors. RESOLUTION: The unnecessary bitmap tracking of write I/Os issued on a mirrored volume from the CVM master node is prevented. So, the tracking map that gets created during detach will always starts clean. * INCIDENT NO:2695231 TRACKING ID:2689845 SYMPTOM: Disks are seen in error state. hitachi_usp-vm0_11 auto - - error DESCRIPTION: When data at the end of the first sector of the disk is same as MBR signature, Volume Manager misinterprets the data disk as MBR disk. Accordingly, partitions are determined but format determination fails for these fake partitions and disk goes into error state. RESOLUTION: Code changes are made to check the status field of the disk along with the MBR signature. Valid status fields for MBR disk are 0x00 and 0x80. * INCIDENT NO:2703035 TRACKING ID:925653 SYMPTOM: Node join fails when CVMTimeout is set to value higher than 35 mins (approximately). DESCRIPTION: Node join fails due to integer overflow for higher CVMTimeout value. RESOLUTION: Code changes done to handle higher CVMTimeout value. * INCIDENT NO:2705101 TRACKING ID:2216951 SYMPTOM: The vxconfigd daemon core dumps in the chosen_rlist_delete() function and the following stack trace is displayed: chosen_rlist_delete() req_dg_import_disk_names() request_loop() main() DESCRIPTION: The vxconfigd daemon core dumps when it accesses a NULL pointer in the chosen_rlist_delete() function. RESOLUTION: The code is modified to handle the NULL pointer in the chosen_rlist_delete() function * INCIDENT NO:2706010 TRACKING ID:1533134 SYMPTOM: Warnings regarding depreciated SCSI ioctl appear in syslog. Following type of message will be logged in /var/log/messages: kernel: program vxconfigd is using a deprecated SCSI ioctl, please convert it to SG_IO DESCRIPTION: SCSI_IOCTL_SEND_COMMAND is depreciated in Linux and is replaced with SG_IO. Using depreciated SCSI_IOCTL_SEND_COMMAND logs warnings in syslog. RESOLUTION: Code changes are made to replace SCSI_IOCTL_SEND_COMMAND with SG_IO interface. * INCIDENT NO:2706024 TRACKING ID:2664825 SYMPTOM: The following two issues are seen when a cloned disk group having a mixture of disks which are clones of disks initialized under VxVM version 4.x and 5.x is imported. (i) The following error will be seen without "-o useclonedev=on -o updateid" options on 5.x environment with the import failure. # vxdg -Cf import VxVM vxdg ERROR Disk group : import failed: Disk group has no valid configuration copies (ii) The following warning will be seen with "-o useclonedev=on -o updateid" options on 5.x environment with the import success. # vxdg -Cf -o useclonedev=on -o updateid import VxVM vxdg WARNING Disk : Not found, last known location: ... DESCRIPTION: The vxconfigd, a VxVM daemon, imports a disk group having disks where all the disks should be either cloned or standard(non-clone). If the disk group has a mixture of cloned and standard devices, and user attempts to import the disk group - (i) without "-o useclonedev=on" options, only standard disks are considered for import. The import would fail if none of the standard disks have a valid configuration copy. (ii) with "-o useclonedev=on" option, the import would succeed, but the standard disks go missing because only clone disks are considered for import. A disk which is initialized in the VxVM version earlier to 5.x has no concept of Unique Disk Identifier(UDID) which helps to identify the cloned disk. It could not be flagged as cloned disk even if it is indeed a cloned disk. This results in the issues (i) and (ii). RESOLUTION: The source code is modified to set the appropriate flags so that the disks initialized in VxVM 4.X will be recognized as "cloned", and both of the issues (i) and (ii) will be avoided. * INCIDENT NO:2706027 TRACKING ID:2657797 SYMPTOM: Starting a RAID5 volume fails, when one of the sub-disks in the RAID5 column starts at an offset greater than 1TB. Example: # vxvol -f -g dg1 -o delayrecover start vol1 VxVM vxvol ERROR V-5-1-10128 Unexpected kernel error in configuration update DESCRIPTION: VxVM uses an integer variable to store the starting block offset of a sub-disk in a RAID5 column. This overflows when a sub-disk is located at an offset greater than 2147483647 blocks (1TB) and results in failure to start the volume. Refer to "sdaj" in the following example. E.g. v RaidVol - DETACHED NEEDSYNC 64459747584 RAID - raid5 pl RaidVol-01 RaidVol ENABLED ACTIVE 64459747584 RAID 4/128 RW [..] SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE sd DiskGroup101-01 RaidVol-01 DiskGroup101 0 1953325744 0/0 sdaa ENA sd DiskGroup106-01 RaidVol-01 DiskGroup106 0 1953325744 0/1953325744 sdaf ENA sd DiskGroup110-01 RaidVol-01 DiskGroup110 0 1953325744 0/3906651488 sdaj ENA RESOLUTION: VxVM code is modified to handle integer overflow conditions for RAID5 volumes. * INCIDENT NO:2706038 TRACKING ID:2516584 SYMPTOM: There are many random directories not cleaned up in /tmp/, like vx.$RANDOM.$RANDOM.$RANDOM.$$ on system startup. DESCRIPTION: In general the startup scripts should call quit(), in which it call do the cleanup when errors detected. The scripts were calling exit() directly instead of quit() leaving some random-created directories uncleaned. RESOLUTION: These script should be restored to call quit() instead of exit() directly. * INCIDENT NO:2726161 TRACKING ID:2726148 SYMPTOM: System becomes unbootable after setting dmp_native_support tunable on and reboot is done. DESCRIPTION: When root is on raw partition and other essential mounts (like /var, /usr, etc) are on LVM partitions, enabling dmp native support will cause system to be unbootable because the value of filter in lvm.conf file has been modified to reject raw partitions. RESOLUTION: Modified the source to change device field of root in fstab to point to 'UUID='. * INCIDENT NO:2730149 TRACKING ID:2515369 SYMPTOM: vxconfigd(1M) can hang in the presence of EMC BCV devices in established (bcv-nr) state with a call stack similar to the following is observed: inline biowait_rp biowait dmp_indirect_io gendmpioctl dmpioctl spec_ioctl vno_ioctl ioctl syscall Also, a message similar to the following can be seen in the syslog: NOTICE: VxVM vxdmp V-5-3-0 gendmpstrategy: strategy call failed on bp
, path devno 255/ DESCRIPTION: The issue can happen during device discovery. While reading the device information, the device is expected to be opened in block mode, but the device was incorrectly being opened in character mode causing the hang. RESOLUTION: The code was changed to open the block device from DMP indirect IO code path. * INCIDENT NO:2737373 TRACKING ID:2556467 SYMPTOM: When dmp_native_support is enabled, ASM (Automatic Storage Management) disks are disconnected from host and host is rebooted, user defined user-group ownership of respective DMP (Dynamic Multipathing) devices is lost and ownership is set to default values. DESCRIPTION: The user-group ownership records of DMP devices in /etc/vx/.vxdmprawdev file are refreshed at the time of boot and only the records of currently available devices are retained. As part of refresh, records of all the disconnected ASM disks are removed from /etc/vx/.vxdmpraw and hence set to default value. RESOLUTION: Made code changes so that the file /etc/vx/.vxdmprawdev will not be refreshed at boot time. * INCIDENT NO:2737374 TRACKING ID:2735951 SYMPTOM: Following messages can be seen in syslog: SCSI error: return code = 0x00070000 I/O error, dev , sector VxVM vxdmp V-5-0-0 i/o error occurred (errno=0x0) on dmpnode / DESCRIPTION: When the SCSI resets happen, the I/O fails with PATH_OK or PATH_RETRY error. As time bound recovery is default recovery option, VxVM retries the I/O till timeout. Because of miscalculation of time taken by each I/O retry, total timeout value is reduced drastically. All retries fail with the same error in this small timeout value and uncorrectable error occurs. RESOLUTION: Code changes are made to calculate the timeout value properly. * INCIDENT NO:2747340 TRACKING ID:2739709 SYMPTOM: While rebuilding disk group,maker file generated from "vxprint - dmvpshrcCx -D -" command does not have the links between volumes and vset. Hence,rebuild of disk group fails. DESCRIPTION: File generated by the "vxprint -dmvpshrcCx -D -" command does not have the link between volumes and vset(volume set) due to which diskgroup rebuilding fails. RESOLUTION: Code changes are done to maintain the link between volumes and vsets. * INCIDENT NO:2750455 TRACKING ID:2560843 SYMPTOM: In 3 or more node cluster, when one of the slaves is rebooted under heavy I/O load, the I/Os hang on the other slave. Example : Node A (master and logowner) Node B (slave 1) Node C (slave 2) If Node C is doing a heavy I/Os and Node B is rebooted, the I/Os on Node C gets hung. DESCRIPTION: When Node B leaves the cluster, its throttled I/Os are aborted and all the resources taken by these I/Os are freed. Along with these I/Os, throttled I/Os of Node C are also responded that resources are not available to let Node C resend those I/Os. But during this process, region locks hold by these I/Os on master are not freed. RESOLUTION: All the resources taken by the remote I/Os on master are freed properly. * INCIDENT NO:2754704 TRACKING ID:2333540 SYMPTOM: When the Dynamic Lun Expansion feature is used to resize a LUN using "vxdisk resize" from RedHat 5.6 (RHSA-2012:0107) or RedHat 6.1 (RHSA- 2011:1849) onwards a VxVM disk may get resized incorrectly, possibly resulting in data corruption. DESCRIPTION: As part of LUN resize operation, VxVM issues an IOCTL on a LUN to gather disk geometry information such as number of heads, cylinders and sectors. Due to a change made in a security update to the operating system to lock a possible privilege escalation, IOCTLs issued on device partitions are blocked. This prevents VxVM from correctly obtaining the disk geometry. RESOLUTION: The code was modified to issue the IOCTL on the whole disk rather than a partition to gather the appropriate new geometry for the resized LUN. * INCIDENT NO:2756069 TRACKING ID:2756059 SYMPTOM: During boot process when vxvm starts large cross-dg mirrored volume (>1.5TB), system may panic with following stack: vxio:voldco_or_drl_to_pvm vxio:voldco_write_pervol_maps_20 vxio:volfmr_write_pervol_maps vxio:volfmr_copymaps_instant vxio:volfmr_copymaps vxio:vol_mv_precommit vxio:vol_commit_iolock_objects vxio:vol_ktrans_commit vxio:volconfig_ioctl vxio:volsioctl_real DESCRIPTION: During resync of the cross-dg mirrored volume DRL(dirtly region logging) log is changed to track map on the volume. While changing map pointer calculation is not done properly. Due to wrong moving forward step of the pointer, array out of bounds issue occurs for very large volume leading to panic. RESOLUTION: The code changes are done to fix the wrong pointer increment. * INCIDENT NO:2759895 TRACKING ID:2753954 SYMPTOM: When cable is disconnected from one port of a dual-port FC HBA, only paths going through the port should be marked as SUSPECT. But paths going through other port are also getting marked as SUSPECT. DESCRIPTION: Disconnection of a cable from a HBA port generates a FC event. When the event is generated, paths of all ports of the corresponding HBA are marked as SUSPECT. RESOLUTION: The code changes are done to mark the paths only going through the port on which FC event is generated. * INCIDENT NO:2763211 TRACKING ID:2763206 SYMPTOM: vxdisk rm dumps core with following stack trace vfprintf volumivpfmt volpfmt do_rm DESCRIPTION: While copying the disk name of very large length array bound checking is not done which causes buffer overflow. Segmentation fault occurs while accessing corrupted memory, terminating "vxdisk rm" process. RESOLUTION: Code changes are done to do array bound checking to avoid such buffer overflow issues. * INCIDENT NO:2768492 TRACKING ID:2277558 SYMPTOM: vxassist outputs an error message while doing snapshot related operations. The message looks like : "VxVM VVR vxassist ERROR V-5-1-10127 getting associations of rvg : Property not found in the list" DESCRIPTION: The error message is being displayed incorrectly. There is an error condition which is getting masked by a previously occurred error which vxassist chose to ignore, and went ahead with the operation. RESOLUTION: Fix has been added to reset previously occurred error which has been ignored, so that the real error is displayed by vxassist. * INCIDENT NO:2800774 TRACKING ID:2566174 SYMPTOM: In a Clustered Volume Manager environment, the node which is taking over as MASTER dumped core because of NULL pointer dereference while releasing the ilocks. The stack is given below: vxio:volcvm_msg_rel_gslock vxio:volkmsg_obj_sio_start vxio:voliod_iohandle vxio:voliod_loop DESCRIPTION: The issue is seen due to offloading glock messages to the io daemon threads. When VxVM io daemon threads are processing the glock release messages, the interlock release and free happens after invoking the kernel message complete routine. This has a side effect that the reference count on the control block becomes zero and if garbage collection is running at this stage, it will end up freeing the message from the garbage queue. So, if there is a resend of the same message, there will be two contexts processing the same interlock free request. The receiver thread for which interlock is NULL and freed from other context, panic occurs. RESOLUTION: Code changes are done to offload glock messages to VxVM io daemon threads after processing the control block. Also the kernel message response routine is invoked after checking whether interlock release is required and releasing it. * INCIDENT NO:2804911 TRACKING ID:2637217 SYMPTOM: The storage allocation attributes 'pridiskname' and 'secdiskname' are not documented in the vradmin man page for resizevol/resizesrl. DESCRIPTION: The 'pridiskname' and 'secdiskname' are optional arguments to the vradmin resizevol and vradmin resizesrl commands, which enable users to specify a comma- separated list of disk names for the resize operation on a VVR data volume and SRL, respectively. These arguments were introduced in 5.1SP1, but were not documented in the vradmin man page. RESOLUTION: The vradmin man page has been updated to document the storage allocation attributes 'pridiskname' and 'secdiskname' for the vradmin resizevol and vradmin resizesrl commands. * INCIDENT NO:2821137 TRACKING ID:2774406 SYMPTOM: The system may panic while referencing the DCM(Data Channge Map) object attached to the volume, with following stack: vol_flush_srl_to_dv_start voliod_iohandle voliod_loop DESCRIPTION: When volume tries to flush the DCM to track the I/O map, if disk attached to the DCM is not available, DCM state is set to aborting before marking inactive. Since the current state of volume is till ACTIVE, trying to access the DCM object causes panic. RESOLUTION: Code changes are done to check if DCM is not in aborting state before proceeding with the DCM flush. * INCIDENT NO:2821143 TRACKING ID:1431223 SYMPTOM: "vradmin syncvol" and "vradmin syncrvg" commands do not work if the remote diskgroup and vset names are specified when synchronizing vsets. DESCRIPTION: When command "vradmin syncvol" or "vradmin syncrvg" for vset is executed, vset is expanded to its component volumes and path is generated for each component volume. But when remote vset names are specified on command line, it fails to expand remote component volumes correctly. Synchronization will fail because of incorrect path for volumes. RESOLUTION: Code changes have been made to ensure remote vset is expanded correctly when specified on command line. * INCIDENT NO:2821176 TRACKING ID:2711312 SYMPTOM: After pulling the FC cable, new symbolic link gets created for the null path in root directory. # ls -l lrwxrwxrwx. 1 root root c -> /dev/vx/.dmp/c DESCRIPTION: Whenever FC cable is added or removed, an event is sent to the OS and some udev rules are executed by VxVM. When the FC cable is pulled, path id is removed and hardware path information becomes null. Without checking if hardware path information has become null, symbolic links are generated. RESOLUTION: Code changes are made to check if hardware path information is null and accordingly create symbolic links. * INCIDENT NO:2821452 TRACKING ID:2495332 SYMPTOM: vxcdsconvert(1M) fails if private region length is less than 1MB and it is a single sub-disk spanning the entire disk with following error: # vxcdsconvert -g alldisks evac_subdisks_ok=yes VxVM vxprint ERROR V-5-1-924 Record not found VxVM vxprint ERROR V-5-1-924 Record not found VxVM vxcdsconvert ERROR V-5-2-3174 Internal Error VxVM vxcdsconvert ERROR V-5-2-3120 Conversion process aborted DESCRIPTION: If non-cds disk private region length is less than 1MB, vxcdsconvert internally tries to relocate subdisks at the start to create room for the private region of 1MB. To make room for back-up labels, vxcdsconvert tries to relocate subdisks at the end of the disk. Two entries, one for relocation at the start and one at the end are created during the analysis phase. Once the first sub-disk is relocated, next vxprint operation fails as the sub-disk has already been evacuated to another DM(disk media record). RESOLUTION: This problem is fixed by allowing generation of multiple relocation entries for same subdisk. Later if the sub-disk is found to be already evacuated to other DM, relocation is skipped for the subdisk with same name. * INCIDENT NO:2821519 TRACKING ID:1765916 SYMPTOM: VxVM socket files don't have write protection from other users. The following files are writeable by all users: srwxrwxrwx root root /etc/vx/vold_diag/socket srwxrwxrwx root root /etc/vx/vold_inquiry/socket srwxrwxrwx root root /etc/vx/vold_request/socket DESCRIPTION: These sockets are used by the admin/support commands to communicate with the vxconfigd. These sockets are created by vxconfigd during it's start up process. RESOLUTION: Proper write permissions are given to VxVM socket files. Only vold_inquiry/socket file is still writeable by all, because it is used by many VxVM commands like vxprint, which are used by all users. * INCIDENT NO:2821678 TRACKING ID:2389554 SYMPTOM: The vxdg command of VxVM (Veritas Volume Manager) located at /usr/sbin directory shows incorrect message for ssb (Serial Split Brain) information of a disk group. The ssb information uses "DISK PRIVATE PATH" as an item, but the content is public path of some disk. The ssb information also prints unknown characters to represent the config copy id of a disk if the disk's config copies are all disabled. Moreover, there is some redundant information in the output messages. "DISK PRIVATE PATH" error is like this: $ vxdisk list ... pubpaths: block=/dev/vx/dmp/s4 char=/dev/vx/rdmp/s4 privpaths: block=/dev/vx/dmp/s3 char=/dev/vx/rdmp/s3 ... $ vxsplitlines -v -g mydg ... Pool 0 DEVICE DISK DISK ID DISK PRIVATE PATH /dev/vx/rdmp/s4 Unknown character error message is like this: $ vxdg import ... To import the diskgroup with config copy from the second pool issue the command /usr/sbin/vxdg [-s] -o selectcp= import ... DESCRIPTION: VxVM uses a SSB data structure to maintain ssb information displayed to the user. The SSB data structure contains some members, such as Pool ID, config copy id, etc. After memory allocation for a SSB data structure, this new allocated memory area is not initialized. If all the config copies of some disk are disabled, then the config copy id member has unknwon data. vxdg command will try to print such data, then unknown characters are displayed to the stdout. The SSB data structure has "disk public path" member, but no "disk private path" member. So the output message can only display public path of a disk. RESOLUTION: The ssb structure has been changed to use "disk private path" instead of "disk public path". Moreover, after memory allocation for a ssb structure, newly allocated memory is properly initialized. * INCIDENT NO:2821695 TRACKING ID:2599526 SYMPTOM: SRL to DCM flush does not happen resulting in I/O hang. DESCRIPTION: After SRL overflow, before the RU state machine phase could be changed to VOLRP_PHASE_SRL_FLUSH; Rlink connection thread sneak in and changed the phase to VOLRP_PHASE_START_UPDATE. Once the phase is changed to VOLRP_PHASE_START_UPDATE; the state machine missed to flush the SRL into DCM and goes into VOLRP_PHASE_DCM_WAIT and stucks there. RESOLUTION: RU state machine phases are handled correctly after SRL overflows. * INCIDENT NO:2826129 TRACKING ID:2826125 SYMPTOM: VxVM script daemons are not up after they are invoked with the vxvm-recover script. DESCRIPTION: When the VxVM script daemon is starting, it will terminate any stale instance if it does exist. When the script daemon is invoking with exactly the same process id of the previous invocation, the daemon itself is abnormally terminated by killing one own self through a false-positive detection. RESOLUTION: Code changes was made to handle the same process id situation correctly. * INCIDENT NO:2826607 TRACKING ID:1675482 SYMPTOM: vxdg list command shows configuration copy in new failed state. # vxdg list dgname config disk 3PARDATA0_75 copy 1 len=48144 state=new failed config-tid=0.1550 pending-tid=0.1551 Error: error=Volume error 0 DESCRIPTION: When a configuration copy is initialized on a new disk in a diskgroup, an IO error on the disk can prevent on-disk update and make configuration copy in- consistent. RESOLUTION: In case of a failed initialization, configuration copy is disabled. If required in future, this disabled copy will be reused for setting up a new configuration copy. If current state of configuration copy is "new-failed", next import of the diskgroup will disable it. * INCIDENT NO:2827791 TRACKING ID:2760181 SYMPTOM: The secondary slave node hit a panic in vol_rv_change_sio_start() for already active logowner operation. DESCRIPTION: The slave node panic during the logowner change. The logowner change and the reconfiguration recovery process happens at the same time, leading to a race in setting the ACTIVE flag. The reconfiguration recovery unset the flag which is set by the logowner change operation. In the middle of logowner change operation the ACTIVE flag is missing and leads to the system panic. RESOLUTION: The appropriate lock is taken in the logowner change code and also added more debug log entries for better tracking the logowner issues. * INCIDENT NO:2827794 TRACKING ID:2775960 SYMPTOM: On secondary CVR, after disabling the SRL on one DG, triggered an IO hang on another DG. DESCRIPTION: The failure of SRL lun's is causing the failure in both DG's. The I/O failure in the messages confirmed, the LUN failure on the DG4 also. Every 1024 I/O's to the SRL, the header of the SRL is flushed. In the SRL flush code, during the error scenario, the flush I/O is queued but not getting started. If the flush I/O is not getting completed, the application I/O will hang for ever. RESOLUTION: The fix is to start the flush I/O which is queued in the error scenario. * INCIDENT NO:2827939 TRACKING ID:2088426 SYMPTOM: Re-onlining of all disks irrespective of whether they belong to the shared dg being destroyed/deported is being done on master and slaves. DESCRIPTION: When a shared dg is destroyed/deported, all disks on the nodes in the cluster are re-onlined asynchronously when another command is fired. This results in wasteful usage of resources and delaying the next command substantially based on how many disks/luns are there on each node. RESOLUTION: Restrict re-onlining of luns/disks to those that belong to the dg that is being destroyed/deported. By doing this resource and delay in the next command is restricted to the number of luns/disks that belong to the dg in question. * INCIDENT NO:2836910 TRACKING ID:2818840 SYMPTOM: 1. The file permissions set to ASM devices is not persistent across reboot. 2. User is not able to set desired permissions to ASM devices. 3. The files created with user id "root" and group other than "system" are not persistent. DESCRIPTION: The vxdmpasm sets the permissions of the devices to "660", which is not persistent across reboot as these devices are not kept in /etc/vx/.vxdmpasmdev file. Currently there is no option which enables the User to set the desired permissions. The files created with user id root and group other than "system" are changed back to "root:system" upon reboot. RESOLUTION: Code is modified to keep the device entries in the file /etc/vx/.vxdmpasmdev to make the permissions persistent across reboot. Code is enhanced to provide an option to set the desired permissions and set the desired user id/group. * INCIDENT NO:2845984 TRACKING ID:2739601 SYMPTOM: vradmin repstatus output occasionally reports abnormal timestamp information. DESCRIPTION: Sometimes vradmin repstatus will show the timestamp which is abnormal. This timestamp is reported in the "Timestamp Information" section of vradmin repstatus output. In this case the timestamp reported is a very high value in time, something like 100 hours. This condition occurs when no data has been replicated across to the secondary for a long time. This does not necessarily mean that the Rlinks are disconnected for a long time. Even if the Rlinks are connected it could be possible that no new data was written to the primary during that period and thus no data got replicated across to the secondary. Now, if at this point the Rlink is paused and some writes are done, then vradmin repstatus will show abnormal timestamp. RESOLUTION: To solve this issue whenever new data is written to the data volume, if the Rlink is up-to-date then we mark this timestamp. This will make sure that abnormal timestamp is not reported. * INCIDENT NO:2852270 TRACKING ID:2715129 SYMPTOM: Vxconfigd hangs during Master takeover in a CVM (Clustered Volume Manager) environment. This results in vx command hang. DESCRIPTION: During Master takeover, VxVM (Veritas Volume Manager) kernel signals Vxconfigd with the information of new Master. Vxconfigd then proceeds with a vxconfigd- level handshake with the nodes across the cluster. Before kernel could signal to vxconfigd, vxconfigd handshake mechanism got started, resulting in the hang. RESOLUTION: Code changes are done to ensure that vxconfigd handshake gets started only upon receipt of signal from the kernel. * INCIDENT NO:2858859 TRACKING ID:2858853 SYMPTOM: In CVM(Cluster Volume Manager) environment, after master switch, vxconfigd dumps core on the slave node (old master) when a disk is removed from the disk group. dbf_fmt_tbl() voldbf_fmt_tbl() voldbsup_format_record() voldb_format_record() format_write() ddb_update() dg_set_copy_state() dg_offline_copy() dasup_dg_unjoin() dapriv_apply() auto_apply() da_client_commit() client_apply() commit() dg_trans_commit() slave_trans_commit() slave_response() fillnextreq() vold_getrequest() request_loop() main() DESCRIPTION: During master switch, disk group configuration copy related flags are not cleared on the old master, hence when a disk is removed from a disk group, vxconfigd dumps core. RESOLUTION: Necessary code changes have been made to clear configuration copy related flags during master switch. * INCIDENT NO:2859390 TRACKING ID:2000585 SYMPTOM: If 'vxrecover -sn' is run and at the same time one volume is removed, vxrecover exits with the error 'Cannot refetch volume', the exit status code is zero but no volumes are started. DESCRIPTION: vxrecover assumes that volume is missing because the diskgroup must have been deported while vxrecover was in progress. Hence, it exits without starting remaining volumes. vxrecover should be able to start other volumes, if the DG is not deported. RESOLUTION: Modified the source to skip missing volume and proceed with remaining volumes. * INCIDENT NO:2860281 TRACKING ID:2838059 SYMPTOM: The VVR secondary machine crashes with following panic stack: crash_kexec __die do_page_fault error_exit [exception RIP: vol_rv_update_expected_pos+337] vol_rv_service_update vol_rv_service_message_start voliod_iohandle voliod_loop kernel_thread at ffffffff8005dfb1 DESCRIPTION: If VVR primary machine crashes without completing a few of the write I/Os to the data volumes, it does fill the incomplete write I/Os with the "DUMMY" I/Os. It has to do so to maintain the write order fidelity at the secondary. While processing such dummy updates on secondary, because of a logical error, the secondary VVR code tries to deference the NULL pointer leading to the panic. RESOLUTION: The code changes are made in VVR secondary in "DUMMY" update processing code path to correct the logic. * INCIDENT NO:2860445 TRACKING ID:2627126 SYMPTOM: Observed IO hang on system as lots of IO's are stuck in DMP global queue. DESCRIPTION: Lots of IOs and Paths are stuck in dmp_delayq and dmp_path_delayq respectively and DMP daemon could not process them, because of the race condition between "processing the dmp_delayq" and "waking up the DMP daemon". Lock is held while processing the dmp_delayq, and it is released for very short duration. If any path is busy in this duration, it gives IO error, leading to IO hang. RESOLUTION: The global delay queue pointers are copied to local variables and lock is held only for this period, then IOs in the queue are processed using the local queue variable. * INCIDENT NO:2860449 TRACKING ID:2836798 SYMPTOM: 'vxdisk resize' fails with the following error on the simple format EFI (Extensible Firmware Interface) disk expanded from array side and system may panic/hang after a few minutes. # vxdisk resize disk_10 VxVM vxdisk ERROR V-5-1-8643 Device disk_10: resize failed: Configuration daemon error -1 DESCRIPTION: As VxVM doesn't support Dynamic Lun Expansion on simple/sliced EFI disk, last usable LBA (Logical Block Address) in EFI header is not updated while expanding LUN. Since the header is not updated, the partition end entry was regarded as illegal and cleared as part of partition range check. This inconsistent partition information between the kernel and disk causes system panic/hang. RESOLUTION: Added checks in VxVM code to prevent DLE on simple/sliced EFI disk. * INCIDENT NO:2860451 TRACKING ID:2815517 SYMPTOM: vxdg adddisk succeeds to add a clone disk to non-clone and non-clone disk to clone diskgroup, resulting in mixed diskgroup. DESCRIPTION: vxdg import fails for diskgroup which has mix of clone and non-clone disks. So vxdg adddisk should not allow creation of mixed diskgroup. RESOLUTION: vxdisk adddisk code is modified to return an error for an attempt to add clone disk to non-clone or non-clone disks to clone diskgroup, Thus it prevents addition of disk in diskgroup which leads to mixed diskgroup. * INCIDENT NO:2860812 TRACKING ID:2801962 SYMPTOM: Operations that lead to growing of volume, including 'vxresize', 'vxassist growby/growto' take significantly larger time if the volume has version 20 DCO(Data Change Object) attached to it in comparison to volume which doesn't have DCO attached. DESCRIPTION: When a volume with a DCO is grown, it needs to copy the existing map in DCO and update the map to track the grown regions. The algorithm was such that for each region in the map it would search for the page that contains that region so as to update the map. Number of regions and number of pages containing them are proportional to volume size. So, the search complexity is amplified and observed primarily when the volume size is of the order of terabytes. In the reported instance, it took more than 12 minutes to grow a 2.7TB volume by 50G. RESOLUTION: Code has been enhanced to find the regions that are contained within a page and then avoid looking-up the page for all those regions. * INCIDENT NO:2862024 TRACKING ID:2680343 SYMPTOM: While manually disabling and enabling paths to an enclosure machine may panic with the following stack: apauto_get_failover_path+0000CC() gen_dmpnode_update_cur_pri+000828() dmp_start_failover+000124() gen_update_cur_pri+00012C() dmp_update_cur_pri+000030() dmp_reconfig_update_cur_pri+000010() dmp_decipher_instructions+0006E8() dmp_process_instruction_buffer+000308() dmp_reconfigure_db+0000C4() gendmpioctl+000ECC() dmpioctl+00012C() DESCRIPTION: The Dynamic Multi-Pathing(DMP) driver keeps track of the number of active paths and failed paths internally. The computation may go wrong while exercising manual disable/enable of paths which can lead to machine panic. RESOLUTION: Code changes have been made to properly update the active path and failed path count. * INCIDENT NO:2863673 TRACKING ID:2783293 SYMPTOM: After upgrade to RHEL5.8(2.6.18-308), all paths get disabled when deport/import operations are invoked on shared dgs with SCSI-3 mode. DESCRIPTION: Linux 2.6.18-308 adds some enhancement in the scsi mid layer, where RESERVATION_CONFLICT will be converted to DID_NEXUS_FAILURE error, which is a newly introduced error type not recognized by DMP code. It causes DMP to convert the error to transport failure by default and mark the paths disabled accordingly. RESOLUTION: Add the recognition code of the DID_NEXUS_FAILURE error in DMP. * INCIDENT NO:2867483 TRACKING ID:2886402 SYMPTOM: When re-configuring dmp devices, typically using command 'vxdisk scandisks', vxconfigd hang is observed. Since it is in hang state, no VxVM(Veritas volume manager)commands are able to respond. Following process state is observed of vxconfigd with ps command # ps -ajfx | grep vx 1 30988 29538 27857 pts/1 23406 S 0 0:01 vxnotify -m 1 10422 10422 10422 ? -1 Ssl 0 0:07 /sbin/vxesd 1 15520 15520 15520 ? -1 Dsl 0 0:06 vxconfigd -kr reset # Following process stack of vxconfigd was observed. sync_page+0x3d/0x50 wait_on_page_bit+0x73/0x80 wait_on_page_writeback_range+0xfb/0x190 filemap_fdatawait+0x2f/0x40 filemap_write_and_wait+0x44/0x60 __sync_blockdev+0x24/0x50 sync_blockdev+0x13/0x20 fsync_bdev+0x58/0x70 invalidate_partition+0x2e/0x60 del_gendisk+0x7a/0x150 dmp_unregister_disk+0xba/0xf0 [vxdmp] dmp_decode_destroy_dmpnode+0x15f/0x1b0 [vxdmp] dmp_decipher_instructions+0x2a5/0x2f0 [vxdmp] dmp_process_instruction_buffer+0x1c6/0x1d0 [vxdmp] dmp_reconfigure_db+0x6e/0xf0 [vxdmp] gendmpioctl+0x27e/0x5e0 [vxdmp] dmpioctl+0x35/0x70 [vxdmp] dmp_ioctl+0x2b/0x40 [vxdmp] dmp_compat_ioctl+0x56/0x70 [vxdmp] compat_blkdev_ioctl+0x13d/0x1530 compat_sys_ioctl+0xed/0x510 cstar_dispatch+0x7/0x2e DESCRIPTION: When DMP(dynamic multipathing) node is about to be destroyed, a flag is set to hold any IO(read/write) on it. The IOs which may come in between the process of setting flag and actual destruction of DMP node, are placed in dmp queue and are never served. So the hang is observed. RESOLUTION: Appropriate flag is set for node which is to be destroyed so that any IO after marking flag will be rejected so as to avoid hang condition. * INCIDENT NO:2871980 TRACKING ID:2868790 SYMPTOM: /etc/vx/ddlconfig.info file is not generated correctly and "vxdmpadm getportids" fails to list correct pWWN information on RHEL 6. DESCRIPTION: When vxesd, the event source daemon, cannot get the HBA topology information through HBA libraries because of internal defects in HBA library or non availability of HBA library to use, vxesd will try OS specific methods to get such information. In Linux, the sysfs and procfs are used as the supplement methods. vxesd will check some hard-coded directories to check if the card type is supported or not. In RHEL 6, these hard-coded directories have some changes, causing the recognition to fail. So the /etc/vx/ddlconfig.info file is not created and "vxdmpadm getportids" will show null pWWN info. RESOLUTION: Made necessary code changes for recognition of new sysfs tree layout in vxesd. * INCIDENT NO:2876116 TRACKING ID:2729911 SYMPTOM: During a controller or port failure, UDEV removes the associated path information from DMP. When the paths are being removed the IO occurring to this disk could still get re-directed to this path, after it has been deleted, leading to an IO failure. DESCRIPTION: When a path is being deleted from a DMP node the appropriate data structures for this path needs to be updated to not have it available for IO after deletion which is not happening currently. RESOLUTION: The DMP code is modified to not select the deleted path for future IOs. * INCIDENT NO:2880411 TRACKING ID:2483265 SYMPTOM: In a CVM environment with nodes connected to an Active-Passive (AP) array, path failure on any one node normally results in a cluster-wide failover to another available path. Instead, I/O failure can sometimes occur. DESCRIPTION: The path failover protocol is initiated by the vxio driver on getting a negotiated error for path failure from DMP. Due to a bug, the maximum I/O size supported by the underlying disk driver may not be set correctly. As a result, I/Os get broken up by vxio more frequently. DMP does not set the negotiated error on these broken up I/Os on path failures. As a result, vxio treats the I/O error as a non-recoverable one and fails the I/Os. RESOLUTION: The fix is to set the maximum I/O size supported by the underlying disk driver correctly so that I/O break up becomes less likely. The fix also ensures that the appropriate negotiated error is set by DMP should the I/Os get broken up so that vxio can initiate the path failover protocol instead of failing the I/O. * INCIDENT NO:2882488 TRACKING ID:2754819 SYMPTOM: Diskgroup rebuild through 'vxmake -d' gets stuck with following stack trace: buildlocks() MakeEverybody() main() DESCRIPTION: During diskgroup rebuild for configurations having multiple objects on a single cache object, an internal list of cache objects gets incorrectly modified to a circular list, which causes infinite looping during its access. RESOLUTION: Code changes are done to correctly populate the cache object list. * INCIDENT NO:2884231 TRACKING ID:2606978 SYMPTOM: The on-disk configuration copy gets offlined when rebooting array. This may cause the disk group to be disabled if there's no valid configuration copy. The following messages would be seen in system logs if this happens: vxvm:vxconfigd: V-5-1-768 Offlining config copy 1 on disk huawei- s5500t0_5: vxvm:vxconfigd: ^IReason: Disk write failure vxvm:vxconfigd: V-5-1-7935 Disk group dg_zmsnap: update failed: Disk group has no valid configuration copies vxvm:vxconfigd: V-5-1-7934 Disk group dg_zmsnap: Disabled by errors DESCRIPTION: DMP returns a negotiated error(DMP_PATH_FAILED) so that path failover is triggered in A/P arrays. However, due to a bug in the private region I/O done function, the I/O is failed instead of triggering the path failover protocol. Vxconfigd may disable the DG if the I/O failure is on a disk with last enabled config copy. RESOLUTION: I/O done function has been updated to trigger path failover protocol instead of failing I/O. * INCIDENT NO:2886083 TRACKING ID:2257850 SYMPTOM: Memory leak is observed when information about enclosure is accessed by vxdiskadm. DESCRIPTION: The memory allocated locally for a data structure keeping information about the array specific attributes is not freed. RESOLUTION: Code changes are made to avoid such memory leaks. * INCIDENT NO:2903216 TRACKING ID:2558261 SYMPTOM: Initializing powerpath device using 'vxdisksetup' or 'vxdisk init' fails with error - "Disk is not usable". Un-initializing powerpath device using 'vxdiskunsetup' or 'vxdisk destroy' fails with error - "Disk destroy failed" DESCRIPTION: For the disk name containing "p" character e. g. emcpowerga, disk setup and un- setup fails to parse the name of disks and hence returns with error. RESOLUTION: Code changes are made in parsing of disk names. Now the initialize/un-initialize operations will work for the disks having 'p' in the name. * INCIDENT NO:2907643 TRACKING ID:2924440 SYMPTOM: For RHEL6, on root encapsulated machine, when booting through VxVM_initrd.img, the following message is seen - "Syntax error: "fi" unexpected" DESCRIPTION: When booting VxVM_initrd.img, it executes script 12-vxvm.sh which was containing an extra "fi" statement thus bash used to prompt the reported message. RESOLUTION: The extra "fi" statement is removed to resolve the issue. * INCIDENT NO:2907643 TRACKING ID:2924440 SYMPTOM: For RHEL6, on root encapsulated machine, when booting through VxVM_initrd.img, the following message is seen - "Syntax error: "fi" unexpected" DESCRIPTION: When booting VxVM_initrd.img, it executes script 12-vxvm.sh which was containing an extra "fi" statement thus bash used to prompt the reported message. RESOLUTION: The extra "fi" statement is removed to resolve the issue. * INCIDENT NO:2911010 TRACKING ID:2627056 SYMPTOM: vxmake(1M) command when run with a very large DESCRIPTION: Due to a memory leak in vxmake(1M) command, data section limit for the process was reached. As a result further memory allocations failed and vxmake command failed with the above error RESOLUTION: Fixed the memory leak by freeing the memory after it has been used. PATCH ID:5.1.132.000 * INCIDENT NO:2397062 TRACKING ID:2431423 SYMPTOM:Panic in vol_mv_commit_check() while accessing Data Change Map(DCM) object. Stack trace of panic vol_mv_commit_check at ffffffffa0bef79e vol_ktrans_commit at ffffffffa0be9b93 volconfig_ioctl at ffffffffa0c4a957 volsioctl_real at ffffffffa0c5395c vols_ioctl at ffffffffa1161122 sys_ioctl at ffffffff801a2a0f compat_sys_ioctl at ffffffff801ba4fb sysenter_do_call at ffffffff80125039 DESCRIPTION:In case of DCM failure, object pointer is set to NULL as part of transaction. If DCM is active then we try to access DCM object in transaction code path without checking it to be NULL. DCM object pointer could be NULL in case of failed DCM. Accessing object pointer without check for NULL caused this panic. RESOLUTION:Fix is to put NULL check for DCM object in transaction code path. * INCIDENT NO:2404928 TRACKING ID:2428170 SYMPTOM:I/O hangs when reading or writing to a volume after a total storage failure in CVM environments with Active-Passive arrays. DESCRIPTION:In the event of a storage failure, in active-passive environments, the CVM-DMP fail over protocol is initiated. This protocol is responsible for coordinating the fail-over of primary paths to secondary paths on all nodes in the cluster. In the event of a total storage failure, where both the primary paths and secondary paths fail, in some situations the protocol fails to cleanup some internal structures, leaving the devices quiesced. RESOLUTION:After a total storage failure all devices should be un-quiesced, allowing the I/Os to fail. The CVM-DMP protocol has been changed to cleanup devices, even if all paths to a device have been removed. * INCIDENT NO:2484420 TRACKING ID:2480600 SYMPTOM:I/O of large sizes like 512k and 1024k hang in CVR (Clustered Volume Replicator). DESCRIPTION:When large IOs, say, of sizes like, 1MB, are performed on volumes under RVG (Replicated Volume Group), a limited number of IOs can be accomodated based on RVIOMEM pool limit. So, the pool remains full for majority of the duration.At this time, when CVM (Clustered Volume Manager) slave gets rebooted, or goes down, the pending IOs are aborted and the corresponding memory is freed. In one of the cases, it does not get freed, leading to the hang. RESOLUTION:Code changes have been made to free the memory under all scenarios. * INCIDENT NO:2497638 TRACKING ID:2489350 SYMPTOM:In a Storage Foundation environment running Symantec Oracle Disk Manager (ODM), Veritas File System (VxFS), Cluster volume Manager (CVM) and Veritas Volume Replicator (VVR), kernel memory is leaked under certain conditions. DESCRIPTION:In CVR (CVM + VVR), under certain conditions (for example when I/O throttling gets enabled or kernel messaging subsystem is overloaded), the I/O resources allocated before are freed and the I/Os are being restarted afresh. While freeing the I/O resources, VVR primary node doesn't free the kernel memory allocated for FS-VM private information data structure and causing the kernel memory leak of 32 bytes for each restarted I/O. RESOLUTION:Code changes are made in VVR to free the kernel memory allocated for FS-VM private information data structure before the I/O is restarted afresh. * INCIDENT NO:2519187 TRACKING ID:2419803 SYMPTOM:Secondary Site panics in VVR (Veritas Volume Replicator). Stack trace might look like: kmsg_sys_snd+0xa8() nmcom_send_tcp+0x800() nmcom_do_send+0x290() nmcom_throttle_send+0x178() nmcom_sender+0x350() thread_start+4() DESCRIPTION:While Secondary site is communicating with Primary site, if it encounters "EAGAIN" (try again) error, then it tries to send data on next connection. If all the session connections are not established by this time, it leads to panic as the connection is not initialized. RESOLUTION:Code changes have been made to check for a valid connection before sending data. * INCIDENT NO:2565451 TRACKING ID:2483053 SYMPTOM:VVR Primary system consumes very high kernel heap memory and appear to be hung. DESCRIPTION:There is a race between REGION LOCK deletion thread which runs as part of SLAVE leave reconfiguration and the thread which process the DATA_DONE message coming from log client to logowner. Because of this race, the flags which stores the status information about the I/Os was not correctly updated. This used to cause a lot of SIOs being stuck in a queue consuming a large kernel heap. RESOLUTION:The code changes are made to take the proper locks while updating the SIOs' fields. * INCIDENT NO:2566001 TRACKING ID:2440349 SYMPTOM:The grow operation on a DCO volume may grow it into any 'site' not honoring the allocation requirements strictly. DESCRIPTION:When a DCO volume is grown, it may not honor the allocation specification strictly to use only a particular site even though they are specified explicitly. RESOLUTION:The Data Change Object of Volume Manager is modified such that it will honor the alloc specification strictly if provided explicitly * INCIDENT NO:2566010 TRACKING ID:2560835 SYMPTOM:On master I/Os and vxconfigd get hung when slave is rebooted under heavy I/O load. DESCRIPTION:When slave leaves cluster without sending the DATA ack message to master; slave's I/Os get stuck on master because their logend processing can not be completed. At the same time cluster reconfiguration takes place as the slave left the cluster. In CVM reconfiguration code path these I/Os are aborted in order to proceed with the reconfiguration and recovery. But if the local I/Os on master goes to the logend queue after the logendq is aborted, these local I/Os get stuck forever in the logend queue leading to the permanent I/O hang. RESOLUTION:During CVM reconfiguration and RVG recovery later, no I/Os will be put into the logendq. * INCIDENT NO:2566037 TRACKING ID:2560843 SYMPTOM:In 3 or more node cluster, when one of the slaves is rebooted under heavy I/O load, the I/Os hang on the other slave. Example : Node A (master and logowner) Node B (slave 1) Node C (slave 2) If Node C is doing a heavy I/Os and Node B is rebooted, the I/Os on Node C gets hung. DESCRIPTION:When Node B leaves the cluster, its throttled I/Os (which are in mdship_throttleq) are aborted and all the resources taken by these I/Os (example REGION locks) are freed. Along with these I/Os, throttled I/Os of Node C are also responded with EAGAIN to let Node C resend those I/Os. But during this process, REGION LOCK hold by these I/Os on master are not freed. RESOLUTION:All the resources taken by the remote I/Os on master are freed properly. * INCIDENT NO:2566041 TRACKING ID:2557984 SYMPTOM:VxVM operation namely "vxdisk resize" causes data corruption on VxVM disks that are of format "simple". DESCRIPTION:Whenever a LUN size is increased at the storage, its disk geometry and partition info change as well. However, VxVM fail to consider them for as part of its disk resize operation thus issuing IOs and updating VxVM metadata at wrong offsets thus causing corruption. RESOLUTION:VxVM disk resize code is modified to use the new disk geometry and partition info whenever a LUN size is increased. INCIDENTS FROM OLD PATCHES: --------------------------- NONE