README VERSION : 1.1 README CREATION DATE : 2012-03-09 PATCH-ID : 6.0.1.0 PATCH NAME : VRTSvxfs 6.0RP1 BASE PACKAGE NAME : VRTSvxfs BASE PACKAGE VERSION : 6.0.0.0 OBSOLETE PATCHES : NONE SUPERSEDED PATCHES : NONE REQUIRED PATCHES : NONE INCOMPATIBLE PATCHES : NONE SUPPORTED PADV : sles11_x86_64 (P-PLATFORM , A-ARCHITECTURE , D-DISTRIBUTION , V-VERSION) PATCH CATEGORY : CORE , HANG , PANIC REBOOT REQUIRED : YES PATCH INSTALLATION INSTRUCTIONS: -------------------------------- Please refer to the Release Notes for Installation and Uninstallation Instructions. PATCH UNINSTALLATION INSTRUCTIONS: ---------------------------------- Please refer to the Release Notes for Installation and Uninstallation Instructions. SPECIAL INSTALL INSTRUCTIONS: ----------------------------- NONE SUMMARY OF FIXED ISSUES: ----------------------------------------- 2608568 Abrupt messages are seen in engine log after complete storage failure in cvm resiliency scenario. 2627108 Expanding or shrinking a DLV5 file system using the fsadm(1M)command causes a system panic. 2639744 Network Customization screen doesn't show any NICs in I18N-level0 environment 2645691 The following error message is displayed during the execution of the fsmap(1M) command:'UX:vxfs fsmap: ERROR: V-3-27313' 2645821 The fscdsconv(1M) command which is used to convert corrupted or non-VxFS file systems generates core. 2645827 The system may panic while re-organizing the file system using the fsadm(1M) command. 2645831 On RHEL6U1 writing to VxFS /proc hidden interface fails with EINVAL. 2646915 The fsck(1M) command exits during an internal CFS stress reconfiguration testing. 2646919 Permission denied errors(EACCES) seen while doing I/O's on nfs shared filesystem. 2654504 The replication process dumps core when shared extents are present in the source file system. 2654506 Metadata corruption may be seen after recovery. 2654770 Upgrade of a file system from version 8 to 9 fails in the presence of partition directories and clones. 2654773 In certain cases write on a regular file which has shared extent as the last allocated extent can fail with EIO error. 2654783 write operation on a regular file mapping to shared compressed extent results in corruption 2654790 In certain rare cases after a successful execution of vxfilesnap command, if the source file gets deleted in a very short span of time after the filesnap operation, then the destination file can get corrupted and this could also lead to setting of VX_FULLFSCK flag in the super block. 2661222 In a cluster mounted file system, memory corruption is seen during the execution of the SmartMove feature. 2661223 'Shared' extents are not transferred as 'shared' by the replication process. 2662259 The De-duplication session does not complete. 2663763 Kernel crashes in vx_fopen because of NULL pointer dereference. 2672151 vxdelestat (1M) when invoked with -v option goes into infinite loop 2672209 When the fsckptadm(1M) command with the '-r' and '-R' option is executed, two mutually exclusive options gets executed simultaneously. 2678196 The fiostat command dumps core when the count value is 0. 2684895 Deadlock because of wrong spin lock interrupt level at which delayed allocation list lock is taken. 2700895 Certain commands get blocked by kernel, causing EACCES(ERRNO = 13). SUMMARY OF KNOWN ISSUES: ----------------------------------------- 2691657 On Linux system unable to select ext4 from the FileSystem 2654682 Full file system check using fsck_vxfs(1m) takes over a week 2627081 vxfsd is taking a lot of CPU time after deleting some directories 2627096 fsckptadm (1m) fails with ENXIO 2627084 umount(1m) on a CFS filesystem panics the machine. 2679520 Performance degradation seen on a CFS filesystem while reading from a large directory. 2654673 A cluster mounted file-system may panic the system showing vx_tflush_map in the stack trace. KNOWN ISSUES : -------------- * INCIDENT NO::2691657 TRACKING ID ::2691654 SYMPTOM:: On Linux system unable to select ext4 from the FileSystem WORKAROUND:: None * INCIDENT NO::2654682 TRACKING ID ::2628207 SYMPTOM:: One large file-system with many checkpoints the fsck operation may seem to be hung WORKAROUND:: None * INCIDENT NO::2627081 TRACKING ID ::2129455 SYMPTOM:: vxfsd is taking a lot of CPU time after deleting some directories WORKAROUND:: None. * INCIDENT NO::2627096 TRACKING ID ::1956458 SYMPTOM:: fsckptadm (1m) fails with ENXIO and filesytem is marked for full fsck WORKAROUND:: None. * INCIDENT NO::2627084 TRACKING ID ::2107152 SYMPTOM:: In rare corner cases, system panics while unmounting a cluster mounted filesystem WORKAROUND:: None * INCIDENT NO::2679520 TRACKING ID ::2644485 SYMPTOM:: Performance degradation seen on a CFS filesystem while reading from a large directory. WORKAROUND:: None * INCIDENT NO::2654673 TRACKING ID ::2558892 SYMPTOM:: VxFS causes a server to panic. The subroutine initiating the panic is vx_tflush_map. WORKAROUND:: None FIXED INCIDENTS: ---------------- PATCH ID:6.0.1.0 * INCIDENT NO:2608568 TRACKING ID:2663750 SYMPTOM: Incorrect remote notifications are given on local disk connectivity lost/restore event for disks with same pattern as that of actual disk for which event has occurred. DESCRIPTION: When local disk connectivity is restored/lost for a disk, in addition to correct notification for a disk cvmvoldg agent will also log wrong notifications in VCS engine log for disks having similar pattern though there is no change in connectivity on those disks. RESOLUTION:In cvmvoldg agent, local disk connectivity events (restore/lost) are now checked with exact match before reporting. * INCIDENT NO:2627108 TRACKING ID:2599590 SYMPTOM: Expansion of a 100% full file system may panic the machine with the following stack trace. bad_kern_reference() $cold_vfault() vm_hndlr() bubbledown() vx_logflush() vx_log_sync1() vx_log_sync() vx_worklist_thread() kthread_daemon_startup() DESCRIPTION: When 100% full file system is expanded intent log of the file system is truncated and blocks freed up are used during the expansion. Due to a bug the block map of the replica intent log inode was not getting updated thereby causing the block maps of the two inodes to differ. This caused some of the in- core structures of the intent log to go NULL. The machine panics while de- referencing one of this structure. RESOLUTION:Updated the block map of the replica intent log inode correctly. 100% full file system now can be expanded only If the last extent in the intent log contains more than 32 blocks, otherwise fsadm will fail. To expand such a file-system, some of the files should be deleted manually and resize be retried. * INCIDENT NO:2639744 TRACKING ID:2679361 SYMPTOM: Network Customization screen doesn't show any NICs in I18N-level0 environment DESCRIPTION: NICs were not queried from the vCenter making them locale dependent giving the above mentioned error. RESOLUTION:Code is modified such that NICs are queried from the vCenter to make it locale-independent. * INCIDENT NO:2645691 TRACKING ID:2645435 SYMPTOM: The fsmap command gives error 'UX:vxfs fsmap: ERROR: V-3-27313: failed to open mount point: : I/O error' DESCRIPTION: fsmap queries the mount table and makes a list of all mounted filesystems and then opens the mount points in the list. If in between, the filesystem gets unmounted then the open fails with the above error and program exits. RESOLUTION:Instead of exiting on error because of above scenario, continue by skipping the filesystem which gave error (because it got unmounted). * INCIDENT NO:2645821 TRACKING ID:2536130 SYMPTOM: Issue 1: If fscdsconv command is used to convert a) corrupted or b) non-VxFS file system, instead of exiting with prooper error message it crashes with the following message: devops.c 1693: ASSERT(0) failed DESCRIPTION: Before starting to convert a filesystem, VxFS does a sanity check on the file system. While doing sanity check, for corrupt file system the command incorrectly tries to close the file system twice. Hence on the second close the program crashes. RESOLUTION:Fix makes sure that if close is called once it is not called again. Now if fscdsconv is run on a corrupted or non VxFs file system it exits with the following error message: UX:vxfs fscdsconv: ERROR: V-3-20318: file system dirty, run fsck first UX:vxfs fscdsconv: ERROR: V-3-24426: fscdsconv: Failed to migrate. Issue 2: If partition directory feature is enabled in filesystem and trying to export the filesystem from Linux to AIX platform , mount command asserts with "f:VX_FCL_INIT_FAILED:ndebug". * INCIDENT NO:2645827 TRACKING ID:2552095 SYMPTOM: While re-organizing the file-system using fsadm (1M) the system may panic with the following stack trace vx_iget vx_aioctl_full vx_aioctl_common vx_aioctl vx_ioctl vx_ioctl_skey DESCRIPTION: Due to a race condition in the function vx_inactive and vx_iget, A inode which is on the freelist can be given erroneously , leading to panic similar to the one mentioned above. RESOLUTION:Code is modified to take necessary protection while updating the inode pointer, thus avoiding the race. * INCIDENT NO:2645831 TRACKING ID:2634483 SYMPTOM: On RHEL6U1 writing to VxFS /proc hidden interface fails with EINVAL. DESCRIPTION: he buffer used for copying the users string was not NULL terminated and thus while verification this string fails with EINVAL. RESOLUTION:The code is modified to NULL terminate the user buffer. * INCIDENT NO:2646915 TRACKING ID:2630954 SYMPTOM: During internal CFS stress reconfiguration testing, the fsck(1M) command exits by hitting an assert. DESCRIPTION: When the dexh_getblk() function is executed, if the extent size is greater than 256 Kbytes, the extent is divided into chunks of 256 Kbytes each. When the extent of the hash inodes are read in the dexh_getblk() function, a maximum of 256 Kbytes (MAXBUFSZ) is read at a time. Currently, the chunk size is assigned as 256 Kbytes every time. But there is a bug when the last chunk in the extent is less than 256 Kbytes because of which the length of the buffer is assigned incorrectly and we get an aliased buffer in the buffer cache. Instead, for the last chunk, the remaining size in the extent to be read should be assigned as the chunk size. RESOLUTION:The code is modified so that the buffer size is calculated correctly. * INCIDENT NO:2646919 TRACKING ID:2646930 SYMPTOM: Permission denied errors(EACCES) seen while doing I/O's on nfs shared filesystem. DESCRIPTION: selinux security attributes attached to the anonymous directory entries which are created in case of nfs shared filesystem, were getting wrongly initialized, due to this selinux checks failed giving EACCES(permission denied) error. RESOLUTION:the code is modified to correctly initialize the security attributes of the anonymous directory entries. * INCIDENT NO:2654504 TRACKING ID:2646936 SYMPTOM: Replication process sometimes dumps core if shared extents are present in the source file system. You will see "UX:vxfs vfradmin: INFO: V-3-27703: ... 'replication process killed by signal'." when you use "vfradmin getjobstatus" command. DESCRIPTION: Replication process saves to disk information about shared extents present on the source side. This information is used to optimize data transfer as well as achieve same sharing at the destination files. When a new chunk of main memory is allocated, to save the shared extent structure, it has to be initialized to zero to avoid ambiguity. In one particular case, the replication process fails to initialize the main memory to 0, and aborts processing (dumps core) when it reads uninitialized memory. RESOLUTION:Replication process has been changed to initialize every newly allocated piece of memory in which shared extents are stored to 0. * INCIDENT NO:2654506 TRACKING ID:2613884 SYMPTOM: Metadata corruption may be seen after recovery DESCRIPTION: If during getting a delegation, the primary dies then even though the node doesn't get the delegation, a flag is set instructing it to put the delegation .Because of this after recovery , when we retry we may put the delegation which we no longer have (some other node may have it) leading to corruption. RESOLUTION:Handled this case by resetting the flag when we retry again if the primary dies. * INCIDENT NO:2654770 TRACKING ID:2583197 SYMPTOM: Upgrade of filesystem from version 8 to 9 fails in presence of partition directories and clones. DESCRIPTION: In version 9, we changed the hash function for partitioned directory. So, while upgrading from version 8 to 9, all partitioned directories need to be converted back to regular directories. In our code, we have restrictions to avoid modifying read only clones which hampered changing partitioned directories to regular directories on read only clones, resulting in upgrade failing with EROFS. RESOLUTION:Fixed the code to let us modify read only clones as well while we are upgrading from 8 to 9. * INCIDENT NO:2654773 TRACKING ID:2645108 SYMPTOM: A write to a regular file can fail with EIO in certain cases. DESCRIPTION: In certain cases on a file, which has a shared extent allocated as the last extent of the file, an extending write beyond EOF can fail with EIO error due to VxFS erroneously looking for allocated extents beyond the largest permissible file offset. RESOLUTION:Corrected the code to not look for extents beyond EOF when not necessary. * INCIDENT NO:2654783 TRACKING ID:2645112 SYMPTOM: A write operation on a shared compressed extent can result in corruption of the compressed data associated with that file. DESCRIPTION: In certain cases a write operation on a shared compressed extent can lead to corruption of compressed extent associated with a file. This can happen due to not copy-on-write operation not splitting a shared compressed extent at the right extent boundary. RESOLUTION:Corrected the code to handle this case to avoid breaking a compressed extent during copy-on-write operation. * INCIDENT NO:2654790 TRACKING ID:2645109 SYMPTOM: In certain rare cases after a successful execution of vxfilesnap command, if the source file gets deleted in a very short span of time after the filesnap operation, then the destination file can get corrupted and this could also lead to setting of VX_FULLFSCK flag in the super block. DESCRIPTION: In certain rare cases a successful execution of the vxfilesnap command can lead to allocation of a new indirect extent to the source and destination files. In such a case if the source file gets deleted immediately after the filesnap operation then file system may decide not to flush data belong to this indirect extent and this may result in corruption of the destination file. Also depending on what data previously existed in the indirect extent, filesystem's VX_FULLFSCK flag can get set in the superblock. RESOLUTION:In the above mentioned case code now takes care to flush any newly allocated indirect extent immediately after the successful execution of the filesnap operation. * INCIDENT NO:2661222 TRACKING ID:2660761 SYMPTOM: system panics due to memory corruption. The panic stack could be anything. With postmark benchmark the most common stack was : page_fault+0x1f/0x30 vx_iupdat_cluster+0x6c/0x390 [vxfs] vx_iupdat_local+0x32e/0x610 [vxfs] vx_cfs_iupdat+0x4d/0x150 [vxfs] vx_tflush_inode+0x11e/0x180 [vxfs] vx_fsq_flush+0x2fe/0x7d0 [vxfs] vx_fsflush_fsq+0x93/0xc0 [vxfs] vx_workitem_process+0xb/0x20 [vxfs] vx_worklist_process+0x115/0x260 [vxfs] vx_worklist_thread+0x5d/0xa0 [vxfs] vx_kthread_init+0x75/0x90 [vxfs] child_rip+0xa/0x20 DESCRIPTION: This code that handles smartmove requests for cluster mounted file system had a bug that under certain conditions could write to memory it did not allocate. RESOLUTION:The code was changed to ensure that smartmove does not touch the memory it did not allocate. * INCIDENT NO:2661223 TRACKING ID:2655786 SYMPTOM: Replication turns off shared extent processing after transferring a large number of shared extents succesfully. In further iterations, extents that are shared at the source do not appear as shared at the destination. DESCRIPTION: There is an internal limit in the send side replication process to the size of one of buffers used to store shared extents starting from same physical block, but varying lengths. The varying length of shared extent could be due to varying length of matching sections from different files. Once this limit of shared extent entries at source side is exceeded, the replication process turns off shared extent processing in all future iterations to prevent data structure inconsistency. RESOLUTION:Replication process has been enhanced to remove the internal limit on number of varying length shared extents at source side that refer to the same starting block. * INCIDENT NO:2662259 TRACKING ID:2609002 SYMPTOM: The deduplication session does not complete.The /lost+found/dedupdb/log/spoold/storaged.log contains following message: ERR [1543]: 25004: Queue processing failed five times in a row. Queue processing will be disabled and the CR will no longer accept new backup data. Please contact support. DESCRIPTION: The deduplication database engine did not handle simultaneous queue processing and querying from different threads. RESOLUTION:The code is modified to handle both operations correctly when issued by different threads. * INCIDENT NO:2663763 TRACKING ID:2649367 SYMPTOM: System will crash in vx_fopen with NULL pointer dereference. DESCRIPTION: This is race condition between force umount (on Linux platform) and vx_fopen call. RESOLUTION:We should take an active level before proceeding in vx_fopen if we are here through sys call (and not from VxFS itself). * INCIDENT NO:2672151 TRACKING ID:2672148 SYMPTOM: vxdelestat (1M) when invoked with -v option goes into infinite loop and the following text is displayed 1 2 3 4 5 6 7 DESCRIPTION: In verbose mode while printing the delegation list if the number of delegation is more 1024 (i.e. maximum limit) , we read the delegations in chunks and the offset till which the delegations are read is saved in a variable, this variable was not updated correctly , falling in a infinite loop. RESOLUTION:Code is modified to update this offset correctly. * INCIDENT NO:2672209 TRACKING ID:2653845 SYMPTOM: Two mutually exclusive option (-r and -R ) with fsckptadm were getting executed simultaneously. DESCRIPTION: Option (-R) is used to create non-removable checkpoint, if the tunable ckpt_removable is set to 1 (in this case removable checkpoint is created by default). To override this -R option was introduced , which should be mutually exclusive with -r option in fsckptadm. RESOLUTION:Changed the code such that they are mutually exclusive. * INCIDENT NO:2678196 TRACKING ID:2678096 SYMPTOM: fiostat cordumps saying "Segmentation Fault (core dumped)" when count value is <= 0 DESCRIPTION: when count value is <= 0 we are trying to free the memory which is not allocated. Hence core dump. RESOLUTION:Free the memory only when we allocate them. * INCIDENT NO:2684895 TRACKING ID:2655754 SYMPTOM: Several vxfsd kernel threads are stuck trying to take a simple_lock. System responsiveness may drop drastically. One may observe vxfskd threads with following stack trace in the kernel thread list (0)> f pvthread+010600 pvthread+010600 STACK: Use current context [F000000032099600] of cpu 3 [0001151C]krlock_confer_norestart+000000 () [004B1820]krlock+0003A0 (??, ??) [00534138]slock_krlock_acquire+000258 (??, ??, ??) [005349B8]slock+000558 (??, ??) [00009558].simple_lock+000058 () [0487613C].vx_idalloc_off+0002F8 () [048CDF64].vx_iflush_list+0009BC () [048CE1FC].vx_iflush+00009C () [048C8304].vx_workitem_process+00003C () [048D357C].vx_worklist_process+0000F4 () [048D37B4].vx_worklist_thread+000090 () [04855B28].vx_thread_base+000034 () [0035B6DC]threadentry+00005C (??, ??, ??, ??) [kdb_read_mem] no real storage @ FFFFFFFFFFF9600 DESCRIPTION: This is a deadlock resulting from wrong ipl (interrupt level) used in a file system spinlock. The locks are related to delayed allocation feature and so the issue doesn't exists if the feature is unused (the feature is ON by default). The bug doesn't have any effect if the file system is mounted with cluster option. RESOLUTION:The spin lock initialization and use are modified to use the correct ipl. * INCIDENT NO:2700895 TRACKING ID:2672201 SYMPTOM: Certain commands get blocked by kernel, causing EACCES(ERRNO = 13). DESCRIPTION: The inodes->i_security field is wrongly initialized, i_security field is used by selinux for permission granting. RESOLUTION:Did the code changes to fix the initialization of i_security field, accordingly. INCIDENTS FROM OLD PATCHES: --------------------------- NONE