README VERSION : 1.1 README CREATION DATE : 2012-03-09 PATCH ID : 6.0.1.0 PATCH NAME : VRTSvxfs 6.0RP1 BASE PACKAGE NAME : VRTSvxfs BASE PACKAGE VERSION : 6.0.0.0 OBSOLETE PATCHES : NONE SUPERSEDED PATCHES : NONE REQUIRED PATCHES : NONE INCOMPATIBLE PATCHES : NONE SUPPORTED PADV : aix61,aix71 (P-PLATFORM , A-ARCHITECTURE , D-DISTRIBUTION , V-VERSION) PATCH CATEGORY : CORE , HANG , PANIC , PERFORMANCE REBOOT REQUIRED : YES PATCH INSTALLATION INSTRUCTIONS: -------------------------------- Please refer to the Release Notes for Installation and Uninstallation Instructions. PATCH UNINSTALLATION INSTRUCTIONS: ---------------------------------- Please refer to the Release Notes for Installation and Uninstallation Instructions. SPECIAL INSTRUCTIONS: ---------------------- NONE SUMMARY OF FIXED ISSUES: ----------------------------------------- 2608568 Abrupt messages are seen in engine log after complete storage failure in cvm resiliency scenario. 2627108 Expanding or shrinking a DLV5 file system using the fsadm(1M)command causes a system panic. 2627116 Poor VxFS performance for application doing writes on a mmaped file 2645691 The following error message is displayed during the execution of the fsmap(1M) command:'UX:vxfs fsmap: ERROR: V-3-27313' 2645695 Native filesystem migrated to vxfs disk layout 8 where layout version 9 is the default. 2645821 The fscdsconv(1M) command which is used to convert corrupted or non-VxFS file systems generates core. 2645827 The system may panic while re-organizing the file system using the fsadm(1M) command. 2646915 The fsck(1M) command exits during an internal CFS stress reconfiguration testing. 2654504 The replication process dumps core when shared extents are present in the source file system. 2654505 Listing of a partitioned directory using the DMAPI does not list all the entries. 2654506 Metadata corruption may be seen after recovery. 2654507 A hang may be seen because VxFS falsely detect low pinnable memory scenario 2654511 fsmigadm "commit/status" error messages should be clear. 2654681 New tunable - chunk_inval_size and few more option with 'chunk_flush_size' 2654684 Accessing a file with O_NSHARE mode by multiple process concurrently on Aix could cause file system hang. 2654770 Upgrade of a file system from version 8 to 9 fails in the presence of partition directories and clones. 2654773 In certain cases write on a regular file which has shared extent as the last allocated extent can fail with EIO error. 2654783 write operation on a regular file mapping to shared compressed extent results in corruption 2654790 In certain rare cases after a successful execution of vxfilesnap command, if the source file gets deleted in a very short span of time after the filesnap operation, then the destination file can get corrupted and this could also lead to setting of VX_FULLFSCK flag in the super block. 2661222 In a cluster mounted file system, memory corruption is seen during the execution of the SmartMove feature. 2661223 'Shared' extents are not transferred as 'shared' by the replication process. 2662259 The De-duplication session does not complete. 2672209 When the fsckptadm(1M) command with the '-r' and '-R' option is executed, two mutually exclusive options gets executed simultaneously. 2678196 The fiostat command dumps core when the count value is 0. 2684895 Deadlock because of wrong spin lock interrupt level at which delayed allocation list lock is taken. SUMMARY OF KNOWN ISSUES: ----------------------------------------- 2654682 Full file system check using fsck_vxfs(1m) takes over a week 2627081 vxfsd is taking a lot of CPU time after deleting some directories 2627096 fsckptadm (1m) fails with ENXIO 2627084 umount(1m) on a CFS filesystem panics the machine. 2654673 A cluster mounted file-system may panic the system showing vx_tflush_map in the stack trace. 2684732 fsppadm(1m) dumps core with SIGSEGV while assigning a policy. KNOWN ISSUES : -------------- * INCIDENT NO::2654682 TRACKING ID ::2628207 SYMPTOM:: One large file-system with many checkpoints the fsck operation may seem to be hung WORKAROUND:: None * INCIDENT NO::2627081 TRACKING ID ::2129455 SYMPTOM:: vxfsd is taking a lot of CPU time after deleting some directories WORKAROUND:: None. * INCIDENT NO::2627096 TRACKING ID ::1956458 SYMPTOM:: fsckptadm (1m) fails with ENXIO and filesytem is marked for full fsck WORKAROUND:: None. * INCIDENT NO::2627084 TRACKING ID ::2107152 SYMPTOM:: In rare corner cases, system panics while unmounting a cluster mounted filesystem WORKAROUND:: None * INCIDENT NO::2654673 TRACKING ID ::2558892 SYMPTOM:: VxFS causes a server to panic. The subroutine initiating the panic is vx_tflush_map. WORKAROUND:: None * INCIDENT NO::2684732 TRACKING ID ::2715414 SYMPTOM:: fsppadm(1m) dumps core with SIGSEGV while assigning a policy. WORKAROUND:: Increase the pthread stack size using the following command export PTHREAD_DEFAULT_STACK_SIZE=2048000 FIXED INCIDENTS: PATCH ID:6.0.1.0 * INCIDENT NO:2608568 TRACKING ID:2663750 SYMPTOM: Incorrect remote notifications are given on local disk connectivity lost/restore event for disks with same pattern as that of actual disk for which event has occurred. DESCRIPTION: When local disk connectivity is restored/lost for a disk, in addition to correct notification for a disk cvmvoldg agent will also log wrong notifications in VCS engine log for disks having similar pattern though there is no change in connectivity on those disks. RESOLUTION:In cvmvoldg agent, local disk connectivity events (restore/lost) are now checked with exact match before reporting. * INCIDENT NO:2627108 TRACKING ID:2599590 SYMPTOM: Expansion of a 100% full file system may panic the machine with the following stack trace. bad_kern_reference() $cold_vfault() vm_hndlr() bubbledown() vx_logflush() vx_log_sync1() vx_log_sync() vx_worklist_thread() kthread_daemon_startup() DESCRIPTION: When 100% full file system is expanded intent log of the file system is truncated and blocks freed up are used during the expansion. Due to a bug the block map of the replica intent log inode was not getting updated thereby causing the block maps of the two inodes to differ. This caused some of the in- core structures of the intent log to go NULL. The machine panics while de- referencing one of this structure. RESOLUTION:Updated the block map of the replica intent log inode correctly. 100% full file system now can be expanded only If the last extent in the intent log contains more than 32 blocks, otherwise fsadm will fail. To expand such a file-system, some of the files should be deleted manually and resize be retried. * INCIDENT NO:2627116 TRACKING ID:2511432 SYMPTOM: Poor VxFS performance for application doing writes on a mmaped file which has been written to before being mmaped and therefore has all the associated pages already brought into memory as a result. Large page-ins are observed as soon as the writes begin. Also large page-outs are observed periodically afterward. DESCRIPTION: Buffered writes to a file brings the associated page in memory. Such a page can be written to. However mmaping this file now will mark the page read-only. Any mmaped write writing to same page will encounter now a protection fault thereby leading to page-in. The VxFS background flushing mechanism periodically flushes the entire file which leads to large page-outs. During these page-outs the writes happening else where on the file slowdown owing to unavailability of various inode locks currently held by the flushing thread. RESOLUTION:For preventing large page-ins we have removed the extra overhead of protection fault by marking pages read-write in the mmap range wherever possible. We have introduced a new tunable "vx_ctrl_flush" which can be tuned to control the amount of flushing by background flushing thread and therefore the page- outs. * INCIDENT NO:2645691 TRACKING ID:2645435 SYMPTOM: The fsmap command gives error 'UX:vxfs fsmap: ERROR: V-3-27313: failed to open mount point: : I/O error' DESCRIPTION: fsmap queries the mount table and makes a list of all mounted filesystems and then opens the mount points in the list. If in between, the filesystem gets unmounted then the open fails with the above error and program exits. RESOLUTION:Instead of exiting on error because of above scenario, continue by skipping the filesystem which gave error (because it got unmounted). * INCIDENT NO:2645695 TRACKING ID:2645441 SYMPTOM: Native filesystem migrated to vxfs disk layout 8 using Online migration DESCRIPTION: Even though the default disk layout is version 9 yet, using online migration the native filesystem is migrated to vxfs disk layout version 9. RESOLUTION:Fixed it and now the file system is migrated to default disk layout (in this case 9) * INCIDENT NO:2645821 TRACKING ID:2536130 SYMPTOM: Issue 1: If fscdsconv command is used to convert a) corrupted or b) non-VxFS file system, instead of exiting with prooper error message it crashes with the following message: devops.c 1693: ASSERT(0) failed DESCRIPTION: Before starting to convert a filesystem, VxFS does a sanity check on the file system. While doing sanity check, for corrupt file system the command incorrectly tries to close the file system twice. Hence on the second close the program crashes. RESOLUTION:Fix makes sure that if close is called once it is not called again. Now if fscdsconv is run on a corrupted or non VxFs file system it exits with the following error message: UX:vxfs fscdsconv: ERROR: V-3-20318: file system dirty, run fsck first UX:vxfs fscdsconv: ERROR: V-3-24426: fscdsconv: Failed to migrate. Issue 2: If partition directory feature is enabled in filesystem and trying to export the filesystem from Linux to AIX platform , mount command asserts with "f:VX_FCL_INIT_FAILED:ndebug". * INCIDENT NO:2645827 TRACKING ID:2552095 SYMPTOM: While re-organizing the file-system using fsadm (1M) the system may panic with the following stack trace vx_iget vx_aioctl_full vx_aioctl_common vx_aioctl vx_ioctl vx_ioctl_skey DESCRIPTION: Due to a race condition in the function vx_inactive and vx_iget, A inode which is on the freelist can be given erroneously , leading to panic similar to the one mentioned above. RESOLUTION:Code is modified to take necessary protection while updating the inode pointer, thus avoiding the race. * INCIDENT NO:2646915 TRACKING ID:2630954 SYMPTOM: During internal CFS stress reconfiguration testing, the fsck(1M) command exits by hitting an assert. DESCRIPTION: When the dexh_getblk() function is executed, if the extent size is greater than 256 Kbytes, the extent is divided into chunks of 256 Kbytes each. When the extent of the hash inodes are read in the dexh_getblk() function, a maximum of 256 Kbytes (MAXBUFSZ) is read at a time. Currently, the chunk size is assigned as 256 Kbytes every time. But there is a bug when the last chunk in the extent is less than 256 Kbytes because of which the length of the buffer is assigned incorrectly and we get an aliased buffer in the buffer cache. Instead, for the last chunk, the remaining size in the extent to be read should be assigned as the chunk size. RESOLUTION:The code is modified so that the buffer size is calculated correctly. * INCIDENT NO:2654504 TRACKING ID:2646936 SYMPTOM: Replication process sometimes dumps core if shared extents are present in the source file system. You will see "UX:vxfs vfradmin: INFO: V-3-27703: ... 'replication process killed by signal'." when you use "vfradmin getjobstatus" command. DESCRIPTION: Replication process saves to disk information about shared extents present on the source side. This information is used to optimize data transfer as well as achieve same sharing at the destination files. When a new chunk of main memory is allocated, to save the shared extent structure, it has to be initialized to zero to avoid ambiguity. In one particular case, the replication process fails to initialize the main memory to 0, and aborts processing (dumps core) when it reads uninitialized memory. RESOLUTION:Replication process has been changed to initialize every newly allocated piece of memory in which shared extents are stored to 0. * INCIDENT NO:2654505 TRACKING ID:2624459 SYMPTOM: Listing of a partitioned directory through DMAPI does not list all the entries in certain scenarios. DESCRIPTION: A partitioned namespace visible directory has multiple hidden directories under it and the entries are distributed across those hidden directories. A readdir operation on the visible directory is expected to traverse all the hidden directories and list their entries, but some of the hidden directories were getting skipped due to wrong offset manipulation, resulting in entries presenting in those hidden directories being non-reported. RESOLUTION:Offset manipulation problem is fixed to ensure all hidden directories are traversed and their contents being reported. * INCIDENT NO:2654506 TRACKING ID:2613884 SYMPTOM: Metadata corruption may be seen after recovery DESCRIPTION: If during getting a delegation, the primary dies then even though the node doesn't get the delegation, a flag is set instructing it to put the delegation .Because of this after recovery , when we retry we may put the delegation which we no longer have (some other node may have it) leading to corruption. RESOLUTION:Handled this case by resetting the flag when we retry again if the primary dies. * INCIDENT NO:2654507 TRACKING ID:2536054 SYMPTOM: A hang may be seen because VxFS falsely detect low pinnable memory scenario DESCRIPTION: This issue may be seen if Kernel locking feature is enabled. This feature is enabled by default (default value - 2) on AIX 7.1. The kernel locking feature makes some pageable kernel pages less vulnerable to LRU page stealing. These kernel pages are considered for page stealing only after all other pageable pages have been considered. These kernel pages are not really pinned, but to protect these pages they are excluded from certain OS counts. This gives the impression the system has fewer available pinnable pages. VxFS makes use of some of these counts to detect low memory scenario. As these count now excludes pages which are not really pinned, so low pinnable memory scenario is falsely detected by VxFS. VxFS tries to avoid memory allocation in certain cases like buffer cache etc, if low pinnable memory scenario is detected. If no memory is available to reuse and no new memory allocation is allowed because of low memory scenario then a hang may be noticed. RESOLUTION:OS also maintains counts related to kernel locked pages. VxFS now also uses some of these counts to correctly determine low pinnable scenario. * INCIDENT NO:2654511 TRACKING ID:2563251 SYMPTOM: fsmigadm commit/status command fails if migration parameters cannot be obtained. DESCRIPTION: fsmigadm command fails if it is not fired on a migration filesystem. The ioctl call to obtain migration parameters returns error. Depending on that error, one of the two error messages gets printed. 1. UX:vxfs fsmigadm: ERROR: V-3-27014: Migration getparam ioctl failed on /mnt1, I/O error 2. UX:vxfs fsmigadm: ERROR: V-3-27013: /mnt1 is not a migration file system RESOLUTION:The first error message has been removed, as it is not clear and user- friendly. If ioctl fails, only the second message gets printed, which is simple and clear. * INCIDENT NO:2654681 TRACKING ID:2626390 SYMPTOM: Freeing a large number of pages at once can induce a small i/o latency. DESCRIPTION: By reading an entire file into memory and then removing the file all the pages will be freed. Pages are freed in chunks. Currently the smallest configurable chunk size for freeing pages is 64MB. RESOLUTION:Enhance the fine tuning of the chunk invalidation size, now allow smaller chunks to be tuned. Example tuning below will set the chunk size for freeing pages to 8Mb, the table shows the tuning values now available. # vxtunefs -D chunk inval_size=6 chunk inval_size: 64 256 128 64 32 16 8 4 unit is Mb Tuning value : 0 1 2 3 4 5 6 7 * INCIDENT NO:2654684 TRACKING ID:2650330 SYMPTOM: Accessing a file with O_NSHARE mode by multiple process concurrently on Aix could cause file system hang. DESCRIPTION: There are two different hang scenarios. First, a deadlock can be seen between open threads and a freeze operation. For example, a freeze T1 issued by commands, such as umount, fsadm etc stops new threads from getting active level and meanwhile is waiting for an old thread T2 which is holding active level, but expecting ilock. However, T3 thread with the ilock is not able to get the active level because of freeze thread T1. T1: vx_async_waitmsg+00001C vx_msg_broadcast+000118 vx_cwfa_step+0000A0 vx_cwfreeze_common+0000F8 vx_cwfreeze_all+0002E8 vx_freeze+000038 vx_detach_fset+000394 vx_unmount+0001AC vx_unmount_skey+000034 T2: simple_lock+000058 vx_ilock+000020 vx_close1+000720 vx_close+00006C vx_close_skey+00003C vnop_close+000094 vno_close+000050 closef+00005C T3: vx_delay+000010 vx_active_common_flush+000038 vx_open_modes+00058C vx_open1+0001FC vx_open+00007C vx_open_skey+000044 Another RESOLUTION:Give up the ilock prior to attempting active level, and wakeup function regards ilock as simple lock instead of complex lock. * INCIDENT NO:2654770 TRACKING ID:2583197 SYMPTOM: Upgrade of filesystem from version 8 to 9 fails in presence of partition directories and clones. DESCRIPTION: In version 9, we changed the hash function for partitioned directory. So, while upgrading from version 8 to 9, all partitioned directories need to be converted back to regular directories. In our code, we have restrictions to avoid modifying read only clones which hampered changing partitioned directories to regular directories on read only clones, resulting in upgrade failing with EROFS. RESOLUTION:Fixed the code to let us modify read only clones as well while we are upgrading from 8 to 9. * INCIDENT NO:2654773 TRACKING ID:2645108 SYMPTOM: A write to a regular file can fail with EIO in certain cases. DESCRIPTION: In certain cases on a file, which has a shared extent allocated as the last extent of the file, an extending write beyond EOF can fail with EIO error due to VxFS erroneously looking for allocated extents beyond the largest permissible file offset. RESOLUTION:Corrected the code to not look for extents beyond EOF when not necessary. * INCIDENT NO:2654783 TRACKING ID:2645112 SYMPTOM: A write operation on a shared compressed extent can result in corruption of the compressed data associated with that file. DESCRIPTION: In certain cases a write operation on a shared compressed extent can lead to corruption of compressed extent associated with a file. This can happen due to not copy-on-write operation not splitting a shared compressed extent at the right extent boundary. RESOLUTION:Corrected the code to handle this case to avoid breaking a compressed extent during copy-on-write operation. * INCIDENT NO:2654790 TRACKING ID:2645109 SYMPTOM: In certain rare cases after a successful execution of vxfilesnap command, if the source file gets deleted in a very short span of time after the filesnap operation, then the destination file can get corrupted and this could also lead to setting of VX_FULLFSCK flag in the super block. DESCRIPTION: In certain rare cases a successful execution of the vxfilesnap command can lead to allocation of a new indirect extent to the source and destination files. In such a case if the source file gets deleted immediately after the filesnap operation then file system may decide not to flush data belong to this indirect extent and this may result in corruption of the destination file. Also depending on what data previously existed in the indirect extent, filesystem's VX_FULLFSCK flag can get set in the superblock. RESOLUTION:In the above mentioned case code now takes care to flush any newly allocated indirect extent immediately after the successful execution of the filesnap operation. * INCIDENT NO:2661222 TRACKING ID:2660761 SYMPTOM: system panics due to memory corruption. The panic stack could be anything. With postmark benchmark the most common stack was : page_fault+0x1f/0x30 vx_iupdat_cluster+0x6c/0x390 [vxfs] vx_iupdat_local+0x32e/0x610 [vxfs] vx_cfs_iupdat+0x4d/0x150 [vxfs] vx_tflush_inode+0x11e/0x180 [vxfs] vx_fsq_flush+0x2fe/0x7d0 [vxfs] vx_fsflush_fsq+0x93/0xc0 [vxfs] vx_workitem_process+0xb/0x20 [vxfs] vx_worklist_process+0x115/0x260 [vxfs] vx_worklist_thread+0x5d/0xa0 [vxfs] vx_kthread_init+0x75/0x90 [vxfs] child_rip+0xa/0x20 DESCRIPTION: This code that handles smartmove requests for cluster mounted file system had a bug that under certain conditions could write to memory it did not allocate. RESOLUTION:The code was changed to ensure that smartmove does not touch the memory it did not allocate. * INCIDENT NO:2661223 TRACKING ID:2655786 SYMPTOM: Replication turns off shared extent processing after transferring a large number of shared extents succesfully. In further iterations, extents that are shared at the source do not appear as shared at the destination. DESCRIPTION: There is an internal limit in the send side replication process to the size of one of buffers used to store shared extents starting from same physical block, but varying lengths. The varying length of shared extent could be due to varying length of matching sections from different files. Once this limit of shared extent entries at source side is exceeded, the replication process turns off shared extent processing in all future iterations to prevent data structure inconsistency. RESOLUTION:Replication process has been enhanced to remove the internal limit on number of varying length shared extents at source side that refer to the same starting block. * INCIDENT NO:2662259 TRACKING ID:2609002 SYMPTOM: The deduplication session does not complete.The /lost+found/dedupdb/log/spoold/storaged.log contains following message: ERR [1543]: 25004: Queue processing failed five times in a row. Queue processing will be disabled and the CR will no longer accept new backup data. Please contact support. DESCRIPTION: The deduplication database engine did not handle simultaneous queue processing and querying from different threads. RESOLUTION:The code is modified to handle both operations correctly when issued by different threads. * INCIDENT NO:2672209 TRACKING ID:2653845 SYMPTOM: Two mutually exclusive option (-r and -R ) with fsckptadm were getting executed simultaneously. DESCRIPTION: Option (-R) is used to create non-removable checkpoint, if the tunable ckpt_removable is set to 1 (in this case removable checkpoint is created by default). To override this -R option was introduced , which should be mutually exclusive with -r option in fsckptadm. RESOLUTION:Changed the code such that they are mutually exclusive. * INCIDENT NO:2678196 TRACKING ID:2678096 SYMPTOM: fiostat cordumps saying "Segmentation Fault (core dumped)" when count value is <= 0 DESCRIPTION: when count value is <= 0 we are trying to free the memory which is not allocated. Hence core dump. RESOLUTION:Free the memory only when we allocate them. * INCIDENT NO:2684895 TRACKING ID:2655754 SYMPTOM: Several vxfsd kernel threads are stuck trying to take a simple_lock. System responsiveness may drop drastically. One may observe vxfskd threads with following stack trace in the kernel thread list (0)> f pvthread+010600 pvthread+010600 STACK: Use current context [F000000032099600] of cpu 3 [0001151C]krlock_confer_norestart+000000 () [004B1820]krlock+0003A0 (??, ??) [00534138]slock_krlock_acquire+000258 (??, ??, ??) [005349B8]slock+000558 (??, ??) [00009558].simple_lock+000058 () [0487613C].vx_idalloc_off+0002F8 () [048CDF64].vx_iflush_list+0009BC () [048CE1FC].vx_iflush+00009C () [048C8304].vx_workitem_process+00003C () [048D357C].vx_worklist_process+0000F4 () [048D37B4].vx_worklist_thread+000090 () [04855B28].vx_thread_base+000034 () [0035B6DC]threadentry+00005C (??, ??, ??, ??) [kdb_read_mem] no real storage @ FFFFFFFFFFF9600 DESCRIPTION: This is a deadlock resulting from wrong ipl (interrupt level) used in a file system spinlock. The locks are related to delayed allocation feature and so the issue doesn't exists if the feature is unused (the feature is ON by default). The bug doesn't have any effect if the file system is mounted with cluster option. RESOLUTION:The spin lock initialization and use are modified to use the correct ipl. INCIDENTS FROM OLD PATCHES: --------------------------- NONE