README VERSION : 1.1 README CREATION DATE : 2012-03-09 PATCH-ID : PVCO_03952 PATCH NAME : VRTSvxfs 6.0RP1 BASE PACKAGE NAME : VRTSvxfs BASE PACKAGE VERSION : 6 OBSOLETE PATCHES : NONE SUPERSEDED PATCHES : NONE REQUIRED PATCHES : PVKL_03951 INCOMPATIBLE PATCHES : NONE SUPPORTED PADV : hpux1131 (P-PLATFORM , A-ARCHITECTURE , D-DISTRIBUTION , V-VERSION) PATCH CATEGORY : CORE , CORRUPTION , HANG , PANIC , PERFORMANCE REBOOT REQUIRED : NO PATCH INSTALLATION INSTRUCTIONS: -------------------------------- Please refer to the Release Notes for Installation and Uninstallation Instructions. PATCH UNINSTALLATION INSTRUCTIONS: ---------------------------------- Please refer to the Release Notes for Installation and Uninstallation Instructions. SPECIAL INSTRUCTIONS: --------------------- NONE SUMMARY OF FIXED ISSUES: ----------------------------------------- 2627108 Expanding or shrinking a DLV5 file system using the fsadm(1M)command causes a system panic. 2645821 The fscdsconv(1M) command which is used to convert corrupted or non-VxFS file systems generates core. 2645827 The system may panic while re-organizing the file system using the fsadm(1M) command. 2653418 The performance of vxbench degrades for buffered writes on the HP-UX platform. 2654505 Listing of a partitioned directory using the DMAPI does not list all the entries. 2654506 Metadata corruption may be seen after recovery. 2654770 Upgrade of a file system from version 8 to 9 fails in the presence of partition directories and clones. 2654773 In certain cases write on a regular file which has shared extent as the last allocated extent can fail with EIO error. 2654783 write operation on a regular file mapping to shared compressed extent results in corruption 2654790 In certain rare cases after a successful execution of vxfilesnap command, if the source file gets deleted in a very short span of time after the filesnap operation, then the destination file can get corrupted and this could also lead to setting of VX_FULLFSCK flag in the super block. 2660094 On a cluster mounted file system, the mount(1M) command hangs. 2661222 In a cluster mounted file system, memory corruption is seen during the execution of the SmartMove feature. 2679504 A write(2) operation exceeding the quota limit fails with an EDQUOT error. 2679523 Duplicate file names can be seen in a directory. 2684895 Deadlock because of wrong spin lock interrupt level at which delayed allocation list lock is taken. SUMMARY OF KNOWN ISSUES: ----------------------------------------- 2654682 Full file system check using fsck_vxfs(1m) takes over a week 2654685 on a cluster mounted filesystem ls(1m) with -l option may take longer 2627081 vxfsd is taking a lot of CPU time after deleting some directories 2627096 fsckptadm (1m) fails with ENXIO 2627101 On file-system having low free memory , commands like ls,find may seem to be hung. 2627084 umount(1m) on a CFS filesystem panics the machine. 2627089 mmap(1m) performance is lower on VXFS 5.0.1 with HPUX 11.31 2654673 A cluster mounted file-system may panic the system showing vx_tflush_map in the stack trace. 2684732 fsppadm(1m) dumps core with SIGSEGV while assigning a policy. KNOWN ISSUES : -------------- * INCIDENT NO::2654682 TRACKING ID ::2628207 SYMPTOM:: One large file-system with many checkpoints the fsck operation may seem to be hung WORKAROUND:: None * INCIDENT NO::2654685 TRACKING ID ::2651922 SYMPTOM:: on a cluster mounted filesystem ls(1m) with -l option may take longer WORKAROUND:: None * INCIDENT NO::2627081 TRACKING ID ::2129455 SYMPTOM:: vxfsd is taking a lot of CPU time after deleting some directories WORKAROUND:: None. * INCIDENT NO::2627096 TRACKING ID ::1956458 SYMPTOM:: fsckptadm (1m) fails with ENXIO and filesytem is marked for full fsck WORKAROUND:: None. * INCIDENT NO::2627101 TRACKING ID ::2359706 SYMPTOM:: On file-system having low free memory , commands like ls,find may seem to be hung. WORKAROUND:: None * INCIDENT NO::2627084 TRACKING ID ::2107152 SYMPTOM:: In rare corner cases, system panics while unmounting a cluster mounted filesystem WORKAROUND:: None * INCIDENT NO::2627089 TRACKING ID ::2183320 SYMPTOM:: mmap(1m) performance is lower on VXFS 5.0.1 with HPUX 11.31 WORKAROUND:: None * INCIDENT NO::2654673 TRACKING ID ::2558892 SYMPTOM:: VxFS causes a server to panic. The subroutine initiating the panic is vx_tflush_map. WORKAROUND:: None * INCIDENT NO::2684732 TRACKING ID ::2715414 SYMPTOM:: fsppadm(1m) dumps core with SIGSEGV while assigning a policy. WORKAROUND:: Increase the pthread stack size using the following command export PTHREAD_DEFAULT_STACK_SIZE=2048000 FIXED INCIDENTS: ---------------- PATCH ID:PVCO_03952 * INCIDENT NO:2627108 TRACKING ID:2599590 SYMPTOM: Expansion of a 100% full file system may panic the machine with the following stack trace. bad_kern_reference() $cold_vfault() vm_hndlr() bubbledown() vx_logflush() vx_log_sync1() vx_log_sync() vx_worklist_thread() kthread_daemon_startup() DESCRIPTION: When 100% full file system is expanded intent log of the file system is truncated and blocks freed up are used during the expansion. Due to a bug the block map of the replica intent log inode was not getting updated thereby causing the block maps of the two inodes to differ. This caused some of the in- core structures of the intent log to go NULL. The machine panics while de- referencing one of this structure. RESOLUTION:Updated the block map of the replica intent log inode correctly. 100% full file system now can be expanded only If the last extent in the intent log contains more than 32 blocks, otherwise fsadm will fail. To expand such a file-system, some of the files should be deleted manually and resize be retried. * INCIDENT NO:2645821 TRACKING ID:2536130 SYMPTOM: Issue 1: If fscdsconv command is used to convert a) corrupted or b) non-VxFS file system, instead of exiting with prooper error message it crashes with the following message: devops.c 1693: ASSERT(0) failed DESCRIPTION: Before starting to convert a filesystem, VxFS does a sanity check on the file system. While doing sanity check, for corrupt file system the command incorrectly tries to close the file system twice. Hence on the second close the program crashes. RESOLUTION:Fix makes sure that if close is called once it is not called again. Now if fscdsconv is run on a corrupted or non VxFs file system it exits with the following error message: UX:vxfs fscdsconv: ERROR: V-3-20318: file system dirty, run fsck first UX:vxfs fscdsconv: ERROR: V-3-24426: fscdsconv: Failed to migrate. Issue 2: If partition directory feature is enabled in filesystem and trying to export the filesystem from Linux to AIX platform , mount command asserts with "f:VX_FCL_INIT_FAILED:ndebug". * INCIDENT NO:2645827 TRACKING ID:2552095 SYMPTOM: While re-organizing the file-system using fsadm (1M) the system may panic with the following stack trace vx_iget vx_aioctl_full vx_aioctl_common vx_aioctl vx_ioctl vx_ioctl_skey DESCRIPTION: Due to a race condition in the function vx_inactive and vx_iget, A inode which is on the freelist can be given erroneously , leading to panic similar to the one mentioned above. RESOLUTION:Code is modified to take necessary protection while updating the inode pointer, thus avoiding the race. * INCIDENT NO:2653418 TRACKING ID:2646933 SYMPTOM: Large sequential writes take extra time to complete compared to previous version of the software. The degradation in write performance is observed for large sequential write in the order of Gigabytes. DESCRIPTION: This issue was introduced because of omission of an optimization flag in the call to operating system virtual memory interfaces. The flag is FVF_FLUSH_BHND used in virtual memory API call to flush a file. This issue has no effect when direct i/o is used for writes. The performance degradation is observed only when delayed allocation feature is used (feature is ON by default). The bug doesn't have any effect if the file system is mounted with cluster option. RESOLUTION:The correct flag was passed when flushing a file. * INCIDENT NO:2654505 TRACKING ID:2624459 SYMPTOM: Listing of a partitioned directory through DMAPI does not list all the entries in certain scenarios. DESCRIPTION: A partitioned namespace visible directory has multiple hidden directories under it and the entries are distributed across those hidden directories. A readdir operation on the visible directory is expected to traverse all the hidden directories and list their entries, but some of the hidden directories were getting skipped due to wrong offset manipulation, resulting in entries presenting in those hidden directories being non-reported. RESOLUTION:Offset manipulation problem is fixed to ensure all hidden directories are traversed and their contents being reported. * INCIDENT NO:2654506 TRACKING ID:2613884 SYMPTOM: Metadata corruption may be seen after recovery DESCRIPTION: If during getting a delegation, the primary dies then even though the node doesn't get the delegation, a flag is set instructing it to put the delegation .Because of this after recovery , when we retry we may put the delegation which we no longer have (some other node may have it) leading to corruption. RESOLUTION:Handled this case by resetting the flag when we retry again if the primary dies. * INCIDENT NO:2654770 TRACKING ID:2583197 SYMPTOM: Upgrade of filesystem from version 8 to 9 fails in presence of partition directories and clones. DESCRIPTION: In version 9, we changed the hash function for partitioned directory. So, while upgrading from version 8 to 9, all partitioned directories need to be converted back to regular directories. In our code, we have restrictions to avoid modifying read only clones which hampered changing partitioned directories to regular directories on read only clones, resulting in upgrade failing with EROFS. RESOLUTION:Fixed the code to let us modify read only clones as well while we are upgrading from 8 to 9. * INCIDENT NO:2654773 TRACKING ID:2645108 SYMPTOM: A write to a regular file can fail with EIO in certain cases. DESCRIPTION: In certain cases on a file, which has a shared extent allocated as the last extent of the file, an extending write beyond EOF can fail with EIO error due to VxFS erroneously looking for allocated extents beyond the largest permissible file offset. RESOLUTION:Corrected the code to not look for extents beyond EOF when not necessary. * INCIDENT NO:2654783 TRACKING ID:2645112 SYMPTOM: A write operation on a shared compressed extent can result in corruption of the compressed data associated with that file. DESCRIPTION: In certain cases a write operation on a shared compressed extent can lead to corruption of compressed extent associated with a file. This can happen due to not copy-on-write operation not splitting a shared compressed extent at the right extent boundary. RESOLUTION:Corrected the code to handle this case to avoid breaking a compressed extent during copy-on-write operation. * INCIDENT NO:2654790 TRACKING ID:2645109 SYMPTOM: In certain rare cases after a successful execution of vxfilesnap command, if the source file gets deleted in a very short span of time after the filesnap operation, then the destination file can get corrupted and this could also lead to setting of VX_FULLFSCK flag in the super block. DESCRIPTION: In certain rare cases a successful execution of the vxfilesnap command can lead to allocation of a new indirect extent to the source and destination files. In such a case if the source file gets deleted immediately after the filesnap operation then file system may decide not to flush data belong to this indirect extent and this may result in corruption of the destination file. Also depending on what data previously existed in the indirect extent, filesystem's VX_FULLFSCK flag can get set in the superblock. RESOLUTION:In the above mentioned case code now takes care to flush any newly allocated indirect extent immediately after the successful execution of the filesnap operation. * INCIDENT NO:2660094 TRACKING ID:2580905 SYMPTOM: On a cluster mounted files-system mount (1M) command hangs. DESCRIPTION: While mounting the file-system we unnecessarily query the re-org status and the device information which races with other updates causing the hang RESOLUTION:Code is modified to remove the extra re-org status and device information queries, thus reducing the possibility of hang. * INCIDENT NO:2661222 TRACKING ID:2660761 SYMPTOM: system panics due to memory corruption. The panic stack could be anything. With postmark benchmark the most common stack was : page_fault+0x1f/0x30 vx_iupdat_cluster+0x6c/0x390 [vxfs] vx_iupdat_local+0x32e/0x610 [vxfs] vx_cfs_iupdat+0x4d/0x150 [vxfs] vx_tflush_inode+0x11e/0x180 [vxfs] vx_fsq_flush+0x2fe/0x7d0 [vxfs] vx_fsflush_fsq+0x93/0xc0 [vxfs] vx_workitem_process+0xb/0x20 [vxfs] vx_worklist_process+0x115/0x260 [vxfs] vx_worklist_thread+0x5d/0xa0 [vxfs] vx_kthread_init+0x75/0x90 [vxfs] child_rip+0xa/0x20 DESCRIPTION: This code that handles smartmove requests for cluster mounted file system had a bug that under certain conditions could write to memory it did not allocate. RESOLUTION:The code was changed to ensure that smartmove does not touch the memory it did not allocate. * INCIDENT NO:2679504 TRACKING ID:2566875 SYMPTOM: A write(2) exceeding a quota limit fails with EDQUOT error(i.e. "Disc quota exceeded") before user quota limit is reached. Therefore write upto length below quota limit is not performed. Partial write of length within quota limit should be performed instead of just returning EDQUOT error. DESCRIPTION: When a write request exceeds a quota limit, the error EDQUOT should be handled so that vxfs can manage to allocate space up to the hard quota limit to proceed with a partial write. However, vxfs doesn't handle this error and error is returned without performing partial write. RESOLUTION:EDQUOT error from extent allocation routine is now handled to retry write of length within quota limit. * INCIDENT NO:2679523 TRACKING ID:2670022 SYMPTOM: Duplicate file names can be seen in a directory. DESCRIPTION: VxFS maintains internal directory name lookup cache (DNLC) to improve the performance of directory lookups. A race condition is arising in DNLC lists manipulation code during lookup/creation of file names having >32 characters ( which will further affect other file creations). This is causing DNLC to have a stale entry for an existing file in the directory. A lookup of such a file through DNLC will say file as non-existent which will allow another duplicate file name in the directory. RESOLUTION:Fixed the race condition by protecting the DNLC lists through proper locks. * INCIDENT NO:2684895 TRACKING ID:2655754 SYMPTOM: Several vxfsd kernel threads are stuck trying to take a simple_lock. System responsiveness may drop drastically. One may observe vxfskd threads with following stack trace in the kernel thread list (0)> f pvthread+010600 pvthread+010600 STACK: Use current context [F000000032099600] of cpu 3 [0001151C]krlock_confer_norestart+000000 () [004B1820]krlock+0003A0 (??, ??) [00534138]slock_krlock_acquire+000258 (??, ??, ??) [005349B8]slock+000558 (??, ??) [00009558].simple_lock+000058 () [0487613C].vx_idalloc_off+0002F8 () [048CDF64].vx_iflush_list+0009BC () [048CE1FC].vx_iflush+00009C () [048C8304].vx_workitem_process+00003C () [048D357C].vx_worklist_process+0000F4 () [048D37B4].vx_worklist_thread+000090 () [04855B28].vx_thread_base+000034 () [0035B6DC]threadentry+00005C (??, ??, ??, ??) [kdb_read_mem] no real storage @ FFFFFFFFFFF9600 DESCRIPTION: This is a deadlock resulting from wrong ipl (interrupt level) used in a file system spinlock. The locks are related to delayed allocation feature and so the issue doesn't exists if the feature is unused (the feature is ON by default). The bug doesn't have any effect if the file system is mounted with cluster option. RESOLUTION:The spin lock initialization and use are modified to use the correct ipl. INCIDENTS FROM OLD PATCHES: --------------------------- NONE