README VERSION : 1.1 README CREATION DATE : 2014-04-08 PATCH ID : 6.0.500.000 PATCH NAME : VRTSvxfs 6.0.500.000 BASE PACKAGE NAME : VRTSvxfs BASE PACKAGE VERSION : 6.0.100.000 SUPERSEDED PATCHES : 6.0.300.000 REQUIRED PATCHES : NONE INCOMPATIBLE PATCHES : NONE SUPPORTED PADV : aix61,aix71 (P-PLATFORM , A-ARCHITECTURE , D-DISTRIBUTION , V-VERSION) PATCH CATEGORY : CORE , CORRUPTION , HANG , PANIC , PERFORMANCE PATCH CRITICALITY : CRITICAL HAS KERNEL COMPONENT : YES ID : NONE REBOOT REQUIRED : YES REQUIRE APPLICATION DOWNTIME : Yes PATCH INSTALLATION INSTRUCTIONS: -------------------------------- Please refer to Install guide for install instructions PATCH UNINSTALLATION INSTRUCTIONS: ---------------------------------- Please refer to Install guide for uninstall instructions SPECIAL INSTRUCTIONS: ---------------------- NONE SUMMARY OF FIXED ISSUES: ----------------------------------------- PATCH ID:6.0.500.000 2705336 (2059611) The system panics due to a NULL pointer dereference while flushing bitmaps to the disk. 2933290 (2756779) The code is modified to improve the fix for the read and write performance concerns on Cluster File System (CFS) when it runs applications that rely on the POSIX file-record using the fcntl lock. 2933301 (2908391) It takes a long time to remove checkpoints from the VxFS file system, when there are a large number of files present. 2947029 (2926684) In rare cases, the system may panic while performing a logged write. 2959557 (2834192) You are unable to mount the file system after the full fsck(1M) utility is run. 2978234 (2972183) The fsppadm(1M) enforce command takes a long time on the secondary nodes compared to the primary nodes. 2978236 (2977828) The file system is marked bad after an inode table overflow error. 2982161 (2982157) During internal testing, the Af:vx_trancommit:4A debug asset was hit when the available transaction space is lesser than required. 2983249 (2983248) The vxrepquota(1M) command dumps core. 2999566 (2999560) The 'fsvoladm'(1M) command fails to clear the 'metadataok' flag on a volume. 3027250 (3031901) The 'vxtunefs(1M)' command accepts the garbage value for the 'max_buf_dat_size' tunable. 3056103 (3197901) prevent duplicate symbol in VxFS libvxfspriv.a and vxfspriv.so 3059000 (3046983) Invalid CFS node number in ".__fsppadm_fclextract", causes the DST policy enforcement failure. 3108176 (2667658) The 'fscdsconv endian' conversion operation fails because of a macro overflow. 3131798 (2839871) On a system with DELICACHE enabled, several file system operations may hang. 3131826 (2966277) Systems with high file system activity like read/write/open/lookup may panic the system. 3131905 (3022673) Veritas File System (VxFS) is unresponsive when it changes the memory using DLPAR. 3239226 (3023855) Enable the AnoreuserdA option of the mount(1M) command on CFS. 3248029 (2439261) When the vx_fiostats_tunable value is changed from zero to non-zero, the system panics. 3248042 (3072036) Read operations from secondary node in CFS can sometimes fail with the ENXIO error code. 3248046 (3092114) The information output displayed by the "df -i" command may be inaccurate for cluster mounted file systems. 3248051 (3121933) The pwrite(2) function fails with the EOPNOTSUPP error. 3248054 (3153919) The fsadm (1M) command may hang when the structural file set re-organization is in progress. 3248089 (3003679) When running the fsppadm(1M) command and removing a file with the named stream attributes (nattr) at the same time, the file system does not respond. 3248090 (2963763) When the thin_friendly_alloc() and deliache_enable() functionality is enabled, VxFS may enter a deadlock. 3248094 (3192985) Checkpoints quota usage on Cluster File System (CFS) can be negative. 3248096 (3214816) With the DELICACHE feature enabled, frequent creation and deletion of the inodes of a user may result in corruption of the user quota file. 3248099 (3189562) Oracle daemons get hang with the vx_growfile() kernel function. 3284764 (3042485) During internal stress testing, the f:vx_purge_nattr:1 assert fails. 3296988 (2977035) A debug assert issue was encountered in vx_dircompact() function while running an internal noise test in the Cluster File System (CFS) environment 3299685 (2999493) The file system check validation fails with an error message after a successful full fsck operation during internal testing. 3306410 (2495673) Mismatch of concurrent I/O related data in an inode is observed during communication between the nodes in a cluster. 3310758 (3310755) Internal testing hits a debug assert Avx_rcq_badrecord:9:corruptfsA. 3321730 (3214328) A mismatch is observed between the states for the Global Lock Manager (GLM) grant level and the Global Lock Manager (GLM) data in a Cluster File System (CFS) inode. 3323912 (3259634) A Cluster File System (CFS) with blocks larger than 4GB may become corrupt. 3338024 (3297840) A metadata corruption is found during the file removal process. 3338026 (3331419) System panic because of kernel stack overflow. 3338030 (3335272) The mkfs (make file system) command dumps core when the log size provided is not aligned. 3338063 (3332902) While shutting down, the system running the fsclustadm(1M) command panics. 3338758 (3089314) The Workload Partitions (WPARs) become unresponsive while running the pwdck command. 3338759 (3089211) When adding or removing CPUs, Veritas File System (VxFS) may crash with Data Storage Interrupt (DSI) stack trace. 3338762 (3096834) Intermittent vx_disable messages are displayed in the system log. 3338764 (3131360) The vxfsconvert(1M) command fails with error messages, indicating that unallocated inodes cannot be found. 3338770 (3152313) With Partitioned Directories feature enabled, removing a file may panic the system. 3338776 (3224101) After you enable the optimization for updating the i_size across the cluster nodes lazily, the system panics. 3338779 (3252983) On a high-end system greater than or equal to 48 CPUs, some file system operations may hang. 3338780 (3253210) File system hangs when it reaches the space limitation. 3338785 (3265538) System panics because Veritas File System (VxFS) calls the lock_done kernel service at intpri=A instead of intpri=B. 3338787 (3261462) File system with size greater than 16TB corrupts with vx_mapbad messages in the system log. 3338790 (3233284) FSCK binary hangs while checking Reference Count Table (RCT. 3339230 (3308673) A fragmented file system is disabled when delayed allocations feature is enabled. 3339884 (1949445) System is unresponsive when files were created on large directory. 3340029 (3298041) With the delayed allocation feature enabled on a locally mounted file system, observable performance degradation might be experienced when writing to a file and extending the file size. 3340144 (3237204) The vxfsstat(1M) statistical reporting tool displays inaccurate memory usage information. 3351946 (3194635) The internal stress test on a locally mounted file system exited with an error message. 3351947 (3164418) Internal stress test on locally mounted VxFS filesytem results in data corruption in no space on device scenario while doing spilt on Zero Fill-On-Demand(ZFOD) extent 3351977 (3340286) After a file system is resized, the tunable setting of dalloc_enable gets reset to a default value. 3359278 (3364290) The kernel may panic in Veritas File System (VxFS) when it is internally working on reference count queue (RCQ) record. 3364285 (3364282) The fsck(1M) command fails to correct inode list file 3364289 (3364287) Debug assert may be hit in the vx_real_unshare() function in the cluster environment. 3364302 (3364301) Assert failure because of improper handling of inode lock while truncating a reorg inode. 3364307 (3364306) Stack overflow seen in extent allocation code path. 3364317 (3364312) The fsadm(1M) command is unresponsive while processing the VX_FSADM_REORGLK_MSG message. 3364333 (3312897) System can hang when the Cluster File System (CFS) primary node is disabled. 3364335 (3331109) The full fsck does not repair the corrupted reference count queue (RCQ) record. 3364338 (3331045) Kernel Oops in unlock code of map while referring freed mlink due to a race with iodone routine for delayed writes. 3364349 (3359200) Internal test on Veritas File System (VxFS) fsdedup(1M) feature in cluster file system environment results in a hang. 3370650 (2735912) The performance of tier relocation using the fsppadm(1M) enforce command degrades while migrating a large number of files. 3372909 (3274592) Internal noise test on cluster file system is unresponsive while executing the fsadm(1M) command 3380905 (3291635) Internal testing found debug assert Avx_freeze_block_threads_all:7cA on locally mounted file systems while processing preambles for transactions. 3396539 (3331093) Issue with MountAgent Process for vxfs. While doing repeated switchover on HP-UX, MountAgent got stuck. 3402484 (3394803) A panic is observed in VxFS routine vx_upgrade7() function while running the vxupgrade command(1M). 3405172 (3436699) An assert failure occurs because of a race condition between clone mount thread and directory removal thread while pushing data on clone. 3409619 (3395692) Double deletion of a pointer in the vx_do_getacl() function causes abend_trap(). 3409691 (3417076) The vxtunefs(1M) command fails to set tunables. 3417311 (3417321) The vxtunefs(1M) tunable man page gives an incorrect 3418754 (3418997) The vxtunefs(1M) command accepts the garbage value for the 'write_throttle' tunable. 3430687 (3444775) Internal noise testing on cluster file system results in a kernel panic in function vx_fsadm_query() with an error message. 3436393 (3462694) The fsdedupadm(1M) command fails with error code 9 when it tries to mount checkpoints on a cluster. 3448567 (3448627) The vxtunefs(1M) command accepts the garbage value for the discovered_direct_iosz tunable. 3448758 (3449150) The vxtunefs(1M) command accepts garbage values for certain tunables. 3448818 (3449152) Failed to set 'thin_friendly_alloc' tunable in case of cluster file system (CFS). 3451355 (3463717) Information regarding Cluster File System (CFS) that does not support the 'thin_friendly_alloc' tunable is not updated in the vxtunefs(1M) command man page. PATCH ID:6.0.300.000 2928921 (2843635) Internal testing is having some failures. 2933290 (2756779) The code is modified to improve the fix for the read and write performance concerns on Cluster File System (CFS) when it runs applications that rely on the POSIX file-record using the fcntl lock. 2933291 (2806466) A reclaim operation on a file system that is mounted on a Logical Volume Manager (LVM) may panic the system. 2933292 (2895743) Accessing named attributes for some files stored in CFS seems to be slow. 2933294 (2750860) Performance of the write operation with small request size may degrade on a large file system. 2933296 (2923105) Removal of the VxFS module from the kernel takes a longer time. 2933309 (2858683) Reserve extent attributes changed after vxrestore, for files greater than 8192bytes. 2933313 (2841059) full fsck fails to clear the corruption in attribute inode 15 2933315 (2887423) severe lock contention in vx_sched. HF1e to address 2933319 (2848948) VxFS buff cache consumption increased significantly after running over 248 days 2933321 (2878164) VxFS consuming too much pinned heap. 2933322 (2857568) Performance issues seen during back-up operations reading larges files sequentially. 2933324 (2874054) The vxconvert(1M) command fails to convert Logical Volume Manager (LVM) disk groups to VxVM disk groups with V-3-21784. 2933751 (2916691) Customer experiencing hangs when doing dedups 2933822 (2624262) Filestore:Dedup:fsdedup.bin hit oops at vx_bc_do_brelse 2937367 (2923867) Internal test hits an assert "f:xted_set_msg_pri1:1". 2975645 (2978326) On a Cluster mounted filesystem changing value of the dalloc_enable/dalloc_limit tunables fail. 2976664 (2906018) The vx_iread errors are displayed after successful log replay and mount of the file system. 2978227 (2857751) The internal testing hits the assert "f:vx_cbdnlc_enter:1a". 2983739 (2857731) Internal testing hits an assert "f:vx_mapdeinit:1" 2984589 (2977697) A core dump is generated while you are removing the clone. 2987373 (2881211) File ACLs not preserved in checkpoints properly if file has hardlink. PATCH ID:6.0.100.200 2912412 (2857629) File system corruption can occur requiring a full fsck of the system. 2912435 (2885592) vxdump to the vxcompress file system is aborted 2923805 (2590918) Delay in freeing unshared extents upon primary switch over. SUMMARY OF KNOWN ISSUES: ----------------------------------------- 3338052(3064877) VxFS performance issue. Reading the block with big blocksize into memory is very slow on AIX. 3448593(3449606) Setting zero value for tunables 'max_buf_data_size' and 'discovered_direct_iosz', changes their values to default. KNOWN ISSUES : -------------- * INCIDENT NO::3338052 TRACKING ID ::3064877 SYMPTOM:: If an application performs sequential unaligned large reads, it may encounter performance issue on AIX. WORKAROUND:: None * INCIDENT NO::3448593 TRACKING ID ::3449606 SYMPTOM:: Setting zero value for tunables 'max_buf_data_size' and 'discovered_direct_iosz', changes their values to default. WORKAROUND:: Run the vxtunefs command again to set the tunable value to the previous one. # vxtunefs -o discovered_direct_iosz=64k FIXED INCIDENTS: PATCH ID:6.0.500.000 * INCIDENT NO:2705336 TRACKING ID:2059611 SYMPTOM: The system panics due to a NULL pointer dereference while flushing the bitmaps to the disk and the following stack trace is displayed: a| a| vx_unlockmap+0x10c vx_tflush_map+0x51c vx_fsq_flush+0x504 vx_fsflush_fsq+0x190 vx_workitem_process+0x1c vx_worklist_process+0x2b0 vx_worklist_thread+0x78 DESCRIPTION: The vx_unlockmap() function unlocks a map structure of the file system. If the map is being used, the hold count is incremented. The vx_unlockmap() function attempts to check whether this is an empty mlink doubly linked list. The asynchronous vx_mapiodone routine can change the link at random even though the hold count is zero. RESOLUTION: The code is modified to change the evaluation rule inside the vx_unlockmap() function, so that further evaluation can be skipped over when map hold count is zero. * INCIDENT NO:2933290 TRACKING ID:2756779 SYMPTOM: Write and read performance concerns on Cluster File System (CFS) when running applications that rely on POSIX file-record locking (fcntl). DESCRIPTION: The usage of fcntl on CFS leads to high messaging traffic across nodes thereby reducing the performance of readers and writers. RESOLUTION: The code is modified to cache the ranges that are being file-record locked on the node. This is tried whenever possible to avoid broadcasting of messages across the nodes in the cluster. * INCIDENT NO:2933301 TRACKING ID:2908391 SYMPTOM: Checkpoint removal takes too long if Veritas File System (VxFS) has a large number of files. The cfsumount(1M) command could hang if removal of multiple checkpoints is in progress for such a file system. DESCRIPTION: When removing a checkpoint, VxFS traverses every inode to determine if pull/push is needed for upstream/downstream checkpoint in its chain. This is time consuming if the file system has large number of files. This results in the slow checkpoint removal. The command "cfsumount -c fsname" forces the umounts operation on a VxFS file system if there is any asynchronous checkpoint removal job in progress by checking if the value of vxfs stat "vxi_clonerm_jobs" is larger than zero. However, the stat does not count in the jobs in the checkpoint removal working queue and the jobs are entered into the working queue. The "force umount" operation does not happen even if there are pending checkpoint removal jobs because of the incorrect value of "vxi_clonerm_jobs" (zero). RESOLUTION: For slow checkpoint removal issue: Code is modified to create multiple threads to work on different Inode Allocation Units (IAUs) in parallel and to reduce the inode push work by sorting the checkpoint removal jobs by the creation time in ascending order and enlarged the checkpoint push size. For the cfsumount(1M) command hang issue: Code is modified to add the counts of jobs in the working queue in the "vxi_clonerm_jobs" stat. * INCIDENT NO:2947029 TRACKING ID:2926684 SYMPTOM: On systems with heavy transactions workload like creation, deletion of files and so on, the system may panic with the following stack trace: a|.. vxfs:vx_traninit+0x10 vxfs:vx_dircreate_tran+0x420 vxfs:vx_pd_create+0x980 vxfs:vx_create1_pd+0x1d0 vxfs:vx_do_create+0x80 vxfs:vx_create1+0xd4 vxfs:vx_create+0x158 a|.. DESCRIPTION: In case of a delayed log, a transaction commit can complete before completing the log write. The memory for transaction is freed before logging the transaction and corrupts the transaction freelist causing the system to panic. RESOLUTION: The code is modified such that the transaction is not freed untill the log is written. * INCIDENT NO:2959557 TRACKING ID:2834192 SYMPTOM: The mount operation fails after full fsck(1M) utility is run and displays the following error message on the console: 'UX:vxfs mount.vxfs: ERROR: V-3-26881 : Cannot be mounted until it has been cleaned by fsck. Please run "fsck -t vxfs -y MNTPNT" before mounting'. DESCRIPTION: When a CFS is mounted, VxFS validates the per-node-cut entries (PNCUT) which are in-core against their counterparts on the disk. This validation failure makes the mount unsuccessful for the full fsck. Full fsck is in the fourth pass when it checks the free inode/extent maps and merges the dirty PNCUT files in- core, and validates them with the corresponding on-disk values. However, if any PNCUT entry is corrupted, then the fsck(1M) utility simply ignores it. This results in the mount failure. RESOLUTION: The code is modified to enhance the fsck(1M) utility to handle any delinquent PNCUT entries and rebuild them as required. * INCIDENT NO:2978234 TRACKING ID:2972183 SYMPTOM: "fsppadm enforce" takes longer than usual time force update the secondary nodes than it takes to force update the primary nodes. DESCRIPTION: The ilist is force updated on secondary node. As a result the performance on the secondary becomes low. RESOLUTION: Force update the ilist file on Secondary nodes only on error condition. * INCIDENT NO:2978236 TRACKING ID:2977828 SYMPTOM: The file system is marked bad after an inode table overflow error with the following error messages: kernel: vxfs: msgcnt 7911 mesg 014: V-2-14: vx_iget - inode table overflow kernel: vxfs: msgcnt 7912 mesg 063: V-2-63: vx_fset_markbad - file system fileset (index ) marked bad kernel: V-2-96: vx_setfsflags - file system fullfsck flag set - vx_fset_markbad DESCRIPTION: To remove a checkpoint, the system truncates every file that is consumed by the checkpoint. When the number of the files are too large, the inode cache may become full, leading to an ENFILE error (inode table full). And the ENFILE error inappropriately sets the full fsck flag on the file system. RESOLUTION: The code is modified to convert the ENFILE error to the ENOSPC error to fix the issue. * INCIDENT NO:2982161 TRACKING ID:2982157 SYMPTOM: During internal testing, the Af:vx_trancommit:4A debug asset was hit when the available transaction space is lesser than the required space. DESCRIPTION: The Af:vx_trancommit:4A assert is hit when available transaction space is lesser than required. During the file truncate operations, when VxFS calculates transaction space, it doesnAt consider the transaction space required in case the file has shared extents. As a result, the Af:vx_trancommit:4A debug assert is hit. RESOLUTION: The code is modified to take into account the extra transaction buffer space required when the file being truncated has shared extents. * INCIDENT NO:2983249 TRACKING ID:2983248 SYMPTOM: The vxrepquota(1M) command dumps core on a systems with more than 50 file systems mounted with the quota option. The stack trace is as follows: /opt/VRTS/bin/vxrepquota strlen+0x50() sprintf+0x40() .. .. main+0x6d4() DESCRIPTION: In vxrepquota(1M) and vxquotaon(1M), VxFS allocates an array of 50 pointers for the vfstab entries. Thus it can hold a maximum of 50 entries. If there are more than 50 VxFS file system entries in the /etc/vfstab file, it leads to system overflow. RESOLUTION: The code is modified to extend the size of the array listbuf to 1024, so that the overflow occurs only if there are more than 1024 VxFS file system entries in the /etc/vfstab file. * INCIDENT NO:2999566 TRACKING ID:2999560 SYMPTOM: While trying to clear the 'metadataok' flag on a volume of the volume set, the 'fsvoladm'(1M) command gives error. DESCRIPTION: The 'fsvoladm'(1M) command sets and clears 'dataonly' and 'metadataok'flags on a volume in a vset on which VxFS is mounted. The 'fsvoladm'(1M) command fails while clearing A the AmetadataokA flag and reports, an EINVAL (invalid argument) error for certain volumes. This failure occurs because while clearing the flag, VxFS reinitialize the reorg structure for some volumes. During re-initialization, VxFS frees the existing FS structures. However, it still refers to the stale device structure resulting in an EINVAL error. RESOLUTION: The code is modified to let the in-core device structure point to the updated and correct data. * INCIDENT NO:3027250 TRACKING ID:3031901 SYMPTOM: The 'vxtunefs(1M)' command accepts the garbage value for the 'max_buf_dat_size' tunable. DESCRIPTION: When the garbage value for the 'max_buf_dat_size' tunable using 'vxtunefs(1M)' is specified, the tunable accepts the value and gives the successful update message; but the value actually doesn't get reflected in the system. And, this error is not identified from parsing the command line value of THE 'max_buf_dat_size' tunable; hence the garbage value for this tunable is also accepted. RESOLUTION: The code is modified to handle the error returned from parsing the command line value of the 'max_buf_data_size' tunable. * INCIDENT NO:3056103 TRACKING ID:3197901 SYMPTOM: fset_get fails for the mention configuration DESCRIPTION: duplicate symbol fs_bmap in VxFS libvxfspriv.a and vxfspriv.so RESOLUTION: duplicate symbol fs_bmap in VxFS libvxfspriv.a and vxfspriv.so has being fixed by renaming to fs_bmap_priv in the libvxfspriv.a * INCIDENT NO:3059000 TRACKING ID:3046983 SYMPTOM: There is an invalid CFS node number () in ".__fsppadm_fclextract". This causes the Dynamic Storage Tiering (DST) policy enforcement to fail. DESCRIPTION: DST policy enforcement sometimes depends on the extraction of the File Change Log (FCL). When the FCL change log is processed, it reads the FCL records from the change log into the buffer. If it finds that the buffer is not big enough to hold the records, it will do some rollback and pass out the needed buffer size. However, the rollback is not complete, this results in the problem. RESOLUTION: The code is modified to add the codes to the rollback content of "fh_bp1- >fb_addr" and "fh_bp2->fb_addr". * INCIDENT NO:3108176 TRACKING ID:2667658 SYMPTOM: Attempt to perform an fscdsconv-endian conversion from the SPARC little-endian byte order to the x86 big-endian byte order fails because of a macro overflow. DESCRIPTION: Using the fscdsconv(1M) command to perform endian conversion from the SPARC little-endian (any SPARC architecture machine) byte order to the x86 big-endian (any x86 architecture machine) byte order fails. The write operation for the recovery file results in the control data offset (a hard coded macro to 500MB) overflow. RESOLUTION: The code is modified to take an estimate of the control-data offset explicitly and dynamically while creating and writing the recovery file. * INCIDENT NO:3131798 TRACKING ID:2839871 SYMPTOM: On a system with DELICACHE enabled, several file system operations may hang with the following stack trace: vx_delicache_inactive vx_delicache_inactive_wp vx_workitem_process vx_worklist_process vx_worklist_thread vx_kthread_init DESCRIPTION: The DELICACHE lock is used to synchronize the access to the DELICACHE list and it is held only while updating this list. However, in some cases it is held longer and is released only after the issued I/O is completed, causing other threads to hang. RESOLUTION: The code is modified to release the spinlock before issuing a blocking I/O request. * INCIDENT NO:3131826 TRACKING ID:2966277 SYMPTOM: Systems with high file-system activity like read/write/open/lookup may panic with the following stack trace due to a rare race condition: spinlock+0x21 ( ) -> vx_rwsleep_unlock() vx_ipunlock+0x40() vx_inactive_remove+0x530() vx_inactive_tran+0x450() vx_local_inactive_list+0x30() vx_inactive_list+0x420() -> vx_workitem_process() -> vx_worklist_process() vx_worklist_thread+0x2f0() kthread_daemon_startup+0x90() DESCRIPTION: ILOCK is released before doing a IPUNLOCK that causes a race condition. This results in a panicwhen an inode that has been set free is accessed. RESOLUTION: The code is modified so that the ILOCK is used to protect the inodes' memory from being set free, while the memory is being accessed. * INCIDENT NO:3131905 TRACKING ID:3022673 SYMPTOM: Veritas File System (VxFS) is unresponsive when it changes the memory using Dynamic Logical Partition (DLPAR) with the following stack: vx_delay vx_get_ownership_try vx_ihlock vx_cfs_iread vx_iget vx_ap_read_clone_default vx_load_fsap vx_refresh_fsap@AF19_10 vx_recv_cwfa_loadfs vx_recv_cwfa vx_msg_recvreq vx_msg_process_thread vx_thread_base DESCRIPTION: DLPAR operations require VxFS to re-tune a variety of data structure according to the new CPU/memory size. Prior to re-initialization, if buffer reinit is needed, VxFS moves it's all in-use inodes to a temporary list, and then move back them after the DLPAR is complete. However, we miss the buffer reinit check and do move-out no matter whether buffer reinit will be performed. As a result, we miss those inodes in some circumstances after DLPAR operation. Accessing them will be pointed to wrong inodes and loop to obtain the correct inode information infinitely. RESOLUTION: The code is modified to move the inodes to the temporary inode list only after the buffers are re-initialized. * INCIDENT NO:3239226 TRACKING ID:3023855 SYMPTOM: An enhancement is made to enable the AnoreuserdA option of the mount(1M) command on CFS. DESCRIPTION: During a back-up the files are read sequentially and the entire file is brought into memory (into the page cache).This displaces the other file data held in the page cache. If the files are removed after back-up, then all the file pages are set free during the removal. The Virtual Machine Manager (VMM) chunking mechanism needs to 'chunk' down the page list during the removal. If the files are large in size, and if the entire file is held in memory, then the large chunking operation can impose small I/O delays on the system. As a result, we have identified a need to avoid performing a large amount of VMM chunking when files are removed immediately after their back-up. A requirement for an option to remove pages during the back-up, rather than during the file removal is identified. RESOLUTION: An enhancement is made to introduce a new mount option called "noreuserd". This mount option initially works for the locally mounted file systems (not for CFS). When mounted with this option, VxFS frees the pages behind the "read pointer" when sequential-buffered_read I/O is performed * INCIDENT NO:3248029 TRACKING ID:2439261 SYMPTOM: When the vx_fiostats_tunable is changed from zero to non-zero, the system panics with the following stack trace: vx_fiostats_do_update vx_fiostats_update vx_read1 vx_rdwr vno_rw rwuio pread DESCRIPTION: When vx_fiostats_tunable is changed from zero to non-zero, all the incore-inode fiostats attributes are set to NULL. When these attributes are accessed, the system panics due to the NULL pointer dereference. RESOLUTION: The code has been modified to check the file I/O stat attributes are present before dereferencing the pointers. * INCIDENT NO:3248042 TRACKING ID:3072036 SYMPTOM: Reads from secondary node in CFS can sometimes fail with ENXIO (No such device or address). DESCRIPTION: The incore attribute ilist on secondary node is out of sync with that of the primary. RESOLUTION: The code is modified such that incore attribute ilist on secondary node is force updated with data from primary node. * INCIDENT NO:3248046 TRACKING ID:3092114 SYMPTOM: The information output by the "df -i" command can often be inaccurate for cluster mounted file systems. DESCRIPTION: In Cluster File System 5.0 release a concept of delegating metadata to nodes in the cluster is introduced. This delegation of metadata allows CFS secondary nodes to update metadata without having to ask the CFS primary to do it. This provides greater node scalability. However, the "df -i" information is still collected by the CFS primary regardless of which node (primary or secondary) the "df -i" command is executed on. For inodes the granularity of each delegation is an Inode Allocation Unit [IAU], thus IAUs can be delegated to nodes in the cluster. When using a VxFS 1Kb file system block size each IAU will represent 8192 inodes. When using a VxFS 2Kb file system block size each IAU will represent 16384 inodes. When using a VxFS 4Kb file system block size each IAU will represent 32768 inodes. When using a VxFS 8Kb file system block size each IAU will represent 65536 inodes. Each IAU contains a bitmap that determines whether each inode it represents is either allocated or free, the IAU also contains a summary count of the number of inodes that are currently free in the IAU. The ""df -i" information can be considered as a simple sum of all the IAU summary counts. Using a 1Kb block size IAU-0 will represent inodes numbers 0 - 8191 Using a 1Kb block size IAU-1 will represent inodes numbers 8192 - 16383 Using a 1Kb block size IAU-2 will represent inodes numbers 16384 - 32768 etc. The inaccurate "df -i" count occurs because the CFS primary has no visibility of the current IAU summary information for IAU that are delegated to Secondary nodes. Therefore the number of allocated inodes within an IAU that is currently delegated to a CFS Secondary node is not known to the CFS Primary. As a result, the "df -i" count information for the currently delegated IAUs is collected from the Primary's copy of the IAU summaries. Since the Primary's copy of the IAU is stale, therefore the "df -i" count is only accurate when no IAUs are currently delegated to CFS secondary nodes. In other words - the IAUs currently delegated to CFS secondary nodes will cause the "df -i" count to be inaccurate. Once an IAU is delegated to a node it can "timeout" after a 3 minutes of inactivity. However, not all IAU delegations will timeout. One IAU will always remain delegated to each node for performance reasons. Also an IAU whose inodes are all allocated (so no free inodes remain in the IAU) it would not timeout either. The issue can be best summarized as: The more IAUs that remain delegated to CFS secondary nodes, the greater the inaccuracy of the "df -i" count. RESOLUTION: Allow the delegations for IAU's whose inodes are all allocated (so no free inodes in the IAU) to "timeout" after 3 minutes of inactivity. * INCIDENT NO:3248051 TRACKING ID:3121933 SYMPTOM: The pwrite() function fails with EOPNOTSUPP when the write range is in two indirect extents. DESCRIPTION: When the range of pwrite() falls in two indirect extents (one ZFOD extent belonging to DB2 pre-allocated files created with setext( , VX_GROWFILE, ) ioctl and another DATA extent belonging to adjacent INDIR) write fails with EOPNOTSUPP. The reason is that VxFS is trying to coalesce extents which belong to different indirect address extents as part of this transaction - such a meta- data change consumes more transaction resources which VxFS transaction engine is unable to support in the current implementation. RESOLUTION: Code is modified to retry the transaction without coalescing the extents, as latter is an optimisation and should not fail write. * INCIDENT NO:3248054 TRACKING ID:3153919 SYMPTOM: The fsadm(1M) command may hang when the structural file set re-organization is in progress. The following stack trace is observed: vx_event_wait vx_icache_process vx_switch_ilocks_list vx_cfs_icache_process vx_switch_ilocks vx_fs_reinit vx_reorg_dostruct vx_extmap_reorg vx_struct_reorg vx_aioctl_full vx_aioctl_common vx_aioctl vx_ioctl vx_compat_ioctl compat_sys_ioctl DESCRIPTION: During the structural file set re-organization, due to some race condition, the VX_CFS_IOWN_TRANSIT flag is set on the inode. At the final stage of the structural file set re-organization, all the inodes are re-initialized. Since, the VX_CFS_IOWN_TRANSIT flag is set improperly, the re-initialization fails to proceed. This causes the hang. RESOLUTION: The code is modified such that VX_CFS_IOWN_TRANSIT flag is cleared. * INCIDENT NO:3248089 TRACKING ID:3003679 SYMPTOM: The file system hangs when doing fsppadm and removing a file with named stream attributes (nattr) at the same time. The following two typical threads are involved: T1: COMMAND: "fsppadm" schedule at vxg_svar_sleep_unlock vxg_grant_sleep vxg_cmn_lock vxg_api_lock vx_glm_lock vx_ihlock vx_cfs_iread vx_iget vx_traverse_tree vx_dir_lookup vx_rev_namelookup vx_aioctl_common vx_ioctl vx_compat_ioctl compat_sys_ioctl T2: COMMAND: "vx_worklist_thr" schedule vxg_svar_sleep_unlock vxg_grant_sleep vxg_cmn_lock vxg_api_lock vx_glm_lock vx_genglm_lock vx_dirlock vx_do_remove vx_purge_nattr vx_nattr_dirremove vx_inactive_tran vx_cfs_inactive_list vx_inactive_list vx_workitem_process vx_worklist_process vx_worklist_thread vx_kthread_init kernel_thread DESCRIPTION: The file system hangs due to the deadlock between the threads. T1 initiated by fsppadm calls vx_traverse_tree to obtain the path name for a given inode number. T2 removes the inode as well as its affiliated nattr inodes. The reverse name lookup (T1) holds the global dirlock in vx_dir_lookup during the lookup process. It traverses the entire path from bottom to top to resolve the inode number inversely in vx_traverse_tree. During the lookup, VxFS needs to hold the hlock of each inode to read them, and drop it after reading. The file removal (T2) is processed via vx_inactive_tran which will take the "hlock" of the inode being removed. After that, it will remove all its named attribute inodes invx_do_remove, where sometimes the global dirlock is needed. Eventually, each thread waits for the lock, which is held by the other thread and this result in the deadlock. RESOLUTION: The code is modified so that the dirlock is not acquired during reserve name lookup. * INCIDENT NO:3248090 TRACKING ID:2963763 SYMPTOM: When thin_friendly_alloc and deliache_enable parameters are enabled, Veritas File System (VxFS) may hit the deadlock. The thread involved in the deadlock can have the following stack trace: vx_rwsleep_lock() vx_tflush_inode() vx_fsq_flush() vx_tranflush() vx_traninit() vx_remove_tran() vx_pd_remove() vx_remove1_pd() vx_do_remove() vx_remove1() vx_remove_vp() vx_remove() vfs_unlink() do_unlinkat The threads waiting in vx_traninit() for transaction space, displays following stack trace: vx_delay2() vx_traninit() vx_idelxwri_done() vx_idelxwri_flush() vx_common_inactive_tran() vx_inactive_tran() vx_local_inactive_list() vx_inactive_list+0x530() vx_worklist_process() vx_worklist_thread() DESCRIPTION: In the extent allocation code paths, VxFS is setting the IEXTALLOC flag on the inode, without taking the ILOCK, with overlapping transactions picking up this same inode off the delicache list makes the transaction done code paths to miss the IUNLOCK call. RESOLUTION: The code is modified to change the corresponding code paths to set the IEXTALLOC flag under proper protection. * INCIDENT NO:3248094 TRACKING ID:3192985 SYMPTOM: Checkpoints quota usage on CFS can be negative. An example is as follows: Filesystem hardlimit softlimit usage action_flag /sofs1 51200 51200 18446744073709490176 << negative DESCRIPTION: In CFS, to manage the intent logs, and the other extra objects required for CFS, a holding object referred to as a per-node-object-location table (PNOLT) is created. In CFS, the quota usage is calculated by reading the per node cut (current usage table) files (member of PNOLT) and summing up the quota usage for each clone clain. However, when the quotaoff and quotaon operations are fired on a CFS checkpoint, the usage shows "0" after these two operations are executed. This happens because the quota usage calculation is skipped. Subsequently, if a delete operation is performed, the usage becomes negative since the blocks allocated for the deleted file are subtracted from zero. RESOLUTION: The code is modified such that when the quotaon operation is performed, the quota usage calculation is not skipped. * INCIDENT NO:3248096 TRACKING ID:3214816 SYMPTOM: When you create and delete the inodes of a user frequently with the DELICACHE feature enabled, the user quota file becomes corrupt. DESCRIPTION: The inode DELICACHE feature causes this issue. This feature optimizes the updates on the inode map during the file creation and deletion operations. It is enabled by default. You can disable this feature with the vxtunefs(1M) command. When DELICACHE is enabled and the quota is set for Veritas File System (VxFS), VxFS updates the quota for the inodes before the inodes are on the DELICACHE list and after they are on the inactive list during the removal process. As a result, VxFS decrements the current number of user files twice. This causes the quota file corruption. RESOLUTION: The code is modified to identify the inodes moved to the inactive list from the DELICACHE list. This flag prevents the quota being decremented again during the removal process. * INCIDENT NO:3248099 TRACKING ID:3189562 SYMPTOM: Oracle daemons get hang with the vx_growfile() kernel function. You may see similar stack trace as follows: vx_growfile+0004D4 () vx_doreserve+000118 () vx_tran_extset+0005DC () vx_extset_msg+0006E8 () vx_cfs_extset+000040 () vx_extset+0002D4 () vx_setext+000190 () vx_uioctl+0004AC () vx_ioctl+0000D0 () vx_ioctl_skey+00004C () vnop_ioctl+000050 (??, ??, ??, ??, ??, ??) kernel_add_gate_cstack+000030 () vx_vop_ioctl+00001C () vx_odm_resize@AF15_6+00015C () vx_odm_resize+000030 () odm_vx_resize+000040 () odm_resize+0000E8 () vxodmioctl+00018C () hkey_legacy_gate+00004C () vnop_ioctl+000050 (??, ??, ??, ??, ??, ??) vno_ioctl+000178 (??, ??, ??, ??, ??) DESCRIPTION: The vx_growfile() kernel function may run into a loop on a highly fragmented file system, which causes multiple processes to hang. The vx_growfile() routine is invoked through the setext(1) command or its Application Programming Interface (API). When the vx_growfile() function requires more extents than the typed extent buffer can spare, an VX_EBMAPLOCK error may occur. To handle the error, VxFS cancels the transaction and repeats the same operation again, which creates the loop. RESOLUTION: The code is modified to make VxFS commit the available extents to proceed the growfile transaction, and repeat enough times until the transaction is completed. * INCIDENT NO:3284764 TRACKING ID:3042485 SYMPTOM: During internal Stress testing, the f:vx_purge_nattr:1 assert fails. DESCRIPTION: In case of corruption, the file-system check utility is run, and the inodes to be checked or fixed are picked up serially. However, in some cases the order in which these are processed changes, which cause inconsistent meta-data resulting in assert failure. RESOLUTION: The code is modified to handle named attribute inodes in an earlier pass during full fsck operation. * INCIDENT NO:3296988 TRACKING ID:2977035 SYMPTOM: While running an internal noise test in a Cluster File System (CFS) environment, a debug assert issue was observed in vx_dircompact()function. DESCRIPTION: Compacting directory blocks are avoided if the inode has AextopA (extended operation) flags, such as deferred inode removal and pass through truncation set.. The issue is caused when the inode has extended pass through truncation and is considered for compacting. RESOLUTION: The code is modified to avoid compacting the directory blocks of the inode if it has [0]an extended operation of pass through truncation set.[0] * INCIDENT NO:3299685 TRACKING ID:2999493 SYMPTOM: During internal testing, the file system check validation fails after a successful full fsck operation and displays the following error message: run_fsck : First full fsck pass failed, exiting DESCRIPTION: Even after a successful full fsck completion, the fsck validation fails due to incorrect entries in a structural file (IFRCT) which maintains reference of count of shared extents. While processing information for indirect extents, the modified data does not get flushed to the disk because the buffer is not marked dirty after its contents are modified. RESOLUTION: The code is modified to mark the buffer dirty when its contents are modified. * INCIDENT NO:3306410 TRACKING ID:2495673 SYMPTOM: During communication between the nodes in a cluster, the incore inode gets marked AbadA and an internal test assertion fails. DESCRIPTION: In a Cluster File System (CFS) environment, when two nodes communicate for grant on inode, some data is also piggybacked to the initiating node. If there is any discrepancy on the data that is piggybacked between these two nodes within the cluster, the incore inode gets marked AbadA. During communication, the file system gets disabled causing stale concurrent I/O data transfer to the initiating node resulting in a mismatch. RESOLUTION: The code is modified such that if the file system gets disabled, it invalidates its concurrent I/O count state from other nodes and does not delegate false information when asked for concurrent I/O count from other nodes. * INCIDENT NO:3310758 TRACKING ID:3310755 SYMPTOM: When the system processes an indirect extent, if it finds the first record as Zero Fill-On-Demand (ZFOD) extent (or first n records are ZFOD records), then it hits the assert. DESCRIPTION: In case of indirect extents reference count mechanism (shared block count) regarding files having the shared ZFOD extents are not behaving correctly. RESOLUTION: The code for the reference count queue (RCQ) handling for the shared indirect ZFOD extents is modified, and the fsck(1M) issues with snapshot of file[0] when there are ZFOD extents has been fixed. * INCIDENT NO:3321730 TRACKING ID:3214328 SYMPTOM: A mismatch is observed between the states for the Global Lock Manager (GLM) grant level and the Global Lock Manager (GLM) data in a Cluster File System (CFS) inode. DESCRIPTION: When a file system is disabled during some error situation, and if any thread starts its execution before disabling the file system, then the execution is completed in spite of file system being disabled in between. The Global Lock Manager (GLM) state of an inode changes without updating other flags like inode->i_cflags, which causes a mismatch between the states. RESOLUTION: The code is modified to skip updating the Global Lock Manager (GLM) state when specific flag is set in inode->i_cflags and also when the file system is disabled. * INCIDENT NO:3323912 TRACKING ID:3259634 SYMPTOM: In CFS, each node with mounted file system cluster has its own intent log in the file system. A CFS with more than 4,294,967,296 file system blocks can zero out an incorrect location resulting from an incorrect typecasting. For example, that kind of CFS can incorrectly zero out 65536 file system blocks at the block offset of 1,537,474,560 (file system blocks) with a 8-Kb file system block size and an intent log with the size of 65536 file system blocks. This issue can only occur if an intent log is located above an offset of 4,294,967,296 file system blocks. This situation can occur when you add a new node to the cluster and mount an additional CFS secondary for the first time, which needs to create and zero a new intent log. This situation can also happen if you resize a file system or intent log and clear an intent log. The problem occurs only with the following file system size and the FS block size combinations: 1kb block size and FS size > 4TB 2kb block size and FS size > 8TB 4kb block size and FS size > 16TB 8kb block size and FS size > 32TB For example, the message log can contain the following messages: The full fsck flag is set on a file system with the following type of messages: 2013 Apr 17 14:52:22 sfsys kernel: vxfs: msgcnt 5 mesg 096: V-2-96: vx_setfsflags - /dev/vx/dsk/sfsdg/vol1 file system fullfsck flag set - vx_ierror 2013 Apr 17 14:52:22 sfsys kernel: vxfs: msgcnt 6 mesg 017: V-2-17: vx_attr_iget - /dev/vx/dsk/sfsdg/vol1 file system inode 13675215 marked bad incore 2013 Jul 17 07:41:22 sfsys kernel: vxfs: msgcnt 47 mesg 096: V-2-96: vx_setfsflags - /dev/vx/dsk/sfsdg/vol1 file system fullfsck flag set - vx_ierror 2013 Jul 17 07:41:22 sfsys kernel: vxfs: msgcnt 48 mesg 017: V-2-17: vx_dirbread - /dev/vx/dsk/sfsdg/vol1 file system inode 55010476 marked bad incore DESCRIPTION: In CFS, each node with mounted file system cluster has its own intent log in the file system. When an additional node mounts the file system as a CFS Secondary, the CFS creates an intent log. Note that intent logs are never removed, they are reused. When you clear an intent log, Veritas File System (VxFS) passes an incorrect block number to the log clearing routine, which zeros out an incorrect location. The incorrect location might point to the file data or file system metadata. Or, the incorrect location might be part of the file system's available free space. This is silent corruption. If the file system metadata corrupts, VxFS can detect the corruption when it subsequently accesses the corrupt metadata and marks the file system for full fsck. RESOLUTION: The code is modified so that VxFS can pass the correct block number to the log clearing routine. * INCIDENT NO:3338024 TRACKING ID:3297840 SYMPTOM: A metadata corruption is found during the file removal process with the inode block count getting negative. DESCRIPTION: When the user removes or truncates a file having the shared indirect blocks, there can be an instance where the block count can be updated to reflect the removal of the shared indirect blocks when the blocks are not removed from the file. The next iteration of the loop updates the block count again while removing these blocks. This will eventually lead to the block count being a negative value after all the blocks are removed from the file. The removal code expects the block count to be zero before updating the rest of the metadata. RESOLUTION: The code is modified to update the block count and other tracking metadata in the same transaction as the blocks are removed from the file. * INCIDENT NO:3338026 TRACKING ID:3331419 SYMPTOM: Machine panics with the following stack trace. #0 [ffff883ff8fdc110] machine_kexec at ffffffff81035c0b #1 [ffff883ff8fdc170] crash_kexec at ffffffff810c0dd2 #2 [ffff883ff8fdc240] oops_end at ffffffff81511680 #3 [ffff883ff8fdc270] no_context at ffffffff81046bfb #4 [ffff883ff8fdc2c0] __bad_area_nosemaphore at ffffffff81046e85 #5 [ffff883ff8fdc310] bad_area at ffffffff81046fae #6 [ffff883ff8fdc340] __do_page_fault at ffffffff81047760 #7 [ffff883ff8fdc460] do_page_fault at ffffffff815135ce #8 [ffff883ff8fdc490] page_fault at ffffffff81510985 [exception RIP: print_context_stack+173] RIP: ffffffff8100f4dd RSP: ffff883ff8fdc548 RFLAGS: 00010006 RAX: 00000010ffffffff RBX: ffff883ff8fdc6d0 RCX: 0000000000002755 RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000046 RBP: ffff883ff8fdc5a8 R8: 000000000002072c R9: 00000000fffffffb R10: 0000000000000001 R11: 000000000000000c R12: ffff883ff8fdc648 R13: ffff883ff8fdc000 R14: ffffffff81600460 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffff883ff8fdc540] print_context_stack at ffffffff8100f4d1 #10 [ffff883ff8fdc5b0] dump_trace at ffffffff8100e4a0 #11 [ffff883ff8fdc650] show_trace_log_lvl at ffffffff8100f245 #12 [ffff883ff8fdc680] show_trace at ffffffff8100f275 #13 [ffff883ff8fdc690] dump_stack at ffffffff8150d3ca #14 [ffff883ff8fdc6d0] warn_slowpath_common at ffffffff8106e2e7 #15 [ffff883ff8fdc710] warn_slowpath_null at ffffffff8106e33a #16 [ffff883ff8fdc720] hrtick_start_fair at ffffffff810575eb #17 [ffff883ff8fdc750] pick_next_task_fair at ffffffff81064a00 #18 [ffff883ff8fdc7a0] schedule at ffffffff8150d908 #19 [ffff883ff8fdc860] __cond_resched at ffffffff81064d6a #20 [ffff883ff8fdc880] _cond_resched at ffffffff8150e550 #21 [ffff883ff8fdc890] vx_nalloc_getpage_lnx at ffffffffa041afd5 [vxfs] #22 [ffff883ff8fdca80] vx_nalloc_getpage at ffffffffa03467a3 [vxfs] #23 [ffff883ff8fdcbf0] vx_do_getpage at ffffffffa034816b [vxfs] #24 [ffff883ff8fdcdd0] vx_do_read_ahead at ffffffffa03f705e [vxfs] #25 [ffff883ff8fdceb0] vx_read_ahead at ffffffffa038ed8a [vxfs] #26 [ffff883ff8fdcfc0] vx_do_getpage at ffffffffa0347732 [vxfs] #27 [ffff883ff8fdd1a0] vx_getpage1 at ffffffffa034865d [vxfs] #28 [ffff883ff8fdd2f0] vx_fault at ffffffffa03d4788 [vxfs] #29 [ffff883ff8fdd400] __do_fault at ffffffff81143194 #30 [ffff883ff8fdd490] handle_pte_fault at ffffffff81143767 #31 [ffff883ff8fdd570] handle_mm_fault at ffffffff811443fa #32 [ffff883ff8fdd5e0] __get_user_pages at ffffffff811445fa #33 [ffff883ff8fdd670] get_user_pages at ffffffff81144999 #34 [ffff883ff8fdd690] vx_dio_physio at ffffffffa041d812 [vxfs] #35 [ffff883ff8fdd800] vx_dio_rdwri at ffffffffa02ed08e [vxfs] #36 [ffff883ff8fdda20] vx_write_direct at ffffffffa044f490 [vxfs] #37 [ffff883ff8fddaf0] vx_write1 at ffffffffa04524bf [vxfs] #38 [ffff883ff8fddc30] vx_write_common_slow at ffffffffa0453e4b [vxfs] #39 [ffff883ff8fddd30] vx_write_common at ffffffffa0454ea8 [vxfs] #40 [ffff883ff8fdde00] vx_write at ffffffffa03dc3ac [vxfs] #41 [ffff883ff8fddef0] vfs_write at ffffffff81181078 #42 [ffff883ff8fddf30] sys_pwrite64 at ffffffff81181a32 #43 [ffff883ff8fddf80] system_call_fastpath at ffffffff8100b072 DESCRIPTION: The panic is due to kernel referring to corrupted thread_info structure from the scheduler, thread_info got corrupted by stack overflow. While doing direct I/O write, user-space pages need to be pre-faulted using __get_user_pages() code path. This code path is very deep can end up consuming lot of stack space. RESOLUTION: Reduced the kernel stack consumption by ~400-500 bytes in this code path by making various changes in the way pre-faulting is done. * INCIDENT NO:3338030 TRACKING ID:3335272 SYMPTOM: The mkfs (make file system) command dumps core when the log size provided is not aligned. The following stack trace is displayed: (gdb) bt #0 find_space () #1 place_extents () #2 fill_fset () #3 main () (gdb) DESCRIPTION: While creating the VxFS file system using the mkfs command, if the log size provided is not aligned properly, you may end up in doing miscalculations for placing the RCQ extents and finding no place. This leads to illegal memory access of AU bitmap and results in core dump. RESOLUTION: The code is modified to place the RCQ extents in the same AU where log extents are allocated. * INCIDENT NO:3338063 TRACKING ID:3332902 SYMPTOM: The system running the fsclustadm(1M) command panics while shutting down. The following stack trace is logged along with the panic: machine_kexec crash_kexec oops_end page_fault [exception RIP: vx_glm_unlock] vx_cfs_frlpause_leave [vxfs] vx_cfsaioctl [vxfs] vxportalkioctl [vxportal] vfs_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath DESCRIPTION: There exists a race-condition between "fsclustadm(1M) cfsdeinit" and "fsclustadm(1M) frlpause_disable". The "fsclustadm(1M) cfsdeinit" fails after cleaning the Group Lock Manager (GLM), without downgrading the CFS state. Under the false CFS state, the "fsclustadm(1M) frlpause_disable" command enters and accesses the GLM lock, which "fsclustadm(1M) cfsdeinit" frees resulting in a panic. There exists another race between the code in vx_cfs_deinit() and the code in fsck, and it will lead to the situation that although fsck has a reservation held, but this couldn't prevent vx_cfs_deinit() from freeing vx_cvmres_list because there is no such a check for vx_cfs_keepcount. RESOLUTION: The code is modified to add appropriate checks in the "fsclustadm(1M) cfsdeinit" and "fsclustadm(1M) frlpause_disable" to avoid the race-condition. * INCIDENT NO:3338758 TRACKING ID:3089314 SYMPTOM: The WPARs are unresponsive while running the pwdck command and the following stack trace is observed: e_block_thread() common_reclock() kernel_add_gate() vx_do_ftrunc2() vx_ftrunc2() vx_ftrunc_skey() vnop_ftrunc() ftrunc_common() DESCRIPTION: The variable which specifies value that uniquely identifies the host for a given virtual file system was not correctly updated before calling the kernel locking function leading to the above mentioned hang RESOLUTION: The unique host identifier is correctly initialized before making a call to the kernel record locking function. * INCIDENT NO:3338759 TRACKING ID:3089211 SYMPTOM: When adding/removing CPUs, Veritas File System (VxFS) can crash with Data Storage Interrupt (DSI) with the following stack: ... vx_ilock_try+000010 () vx_iflush_list+0004B4 () vx_workitem_process+000050 () vx_worklist_process+0001D4 () vx_worklist_thread+000090 () vx_thread_base+000048 () ... DESCRIPTION: When changing the memory/number of CPUs (Dynamic Reconfiguration (DR)), VxFS adjusts its private buffer cache and inode cache. When worker threads walking these cache lists are not synchronized with the DR thread, the cache lists can be moved underneath a worker thread, causing the latter to refer to stale data resulting in DSI. Here, sync initiated a worker thread which is operating on inode cache list without ensuring that no DR thread is in progress. RESOLUTION: The code is modified to acquire the DRB lock while performing a sync operation. * INCIDENT NO:3338762 TRACKING ID:3096834 SYMPTOM: Intermittent vx_disable messages are displayed in system log. DESCRIPTION: VxFS displays intermittent vx_disable messages. The file system is not corrupt and the fsck(1M) command does not indicate any problem with the file system. However, the file system gets disabled. RESOLUTION: The code is modified to make the vx_disable message verbose with stack trace information to facilitate further debugging. * INCIDENT NO:3338764 TRACKING ID:3131360 SYMPTOM: The vxfsconvert(1M) command fails on a Journaled File System (JFS) with the following messages: UX:vxfs vxfsconvert: ERROR: V-3-27051: Failed to read 4096, -1: No such device or address UX:vxfs vxfsconvert: ERROR: V-3-27049: Failed read of block XXXX: No such device or address DESCRIPTION: Before converting a file system, vxfsconvert(1M) first validates if the inode being read is a JFS2 inode. Due to certain reasons, the validation doesn't fail even if the inodes are a set of junk inodes. Later when vxfsconvert(1M) tries to read the data associated with the junk inodes, it fails with the error messages. RESOLUTION: Additional check has been added to the validation code, so that it can detect the junk inodes. * INCIDENT NO:3338770 TRACKING ID:3152313 SYMPTOM: On a VxFS file system with partitioned directories feature enabled, while removing a file the system may panic with the following stack: simple_lock_try +00003C () vx_ilock_try+000010 () vx_rwlock_range_revoke+000774 () vx_rwlock_range_revoke_skey+000094 () vxg_range_start_revoke_callback+0000F8 () vxg_range_start_revoke+0000E0 () vxg_range_start_revokes+0000F4 () vxg_range_unlock_body+000158 () vxg_api_range_unlockwf+0000A0 () vx_glm_range_unlock+000094 () vx_glmrange_rangeunlock+000010 () vx_irwunlock+000154 () vx_pd_remove+0003FC () vx_remove1_pd@AF49_38+0000F4 () vx_do_remove@AF50_39+0000BC () vx_remove1@AF51_40+0000C8 () vx_remove_vp+00006C () vx_remove+000010 () vx_remove_skey+00003C () DESCRIPTION: During the GLM unlock routine processing; we try to acquire a lock which is already owned by the same thread causing the above mentioned panic. RESOLUTION: The code is modified to check the lock status before trying to acquire it. * INCIDENT NO:3338776 TRACKING ID:3224101 SYMPTOM: On a file system that is mounted by a cluster, the system panics after you enable the lazy optimization for updating the i_size across the cluster nodes. The stack trace may look as follows: vxg_free() vxg_cache_free4() vxg_cache_free() vxg_free_rreq() vxg_range_unlock_body() vxg_api_range_unlock() vx_get_inodedata() vx_getattr() vx_linux_getattr() vxg_range_unlock_body() vxg_api_range_unlock() vx_get_inodedata() vx_getattr() vx_linux_getattr() DESCRIPTION: On a file system that is mounted by a cluster with the -o cluster option, read operations or write operations take a range lock to synchronize updates across the different nodes. The lazy optimization incorrectly enables a node to release a range lock which is not acquired and panic the node. RESOLUTION: The code has been modified to release only those range locks which are acquired. * INCIDENT NO:3338779 TRACKING ID:3252983 SYMPTOM: On a high-end system greater than or equal to 48 CPUs, some file-system operations may hang with the following stack trace: vx_ilock() vx_tflush_inode() vx_fsq_flush() vx_tranflush() vx_traninit() vx_tran_iupdat() vx_idelxwri_done() vx_idelxwri_flush() vx_delxwri_flush() vx_workitem_process() vx_worklist_process() vx_worklist_thread() DESCRIPTION: The function to get an inode returns an incorrect error value if there are no free inodes available in incore, this error value allocates an inode on-disk instead of allocating it to the incore. As a result, the same function is called again resulting in a continuous loop. RESOLUTION: The code is modified to return the correct error code. * INCIDENT NO:3338780 TRACKING ID:3253210 SYMPTOM: When the file system reaches the space limitation, it hangs with the following stack trace: vx_svar_sleep_unlock() default_wake_function() wake_up() vx_event_wait() vx_extentalloc_handoff() vx_te_bmap_alloc() vx_bmap_alloc_typed() vx_bmap_alloc() vx_bmap() vx_exh_allocblk() vx_exh_splitbucket() vx_exh_split() vx_dopreamble() vx_rename_tran() vx_pd_rename() DESCRIPTION: When large directory hash is enabled through the vx_dexh_sz(5M) tunable, Veritas File System (VxFS) uses the large directory hash for directories. When you rename a file, a new directory entry is inserted to the hash table, which results in hash split. The hash split fails the current transaction and retries after some housekeeping jobs complete. These jobs include allocating more space for the hash table. However, VxFS doesn't check the return value of the preamble job. And thus, when VxFS runs out of space, the rename transaction is re-entered permanently without knowing if more space is allocated by preamble jobs. RESOLUTION: The code is modified to enable VxFS to exit looping when ENOSPC is returned from the preamble job. * INCIDENT NO:3338785 TRACKING ID:3265538 SYMPTOM: System panics because VxFS calls the lock_done kernel service at intpri=A instead of intpri=B. DESCRIPTION: This issue happens because VX_RECONFIGLK_PL is defined to VX_SPIN_PL(INTIODONE) on AIX , but to VX_SPLBASE on Linux. In VX_RECONNECTWAIT(), VxFS calls VX_REFCONF_UNLOCK() without restoring the interrupt priority to INTBASE. The current interrupt priority is INTIODONE due to the previous X_FSEXT_RECONFIG_LOCK(). VX_RECONF_UNLOCK() unlocks the complex lock at VX_RECONF_LOCK . It traps and raises an exception because of the incorrect interrupt priority. The correct priority should be INTBASE. RESOLUTION: The VX_RECONFIGLK_PL on AIX is set to VX_SPLBASE. This issue no longer exists. * INCIDENT NO:3338787 TRACKING ID:3261462 SYMPTOM: File system with size greater than 16TB corrupts with vx_mapbad messages in the system log. DESCRIPTION: The corruption results from the combination of the following two conditions: a. Two or more threads race against each other to allocate around the same offset range. As a result, VxFS returns the buffer locked only in shared mode for all the threads which fail in allocating the extent. b. Since the allocated extent is from a region beyond 16TB, threads need to convert the buffer to a different type so that to accommodate the new extentAs start value. The buffer overrun happens because VxFS erroneously tries to unconditionally convert the buffer to the new type even though the buffer might not be able to accommodate the converted data. RESOLUTION: When the race condition is detected, VxFS returns proper retry errors to the caller, so that the whole operation is retried from the beginning. Also, the code is modified to ensure that VxFS doesnAt try to convert the buffer to the new type when it cannot accommodate the new data. In case this check fails, VxFS performs the proper split logic, so that buffer overrun doesnAt happen when the operation is retried. * INCIDENT NO:3338790 TRACKING ID:3233284 SYMPTOM: FSCK binary hangs while checking Reference Count Table (RCT) with the following stack trace: bmap_search_typed_raw() bmap_check_typed_raw() rct_check() process_device() main() DESCRIPTION: The FSCK binary hangs due to the looping in the bmap_search_typed_raw() function. This function searches for extent entry in the indirect buffer for a given offset. In this case, the given offset is less than the start offset of the first extent entry. This unhandled corner case causes the infinite loop. RESOLUTION: The code is modified to handle the following cases: 1. Searching in empty indirect block. 2. Searching for an offset, which is less than the start offset of the first entry in the indirect block. * INCIDENT NO:3339230 TRACKING ID:3308673 SYMPTOM: With the delayed allocations feature enabled for local mounted file system having highly fragmented available free space, the file system is disabled with the following message seen in the system log WARNING: msgcnt 1 mesg 031: V-2-31: vx_disable - /dev/vx/dsk/testdg/testvol file system disabled DESCRIPTION: VxFS transaction provides multiple extent allocations to fulfill one allocation request for a file system that has a high free space fragmentation. Thus, the allocation transaction becomes large and fails to commit. After retrying the transaction for a defined number of times, the file system is disabled with the with the above mentioned error RESOLUTION: The code is modified to commit the part of the transaction which is commit table and retrying the remaining part * INCIDENT NO:3339884 TRACKING ID:1949445 SYMPTOM: System is unresponsive when files are created on large directory. The following stack is logged: vxg_grant_sleep() vxg_cmn_lock() vxg_api_lock() vx_glm_lock() vx_get_ownership() vx_exh_coverblk() vx_exh_split() vx_dexh_setup() vx_dexh_create() vx_dexh_init() vx_do_create() DESCRIPTION: For large directories, large directory hash (LDH) is enabled to improve the lookup feature. When a system takes ownership of LDH inode twice in same thread context (while building hash for directory), it becomes unresponsive RESOLUTION: The code is modified to avoid taking ownership again if we already have the ownership of the LDH inode. * INCIDENT NO:3340029 TRACKING ID:3298041 SYMPTOM: While performing "delayed extent allocations" by writing to a file sequentially and extending the file's size, or performing a mixture of sequential write I/O and random write I/O which extend a file's size, the write I/O performance to the file can suddenly degrade significantly. DESCRIPTION: The 'dalloc' feature allows VxFS to allocate extents (file system blocks) to a file in a delayed fashion when extending a file size. Asynchronous writes that extend a file's size will create and dirty memory pages, new extents can therefore be allocated when the dirty pages are flushed to disk (via background processing) rather than allocating the extents in the same context as the write I/O. However, in some cases, with the delayed allocation on, the flushing of dirty pages may occur synchronously in the foreground in the same context as the write I/O, when triggered the foreground flushing can significantly slow the write I/O performance. RESOLUTION: The code is modified to avoid the foreground flushing of data in the same write context. * INCIDENT NO:3340144 TRACKING ID:3237204 SYMPTOM: The vxfsstat(1M) statistical reporting tool displays inaccurate pinned memory usage information for VxFS. Additionally, the glmstat(1M) utility tool fails to display the counts for the Group Lock Manager (GLM) pinned memory usage. DESCRIPTION: The VxFS pinned memory usage counters are maintained in a per-CPU array of structures. These counters are viewed as counters from the time the driver is loaded in the kernel. The pinned memory usage counters change frequently making them more exposed to races, which result in inaccurate counts. The fetch_and_addlp atomic protection kernel service prevents the races and the inaccuracies result in an unacceptable performance overhead. RESOLUTION: The code is modified such that the vxfsstat(1M) and glmstat(1M) displays accurate counts of the pinned memory usage for VxFS and GLM respectively. The vxfsstat(1M) command is improved to generate a detailed report of the VxFS pinned memory usage. * INCIDENT NO:3351946 TRACKING ID:3194635 SYMPTOM: Internal stress test on locally mounted filesystem exitsed with an error message. DESCRIPTION: With a file having Zero-Filled on Demand (ZFOD) extents, a write operation in ZFOD extent area may lead to the coalescing of extent of type SHARED or COMPRESSED, or both with new extent of type DATA. The new DATA extent may be coalesced with the adjacent extent, if possible. If this happens without unsharing for shared extent or uncompressing for compressed extent case, data or metadata corruption may occur. RESOLUTION: The code is modified such that adjacent shared, compressed or pseudo-compressed extent is not coalesced. * INCIDENT NO:3351947 TRACKING ID:3164418 SYMPTOM: Internal stress test on locally mounted VxFS filesytem results in data corruption in no space on device scenario while doing spilt on Zero Fill-On-Demand(ZFOD) extent. DESCRIPTION: When the split operation on Zero Fill-On-Demand(ZFOD) extent fails because of the ENOSPC(no space on device) error, then it erroneously processes the original ZFOD extent and returns no error. This may result in data corruption. RESOLUTION: The code is modified to return the ZFOD extent to its original state, if the ZFOD split operation fails due to ENOSPC error. * INCIDENT NO:3351977 TRACKING ID:3340286 SYMPTOM: The tunable setting of dalloc_enable gets reset to a default value after a file system is resized. DESCRIPTION: The file system resize operation triggers the file system re-initialization process. During this process, the tunable value of dalloc_enable gets reset to the default value instead of retaining the old tunable value. RESOLUTION: The code is fixed such that the old tunable value of dalloc_enable is retained. * INCIDENT NO:3359278 TRACKING ID:3364290 SYMPTOM: The kernel may panic in Veritas File System (VxFS) when it is internally working on reference count queue(RCQ) record. DESCRIPTION: The work item spawned by VxFS in kernel to process RCQ records during RCQ full situation is getting passed file system pointer as argument. Since no active level is held, this file system pointer is not guaranteed to be valid by the time the workitem starts processing. This may result in the panic. RESOLUTION: The code is modified to pass externally visible file system structure, as this structure is guaranteed to be valid since the creator of the work item takes a reference held on the structure which is released after the workitem exits. * INCIDENT NO:3364285 TRACKING ID:3364282 SYMPTOM: The fsck(1M) command fails to correct inode list file. DESCRIPTION: The fsck(1M) command fails to correct the inode list file to write metadata for the inode list file after writing to disk an extent for the inode list file, if the write operation is successful. RESOLUTION: The fsck(1M) command is modified to write metadata for the inode list file after succewrite operations of an extent for the inode list file. * INCIDENT NO:3364289 TRACKING ID:3364287 SYMPTOM: Debug assert may be hit in the vx_real_unshare() function in the cluster environment. DESCRIPTION: The vx_extend_unshare() function wrongly looks at the offset immediately after the current unshare length boundary. Instead, it should look at the offset that falls on the last byte of current unshare length. This may result in hitting debug asserts in the vx_real_unshare() function. RESOLUTION: The code is modified for the shared compressed extent. When the vx_extend_unshare() function tries to extend the unshared region, it doesnAt look up at the first byte immediately after the region is unshared. Instead, it does a looks up at the last byte unshared. * INCIDENT NO:3364302 TRACKING ID:3364301 SYMPTOM: Assert failure because of improper handling of inode lock while truncating a reorg inode. DESCRIPTION: While truncating the reorg extent, there may be a case where unlock on inode is called even when lock on inode is not taken.While truncating reorg inode, locks held are released and before it acquires them again, it checks if the inode is cluster inode. if true, it goes for taking delegation hold lock. If there was error while taking delegation hold lock, it comes to error code path. Here it checks if there was any transaction and if it had tran commitable error. It commits the transaction and on success calls the unlock to release the locks which was not held. RESOLUTION: The code is modified to check whether lock is taken or not before unlocking. * INCIDENT NO:3364307 TRACKING ID:3364306 SYMPTOM: Stack overflow seen in extent allocation code path. DESCRIPTION: Stack overflow appears in the vx_extprevfind() code path. RESOLUTION: The code is modified to hand-off the extent allocation to a worker thread when stack consumption reaches 4k. * INCIDENT NO:3364317 TRACKING ID:3364312 SYMPTOM: The fsadm(1M) command is unresponsive while processing the VX_FSADM_REORGLK_MSG message. The following stack trace may be seen while processing VX_FSADM_REORGLK_MSG: vx_tranundo() vx_do_rct_gc() vx_rct_setup_gc() vx_reorg_complete_gc() vx_reorg_complete() vx_reorg_clear_rct() vx_reorg_clear() vx_reorg_clear() vx_recv_fsadm_reorglk() vx_recv_fsadm() vx_msg_recvreq() vx_msg_process_thread() vx_thread_base() DESCRIPTION: In the vx_do_rct_gc() function, flag for in-directory cleanup is set for a shared indirect extent (SHR_IADDR_EXT). If the truncation fails, the vx_do_rct_gc()function does not clear the in-directory cleanup flag. As a result, the caller ends up calling the vx_do_rct_gc()function repeatedly leading to a never ending loop. RESOLUTION: The code is modified to reset the value of in-directory cleanup flag in case of truncation error inside the vx_do_rct_gc() function. * INCIDENT NO:3364333 TRACKING ID:3312897 SYMPTOM: In Cluster File System (CFS), system can hang while trying to perform any administrative operation when the primary node is disabled. DESCRIPTION: In CFS, when node 1 tries to do some administrative operation which freezes and thaws the file system (e.g. turning on/off fcl), a deadlock can occur between the thaw and recovery (which started due to CFS primary being disabled) threads. The thread on node 1 trying to thaw is blocked while waiting for node 2 to reply to the loadfs message. The thread processing the loadfs message is waiting to complete the recovery operation. The recovery thread on node 2 is waiting for lock on an extent map (emap) buffer. This lock is held on node 1, as part of a transaction that was committed during the freeze, which results into a deadlock. RESOLUTION: The code is modified such as to flush any transactions that were committed during a freeze before starting the thawing process. * INCIDENT NO:3364335 TRACKING ID:3331109 SYMPTOM: The full fsck does not repair corrupted reference count queue (RCQ) record. DESCRIPTION: When the RCQ record is corrupted due to an I/O error or log error, there is no code in full fsck which handles this corruption. As a result, some further operations related to RCQ might fail. RESOLUTION: The code is modified To repair the corrupt RCQ entry during a full fsck. * INCIDENT NO:3364338 TRACKING ID:3331045 SYMPTOM: Kernel Oops in unlock code of map while referring freed mlink due to a race with iodone routine for delayed writes. DESCRIPTION: After issuing ASYNC I/O of map buffer, there is a possible race Between the vx_unlockmap() function and the vx_mapiodone() function. Due to a race, the vx_unlockmap() function refers a mlink after it gets freed. RESOLUTION: The code is modified to handle such race condition. * INCIDENT NO:3364349 TRACKING ID:3359200 SYMPTOM: Internal test on Veritas File System (VxFS) fsdedup(1M) feature in cluster filesystem environment results in a hang. DESCRIPTION: The thread which processes the fsdedup(1M) request is taking the delegation lock on extent map which itself is waiting to acquire a lock on cluster-wide reference count queue(RCQ) buffer. While other internal VxFS thread is working on RCQ takes lock on cluster-wide RCQ buffer and is waiting to acquire delegation lock on extent map causinga deadlock. RESOLUTION: The code is modified to correct the lock hierarchy such that the delegation lock on extent map is taken before taking lock on cluster-wide RCQ buffer. * INCIDENT NO:3370650 TRACKING ID:2735912 SYMPTOM: The performance of tier relocation for moving a large number of files is poor when the `fsppadm enforce' command is used. When looking at the fsppadm(1M) command in the kernel, the following stack trace is observed: vx_cfs_inofindau vx_findino vx_ialloc vx_reorg_ialloc vx_reorg_isetup vx_extmap_reorg vx_reorg vx_allocpolicy_enforce vx_aioctl_allocpolicy vx_aioctl_common vx_ioctl vx_compat_ioctl DESCRIPTION: When the relocation is for each file located in Tier 1 to be relocated to Tier 2, Veritas File System (VxFS) allocates a new reorg inode and all its extents in Tier 2. VxFS then swaps the content of these two files and deletes the original file. This new inode allocation which involves a lot of processing can result in poor performance when a large number of files are moved. RESOLUTION: The code is modified to develop a reorg inode pool or cache instead of allocating it each time. * INCIDENT NO:3372909 TRACKING ID:3274592 SYMPTOM: Internal noise test on Cluster File System (CFS)is unresponsive while executing the fsadm(1M) command DESCRIPTION: In CFS, the fsadm(1M) command hangs in the kernel, while processing the fsadm-reorganisation message on a secondary node. The hang results due to a race with the thread processing fsadm-query message for mounting primary-fileset on secondary node where the thread processing fsadm-query message wins the race. RESOLUTION: The code is modified to synchronize the processing of fsadm-query message and fsadm-reorganization message on the primary node. This synchronization ensures that they are processed in the order in which they were received. * INCIDENT NO:3380905 TRACKING ID:3291635 SYMPTOM: Internal testing found the Avx_freeze_block_threads_all:7cA debug assert on locally mounted file systems while processing preambles for transactions DESCRIPTION: While processing preambles for transactions, if reference count queue (RCQ) is full, VxFS may hamper the processing of RCQ to free some records. This may result in hitting the debug assert. RESOLUTION: The code is modified to ignore the Reference count queue (RCQ) full errors when VxFS processes preambles for transactions. * INCIDENT NO:3396539 TRACKING ID:3331093 SYMPTOM: MountAgent got stuck while doing repeated switchover due to current VxFS-AMF notification/unregistration design with the following stacktrace: sleep_spinunlock+0x61 () vx_delay2+0x1f0 () vx_unreg_callback_funcs_impl+0xd0 () disable_vxfs_api+0x190 () text+0x280 () amf_event_release+0x230 () amf_fs_event_lookup_notify_multi+0x2f0 () amf_vxfs_mount_opt_change_callback+0x190 () vx_aioctl_unsetmntlock+0x390 () cold_vx_aioctl_common+0x7c0 () vx_aioctl+0x300 () vx_admin_ioctl+0x610 () vxportal_ioctl+0x690 () spec_ioctl+0xf0 () vno_ioctl+0x350 () ioctl+0x410 () syscall+0x5b0 () DESCRIPTION: This issue is related to VxFS-AMF interface. VxFS provides notifications to AMF for certain events like FS being disabled or mount options change. While VxFS has called into AMF, AMF event handling mechanism can trigger an unregistration of VxFS in the same context since VxFS's notification triggered the last event notification registered with AMF. Before VxFS calls into AMF, a variable vx_fsamf_busy is set to 1 and it is reset when the callback returns. The unregistration loops if it finds that vx_fsamf_busy is set to 1. Since unregistration was called from the same context of the notification call back, the vx_fsamf_busy was never set to 0 and the loop goes on endlessly causing the command that triggered the notification to hang. RESOLUTION: A delayed unregistration mechanism is employed. The fix addresses the issue of getting unregistration from AMF in the context of callback from VxFS to AMF. In such scenario, the unregistration is marked for a later time. When all the notifications return and if a delayed unregistration is marked, the unregistration routine is explicitly called. * INCIDENT NO:3402484 TRACKING ID:3394803 SYMPTOM: The vxupgrade(1M) command causes VxFS to panic with the following stack trace: panic_save_regs_switchstack() panic bad_kern_reference() $cold_pfault() vm_hndlr() bubbleup() vx_fs_upgrade() vx_upgrade() $cold_vx_aioctl_common() vx_aioctl() vx_ioctl() vno_ioctl() ioctl() syscall() DESCRIPTION: The panic is caused due to de_referencing the operator in the NULL device (one of the devices in the DEVLIST is showing as a NULL device). RESOLUTION: The code is modified to skip the NULL devices when the device in EVLIST is processed. * INCIDENT NO:3405172 TRACKING ID:3436699 SYMPTOM: Assert failure occurs because of race condition between clone mount thread and directory removal thread while pushing data on clone. DESCRIPTION: There is a race condition between clone mount thread and directory removal thread (while pushing modified directory data on clone). On AIX, vnodes are added into the VFS vnode list (link-list of vnodes). The first entry to this vnode link-list must be root's vnode, which was done during the mount process. While mounting a clone, mount thread is scheduled before adding root's vnode into this list. During this time, the thread 2 takes the VFS lock on the same VFS list and tries to enter the directory's vnode into this vnode list. As there was no root vnode present at the start, it is assumed that this directory vnode as a root vnode and while cross checking this with the VROOT flag, the assert fails. RESOLUTION: The code is modified to handle the race condition by attaching root vnode into VFS vnode list before setting VFS pointer into file set. * INCIDENT NO:3409619 TRACKING ID:3395692 SYMPTOM: Double deletion of a pointer in the vx_do_getacl()function causes abend_trap() with the following stack trace: abend_trap xmfree() xm_bad_free() xmdbg_do_xmfree_record() xmfree()vx_free() vx_getxacl_vxfs() vx_getxacl() vx_getxacl() vx_getxacl_skey() DESCRIPTION: The vx_do_getacl() function frees the memory allocated by a pointer twice, which results in the above mentioned trap. RESOLUTION: The code is modified to nullify the pointer after it is freed. * INCIDENT NO:3409691 TRACKING ID:3417076 SYMPTOM: The vxtunefs(1M)command reports error when it attempts to set the tunables from the tunefstab file. DESCRIPTION: The vxtunefs(1M) command sets the tunable values that are mentioned in the specific file called tunefstab (/etc/vx/tunefstab by default). When the file contains blank lines or white spaces (which are unusual in the tunefstab file), the device name buffer contains the name of the previous (stale) device and the tunable values are null. The command fails to set tunables for that device consequently. RESOLUTION: The code is modified to handle the unusual text (newlines or white spaces) in the tunefstab file. * INCIDENT NO:3417311 TRACKING ID:3417321 SYMPTOM: The vxtunefs(1M) man page gives an incorrect DESCRIPTION: According to the current design, the tunable Adelicache_enableA is enabled by default both in case of local mount and cluster mount. But, the man page is not updated accordingly. It still specifies that this tunable is enabled by default only in case of a local mount. The man page needs to be updated to correct the RESOLUTION: The code is modified to update the man page of the vxtunefs(1m) tunable to display the correct contents for the Adelicache_enableA tunable. Additional information is provided with respect to the performance benefits, in case of CFS being limited as compared to the local mount. Also, in case of CFS, unlike the other CFS tunable parameters, there is a need to explicitly turn this tunable on or off on each node. * INCIDENT NO:3418754 TRACKING ID:3418997 SYMPTOM: The vxtunefs(1M) command accepts the garbage value for the 'write_throttle' tunable. DESCRIPTION: When garbage values are given for the write_throttle tunable to the vxtunefs(1M) command, the command accepts the value and displays a successful update message. However, the value doesn't get reflected in the system. Consequently the garbage value for the tunable is also accepted. RESOLUTION: The code is modified to handle the error from parsing the command-line value of the write_throttle tunable. * INCIDENT NO:3430687 TRACKING ID:3444775 SYMPTOM: Internal noise testing on Cluster File System (CFS) results in a kernel panic in function vx_fsadm_query()with the following error message "Unable to handle kernel paging request". DESCRIPTION: The issue occurs due to simultaneous asynchronous access or modification by two threads to inode list extent array. As a result, memory freed by one thread is accessed by the other thread, resulting in the panic. RESOLUTION: The code is modified to add relevant locks to synchronize access or modification of inode list extent array. * INCIDENT NO:3436393 TRACKING ID:3462694 SYMPTOM: The fsdedupadm(1M) command fails with error code 9 when it tries to mount checkpoints on a cluster. DESCRIPTION: While mounting checkpoints, the fsdedupadm(1M) command fails to parse the cluster mount option correctly, resulting in the mount failure. RESOLUTION: The code is modified to parse cluster mount options correctly in the fsdedupadm(1M) operation. * INCIDENT NO:3448567 TRACKING ID:3448627 SYMPTOM: The vxtunefs(1M) command accepts the garbage value for the discovered_direct_iosz tunable. DESCRIPTION: When garbage values are given for the discovered_direct_iosz tunable to the vxtunefs(1M) command, the command accepts the value and gives a successful update message. However, the value doesn't get reflected in the system. Consequently the garbage value for the tunable is also accepted. RESOLUTION: The code is modified to handle the error from parsing the command-line value of the discovered_direct_iosz tunable. * INCIDENT NO:3448758 TRACKING ID:3449150 SYMPTOM: The vxtunefs(1M) command accepts garbage values for the following tunables: max_direct_iosz() default_indir_size() qio_cache_enable() odm_cache_enable() max_diskq() initial_extent_size() max_seqio_extent_size() hsm_write_prealloc() read_ahead() inode_aging_size() oltp_load() inode_aging_count() DESCRIPTION: When garbage values are given for any of the above listed tunables to the vxtunefs(1M) command, the command accepts the value and gives a successful update message. However, the value doesn't get reflected in the system. Consequently the garbage value for the tunable is also accepted. RESOLUTION: The code is modified to handle the error from parsing the command-line value of the tunables. * INCIDENT NO:3448818 TRACKING ID:3449152 SYMPTOM: The vxtunefs(1M) command fails to set the thin_friendly_alloc tunable in CFS. DESCRIPTION: The thin_friendly_alloc tunable is not supported on CFS. But when the vxtunefs(1M) command is used to set it in CFS, a false successful message is displayed. RESOLUTION: The code is modified to report error for the attempt to set the thin_friendly_alloc tunable in CFS. * INCIDENT NO:3451355 TRACKING ID:3463717 SYMPTOM: CFS does not support the 'thin_friendly_alloc' tunable. And, the vxtunefs(1M) command man page is not updated with this information. DESCRIPTION: Since the man page does not explicitly mention that the 'thin_friendly_alloc' tunable is not supported, it is assumed that CFS supports this feature. RESOLUTION: The man page pertinent to the vxtunefs(1M) command is updated to denote that CFS does not support the 'thin_friendly_alloc' tunable. PATCH ID:6.0.300.000 * INCIDENT NO:2928921 TRACKING ID:2843635 SYMPTOM: The VxFS internal testing, there are some failures during the reorg operation of structural files. DESCRIPTION: While the reorg is in progress, from certain ioctl, the error value that is to be returned is overwritten and thus results in an incorrect error value and test failures. RESOLUTION: Made changes accordingly so as the error value is not overwritten. * INCIDENT NO:2933290 TRACKING ID:2756779 SYMPTOM: Write and read performance concerns on Cluster File System (CFS) when running applications that rely on POSIX file-record locking (fcntl). DESCRIPTION: The usage of fcntl on CFS leads to high messaging traffic across nodes thereby reducing the performance of readers and writers. RESOLUTION: The code is modified to cache the ranges that are being file-record locked on the node. This is tried whenever possible to avoid broadcasting of messages across the nodes in the cluster. * INCIDENT NO:2933291 TRACKING ID:2806466 SYMPTOM: A reclaim operation on a file system that is mounted on an LVM volume using the fsadm(1M) command with the -R option may panic the system. And the following stack trace is displayed: vx_dev_strategy+0xc0() vx_dummy_fsvm_strategy+0x30() vx_ts_reclaim+0x2c0() vx_aioctl_common+0xfd0() vx_aioctl+0x2d0() vx_ioctl+0x180() DESCRIPTION: Thin reclamation supports only mounted file systems on a VxVM volume. RESOLUTION: The code is modified to return errors without panicking the system if the underlying volume is LVM. * INCIDENT NO:2933292 TRACKING ID:2895743 SYMPTOM: It takes a longer than usual time for many Windows7 clients to log off in parallel if the user profile is stored in Cluster File system (CFS). DESCRIPTION: Veritas File System (VxFS) keeps file creation time/full ACL things for samba clients in the extended attribute which is implemented via named streams. VxFS reads the named stream for each of the ACL objects. Reading of named stream is a costly operation, as it results in an open, an opendir, a lookup, and another open to get the fd. The VxFS function vx_nattr_open() holds the exclusive rwlock to read an ACL object that stored as extended attribute. It may cause heavy lock contention when many threads want the same lock. They might get blocked until one of the nattr_open releases it. This takes time since nattr_open is very slow. RESOLUTION: The code is modified so that it takes the rwlock in shared mode instead of Exclusive mode. * INCIDENT NO:2933294 TRACKING ID:2750860 SYMPTOM: Performance of the write operation with small request size may degrade on a large Veritas File System (VxFS) file system. Many threads may be found sleeping with the following stack trace: vx_sleep_lock vx_lockmap vx_getemap vx_extfind vx_searchau_downlevel vx_searchau_downlevel vx_searchau_downlevel vx_searchau_downlevel vx_searchau_uplevel vx_searchau+0x600 vx_extentalloc_device vx_extentalloc vx_te_bmap_alloc vx_bmap_alloc_typed vx_bmap_alloc vx_write_alloc3 vx_recv_prealloc vx_recv_rpc vx_msg_recvreq vx_msg_process_thread kthread_daemon_startup DESCRIPTION: A VxFS allocate unit (AU) is composed of 32768 disk blocks, and can be expanded when it is partially allocated, or non-expanded when the AU is fully occupied or completely unused. The extent map for a large file system with 1k block size is organized as a big tree. For example, a 4-TB file system with 1KB file system block size can have up to 128k Aus. To find an appropriate extent, VxFS extent allocation algorithm will first search expanded AU to avoid causing free space fragmentation by traversing the free extent map tree. If getting failed, it will do the same with the non-expanded AUs. When there are too many small extents(less than 32768 blocks) requests, and all the small free extents are used up, but a large number of au-size extents (32768 blocks) are available; the file system could run into this hang. Because of no small available extents in the expanded AUs, VxFS will look for some larger non-expanded extents, namely au-size extents, which are not what VxFS wanted (expanded AU is expected). As a result, each request will walk along the big extent map tree for every au-size extent, which will end up with failure finally. The requested extent can be gotten during the second attempt for non-expanded AUs eventually, but the unnecessary work consumes a lot of CPU resource. RESOLUTION: The code is modified to optimize the free-extend-search algorithm by skipping certain au-size extents to reduce the overall search time. * INCIDENT NO:2933296 TRACKING ID:2923105 SYMPTOM: Removing the Veritas File System (VxFS) module using rmmod(8) on a system having heavy buffer cache usage may hang. DESCRIPTION: When a large number of buffers are allocated from the buffer cache, at the time of removing VxFS module, the process of freeing the buffers takes a long time. RESOLUTION: The code is modified to use an improved algorithm which prevents it from traversing the free lists even if it has found the free chunk. Instead, it will break out from the search and free that buffer. * INCIDENT NO:2933309 TRACKING ID:2858683 SYMPTOM: The reserve-extent attributes are changed after the vxrestore(1M ) operation, for files that are greater than 8192 bytes. DESCRIPTION: A local variable is used to contain the number of the reserve bytes that are reused during the vxrestore(1M) operation, for further VX_SETEXT ioctl call for files that are greater than 8k. As a result, the attribute information is changed. RESOLUTION: The code is modified to preserve the original variable value till the end of the function. * INCIDENT NO:2933313 TRACKING ID:2841059 SYMPTOM: The file system gets marked for a full fsck operation and the following message is displayed in the system log: V-2-96: vx_setfsflags file system fullfsck flag set - vx_ierror vx_setfsflags+0xee/0x120 vx_ierror+0x64/0x1d0 [vxfs] vx_iremove+0x14d/0xce0 vx_attr_iremove+0x11f/0x3e0 vx_fset_pnlct_merge+0x482/0x930 vx_lct_merge_fs+0xd1/0x120 vx_lct_merge_fs+0x0/0x120 vx_walk_fslist+0x11e/0x1d0 vx_lct_merge+0x24/0x30 vx_workitem_process+0x18/0x30 vx_worklist_process+0x125/0x290 vx_worklist_thread+0x0/0xc0 vx_worklist_thread+0x6d/0xc0 vx_kthread_init+0x9b/0xb0 V-2-17: vx_iremove_2 : file system inode 15 marked bad incore DESCRIPTION: Due to a race condition, the thread tries to remove an attribute inode that has already been removed by another thread. Hence, the file system is marked for a full fsck operation and the attribute inode is marked as 'bad ondisk'. RESOLUTION: The code is modified to check if the attribute node that a thread is trying to remove has already been removed. * INCIDENT NO:2933315 TRACKING ID:2887423 SYMPTOM: spin-lock contention on vx_sched_lk can result in slow I/O. DESCRIPTION: vx_sched process takes the spin-lock vx_sched_lk at INTIODONE priority, thereby disabling all other interrupts at this priority level. Contention on this spin- lock can hold-off iodone processing for more time and can result in a slow I/O. RESOLUTION: vx_sched process now takes the spin-lock at INTBASE priority. * INCIDENT NO:2933319 TRACKING ID:2848948 SYMPTOM: VxFS buff cache consumption increased significantly after running over 248 days. The problem is specifix to aix platform. DESCRIPTION: As a fundamental concept, the buffer cache holds copies of data which correspond to blocks containing file system metadata (directory blocks, indirect blocks, raw inode structures, and many other data types). These buffers are held in memory until they get explicitly invalidated for some reason, the memory is needed for other purposes, or they have not been accessed recently. Every UNIX system provides a time counter for storing system time value. On AIX it is a 64 bit variable named lbolt. The problem however is that the data type that the vxfs code presumed was the correct one to save this value (clock_t) is only a 32 bit type on AIX. Due to this oversight, any AIX system which has been running long enough has a value in lbolt which is improperly typecasted to a variable of type clock_t. The variable saved for the age of the buffer cache can be negative. The code which compares the various saved and generated timestamps can calculate the wrong time differences due to losing the higher bits in the clock. This can cause it to think that the buffers are effectively newer than the current time. Because of these issues, any AIX system running vxfs may encounter a hang after the value in lbolt exceeds the maximum signed 32 bit integer. RESOLUTION: The data types of lbolt and all those variables where time is stored are corrected to be consistent (64 bit). * INCIDENT NO:2933321 TRACKING ID:2878164 SYMPTOM: VxFS consuming too much pinned heap. DESCRIPTION: While freeing the memory and pre-translations from heap, if the call happen to be in interrupt context, a kernel extension is not allowed to call services like xmfree() and xlate_remove(). Hence, VxFS hangs off these structures to per-cpu data to be freed later(during further allocations). This deferral is piling up the heap consumption of VxFS. RESOLUTION: VxFS now have separate worker threads which will release the consume heap. While doing the free, if the call happens to be interrupt context, the allocated structure will be placed on workitem queues to be picked up by corresponding worker thread and release it. * INCIDENT NO:2933322 TRACKING ID:2857568 SYMPTOM: During backup files are read sequentially and the entire file is brought into memory (into the page cache). This displaces other file data held in the page cache. If files are removed after their backup then all of the file pages are freed during the remove. The VMM chunking mechanism needs to 'chunk' down the page list during the remove - if the files are large, and if the entire file is held in memory, then this large chunking operation can impose small i/o delays on the system. DESCRIPTION: Symantec identified a need to avoid having to perform a large amount of VMM chunking when removing files immediately after their backup. We identified a requirement for an option to remove pages during the backup rather than during the file remove. RESOLUTION: Symantec have introduced a new mount option called "noreuserd". This mount option will initially only work for local mounted file systems (not for cluster mounted file systems). When mounted with this option VxFS will free pages behind the "read pointer" when performing sequential buffered read i/o. The term 'noreuse' means - we do not need to use the pages again, we only need to use them once, so we can throw them away immediately after use. The term 'rd' means it only works for reads(sequential reads). Hence the new mount option name of "noreuserd". * INCIDENT NO:2933324 TRACKING ID:2874054 SYMPTOM: The vxconvert(1M) command fails to convert LVM disk groups to VxVM disk groups with the following error: VxFS ERROR V-3-21784: can't malloc DESCRIPTION: While converting an LVM volume to a VxVM volume, vxconvert(1M) allocates memory with the malloc() function to do the conversion. The memory allocated is based on the extent size. When the extent size is big, malloc is not able to find a large chunk of free memory. As a result, the conversion fails with the above mentioned error. RESOLUTION: The code is modified to break down the big contiguous malloc request into smaller sizes, so that it could be fulfilled using non-contiguous memory chunks. * INCIDENT NO:2933751 TRACKING ID:2916691 SYMPTOM: fsdedup infinite loop with the following stack: #5 [ffff88011a24b650] vx_dioread_compare at ffffffffa05416c4 #6 [ffff88011a24b720] vx_read_compare at ffffffffa05437a2 #7 [ffff88011a24b760] vx_dedup_extents at ffffffffa03e9e9b #11 [ffff88011a24bb90] vx_do_dedup at ffffffffa03f5a41 #12 [ffff88011a24bc40] vx_aioctl_dedup at ffffffffa03b5163 DESCRIPTION: vx_dedup_extents() do the following to dedup two files: 1. Compare the data extent of the two files that need to be deduped. 2. Split both files' bmap to make them share the first file's common data extent. 3. Free the duplicate data extent of the second file. In step 2, During bmap split, vx_bmap_split() might need to allocate space for the inode's bmap to add new bmap entries, which will add emap to this transaction. (This condition is more likely to hit if the dedup is being run on two large files that have interleaved duplicate/difference data extents, the files bmap will needed to be splited more in this case) In step 3, vx_extfree1() doesn't support Multi AU extent free if there is already an emap in the same transaction, In this case, it will return VX_ETRUNCMAX. (Please see incident e569695 for history of this limitation) VX_ETRUNCMAX is a retirable error, so vx_dedup_extents() will undo everything in the transaction and retry from the beginning, then hit the same error again. Thus infinite loop. RESOLUTION: We make vx_te_bmap_split() always register an transaction preamble for the bmap split operation in dedup, and let vx_dedup_extents() perform the preamble at a separate transaction before it retry the dedup operation. * INCIDENT NO:2933822 TRACKING ID:2624262 SYMPTOM: Panic hit in vx_bc_do_brelse() function while executing dedup functionality with following backtrace. vx_bc_do_brelse() vx_mixread_compare() vx_dedup_extents() enqueue_entity() __alloc_pages_slowpath() __get_free_pages() vx_getpages() vx_do_dedup() vx_aioctl_dedup() vx_aioctl_common() vx_rwunlock() vx_aioctl() vx_ioctl() vfs_ioctl() do_vfs_ioctl() sys_ioctl() DESCRIPTION: While executing function vx_mixread_compare() in dedup codepath, we hit error due to which an allocated data structure remained uninitialised. The panic occurs due to writing to this uninitialised allocated data structure in the function vx_mixread_compare(). RESOLUTION: Code is changed to free the memory allocated to the data structure when we are going out due to error. * INCIDENT NO:2937367 TRACKING ID:2923867 SYMPTOM: Got assert hit due to VX_RCQ_PROCESS_MSG having lower priority(Numerically) than VX_IUPDATE_MSG; DESCRIPTION: When primary is going to send VX_IUPDATE_MSG message to the owner of the inode about updation of the inode's non-transactional field change then it checks for the current messaging priority(for VX_RCQ_PROCESS_MSG) with the priority of the message being sent(VX_IUPDATE_MSG) to avoid possible deadlock. In our case we were getting VX_RCQ_PROCESS_MSG priority numerically lower than VX_IUPDATE_MSG thus getting assert hit. RESOLUTION: We have changed the VX_RCQ_PROCESS_MSG priority numerically higher than VX_IUPDATE_MSG thus avoiding possible assert hit. * INCIDENT NO:2975645 TRACKING ID:2978326 SYMPTOM: Can't change the value of the dalloc_enable/dalloc_limit tunables in case of cluster file systems. DESCRIPTION: Actually in case of cluster file systems , dalloc_enable and dalloc_limit tunables can not be set/changed. RESOLUTION: Included the information in the man page of vxtunefs regarding dalloc_enable and dalloc_limit tunables that are not supported for cluster file systems. * INCIDENT NO:2976664 TRACKING ID:2906018 SYMPTOM: In the event of a system crash, the fsck-intent-log is not replayed and file system is marked clean. Subsequently, mounting the file-system-extended operations is not completed. DESCRIPTION: Only when a file system that contains PNOLTs is mounted locally (mounted without using 'mount -o cluster') are potentially exposed to this issue. The reason why fsck silently skips the intent-log replay is that each PNOLT has a flag to identify whether the intent-log is dirty or not - in the event of a system crash this flag signifies whether intent-log replay is required or not. In the event of a system crash whilst the file system was mounted locally and the PNOLTs are not utilized. The fsck intent-log replay will still check for the flags in the PNOLTs, however, these are the wrong flags to check if the file system was locally mounted. The fsck intent-log replay therefore assumes that the intent-logs are clean (because the PNOLTs are not marked dirty) and it therefore skips the replay of intent-log altogether. RESOLUTION: The code is modified such that when PNOLTs exist in the file system, VxFS will set the dirty flag in the CFS primary PNOLT while mounting locally. With this change, in the event of system crash whilst a file system is locally mounted, the subsequent fsck intent-log replay will correctly utilize the PNOLT structures and successfully replay the intent log. * INCIDENT NO:2978227 TRACKING ID:2857751 SYMPTOM: The internal testing hits the assert "f:vx_cbdnlc_enter:1a" when the upgrade was in progress. DESCRIPTION: The clone/fileset should be mounted if there is an attempt to add an entry in the dnlc. If the clone/fileset is not mounted and still there is an attempt to add it to dnlc, then it is not valid. RESOLUTION: Fix is added to check if filset is mounted or not before adding an entry to dnlc. * INCIDENT NO:2983739 TRACKING ID:2857731 SYMPTOM: Internal testing hits an assert "f:vx_mapdeinit:1" while the file system is frozen and not disabled. DESCRIPTION: while performing deinit for a free inode map, the delegation state should not be set.This is actually a race between freeze/release-dele/fs-reinit sequence and processing of extops during reconfig. RESOLUTION: Taking appropriate locks during the extop processing so that fs structures remain quiesced during the switch. * INCIDENT NO:2984589 TRACKING ID:2977697 SYMPTOM: Deleting checkpoints of file systems with character special device files viz. /dev/null using fsckptadm may panic the machine with the following stack trace: vx_idetach vx_inode_deinit vx_idrop vx_inull_list vx_workitem_process vx_worklist_process vx_worklist_thread vx_kthread_init DESCRIPTION: During the checkpoint removal operation the type of the inodes is converted to 'pass through inode'. During a conversion we try to refer to the device reference for the special file, which is invalid in the clone context leading to a panic. RESOLUTION: The code is modified to remove device reference of the special character files during the clone removal operation thus preventing the panic. * INCIDENT NO:2987373 TRACKING ID:2881211 SYMPTOM: File ACLs not preserved in checkpoints properly if file has hardlink. Works fine with file ACLs which don't have hardlinks. DESCRIPTION: This issue is with attribute inode. When we add an acl entry, if its in the immediate area its propagated to the clone . But in the case if attribute inode is created, its not being propagated to the checkpoint. We are missing push in the context of attribute inode and so getting this issue. RESOLUTION: Modified the code to propagate the ACLs entries (attribute inode case) to the clone. PATCH ID:6.0.100.200 * INCIDENT NO:2912412 TRACKING ID:2857629 SYMPTOM: When a new node takes over a primary for the file system, it could process stale shared extent records in a per node queue. The primary will detect a bad record and set the full fsck flag. It will also disable the file system to prevent further corruption. DESCRIPTION: Every node in the cluster that adds or removes references to shared extents, adds the shared extent records to a per node queue. The primary node in the cluster processes the records in the per node queues and maintains reference counts in a global shared extent device. In certain cases the primary node might process bad or stale records in the per node queue. Two situations under which bad or stale records could be processed are: 1. clone creation initiated from a secondary node immediately after primary migration to different node. 2. queue wraparound on any node and take over of primary by new node immediately afterwards. Full fsck might not be able to rectify the file system corruption. RESOLUTION: Update the per node shared extent queue head and tail pointers to correct values on primary before starting processing of shared extent records. * INCIDENT NO:2912435 TRACKING ID:2885592 SYMPTOM: vxdump of a file system which is compressed using vxcompress is aborted. DESCRIPTION: vxdump is aborted due to malloc() failure. malloc()fails due to a memory leak in vxdump command code while handling compressed extents. RESOLUTION: Fixed the memory leak. * INCIDENT NO:2923805 TRACKING ID:2590918 SYMPTOM: Upon new node in the cluster taking over as primary of the file system, there might be a significant delay in freeing up unshared extents. This problem can occur only in the case when shared extent addition or deletions occurred immediately after primary switch over to different node in the cluster. DESCRIPTION: When a new node in the cluster takes over as primary for the file system, a file system thread in the new primary performs a full scan of the shared extent device file to free up any shared extents that have become completely unshared. If heavy shared extent related activity such as additional sharing or unsharing of extents were to occur anywhere in the cluster while the full scan was being performed, the full scan could get interrupted. Due to a bug, the full scan is marked as completed and scheduled further scans of the shared extent device are partial scans. This will cause a substantial delay in freeing up some of the unshared extents in the device file. RESOLUTION: If the first full scan of shared extent device upon primary takeover gets interrupted, then do not mark the full scan as complete. INCIDENTS FROM OLD PATCHES: --------------------------- NONE