vxtunefs (1M)

NAME

vxtunefs - tune a Veritas File System

SYNOPSIS

vxtunefs [-b value] [-ps] [-D {print | drefund_enable=value
| lowmem_disable=value | num_pdt=value}] [-f tunefstab]
[-o parameter=value] [{mount_point | special}]...

AVAILABILITY

VRTSvxfs

DESCRIPTION

The vxtunefs command sets or prints tunable I/O parameters of mounted file systems. The vxtunefs command can set parameters describing the I/O properties of the underlying device, parameters to indicate when to treat an I/O as direct I/O, or parameters to control the extent allocation policy for the specified file system.

With no options specified, vxtunefs prints the existing VxFS parameters for the specified file systems.

The vxtunefs command works on a list of mount points specified on the command line, or all the mounted file systems listed in the tunefstab file. The default tunefstab file is /etc/vx/tunefstab. You can change the default by setting the VXTUNEFSTAB environment variable.

The vxtunefs command can be run at any time on a mounted file system, and all parameter changes take immediate effect. Parameters specified on the command line override parameters listed in the tunefstab file.

If /etc/vx/tunefstab exists, the VxFS-specific mount command invokes vxtunefs to set device parameters from /etc/vx/tunefstab. The VxFS-specific mount command interacts with VxVM to obtain default values for the tunables, so you need to specify tunables for VxVM devices only to change the defaults.

Only a privileged user can run vxtunefs.

NOTES

The vxtunefs command works with Storage Checkpoints; however, VxFS tunables apply to an entire file system. Therefore tunables affect not only the primary fileset, but also any Storage Checkpoint filesets within that file system. It is not recommended to retune a parameter when the system is under heavy load. Any change in a tunable parameter requires freezing the file system and it can slow down or stall some file system activities.
Hence retuning a parameter may take a long time to complete.

Cluster File System Issues

Whether specified by the command line or the tunefstab file, tunable parameters are propagated to all nodes in the cluster.

SFCFS requires more memory to manage distributed operations, so you may want to adjust related tunables for a cluster mounted file system.

Cached Quick I/O does not function on CFS, so the qio_cache_enable parameter is not supported.

The max_seqio_extent_size and initial_extent_size parameters are in-memory values that take effect only when the invoking node is the primary, or later becomes the primary, node in the cluster.

Multiple Volume Set Considerations

When using file systems in multiple volume sets, VxFS sets the VxFS tunables based on the geometry of the first component volume (volume 0) in the volume set.

OPTIONS

-b value Sets the virtual memory manager (VMM) buffer count. There is a default value for the VMM based on the amount of physical memory, and a current value. You can display these two values by entering vxtunefs -b. When VxFS is installed, the default value and the current value are the same. The -b value option specifies an increase, from zero to 100 per cent, in the VMM buffer count from its default. The specified value is saved in the file /etc/vx/vxfssystem to make it persistent across VxFS module loads or system reboots.
The system can be tuned by adjusting the VMM value. In most instances, the default value is suitable for good performance. There are counters in the kernel that can be monitored to determine if there are delays caused by waiting for VMM buffers. If there appears to be a performance issue related to VMM, the buffer count can be increased. If there is better response time on the system, it is a good indication that VMM buffers were a bottleneck.
The vxportal driver must be loaded for this option to work. Run the following command to determine whether the vxportal driver is available:
/etc/methods/vxkextadm portal status
-D suboption You can specify one of the following suboptions with the -D option:
print Prints the current values of the parameters. The value of drefund supported is also printed. The value (0 or 1) of the drefund supported internal variable indicates whether the system supports the D_REFUND mode from the perspective of VxFS.
drefund_enable=value
  Specifying a value of 1 enables the D_REFUND mode, while a value of 0 disables the D_REFUND mode. In D_REFUND mode, PDT buffers are dynamically grown and shrunk as required.
The D_REFUND mode is supported only on 1 6.1 TL2 and later releases. The default drefund_enable value on 1 5.3 is 0 and on 1 6.1 is 1. The correct D_REFUND mode operation requires certain APARs to be installed. See the Veritas Storage Foundation Release Notes. VxFS keeps track of this D_REFUND support internally. VxFS operates in D_REFUND mode only if the drefund_enable value is 1 and the 1 operating system supports the D_REFUND mode. The operating system supports the D_REFUND mode if the drefund supported value is 1. You can check the value of drefund supported by specifying the -D print option.
num_pdt=value
  Sets the number of PDT buffers used by VxFS. The value specified must be between 1 and 64 and should be a power of 2.
If D_REFUND is enabled (drefund_enable=1), the default number of PDT buffers set by VxFS auto-tuning is sufficient such that manual tuning should not be required. If D_REFUND is disabled (drefund_enable=0), the default number of PDT buffers set by VxFS is also auto-tuned, but is using different values. In this case, the default number of PDT buffers is 16 when number of CPUs is between 4 and 128, and 32 PDT buffers when number of CPUs is greater than 128.
chunk_flush_size=value
  Sets the chunk size used when flushing a large file to disk. To flush an entire file, a single VMM call may be issued. However, if the file is large, issuing a single VMM call can result in scanning and flushing a large number of pages at one time, which can impact buffered I/O performance. To avoid this problem, you can issue multiple VMM calls on successive smaller chunks of the large file. A value of 0 disables this chunked flushing. Values of 1, 2, and 3 set chunk sizes of 256 MB, 128 MB, and 64 MB respectively.
If D_REFUND is enabled, the default chunk size is 256 MB. If D_REFUND is disabled, the default chunk size is 128 MB.
sync_time=value
  Sets the periodic interval for the flushing of asynchronous file data writes to disk. The value ranges from 60 to 180 seconds. With lower values, VxFS flushes more frequently.
The default value is 60 seconds.
read_flush_disable=value
  Enables or disables the flushing and invalidating of pages while performing sequential read I/O under memory pressure.
Specifying a value of 0 enables read flushing, while specifying a value of 1 disables read flushing. Default value is 0.
write_flush_disable=value
  Enables or disables the invalidating of pages during the write flush behind operation under memory pressure.
Specifying a value of 0 enables write flushing, while specifying a value of 1 disables write flushing. Default value is 0 (write flushing enabled).
lowmem_disable=value
  Enables or disables the actions taken by VxFS under memory pressure. The read_flush_disable option and write_flush_disable option form a subset of this option. Symantec recommends that you use the lowmem_disable option to change VxFS behavior under memory pressure instead of the read_flush_disable and write_flush_disable options.
Specifying a value of 0 enables low memory flushing, while specifying a value of 1 disables low memory flushing. Default value is 0.
lm_dirtypage_track_enable=value
  Enables or disables dirty page tracking at the 1 VMM level for locally-mounted VxFS file systems.
Specifying a value of 1 enables local mount dirty page tracking, while specifying a value of 0 disables local mount dirty page tracking. Default value is 1.
cfs_dirtypage_track_enable=value
  Enables or disables dirty page tracking at the 1 VMM level for cluster-mounted VxFS file systems.
Specifying a value of 1 enables cluster mount dirty page tracking, while specifying a value of 0 disables cluster mount dirty page tracking. Default value is 1.
fsync_async_flush_enable=value
  Enables or disables the initial asynchronous flush when VxFS performs an fsync() operation on a file.
Specifying a value of 1 enables the initial asynchronous flush, while specifying a value of 0 disables the initial asynchronous flush. Default value is 0. Symantec recommends that you do not alter the default value of this tunable.
bkgrnd_fsync_enable=value
  Enables or disables the call to the internal VxFS fsync() function to mark a file as having clean pages only, thus avoiding the need to perform repeated flushing of the same file every sync_time seconds.
Specifying a value of 1 enables the call to fsync(), while specifying a value of 0 disables the call to fsync(). Default value is 1. Symantec recommends that you do not alter the default value of this tunable.
-f filename Use filename instead of /etc/vx/tunefstab as the file containing tuning parameters.
-o parameter=value
  Specifies parameters for the file systems listed on the command line.
-p Prints the tuning parameters for all the file systems specified on the command line.
-s Sets the new tuning parameters for the Veritas File Systems specified on the command line or in the tunefstab file.

VxFS Tuning Parameters and Guidelines

The values for all the following parameters except fcl_keeptime, fcl_winterval, inode_aging_count, inode_aging_size, read_nstream, write_nstream, and qio_cache_enable can be specified in bytes, kilobytes, megabytes, gigabytes, terabytes, or sectors (512 bytes) by appending k, K, m, M, g, G, t, T, s, or S. There is no need for a suffix for the value in bytes.

If the file system is being used with VxVM, it is advisable to let the parameters use the default values based on the volume geometry.

If the file system is being used with a hardware disk array, align the parameters to match the geometry of the logical disk. For disk striping and RAID-5 configurations, set read_pref_io to the stripe unit size or interleave factor and set read_nstream to be the number of columns. For disk striping configurations, set write_pref_io and write_nstream to the same values as read_pref_io and read_nstream, but for RAID-5 configurations, set write_pref_io, to the full stripe size and set write_nstream to 1.

For an application to do efficient direct I/O or discovered direct I/O, it should issue read requests that are equal to the product of read_nstream and read_pref_io. In general, any multiple or factor of read_nstream multiplied by read_pref_io is a good size for performance. For writing, the same general rule applies to the write_pref_io and write_nstream parameters. When tuning a file system, the best thing to do is use the tuning parameters under a real workload.

If an application is doing sequential I/O to large files, the application should issue requests larger than the discovered_direct_iosz. This performs the I/O requests as discovered direct I/O requests which are unbuffered like direct I/O, but which do not require synchronous inode updates when extending the file. If the file is too large to fit in the cache, using unbuffered I/O avoids losing useful data out of the cache and lowers CPU overhead.

delicache_enable
  Specifies whether performance optimization of inode allocation and reuse during a new file creation is turned on or not. These optimizations are not supported for cluster file systems. You can specify the following values for delicache_enable:
 
0 Disables delicache optimization
1 Enables delicache optimization
The default value of delicache_enable is 1 for local mounts and 0 for cluster file systems.
discovered_direct_iosz
  Any file I/O requests larger than the discovered_direct_iosz are handled as discovered direct I/O. A discovered direct I/O is unbuffered like direct I/O, but it does not require a synchronous commit of the inode when the file is extended or blocks are allocated. For larger I/O requests, the CPU time for copying the data into the page cache and the cost of using memory to buffer the I/O becomes more expensive than the cost of doing the disk I/O. For these I/O requests, using discovered direct I/O is more efficient than regular I/O. The default value of this parameter is 256K.
fcl_keeptime
  Specifies the minimum amount of time, in seconds, that the VxFS File Change Log (FCL) keeps records in the log. When the oldest 8K block of FCL records have been kept longer than the value of fcl_keeptime, they are purged from the FCL file and the extents nearest to the beginning of the FCL file are freed. This process is referred to as "punching a hole." Holes are punched in the FCL file in 8K chunks.
If the fcl_maxalloc parameter is set, records are purged from the FCL file when the amount of space allocated to the FCL file exceeds fcl_maxalloc. This purge occurs even if the elapsed time that the records have been in the log is less than the value of fcl_keeptime. If the file system runs out of space before fcl_keeptime is reached, the FCL file is deactivated.
Either or both of the fcl_keeptime or fcl_maxalloc parameters must be set before the File Change Log can be activated. fcl_keeptime operates only on Version 6 or higher disk layout file systems.
fcl_maxalloc
  Specifies the maximum amount of space that can be allocated to the VxFS File Change Log. The FCL file is a sparse file that grows as changes occur in the file system. When the space allocated to the FCL file reaches the fcl_maxalloc value, the oldest FCL records are purged from the FCL file and the extents nearest to the beginning of the FCL file are freed. This process is referred to as "punching a hole." Holes are punched in the FCL file in 8K chunks. If the file system runs out of space before fcl_maxalloc is reached, the FCL file is deactivated.
Either or both of the fcl_maxalloc or fcl_keeptime parameters must be set before the File Change Log can be activated. fcl_maxalloc operates only on Version 6 or higher disk layout file systems.
fcl_ointerval
  Specifies the time interval in seconds within which subsequent opens of a file do not produce an additional FCL record. This helps to reduce number of repetitive file-open records logged in the FCL file, especially in the case of frequent accesses through NFS. If the tracking of access information is also enabled, a subsequent file open event within fcl_ointerval might produce a record, if the latter open is by a different user. Similarly, if an inode goes out of cache and returns, or if there is an FCL sync, there might be more than one file open record within the same open interval. The default value is 600 seconds.
fcl_winterval
  Specifies the time, in seconds, that must elapse before the VxFS File Change Log records a data overwrite, data extending write, or data truncate for a file. The ability to limit the number of repetitive FCL records for continuous writes to the same file is important for file system performance and for applications processing the FCL file. fcl_winterval is best set to an interval less than the shortest interval between reads of the FCL file by any application. This way all applications using the FCL file can be assured of finding at least one FCL record for any file experiencing continuous data changes.
fcl_winterval is enforced for all files in the file system. Each file maintains its own time stamps, and the elapsed time between FCL records is per file. This elapsed time can be overridden using the VxFS FCL sync public API (see the vxfs_fcl_sync(3) manual page).
fcl_winterval operates only on Version 6 or higher disk layout file systems. The default value of fcl_winterval is 3600 seconds.
initial_extent_size
  Changes the default size of the initial extent.
VxFS determines, based on the first write to a new file, the size of the first extent to allocate to the file. Typically the first extent is the smallest power of 2 that is larger than the size of the first write. If that power of 2 is less than 8K, the first extent allocated is 8K. After the initial extent, the file system increases the size of subsequent extents with each allocation. See max_seqio_extent_size.
Because most applications write to files using a buffer size of 8K or less, the increasing extents start doubling from a small initial extent. initial_extent_size changes the default initial extent size to a larger value, so the doubling policy starts from a much larger initial size, and the file system will not allocate a set of small extents at the start of a file.
Use this parameter only on file systems that have a very large average file size. On such file systems, there are fewer extents per file and less fragmentation.
initial_extent_size is measured in file system blocks.
inode_aging_count
  Specifies the maximum number of inodes to place on an inode aging list. Inode aging is used in conjunction with file system Storage Checkpoints to allow quick restoration of large, recently deleted files. The aging list is maintained in first-in-first-out (fifo) order up to maximum number of inodes specified by inode_aging_count. As newer inodes are placed on the list, older inodes are removed to complete their aging process. For best performance, it is advisable to age only a limited number of larger files before completion of the removal process. The default maximum number of inodes to age is 2048.
inode_aging_size
  Specifies the minimum size to qualify a deleted inode for inode aging. Inode aging is used in conjunction with file system Storage Checkpoints to allow quick restoration of large, recently deleted files. For best performance, age only a limited number of larger files before completion of the removal process. Setting the size too low can push larger file inodes out of the aging queue to make room for newly removed smaller file inodes.
lazy_copyonwrite
  Shared extents must be replaced with newly allocated extents before modification. Under normal circumstances, the data from the shared extent is copied to the new extent before it is inserted into the file. This causes some performance impact due to the extra operations on the disk. Postponing the update of the new extent until normal write processing would flush the new data allows a performance benefit. However, if the system performing the write fails before completing the operation, the data that was previously on the disk at the new location may appear inside the file after file system recovery. You can specify the following values for lazy_copyonwrite:
 
0 Disables lazy_copyonwrite optimization
1 Enables lazy_copyonwrite optimization
The default value of lazy_copyonwrite is 0.
This tunable is not supported on disk layouts prior to Version 8.
max_buf_data_size
  Not available.
max_direct_iosz
  Maximum size of a direct I/O request issued by the file system. If there is a larger I/O request, it is broken up into max_direct_iosz chunks. This parameter defines how much memory an I/O request can lock at once; do not set it to more than 20% of the system’s memory.
max_diskq
  Specifies the maximum disk queue generated by a single file. If the number of dirty pages in the disk queue exceeds this limit, the file system prevents writing more data to disk until the amount of data decreases. The default value is 1 megabyte.
Although it does not limit the actual disk queue, max_diskq prevents processes that flush data to disk, such as fsync, from making the system unresponsive.
See the write_throttle description for more information on pages and system memory.
max_seqio_extent_size
  Increases or decreases the maximum size of an extent. When the file system is following its default allocation policy for sequential writes to a file, it allocates an initial extent that is large enough for the first write to the file. When additional extents are allocated, the extents are progressively larger since the algorithm tries to double the size of the file with each new extent. Thus, each extent can hold several writes worth of data. This reduces the total number of extents in anticipation of continued sequential writes. When there are no more writes to the file, unused space is freed for other files to use.
In general, this allocation stops increasing the size of extents at 2048 blocks, which prevents one file from holding too much unused space.
max_seqio_extent_size is measured in file system blocks. The default value for this tunable is 2048 blocks. Setting max_seqio_extent_size to a value less than 2048 automatically resets this tunable to the default value.
oltp_load
  Improves file system cache performance when storing database files in a file system mounted with the default mount options. Some AIX system tunable parameters must be tuned to obtain better performance before using oltp_load For example, the virtual memory manager (VMM) maxclient tunable, plus the database and application memory usage, must not exceed total system memory size. To enable oltp_load, set the value to 1. The default value is 0.
qio_cache_enable
  Enables or disables caching on Quick I/O for Databases files. The default behavior is to disable caching. To enable caching, set qio_cache_enable to 1.
On systems with large amounts of memory, the database cannot always use all of the memory as a cache. By enabling file system caching as a second level cache, performance can improve. However, if the database supports large cache sizes, it is advisable to use database caching. See the Veritas Storage Foundation for databases product documentation for more information.
read_ahead
  In the absence of a specific caching advisory, the default for all VxFS read operations is to perform sequential read ahead. The enhanced read ahead functionality implements an algorithm that allows read aheads to detect more elaborate patterns (such as increasing or decreasing read offsets, or multithreaded file accesses) in addition to simple sequential reads. You can specify the following values for read_ahead:
 
0 Disables read ahead functionality
1 Retains traditional sequential read ahead behavior
2 Enables enhanced read ahead for all reads
By default, read_ahead is set to 1, that is, VxFS detects only sequential patterns.
read_ahead detects patterns on a per-thread basis, up to a maximum of vx_era_nthreads. The default number of threads is 5.
read_nstream
  The number of parallel read requests of size read_pref_io that can be outstanding at one time. The file system uses the product of read_nstream and read_pref_io to determine its read ahead size. The default value for read_nstream is 1.
read_pref_io
  The preferred read request size. The file system uses this in conjunction with the read_nstream value to determine how much data to read ahead. The default value is 64K.
thin_friendly_alloc
  Enables or disables thin friendly allocations. Specifying a value of 1 enables thin friendly allocations, while specifying a value of 0 disables thin friendly allocations. The default value is 1 for thinrclm volumes, and 0 for all other volume types. You must turn on delicache_enable before you can activate this feature.
write_nstream
  The number of parallel write requests of size write_pref_io that can be outstanding at one time. The file system uses the product of write_nstream and write_pref_io to determine when to do flush behind on writes. The default value for write_nstream is 1.
write_pref_io
  The preferred write request size. The file system uses this in conjunction with the write_nstream value to determine how to do flush behind on writes. The default value is 64K.
write_throttle
  When data is written to a file through buffered writes, the file system updates only the in-memory image of the file, creating what are referred to as dirty pages. Dirty pages are cleaned when the file system later writes the data in these pages to disk. Note that data can be lost if the system crashes before dirty pages are written to disk.
Newer model computer systems typically have more memory. The more physical memory a system has, the more dirty pages the file system can generate before having to write the pages to disk to free up memory. So more dirty pages can potentially lead to longer return times for operations that write dirty pages to disk such as sync and fsync. If your system has a combination of a slow storage device and a large amount of memory, the sync operations may take long enough to complete that it gives the appearance of a hung system.
If your system is exhibiting this behavior, you can change the value of write_throttle. write_throttle lets you lower the number of dirty pages per file that the file system will generate before writing them to disk. After the number of dirty pages for a file reaches the write_throttle threshold, the file system starts flushing pages to disk even if free memory is still available. Depending on the speed of the storage device, user write performance may suffer, but the number of dirty pages is limited, so sync operations will complete much faster.
The default value of write_throttle is zero. The default value places no limit on the number of dirty pages per file. This typically generates a large number of dirty pages, but maintains fast writes. If write_throttle is non-zero, VxFS limits the number of dirty pages per file to write_throttle pages In some cases, write_throttle may delay write requests. For example, lowering the value of write_throttle may increase the file disk queue to the max_diskq value, delaying user writes until the disk queue decreases. So unless the system has a combination of large physical memory and slow storage devices, it is advisable not to change the value of write_throttle.

DIAGNOSTICS

The vxtunefs command returns the following errors:
EBUSY There are still mounted file systems. Unmount all file systems before tuning the VMM buffer count.
E2BIG There is not enough physical memory to handle the specified percent increase in the VMM buffer count.
EINVAL The specified value is not in the 0-100 percent range.

FILES

/etc/vx/tunefstab VxFS tuning parameters table.
/etc/vx/vxfssystem Contains the value for the vmm buffer count increase to make it persistent across VxFS module loads or system reboots.

SEE ALSO

mount, mkfs, sync, vxfs_fcl_sync(3), tunefstab(4), vxfsio(7)

Veritas File System Administrator’s Guide,
Veritas Volume Manager Administrator’s Guide,
Veritas Storage Foundation for databases documentation.


VxFS 5.1 SP1 vxtunefs (1M)