If the file system is being used with VxVM,
it is advisable to let the parameters use the default
values based on the volume geometry.
If the file system is being used with a
hardware disk array,
align the parameters to match the geometry of the logical disk.
For disk striping and RAID-5 configurations,
set
read_pref_io
to the stripe unit size or interleave factor and set
read_nstream
to be the number of columns.
For disk striping configurations,
set
write_pref_io
and
write_nstream
to the same values as
read_pref_io
and
read_nstream,
but for RAID-5 configurations,
set
write_pref_io,
to the full stripe size and set
write_nstream
to 1.
For an application to do efficient
direct I/O or discovered direct I/O,
it should issue read requests that are equal
to the product of
read_nstream
and
read_pref_io.
In general,
any multiple or factor of
read_nstream
multiplied by
read_pref_io
is a good size for performance.
For writing,
the same general rule applies to the
write_pref_io
and
write_nstream
parameters.
When tuning a file system,
the best thing to do is use the tuning
parameters under a real workload.
If an application is doing sequential I/O to large files,
the application should issue requests larger than the
discovered_direct_iosz.
This performs the I/O requests as discovered
direct I/O requests which are unbuffered like direct I/O,
but which do not require synchronous inode
updates when extending the file.
If the file is too large to fit in the cache,
using unbuffered I/O avoids losing useful data out of
the cache and lowers CPU overhead.
dalloc_enable
|
|
Enables or disables delayed allocations.
You can specify the following values for
dalloc_enable:
|
|
0
|
Disables delayed allocations
|
1
|
Enables delayed allocations
|
|
|
|
|
|
|
delicache_enable
|
|
Specifies whether performance optimization of inode allocation and reuse
during a new file creation is turned on or not.
You can specify the following values for
delicache_enable:
|
|
0
|
Disables delicache optimization
|
1
|
Enables delicache optimization
|
|
|
|
|
|
The default value of
delicache_enable
is 1. However, the performance benefits in case of cluster
file system are limited as compared to local mount.
|
discovered_direct_iosz
|
|
Any file I/O requests larger than the
discovered_direct_iosz are handled as discovered direct I/O.
A discovered direct I/O is unbuffered like direct I/O,
but it does not require a synchronous commit of the
inode when the file is extended or blocks are allocated.
For larger I/O requests, the CPU time for copying the
data into the
page cache
and the cost of using memory to buffer the I/O becomes more expensive
than the cost of doing the disk I/O.
For these I/O requests,
using discovered direct I/O is more efficient than regular I/O.
The default value of this parameter is 256K.
|
fcl_keeptime
|
|
Specifies the minimum amount of time,
in seconds,
that the VxFS File Change Log (FCL) keeps records in the log.
When the oldest 8K block of FCL records have been kept
longer than the value of
fcl_keeptime,
they are purged from the FCL file and the extents
nearest to the beginning of the FCL file are freed.
This process is referred to as
"punching a hole."
Holes are punched in the FCL file in 8K chunks.
|
|
If the
fcl_maxalloc
parameter is set,
records are purged from the FCL file when the
amount of space allocated to the FCL file exceeds
fcl_maxalloc.
This purge occurs even if the elapsed
time that the records have been in the log is less than the value of
fcl_keeptime.
If the file system runs out of space before
fcl_keeptime
is reached,
the FCL file is deactivated.
|
|
Either or both of the
fcl_keeptime
or
fcl_maxalloc
parameters must be set before the File Change Log can be activated.
fcl_keeptime
operates only on Version 6 or higher disk layout file systems.
|
fcl_maxalloc
|
|
Specifies the maximum amount of space that can be allocated to the
VxFS File Change Log.
The FCL file is a sparse file that
grows as changes occur in the file system.
When the space allocated to the FCL file reaches the
fcl_maxalloc
value,
the oldest FCL records are purged from the FCL file
and the extents nearest to the beginning of the FCL file are freed.
This process is referred to as
"punching a hole."
Holes are punched in the FCL file in 8K chunks.
If the file system runs out of space before
fcl_maxalloc
is reached,
the FCL file is deactivated.
|
|
Either or both of the
fcl_maxalloc
or
fcl_keeptime
parameters must be set before the File Change Log can be activated.
fcl_maxalloc
operates only on Version 6 or higher disk layout file systems.
|
fcl_ointerval
|
|
Specifies the time interval in seconds within which subsequent opens
of a file do not produce an additional FCL record. This helps to reduce
number of repetitive file-open records logged in the FCL file, especially in
the case of frequent accesses through NFS. If the tracking of access
information is also enabled, a subsequent file open event within
fcl_ointerval
might produce a record, if the latter open is by a different user.
Similarly, if an inode goes out of cache and returns, or if there is an
FCL sync, there might be more than one file open record within the same
open interval. The default value is 600 seconds.
|
fcl_winterval
|
|
Specifies the time,
in seconds,
that must elapse before the VxFS File Change Log
records a data overwrite,
data extending write,
or data truncate for a file.
The ability to limit the number of repetitive
FCL records for continuous writes to the same file
is important for file system performance
and for applications processing the FCL file.
fcl_winterval
is best set to an interval less than
the shortest interval between reads
of the FCL file by any application.
This way all applications using the FCL file
can be assured of finding at least one FCL record
for any file experiencing continuous data changes.
|
|
fcl_winterval
is enforced for all files in the file system.
Each file maintains its own time stamps,
and the elapsed time between FCL records is per file.
This elapsed time can be overridden
using the VxFS FCL sync public API
(see the
vxfs_fcl_sync(3)
manual page).
|
|
fcl_winterval
operates only on Version 6 or higher disk layout file systems.
The default value of
fcl_winterval
is 3600 seconds.
|
fiostats_enable
|
|
Specifies whether the file I/O statistics collection is turned on or not.
You can specify the following values for
fiostats_enable
:
|
|
0
|
Disables file I/O statistics collection
|
1
|
Enables file I/O statistics collection
|
|
|
|
|
|
The default value of
fiostats_enable
is 0.
|
lazy_isize_enable
|
|
Enables or disables a performance optimization in Cluster File system. When
one node in a cluster is extending the file, optimization is to not reflect the
updated file size immediately on other nodes. Note that if this tunable is
enabled, file size reported by stat might be stale, but it will not have any
impact on other file operations. You can specify the following values for
lazy_isize_enable:
|
|
0
|
Disables the performance optimization
|
1
|
Enables the performance optimization
|
|
|
|
|
|
The default value of
lazy_isize_enable
is 0.
|
initial_extent_size
|
|
Changes the default size of the initial extent.
|
|
VxFS determines,
based on the first write to a new file,
the size of the first extent to allocate to the file.
Typically the first extent is the smallest power of 2 that is larger than
the size of the first write.
If that power of 2 is less than 8K,
the first extent allocated is 8K.
After the initial extent,
the file system increases the size of subsequent extents with each allocation.
See max_seqio_extent_size.
|
|
Because most applications write to files using a buffer size of 8K or
less,
the increasing extents start doubling from a small initial
extent.
initial_extent_size
changes the default initial extent size to a larger value,
so the doubling policy starts from a much larger initial size,
and the file system will not allocate a set of small extents at the start
of a file.
|
|
Use this parameter only on file systems that have a very
large average file size.
On such file systems,
there are fewer extents per file and less fragmentation.
|
|
initial_extent_size
is measured in file system blocks.
|
inode_aging_count
|
|
Specifies the maximum number of inodes to
place on an inode aging list.
Inode aging is used in conjunction with file system
Storage Checkpoints to allow quick restoration of large,
recently deleted files.
The aging list is maintained in first-in-first-out
(fifo)
order up to maximum number of inodes specified by
inode_aging_count.
As newer inodes are placed on the list,
older inodes are removed to complete their aging process.
For best performance,
it is advisable to age only a limited number of
larger files before completion of the removal process.
The default maximum number of inodes to age is 2048.
|
inode_aging_size
|
|
Specifies the minimum size to qualify a deleted inode for inode aging.
Inode aging is used in conjunction with file system
Storage Checkpoints to allow quick restoration of large,
recently deleted files.
For best performance,
age only a limited number of larger files before completion of the
removal process.
Setting the size too low can push larger file inodes out of the aging
queue to make room for newly removed smaller file inodes.
|
lazy_copyonwrite
|
|
Shared extents must be replaced with newly allocated extents before
modification.
Under normal circumstances,
the data from the shared extent is copied to the new extent before it is
inserted into the file.
This causes some performance impact due to the extra operations on the disk.
Postponing the update of the new extent until normal write processing would
flush the new data allows a performance benefit.
However,
if the system performing the write fails before completing the operation,
the data that was previously on the disk at the new location may appear
inside the file after file system recovery.
You can specify the following values for
lazy_copyonwrite:
|
|
0
|
Disables
lazy_copyonwrite
optimization
|
1
|
Enables
lazy_copyonwrite
optimization
|
|
|
|
|
|
The default value of
lazy_copyonwrite
is 0.
|
|
This tunable is not supported on disk layouts prior to Version 8.
|
max_buf_data_size
|
|
Not available.
|
max_direct_iosz
|
|
Maximum size of a direct I/O request issued by the file system.
If there is a larger I/O request,
it is broken up into
max_direct_iosz
chunks.
This parameter defines how much memory an I/O request can lock at once;
do not set it to more than 20% of the systems memory.
|
max_diskq
|
|
Specifies the maximum disk queue generated by a single file.
If the number of dirty
pages
in the disk queue exceeds this limit,
the file system prevents writing more data to disk
until the amount of data decreases.
The default value is 1 megabyte.
|
|
Although it does not limit the actual disk queue,
max_diskq
prevents processes that flush data to disk,
such as
fsync,
from making the system unresponsive.
|
|
See the
write_throttle
description for more information on pages and system memory.
|
max_seqio_extent_size
|
|
Increases or decreases the maximum size of an extent.
When the file system is following its default allocation policy for
sequential writes to a file,
it allocates an initial extent that is
large enough for the first write to the file.
When additional extents are allocated,
the extents are progressively larger since
the algorithm tries to double the size of the file with each new extent.
Thus, each extent can hold several writes worth of data.
This reduces the total number of extents in anticipation of continued
sequential writes.
When there are no more writes to the file,
unused space is freed for other files to use.
|
|
In general,
this allocation stops increasing the size of extents at
max_seqio_extent_size
blocks,
which prevents one file from holding too much unused space.
|
|
max_seqio_extent_size
is measured in file system blocks.
The default value for this tunable is 32768 blocks.
Setting
max_seqio_extent_size
to a value less than 2048 blocks automatically resets this tunable to
2048 blocks (minimum value).
|
odm_cache_enable
|
|
Enables or disables caching for database files accessed via ODM.
To enable caching,
set
odm_cache_enable
to 1.
This enables caching for the file system,
but you must also configure which files or
I/O types are to be cached by using the
odmadm
command.
|
|
See the
odmadm(1)
manual page.
|
|
odm_cache_enable is synonymous with
qio_cache_enable;
activating one activates the other.
|
pdir_enable
|
|
Enables or disables partitioned directories.
Specifying a value of 1 enables partitioned directories,
while specifying a value of 0 disables partitioned directories.
The default value is
0.
|
|
The
pdir_enable
tunable operates only on Version 8 or later disk layout file systems.
|
pdir_threshold
|
|
Sets the threshold value in terms of directory size in bytes,
beyond which the directory will be partitioned if the
pdir_enable
tunable is set to 1.
The default value is
32768.
The
pdir_threshold
tunable operates only on Version 8 or later disk layout file systems.
|
read_ahead
|
|
In the absence of a specific caching advisory,
the default for all VxFS read operations is to perform sequential read ahead.
The enhanced read ahead functionality implements an algorithm that allows
read aheads to detect more elaborate patterns
(such as increasing or decreasing read offsets,
or multithreaded file accesses)
in addition to simple sequential reads.
You can specify the following values for
read_ahead:
|
|
0
|
Disables read ahead functionality
|
1
|
Retains traditional sequential read ahead behavior
|
2
|
Enables enhanced read ahead for all reads
|
|
|
|
|
|
By default,
read_ahead
is set to 1,
that is,
VxFS detects only sequential patterns.
|
|
read_ahead
detects patterns
on a per-thread basis,
up to a maximum of
vx_era_nthreads.
The default number of threads is 5.
|
read_nstream
|
|
The number of parallel read requests of size
read_pref_io
that can be outstanding at one time.
The file system uses the product of
read_nstream
and
read_pref_io
to determine its read ahead size.
The default value for
read_nstream
is 1.
|
read_pref_io
|
|
The preferred read request size. The
file system uses this in conjunction with the
read_nstream
value to determine how much data to read ahead.
The default value is 64K.
|
thin_friendly_alloc
|
|
Enables or disables thin friendly allocations.
Specifying a value of
1
enables thin friendly allocations,
while specifying a value of
0
disables thin friendly allocations.
The default value is
1
for
thinrclm
volumes,
and
0
for all other volume types.
You must turn on
delicache_enable
before you can activate this feature.
This tunable is not supported for cluster file systems.
|
write_nstream
|
|
The number of parallel write
requests of size
write_pref_io
that can be outstanding at one time.
The file system uses the product of
write_nstream
and
write_pref_io
to determine when to do flush behind on writes.
The default value for
write_nstream
is 1.
|
write_pref_io
|
|
The preferred write request size.
The file system uses this in conjunction with the
write_nstream
value to determine how to do flush behind on writes.
The default value is 64K.
|
write_throttle
|
|
When data is written to a file through buffered writes,
the file system updates only the in-memory image of the file,
creating what are referred to as
dirty pages.
Dirty pages are
cleaned
when the file system later writes the data in these pages to disk.
Note that data can be lost if the system crashes before
dirty pages are written to disk.
|
|
Newer model computer systems typically have more memory.
The more physical memory a system has,
the more dirty pages the file system can generate before
having to write the pages to disk to free up memory.
So more dirty pages can potentially lead to longer
return times for operations that write dirty pages to disk
such as
sync
and
fsync.
If your system has a combination of a slow storage device
and a large amount of memory,
the sync operations may take long enough to complete
that it gives the appearance of a hung system.
|
|
If your system is exhibiting this behavior,
you can change the value of
write_throttle.
write_throttle
lets you lower the number of dirty pages per file that the
file system will generate before writing them to disk.
After the number of dirty pages for a file reaches the
write_throttle
threshold,
the file system starts flushing pages to
disk even if free memory is still available.
Depending on the speed of the storage device,
user write performance may suffer,
but the number of dirty pages is limited,
so sync operations will complete much faster.
|
|
The default value of
write_throttle
is zero.
The default value places no limit on the number
of dirty pages per file.
This typically generates a large number of dirty pages,
but maintains fast writes.
If
write_throttle
is non-zero,
VxFS limits the number of dirty pages per file to
write_throttle
pages
In some cases,
write_throttle
may delay write requests.
For example,
lowering the value of
write_throttle
may increase the file disk queue to the
max_diskq
value,
delaying user writes until the disk queue decreases.
So unless the system has a combination of
large physical memory and slow
storage devices,
it is advisable not to change the value of
write_throttle.
|
|