vxfsio - VxFS file system control functions
cc -I /opt/VRTSvxfs/include -L /opt/VRTSvxfs/lib-l vxfsutil -ldl
int ioctl (int fildes, int cmd, arg);
The VxFS ioctl(2) enhancements provide extended control for open files.
The requirements for direct I/O are as follows:
o The starting file offset must be aligned to a 512-byte boundary. o The ending file offset must be aligned to a 512-byte boundary, or the length must be a multiple of 512 bytes. o The memory buffer must start on an 512-byte boundary.
If the I/O is performed using the readv(2) and writev(2) system calls, these restrictions apply to each element of the array of struct iovec.
The argument fildes is an open file descriptor. The data type and value of arg are specific to the type of command specified by cmd.
The symbolic names for commands and file status flags are defined by the sys/fs/vx_ioctl.h header file. The available VxFS ioctls are:
o VX_FREEZE o VX_FREEZE_ALL o VX_THAW o VX_GETCACHE o VX_SETCACHE o VX_GETEXT o VX_SETEXT o VX_GETFSOPT o VX_GET_IOPARAMETERS
VX_FREEZE Sync then freeze the file system. After it is frozen, all operations on the file system are blocked until a VX_THAW operation is received. The argument arg is a timeout value expressed in seconds. If a VX_THAW operation is not received within the specified timeout interval, the file system performs a VX_THAW operation automatically. Only privileged users can run this command on the root directory of the file system. The VX_FREEZE ioctl returns a zero if the file system is successfully frozen. If the operation fails, the return value is -1 and the external variable errno is a general DIAGNOSTIC. VX_FREEZE_ALL Sync then freeze multiple file systems. This ioctl is identical to VX_FREEZE except that multiple file systems can be specified. VX_THAW Unblocks a file system that was frozen by a VX_FREEZE operation. The process that is to issue a VX_THAW operation must have the root directory of the file system open, and must ensure that it does not access the file system after the file system is frozen so that the process itself does not block. Only privileged users can run this command, on the root directory of the file system. The VX_THAW ioctl returns a zero if the file system is successfully thawed. If the operation fails, the return value is -1 and the external variable errno is a general DIAGNOSTIC or one of diagnostics listed in the DIAGNOSTICS section. VX_GETCACHE Gets caching advisories in effect for the file. The argument arg is a pointer to an int. The VX_GETCACHE ioctl returns a zero if the caching advisories are successfully obtained and the advisories are returned in arg. If the operation fails, the return value is -1 and the external variable errno is a general DIAGNOSTIC. VX_SETCACHE Sets caching advisories. These advisories allow an application to indicate to the file system which forms of caching are most advantageous. The values for arg are such that multiple advisories may be set by combining values with bitwise OR operations. The possible values for arg are VX_DIRECT, VX_DSYNC, VX_NOREUSE, VX_RANDOM, VX_SEQ, and VX_UNBUFFERED:
The VX_RANDOM and VX_SEQ caching advisories are mutually exclusive. Similarly, only one of the VX_DIRECT, VX_DSYNC, or VX_UNBUFFERED caching advisories may be set.
VX_DIRECT Indicates that data associated with read and write operations is transferred directly to or from the user-supplied buffer without being cached. Write I/O operations complete as defined by synchronized I/O data integrity completion. That is, the data is stored persistently on disk before the system call returns. I/O operations must be on a file offset that is a multiple of 512 bytes. The length of the I/O must also be a multiple of 512 bytes. In addition, the address of the buffer must be 512-bytes aligned. If the I/O is performed using the readv(2) and writev(2) system calls, these restrictions apply to each element of the array of struct iovec. If an I/O request fails to meet alignment criteria or the file is currently mapped, the I/O request is performed as a data synchronous I/O operation. See the mmap(2) manual page. The alignment required to perform direct I/O on a given platform and operating system release may be less restrictive than above, but meeting these requirements allow direct I/O to work on any platform. On HP-UX, direct I/O will be more efficient if the starting and ending file offsets are aligned on file system block boundaries as reported in the field f_frsize of statvfs(2). VX_DSYNC Indicates data synchronous I/O mode. In this mode, write I/O operations on the file descriptor complete as defined by synchronized I/O data integrity completion. That is, the system call does not return until the file data and any metadata necessary to retrieve the file data has been transferred to stable storage. Inode time stamp updates are not necessary to retrieve the file data. VX_ERA Enables enhanced read ahead functionality for a specific file. Enhanced read ahead functionality implements read aheads that detect more elaborate read patterns, such as multithreaded file access, in addition to simple sequential reads. It is not advisable to combine VX_ERA with VX_SEQ or VX_RANDOM. See the vxtunefs(1M) manual page for information on the read_ahead tunable. VX_NOREUSE Indicates that buffered data does not need to be retained for further use by the application. VX_RANDOM Indicates that the file is being accessed randomly. Read-ahead is not performed. VX_SEQ Indicates that the file is being accessed sequentially. Maximum read ahead is performed. VX_UNBUFFERED Indicates that data associated with read and write operations is transferred directly to or from the user supplied buffer without being cached. The alignment constraints are identical to those associated with the VX_DIRECT caching advisory. If these requirements are not met, the I/O request is performed as buffered I/O. In contrast to VX_DIRECT, the VX_UNBUFFERED advisory does not guarantee synchronized I/O data integrity completion, but it may offer better performance for appending writes.
The VX_RANDOM, VX_SEQ, and VX_NOREUSE caching advisories are maintained on a per-file basis. Changes made to these advisories by a process affect I/O operations by all processes currently accessing the file.
The VX_DIRECT, VX_DSYNC, and VX_UNBUFFERED caching advisories are maintained on a per-open instance of a file, so changes made to these advisories by a process do not affect the setting of these advisories, and therefore I/O operations, by another process.
The VX_SETCACHE ioctl returns a zero if the caching advisories are successfully set. If the operation fails, the return value is -1 and the external variable errno is a general DIAGNOSTIC.
VX_GETEXT Gets extent information. Returns the extent information associated with fildes. The argument arg points to a structure of type vx_ext as defined in sys/fs/vx_ioctl.h. Only persistent extent attributes are visible. The VX_GETEXT ioctl returns a zero if the extent information is successfully obtained. If the operation fails, the return value is -1 and the external variable errno is a general DIAGNOSTIC.
VX_SETEXT Sets extent information. The extent information is set according to the parameters specified by arg. The argument arg points to a structure of type vx_ext defined in sys/fs/vx_ioctl.h. This structure contains the following members:
off_t ext_size; /* Extent size in fs blocks */ off_t reserve; /* Space reservation in fs blocks */ int a_flags; /* Allocation flags */
The ext_size element requests a fixed extent size, in blocks, for the file. If a fixed extent size is not required, use zero to allow the default allocation policy to be used. Changes to the fixed extent size made after the file contains indirect blocks have no effect unless all current indirect blocks are freed via file truncation or reservation deallocation. The reserve element sets the amount of space preallocated to the file (in blocks). If the reserve amount is greater than the current reservation, the allocation for the file is increased to match the reserve amount. If the reserve amount is less than the current reservation, the allocation is decreased. The allocation is not reduced to less than the current file size. File reservation cannot be increased beyond the ulimit(2) of the requesting process. However, an existing reservation will not be trimmed to the requesting processs ulimit(2). Reservation of space for existing sparse files only allocates blocks at the end of the file, not to fill holes. Thus, it is possible to have a larger reservation for a file than blocks in the file. The reservation amount is independent of file size since reservation is used to preallocate space for a file. The a_flags element is used to indicate the type of reservation required. The possible values for a_flags are VX_ALIGN, VX_CHGSIZE, VX_CONTIGUOUS, VX_NOEXTEND, VX_NORESERVE, VX_TRIM, and VX_GROWFILE:
VX_ALIGN Aligns all new extents on an ext_size boundary relative to the starting block of an allocation unit. If VX_CONTIGUOUS is also set, the single extent allocated during this invocation is not subject to the alignment restriction. VX_CHGSIZE The reservation is immediately incorporated into the file. The files on-disk inode is updated with the size and block count information that is increased to include the reserved space. Unlike an fcntl F_FREESP operation, which truncates up, the space included in the file is not initialized. See the fcntl(2) manual page. This operation is restricted to users with appropriate privileges. VX_CONTIGUOUS The reservation is allocated contiguously (as a single extent). ext_size becomes the fixed extent size for subsequent allocations, but has no affect on this allocation. The reservation fails if the file has gone into indirect extents unless the amount of space requested is the same as the indirect extent size. If the contiguous allocation request is done on an empty file, this does not happen. VX_NOEXTEND The file is not extended after the current reservation is exceeded. The reservation may be increased, if necessary, by another invocation of the ioctl, but the file is not automatically extended. VX_NORESERVE The reservation is a non-persistent allocation to the file. The on-disk inode is not updated with the reservation information, so the reservation cannot survive a system crash. The reservation is associated with the file until the close of the file. The reservation is trimmed to the current file size on close. VX_TRIM The reservation for the file is trimmed to the current file size upon last close by all processes that have the file open. VX_GROWFILE The size of the file is changed to include the reservation. This operation does not physically clear the file and is relatively fast. The flag reads the grown part of the file (between the current size of the file and the size once the operation succeeds) and returns zeros after succeeding. This operation can be carried out by all users. Writes to the grown part should never fail with an ENOSPC error. Write permission to a file is required to set extent information, but any process that can open the file can get the extent information. Extent information only applies to regular files. Only one set of extent information is kept per file. Only the VX_ALIGN and VX_NOEXTEND allocation flags are persistent attributes of the file. Other allocation flags may have persistent effects, but are not visible as allocation flags. VX_ALIGN, VX_NOEXTEND, and VX_GROWFILE are the only flags visible through the VX_GETEXT ioctl. The VX_SETEXT ioctl returns a zero if the extent information is successfully set. If the operation fails, the return value is -1 and the external variable errno is a general DIAGNOSTIC. VX_GETFSOPT Gets file system options. The argument arg is a pointer to an int. This command may be used by any user who can open the root inode on the file system. The options returned in arg are:
The VX_GETFSOPT ioctl returns a zero if the file system options are successfully obtained. If the operation fails, the return value is -1 and the external variable errno is a general DIAGNOSTIC.
VX_FSO_BLKCLEAR Indicates that all newly allocated blocks are guaranteed to contain all zeros. VX_FSO_CACHE_CLOSESYNC Indicates that any non-logged changes to the inode or data are flushed to disk when the file is closed. VX_FSO_CACHE_DIRECT Indicates that any non-synchronous I/O is handled as if the VX_DIRECT cache advisory is set on the file. Also, any non-logged changes to the inode or data are flushed to disk when the file is closed. VX_FSO_CACHE_DSYNC Indicates that any writes that do not have either O_SYNC or the VX_DIRECT advisory set are handled as if the VX_DSYNC advisory is set on the file. Also, any non-logged changes to the inode or data are flushed to disk when the file is closed. VX_FSO_CACHE_TMPCACHE Indicates that delayed extending writes are disabled. Non-logged changes to the inode or data are not flushed to disk when the file is closed. VX_FSO_CACHE_UNBUFFERED Indicates that any non-synchronous I/O is handled as if the VX_UNBUFFERED cache advisory is set on the file. Also, any non-logged changes to the inode or data is flushed to disk when the file is closed. VX_FSO_DELAYLOG Indicates that some system calls may return before the intent log is written. VX_FSO_NODATAINLOG Indicates that intent logging of user data for synchronous writes is disabled. VX_FSO_OSYNC_CLOSESYNC Indicates that any non-logged changes to the inode or data is flushed to disk when a file accessed with O_SYNC is closed. VX_FSO_OSYNC_DELAY Indicates that any O_SYNC writes are delayed instead of taking effect immediately. No special action is taken when a file is closed. VX_FSO_OSYNC_DIRECT Indicates that any O_SYNC I/O is handled as if the VX_DIRECT cache advisory is set on the file instead. Also, any non-logged changes to the inode or data is flushed to disk when a file accessed with O_SYNC is closed. VX_FSO_OSYNC_DSYNC Indicates that any O_SYNC writes are handled as if the VX_DSYNC cache advisory is set on the file instead. Also, any non-logged changes to the inode or data are flushed to disk when a file accessed with O_SYNC is closed. VX_FSO_OSYNC_UNBUFFERED Indicates that any O_SYNC I/O is handled as if the VX_UNBUFFERED cache advisory was set on the file. Also, any non-logged changes to the inode or data are flushed to disk when a file accessed with O_SYNC is closed. VX_FSO_SNAPPED Indicates that a snapshot backup is in progress on the file system. VX_FSO_SNAPSHOT Indicates that this file system is a snapshot backup of another file system. VX_FSO_TMPLOG Indicates that the intent log is almost always delayed. VX_GET_IOPARAMETERS Gets the I/O parameters for optimized application I/O. The argument arg points to a structure of type vx_ioparameters as defined in sys/fs/vxio.h. The optimal I/O request sizes for applications using direct or discovered direct I/O are returned in this structure. Applications using buffered I/O must use a multiple of the st_blksize value returned by stat for their I/O requests. The VX_GET_IOPARAMETERS ioctl returns a zero if the parameters are successfully obtained. If the operation fails, the return value is -1 and the external variable errno is a general DIAGNOSTIC. The fields in the vx_ioparameters structure are:
unsigned vi_read_preferred_io; /* preferred read size in bytes */ unsigned vi_read_nstream; /* num of preferred reads to stream */ unsigned vi_read_unit_io; /* less preferred read size in bytes */ unsigned vi_write_preferred_io; /* preferred write size in bytes */ unsigned vi_write_nstream; /* num of preferred writes to stream */ unsigned vi_write_unit_io; /* less preferred write size in bytes */ unsigned vi_pref_strength; /* strength of preferences */ unsigned vi_breakup_size; /* I/O breakup size in bytes */ unsigned vi_align_offset; /* adj for alignment calculations */ dev_t vi_block_device; /* bdev number for this cdev */
For an application to do the most efficient direct I/O or discovered direct I/O, read requests should be equal to the product of vi_read_nstream multiplied by vi_read_preferred_io. In general, any multiple or factor of vi_read_nstream multiplied by vi_read_preferred_io is a size for good performance. For writing, the same formula applies to the vi_write_preferred_io and vi_write_nstream parameters. If an application is doing sequential I/O to large files, it issues a request larger than the discovered direct I/O size for the file system. This causes the I/O requests to be performed as discovered direct I/O requests (which are unbuffered like direct I/O but do not require synchronous inode updates when extending the file). If the file is larger than the cache size, use unbuffered I/O to reduce CPU overhead and to removing useful data from the cache. See the vxtunefs(1M) manual page for more information on discovered direct I/O.
Operation failures can return any of the following values in errno:
EACCESS The calling process does not have write access to the file specified by fildes. EAGAIN The file system is not currently frozen. EBADF The fildes argument is not a valid file descriptor open for writing. EFAULT An address specified by an argument is invalid. EFBIG An attempt was made to reserve space larger than the maximum file size limit for this process. EINVAL The command or argument is invalid. EIO An I/O error occurred while attempting to perform the operation. ENODEV The file specified by fildes is not the root directory of a VxFS file system. ENOSPC Requested space could not be obtained. EPERM The process does not have appropriate privilege. EROFS The file system is mounted read-only. ETIMEDOUT The VX_FREEZE timeout expired before this call.
In some cases, fsadm may reorganize the extent map of a file in such a way as to make it less contiguous. However, it does not change the geometry of a file that has a fixed extent size.
ioctl, fcntl, ulimit, fsadm_vxfs(1M), vxtunefs(1M)
|VxFS 5.1 SP1||vxfsio (7)|