vxintro - introduction to the Veritas Volume Manager utilities
The Veritas Volume Manager (VxVM) utilities provide a shell-level interface for use by system administrators and high-level applications and scripts to query and manipulate objects managed through VxVM.
boot disk group If the root disk is under VxVM control, this disk group contains the root disk and the volumes on it that are used to boot the system. By default, the name of the boot disk group is aliased to the reserved disk group name, bootdg, when the root disk is put under VxVM control. concatenated plex A plex whose subdisks are associated at specific offsets within the address range of the plex, and extend into the plex address range for the length of the subdisk. This layout allows regions of one or more disks to create a plex, rather than a single big region. data change map The data change map (DCM) is a bit-map which represents regions in the volume. When a write to a region occurs, the corresponding bit in the DCM is turned on. It is used by the Volume Replicator (VVR) for both log overflow protection and automatic secondary synchronization. data volume A volume being used as a child of a replicated volume group (RVG). For a primary RVG, a data volume contains the primary copy of that volume data. For a secondary RVG, a data volume contains a copy of the corresponding remote primary data volume. Secondary data volumes are only writable with updates from the primary. The secondary RVG must contain a data volume corresponding to each primary data volume. default disk group Each system may have one special disk group, aliased to the reserved name, defaultdg, which is the default disk group for most utilities. See the vxdg(1M) manual page for more information. disk Disks exist as two entities. One is the physical disk on which all data is ultimately stored and which exhibits all the behaviors of the underlying technology. The other is the Veritas Volume Manager presentation of disks which, while mapping one-to-one with the physical disks, are just presentations of units from which allocations of storage are made. As an example, a physical disk presents the image of a device with a definable geometry with a definable number of cylinders, heads, and so forth, whereas a Veritas Volume Manager disk (VM disk) is simply a unit of allocation with a name and a size. disk access record A configuration record that defines a pathway to a disk. The list of all disk access records stored in a system is used to find all disks attached to the system. Disk access records do not identify particular physical disks. Disk access records are identified by their disk access names (also known as DA names). Through the use of disk IDs, VxVM allows disks to be moved between controllers, or to different locations on a controller. When a disk is moved, a different disk access record is used when accessing the disk, although the disk media record continues to track the actual physical disk. On some systems, VxVM builds a list of disk access records automatically, based on the list of all devices attached to the system. On these systems, it is not necessary to define disk access records explicitly. On other systems, disk access records must be defined explicitly with the vxdisk define operation. Specialty disks (such as RAM disks or floppy disks) are likely to require explicit vxdisk define operations on all systems. disk group A group of disks that share a common configuration. A configuration consists of a set of records describing objects (including disks, volumes, plexes, and subdisks) that are associated with one particular disk group. Each disk group has an administrator-assigned name that can be used by the administrator to reference that disk group. Each disk group has an internally defined unique disk group ID, which is used to differentiate two disk groups with the same administrator-assigned name. Disk groups provide a method to partition the configuration database, so that the database size is not too large and so that database modifications do not affect too many drives. They also allow VxVM to operate with groups of physical disk media that can be moved between systems. Disks and disk groups have a circular relationship: disk groups are formed from disks, and disk group configurations are stored on disks. All disks in a disk group are stamped with a disk group ID, which is a unique identifier for naming disk groups. Some or all disks in a disk group also store copies of the configuration database of the disk group. disk group configuration A disk group configuration is a small database that contains all volume, plex, subdisk, and disk media records. These configurations are replicated onto some or all disks in the disk group, usually with one copy on each disk. Because these databases are stored within disk groups, record associations cannot span disk groups. Thus, a subdisk defined on a disk in one disk group cannot be associated with a volume in another disk group. disk header A block stored in a private region of a disk and that defines several properties of the disk. The disk header defines the size of the private region, the location and size of the public region, the unique disk ID for the disk, the disk group ID and disk group name (if the disk is currently associated with a disk group), and the host ID for a host that has exclusive use of the disk. disk ID A 64-byte universally unique identifier that is assigned to a physical disk when its private region is initialized with the vxdisk init operation. The disk ID is stored in the disk media record so that the physical disk can be related to the disk media record at system startup. disk media record A reference to a physical disk or a disk partition This record can be thought of as a physical disk identifier for the disk or partition Disk media records are configuration records that provide a name with up to 31 characters (known as the disk media name or DM name), which you can use to reference a particular disk, independent of its location on the systems various disk controllers. Disk media records reference particular physical disks through a disk ID, which is a unique identifier that is assigned to a disk when it is initialized for use with VxVM. Operations are provided to set or remove the disk ID stored in a disk media record. Such operations have the effect of removing or replacing disks, with any associated subdisks being removed or replaced along with the disk. host ID A name, usually assigned by the administrator, that identifies a particular host. Host IDs are used to assign ownership to particular physical disks. When a disk is part of a disk group that is in active use by a particular host, the disk is stamped with that hosts host ID. If another system attempts to access the disk, it detects that the disk has a non-matching host ID and disallows access until the first system discontinues use of the disk. To allow for system failures that do not clear the host ID, the vxdisk clearimport operation can be used to clear the host ID stored on a disk. If a disk is a member of a disk group and has a host ID that matches a particular host, then that host imports the disk group as part of system startup. kernel log A log kept in the private region on the disk and that is written by the Veritas Volume Manager kernel. The log contains records describing the state of volumes in the disk group. This log provides a mechanism for the kernel to persistently register state changes so that vxconfigd can be guaranteed to detect the state changes even in the event of a system failure. layered volume A virtual volume that is used and managed directly by VxVM, not by a user. Note that currently a layered volume provides storage for only one upper-level volume. Layered volumes allow VxVM to implement the following features:
o Striped mirrors and concatenated mirrors are new types of volume layouts that are less likely to fail and that provide faster recovery times because there is less data to recover (one column or part of a column versus the whole volume). o Online relayout lets you change the volume layout while the volume is online. See the vxassist(1M) and vxrelayout(1M) manual pages for more information. o RAID-5 subdisk moves and RAID-5 snapshots (see vxassist(1M)). plex A copy of a volumes logical data address space, also called a mirror. A volume can have up to 32 plexes associated with it. Each plex is, at least conceptually, a copy of the volume that is maintained consistently in the presence of volume I/O and reconfigurations. Plexes represent the primary means of configuring storage for a volume. Plexes can have a striped, concatenated, or RAID-5 organization (layout). plex consistency If the plexes of a volume contain different data, then the plexes are said to be inconsistent. This is only a problem if VxVM is unaware of the inconsistencies, as the volume can return differing results for consecutive reads. Plex inconsistency is a serious compromise of data integrity. This inconsistency can be caused by write operations that start around the time of a system failure, if parts of the write complete on one plex but not the other. Plexes can also be inconsistent after creation of a mirrored volume, if the plexes are not first synchronized to contain the same data. An important part of Veritas Volume Manager operation is ensuring that consistent data is returned to any application that reads a volume. This may require that plex consistency of a volume be recovered by copying data between plexes so that they have the same contents. Alternatively, the volume can be put into a state such that reads from one plex are automatically written back to the other plexes, thus making the data consistent for that volume offset. private region Disks used by VxVM contain two special regions: a private region and a public region. Usually, each region is formed from a complete partition of the disk; however, the private and public regions can be allocated from the same partition. The private region of a disk contains various on-disk structures that are used by VxVM for various internal purposes. Each private region begins with a disk header which identifies the disk and its disk group. Private regions can also contain copies of a disk groups configuration, and copies of the disk groups kernel log. public region The public region of a disk is the space reserved for allocating subdisks. Subdisks are defined with offsets that are relative to the beginning of the public region of a particular disk. Only one contiguous region of disk can form the public region for a particular disk. RLINK (remote link) A representation of a communications link to a RVG hierarchy at a remote replication site, which contains information about the link. An RLINK is a primary RLINK if its parent RVG is the primary RVG containing writable primary data volumes. Alternatively, a RLINK is a secondary RLINK if its parent RVG contains read-only data volumes (one for each data volume that the primary RVG contains). The vxrlink(1M) command is used to replicate a volume or volumes to any number of remote sites. The primary RVG has one RLINK record associated with it for each secondary site. A secondary RVG has only one RLINK record associated with it, which represents and contains information about the connection with the corresponding primary RLINK. RVG (replicated volume group) A virtual device that hierarchically contains one or more data volume records, a log volume record called a Storage Replicator Log (SRL), and one (or more for primary RVGs) RLINK records. It must be associated as a parent of the data volume records in order for those data volumes to be replicated to other sites which may be located across a LAN or WAN. In order to actively replicate a data volume, an RVG must have at least one data volume child (the volume to be replicated), exactly one SRL volume child (a volume to SRL data volume writes before they are transmitted to RLINKs), and at least one RLINK child (which represents a connection to an RVG hierarchy at a remote replication site). An RVG can be either a primary or a secondary, depending on whether its data volumes are considered to contain the primary copies of data or whether the data volumes are considered to be read-only copies of the data. Secondary RVGs contain one SRL volume, one RLINK, and the same number of data volumes as the primary RVG contains. SRL volume (storage replicator log volume) A volume being used as a child of an RVG. The SRL volume contains temporary copies of data intended to be written from (primary) or to (secondary) sibling data volumes. The SRL volume also contains meta data, such as connection information, about the RVG and RLINK with which the SRL volume is associated. The primary SRL volume is used to log SRL data while either in asynchronous mode, or during network outages while in synchronous mode. A secondary RVG must also have an SRL volume associated with it. striped plex A plex that distributes data evenly across each of its associated subdisks. A plex has a characteristic number of stripe columns (consisting of associated subdisks) and a characteristic stripe unit size. The stripe unit size defines how data with a particular address is allocated to one of the associated subdisks. Given a stripe unit size of 128 blocks and two stripe columns, the first group of 128 blocks is allocated to the first subdisk, the second group of 128 blocks is allocated to the second subdisk, the third group to the first subdisk, and so on. subdisk A region of storage allocated on a disk for use with a volume. Subdisks are associated to volumes through plexes. One or more subdisks are laid out to form plexes based on the plex layout (striped, concatenated, or RAID-5). Subdisks are defined relative to disk media records. volboot file The volboot file is a special file (usually stored in /etc/vx/volboot) that is used to define a systems host ID and a system-wide default disk group. If the root disk is under VxVM control, the volboot file is also used to define the name of the boot disk group for the system. volume A virtual disk device that looks to applications and file systems like a regular disk device. Volumes present block and raw device interfaces that are compatible in their use with disk devices. However, a volume is a virtual device that can be mirrored, spanned across disk drives, moved to use different storage, and striped using administrative commands. The configuration of a volume can be changed, using Veritas Volume Manager utilities, without causing disruption to applications or file systems that are using the volume.
VxVM employs certain conventions to provide a degree of similarity between various operations. The conventions are used in the following areas:
o Disk Group Selection o Standard Length Numbers o Command Syntax
Most commands operate upon only one disk group per invocation. Each disk group has a separate configuration from every other disk group and it is possible for two disk groups to contain two objects that have the same name. This can happen, in particular, if a disk group is moved from one system to another. However, most utilities make no attempt to ensure that names between disk groups are unique, so name collisions can occur.
System administrators who try to avoid name collisions should be able to use most of the utilities without having to specify disk groups except when creating objects. Administrators cannot use single-command invocations that reference objects in more than one disk group, but disk groups are selected automatically, based on objects specified in the command.
The rules that commands use to select the disk group when this is not specified are described on the vxdg(1M) manual page.
Many basic properties of objects that are managed by VxVM require specification of lengths, either as a pure object length or as an offset relative to some other object.
VxVM supports a volume length up to 256 terabytes (256TB) for consistency with Veritas File System (VxFS) limits. However, 32-bit legacy applications that use system calls such as seek, lseek, read and write are limited to a maximum offset that is determined by the operating system. This value is usually 2^31-1 bytes (1 byte less than 2 terabytes).
VxVM provides a uniform syntax for representing large numbers, and uses suffixes to provide convenient multipliers. Numbers can be specified in decimal, octal, or hexadecimal. For convenience, numbers can be specified as a sum of several numbers.
A hexadecimal (base 16) number is introduced using a prefix of 0x. For example, 0xfff is the same as decimal 4095. An octal (base 8) number is introduced using a prefix of 0. For example, 0177777 is the same as decimal 65535.
A number can be followed by a single-character suffix to indicate a multiplier for the number. A length number with no suffix character represents a count of standard disk sectors. The length of a standard disk sector can vary between systems; it is typically 512 bytes or 1024 bytes. On systems where disks can have different sector sizes, one of the sector sizes is chosen as the standard size. Supported suffix characters are:
b (Blocks) Multiply the length by 512 bytes. g (Gigabytes) Multiply the length by 1,073,741,824 (1024M) bytes. k (Kilobytes) Multiply the length by 1024 bytes. m (Megabytes) Multiply the length by 1,048,576 (1024K) bytes. s (Sectors) Multiply the length by the standard sector size. This is the default unit if no suffix is specified. t (Terabytes) Multiply the length by 1,099,511,627,776 (1024G) bytes.
Numbers are represented internally as an integer number of sectors. As a result, if the standard disk sector size is larger than 512 bytes, numbers can be specified that need to be rounded to a sector. Rounding is always done to the next lowest, not the nearest, multiple of the sector size.
Case is ignored in length specification. Hexadecimal numbers and suffix characters can be specified using any reasonable combination of uppercase and lowercase letters.
Because the letter b is a valid hexadecimal character, there is a special case for the b suffix where a single blank character can separate a number from the b suffix character. Use of a blank within a number, when invoking commands from the shell, usually requires quoting the number. For example:
vxassist make vol01 "0x1000 b"
Numbers can be added or subtracted by separating two or more numbers by a plus or minus sign, respectively. A plus sign is optional. The following is an example:
In output, VxVM reports lengths in sectors, with no suffix character.
Most utilities in VxVM provide more than one operation, with operations grouped into utilities primarily by object type. Utilities that provide multiple operations are typically invoked with the following form:
utility_name [ options ] keyword [ operands ]
utility_name is the name of the Veritas Volume Manager command and keyword specifies the specific operation to perform. Any options that are introduced in the standard -letter form precede the operation keyword. This is in keeping with standard System V UNIX utility syntax, which provides for a set of options as the first arguments to a command, followed by any non-option operands. The keyword is considered an operand under standard System V syntax.
All VM utilities provide an extended usage message that lists all the options and operation keywords supported by the utility. Most VM utilities also display a usage message if you enter an invalid option. For utilities that are keyword-based, you can display this extended usage message the keyword help or a question mark (?). Utilities that use operands for something other than operation selection provide a reserved option of -H to display the extended usage information.
Disk group configurations contain eight types of records:
o Disk Access Records o Disk Group Records o Disk Media Records o Plex Records o RLINK Records o RVG Records o Subdisk Records o Volume Records
Disk access records define an address, or access path, that can be used to access a disk. The list of all disk access records defines the list of all disk addresses that VxVM can use to locate physical disks. Disk access records do not define specific physical disks, since physical disks can be moved on a system. When a physical disk is moved, a different disk access record may be necessary to locate it.
Usually VxVM can use the information provided by the operating system to configure most, if not all, disk access records automatically. Such auto-configured disk access records are not stored persistently on disk, but are instead regenerated every time that VxVM starts up.
Disk access records that cannot be configured by scanning the disks are stored in an ordinary file, /etc/vx/darecs, which is located in the root file system. (In releases prior to VxVM 4.0, such records were stored in the volboot file.)
Unlike all other record types, the names of disk access records can conflict with the names of other records. For example, a specialty disk (such as a RAM disk) can use the same name for both the disk access record and the disk media record that points to it. It is typically advisable to use different names for the access and media records, to avoid additional confusion if disks are moved.
Disk access records have the following fundamental attributes:
Additional attributes can be added, arbitrarily, by disk types. See vxdisk(1M) for a list of additional attributes defined by the standard disk types.
disk access name The name of the disk access record is typically a disk address. VxVM 3.2 Update 1 introduced an alternative enclosure-based naming scheme, which uses disk access names of the form enclosurename_disk# where enclosurename is the logical enclosure name, and disk# is the number of the disk in the enclosure. For example, enc0_0 refers to the first disk in the enclosure enc0. Other systems are likely to have different conventions for disk access name. type Each disk access record has a type, which identifies key characteristics of VxVMs interaction with the disk. The available types are auto, sliced, simple, and nopriv. Typically, most disks are configured with the type auto and format cdsdisk (auto:cdsdisk) to support the Cross-platform Data Sharing (CDS) feature. However, disks that are used to boot the system typically have the disk type auto and format sliced (auto:sliced). See the vxdisk(1M) manual page for more information on disk types. If the physical disk represented by the disk access record is associated with a disk media record, then the following fields are defined: disk group name The name of the disk group containing the disk media record. disk media name The name of the disk media record that points to the physical disk.
Disk group records define several different types of names for a disk group. The different types of names are:
alias name This is the standard name that the system uses when referencing the disk group. References to the disk group name usually mean the alias name. Volume directories are structured into subdirectories based on the disk group alias name. Typically, the disk groups alias name and real name are identical. A local alias can be useful for gaining access to a disk group with a name that conflicts with other disk groups in the system. The names bootdg, defaultdg and nodg are reserved. disk group ID A 64-byte identifier that represents the unique ID of the disk group. All disk groups on all systems should have a different disk group ID, even if they have the same real name. This identifier is stored in the disk headers of all disks in the disk group. It is used to ensure that VxVM does not confuse two disk groups which were created with the same name. real name This is the name of the disk group, as defined on disk. This name is stored in the disk group configuration, and is also stored in the disk headers of all disks in the disk group.
Disk media records define a specific disk within a disk group. The name of a disk media record (the disk media name) is assigned when a disk is first added to a disk group (using the vxdg adddisk operation). Disk media records can be assigned to specific physical disks by associating the disk media record with the current disk access record for the physical disk.
Disk media records have the following fundamental attributes:
disk access name The disk access name that is currently used to access the physical disk referenced by the disk ID. If the disk ID is defined, but no physical disk with that ID could be found, the disk access name is clear. A disk where the physical disk could not be found is considered to be in the NODAREC, or inaccessible, state. A disk can become inaccessible either because the indicated disk is not currently attached to the system, or because I/O failures on the physical disk prevented VxVM from identifying or using the physical disk. disk ID A 64-byte unique identifier representing the physical disk to which the media record is associated. This can be cleared to indicate that the disk is considered in the removed state. A removed disk has no current association with any physical disk. A disk media record that has an active association with a physical disk (both the disk ID and the disk access name attributes are defined), inherits several properties from the underlying physical disk. These attributes are taken from the disk header, which is stored in the private region of the disk. These inherited attributes are: atomic I/O size This is the fundamental I/O size for the disk, in bytes, also known as the sector size. All I/O operations to this disk must be multiples of this size. VxVM requires that all disks have the same sector size. On most systems, this size is 512 bytes. private length The length of the region of the physical disk that is reserved for storing private Veritas Volume Manager information. public length The length of the region of the physical disk that is available for subdisk allocations.
Plex records define the characteristics of a particular plex of a volume. A plex can be in either an associated state or a dissociated state. In the dissociated state, the plex is not a part of a volume. A dissociated plex cannot be accessed in any way. An associated plex can be accessed through the volume.
Plexes have the following fundamental attributes:
comment An administrator-assigned string of up to 40 characters that can be set and changed using the vxedit utility. VxVM does not interpret the comment field. The comment cannot contain newline characters. condition flags Various condition flags are defined for the plex that define state which is recognized automatically, rather than managed by the volume usage type. Defined flags are:
IOFAIL The plex was detached as a result of an I/O failure detected during normal volume I/O. The plex is out-of-date with respect to the volume, and in need of complete recovery. However, this condition also indicates a likelihood that one of the disks in the system should be replaced. NODAREC No physical disk was found for one of the subdisks in the plex. This implies either that the physical disk failed, making it unrecognizable, or that the physical disk is no longer attached through a known access path. NODEVICE A physical device could not be found corresponding to the disk ID in the disk media record for one of the subdisks associated with the plex. The plex cannot be used until this condition is fixed, or the affected subdisk is dissociated. RECOVER A disk for one of the disk media records was replaced or was reattached too late to prevent the plex from becoming out-of-date with respect to the volume. The plex requires complete recovery from another plex in the volume to synchronize the plex with the correct contents of the volume. REMOVED One of the disk media records was put into the removed state through explicit administrative action. The plex cannot be used until the disk is replaced or the affected subdisk is dissociated. contiguous length The offset of the first block in the plex address space that is not backed by a subdisk. If the plex has no holes, the contiguous length matches the plex length. If the contiguous length is equal to or greater than the length of the associated volume, the plex is considered complete, otherwise it is sparse. I/O mode Each plex is in read-write, read-only, or write-only mode. This mode affects read and write operations directed to the volume, if the plex is enabled. For read-write and read-only modes, volume read operations can be directed to the plex. For read-write and write-only modes, volume write operations are directed to the plex. Plexes are normally in read-write mode. Write-only mode is used to recover a plex that failed, and whose contents have thus become out-of-date with respect to the volume. It is also used when attaching a new plex to a volume. In write-only mode, writes to the volume update the plex, causing written regions to be up-to-date. Typically, a set of special copy operations is used to update the remainder of the plex. layout The organization of associated subdisks with respect to the plex address space. The layout is striped, concatenated, or RAID-5. length The length of a plex is the offset of the last subdisk in the plex plus the length of that subdisk. In other words, the length of the plex is defined by the last block in the plex address space that is backed by a subdisk. This value may or may not relate to the length of the volume, depending on whether the plex is completely contiguously allocated. log subdisk Each plex can have at most one associated log subdisk. A log subdisk is used with the dirty region logging feature to improve the time required to recover consistency of a volume after a system failure. plex kernel state Each plex is either enabled, disabled, or detached. When enabled, normal read and write operations from the volume can be directed to the plex. When disabled, no I/O operations are applied to the plex. When detached, normal volume I/O are not directed to the plex. I/O failures encountered during normal volume I/O may move the enabled state for a plex directly from enabled to detached. See the description of volume exception policies (earlier in this manual page) for more information. subdisks Each plex has zero or more associated subdisks. Subdisks are associated at offsets relative to the beginning of the plex address space. Subdisks for concatenated plexes may not cover the entire length of the plex, in which case they leave holes in the plex. A plex that is not as long as the volume to which it is associated is considered to have a hole extending from the end of the plex to the end of the volume. A plex with a hole is considered incomplete, and is sometimes called sparse. usage-type state Volume usage types maintain a private state field related to the the operations that have been performed on the plex, or to failure conditions that have been encountered. This state field contains a string of up to 14 characters. volatile state A plex is considered to have volatile contents if the disk for any of the plexs subdisks is considered to be volatile. The contents of a volatile disk are not presumed to survive a system reboot. The contents of a volatile plex are always considered out-of-date after a recovery and in need of complete recovery from another plex.
RLINK records define the characteristics of a particular RLINK of an RVG. An RLINK can be in either an associated or dissociated state. In the dissociated state, the RLINK is not part of any RVG. An associated RLINK can be accessed through the RVG.
RLINK have the following fundamental attributes:
usage type RLINK are treated internally as if they have a usage-type of gen, but this distinction is largely irrelevant and exists merely to be compatible with pre-existing Veritas Volume Manager code. rlink kernel state Each primary RLINK is either enabled, disabled or detached. When enabled, normal write operations are forwarded to the remote RLINK. When disabled, no I/O operations on the RLINK are allowed. When detached, normal RVG I/O are not directed to the RLINK, but certain ioctls can be issued against the RLINK. An enabled RLINK can also be either connected or disconnected, depending on whether the network connection between the RLINKs on the primary and secondary nodes is currently open or not. usage-type state RLINKs are treated as having a gen usage-type, and its usage-type state is a private state field indicating the state of operations that have been performed on the RLINK, or failure conditions that have been encountered. This state field contains a string of up to 14 characters. synchronous A field that indicates whether the RLINK should operate in synchronous or asynchronous mode. In synchronous mode, a write request to a replicated volume does not complete until the data has been recorded on the SRL and reached the secondary node. In asynchronous mode, a write request completes as soon as the data is recorded on the SRL. The field may have one of three values:
off mode is asynchronous override mode is synchronous, but automatically switches to asynchronous if the rlink becomes inactive due to a disconnection or administrative action fail mode is synchronous. If synchronous=fail is set and an administrator detaches the Primary RLINK, writes to the RVG are not failed. However, if an RLINK becomes inactive for any other reason, including an administrative detach of the Secondary RLINK, subsequent write requests are failed with an EIO error. local_host RLINKs have a host_name attribute that contains the name of the local host machine. Needed to support private networks. remote_host RLINKs have a remote_host attribute that contains the name of the remote host machine from (primary) or to (secondary) which replication takes place. remote_dg RLINKs have a remote_rlink attribute that contains the name of the diskgroup on the remote machine. remote_rlink RLINKs have a remote_rlink attribute that contains the name of the RLINK counterpart on the remote machine. comment An administrator-assigned string of up to 40 characters that can be set and changed using the vxedit utility. VxVM does not interpret the comment field. The comment cannot contain newline characters. latencyprot A field that indicates whether latency protection is enabled for the RLINK. Latency protection prevents a RLINK from having more than a preset number of outstanding requests. All requests which have not been written to the remote data volume are counted as outstanding. If latency protection is enabled, then when the number of outstanding requests reaches latency_high_mark, throttling is enabled. This causes all new write requests to stall until throttling is disabled. Throttling is not disabled until the number of outstanding requests is reduced to latency_low_mark. The field may have one of three values:
off latency protection is disabled override latency protection is enabled, but is automatically disabled if the RLINK becomes inactive due to a disconnection or administrative action fail latency protection is enabled. If the RLINK becomes inactive for any reason, and the latency_high_mark is reached, subsequent write requests are failed with an EIO error. srlprot A field that indicates whether SRL protection is enabled for the RLINK. SRL protection prevents an RLINK from overflowing the SRL, which causes the RLINK to disconnect. If the RLINK has SRL protection enabled, and the next write request would cause the RLINK to overflow the SRL, then throttling is enabled. This causes all new write requests to stall until throttling is disabled. Throttling is not disabled until a predetermined amount of space is available on the SRL. The exception to this is when the autodcm mode is set, in which case, DCM protection is enabled as soon as the SRL overflows and incoming write requests are not stalled. The field may have one of five values:
Note: When DCM protection is activated, the DCMs are used to record the regions that change on the data volumes. The vxrvg command can be used to resynchronize the images when the RLINKs are connected. It should be noted that using DCM to resynchronize an image makes the image inconsistent until the resynchronization completes.
off SRL protection is disabled override SRL protection is enabled, but will automatically be disabled if the RLINK becomes inactive due to a disconnection or administrative action fail SRL protection is enabled. If the RLINK becomes inactive for any reason, and SRL overflow is imminent, subsequent write requests are failed with an EIO error. autodcm SRL protection is enabled. This is the default option for srlprot. If an RLINK begins to overflow the SRL, DCM protection is activated. dcm SRL protection is enabled. This differs from the "autodcm" protection in that the DCM protection is activated only when the RLINKs disconnect. When the RLINKs are connected, incoming writes are throttled as described above.
|Maximum number of outstanding requests when latency protection is enabled.|
|After latency throttling is enabled, the number of outstanding requests must drop to this before it is disabled.|
RVG records define the characteristics of particular RVG devices. The name of an RVG record defines the node name used for files in the /dev/vx/dsk and /dev/vx/rdsk directories. The block device for a particular RVG (which is unused in the current implementation) has the path:
where groupname is the name assigned by the administrator to the disk group containing the RVG. Note that the block device for a child data volume, not the RVG itself, can be used as an argument to the mount command (see mount(2)). The raw device for an RVG, typically used for issuing I/O control operations (see ioctl(2)), has the path:
Accesses to a data volume associated with a primary RVG device are internally directed to the RVG so that replication can take place. The RVG device nodes do not support the read(2) and write(2) system calls.
RVGs have the following fundamental attributes:
usage type RVGs are treated internally as if they have a usage-type of gen, but this distinction is largely irrelevant and exists merely to be compatible with pre-existing Veritas Volume Manager code. rvg kernel state Each primary RVG is either enabled, disabled or detached. When enabled, normal read and write operations are allowed on the child data volumes (assuming the underlying data volumes themselves are enabled) and those accesses are reflected to the data volumes as well as to any active RLINKs. When disabled, no access to the data volumes or any of its associated RLINKs is allowed. When detached, some ioctls can be used by utilities to operate on the RVG. usage-type state RVGs are treated as having a gen usage-type, and its usage-type state is a private state field indicating the state of operations that have been performed on the RVG, or failure conditions that have been encountered. This state field contains a string of up to 14 characters. datavols List of data volumes associated with the RVG. srl Each RVG has one associated SRL volume. rlink Each primary RVG has between zero and thirty-two associated RLINK. Each secondary RVG can have no more than one associated RLINK. comment An administrator-assigned string of up to 40 characters that can be set and changed using the vxedit utility. VxVM does not interpret the comment field. The comment cannot contain newline characters. user|group|mode These attributes are the user, group and file permission modes used for the RVG device nodes. The user and group are normally root. The mode usually allows read and write permission to the owner, and no access by other users.
Subdisk records define a region of disk, allocated from a disks public region. Subdisks have very little state associated with them, other than the configuration state that defines which region of disk the subdisk occupies. Subdisks cannot overlap each other, either in their associations with plexes, or in their arrangement on disk public regions.
Subdisks have the following fundamental attributes:
comment disk media name The name of the disk media record that the subdisk is defined on. disk offset The offset, from the beginning of the disks public region, to the start of the subdisk. length The length of the subdisk. plex offset For associated subdisks, this is the offset (from the beginning of the plex) of the subdisk association. For subdisks associated with striped plexes, the plex offset defines relative ordering of subdisks in the plex, rather than actual offsets within the plex address space.
Volume records define the characteristics of particular volume devices. The name of a volume record defines the node name used for files in the /dev/vx/dsk and /dev/vx/rdsk directories. The block device for a particular volume (which can be used as an argument to the mount command (see mount(1M)) has the path:
where groupname is the name assigned by the administrator to the disk group containing the volume. The raw device for a volume, typically used for application I/O and for issuing I/O control operations (see ioctl(2)), has the path:
Reads to a volume device are directed to one of the read-write or read-only plexes associated with the volume. Writes to the volume are directed to all of the enabled read-write and write-only plexes associated with the volume.
During a write operation, two plexes of a volume may become out of sync with each other, due to the fact that writes directed to two disks can complete at different times. This is not normally a problem. However, if the system were to crash or lose power during a write operation, the two plexes could have different contents.
Most applications and file systems are not written with the presumption that two separate reads of a device can return different contents without an intervening write operation. Since plexes with different contents could cause such a situation where two read operations of a block return different contents, VxVM expends considerable effort to ensure that this is avoided.
Volumes have the following fundamental attributes:
comment exception policy There are several modes that can be set on the volume, by utilities according to the usage type of the volume. These modes affect operation of a volume in the presence of I/O failures. Currently only one of these policies, called GEN_DET_SPARSE is ever used. This policy tracks complete and incomplete plexes in a volume (an incomplete plex does not have a backing subdisk for all blocks in the volume). If an unrecoverable error occurs on an incomplete plex, the plex is detached (disabled from receiving regular volume I/O requests). If an unrecoverable error occurs on a complete plex, the plex is detached unless it is the last complete plex. If the plex is the last complete read-write plex, any incomplete plexes that overlap with the error are detached but the plex with the error remains attached. This default policy is chosen to ensure that an I/O that fails on one plex is not directed to that plex again unless that plex is the last complete plex remaining attached to the volume. In that case, the policy ensures that the volume returns the error consistently, even in the presence of incomplete plexes. length Each volume has a length, which defines the limiting offset of read and write operations. The length is assigned by the administrator, and may or may not match the lengths of the associated plexes. log type A policy to use for logging changes to the volume, which can be assigned by the administrator. Policies that can be specified are:
none Do not perform any special actions when writing to the volume. Just write the requested data to all read-write or write-only plexes. dirty-region-log A volume is divided into regions. A bitmap where each bit corresponds to a region is maintained. When a write to a particular region occurs, the respective bit is set to on. When the system is restarted after a crash, this region bitmap is used to limit the amount of data copying that is required to recover plex consistency for the volume. The region changes are logged to special log subdisks associated with each of the plexes associated with the volume. Use of dirty region logging can greatly speed recovery of a volume, but it also degrades performance of the volume under normal operation. dcm A bit-map representing regions of a volume that is used to track changes to that volume. it is used by VVR for srl overflow protection and automatic secondary synchronization. plexes Each volume has between zero and 32 associated plexes. primary_datavol A name field. This field is only used with secondary volumes. The value indicates the name of the data volume on the primary to which this data volume corresponds. read policy A configurable policy for switching between plexes for volume reads. When a volume has more than one enabled associated plex, VxVM can distribute reads between the plexes to distribute the I/O load and thus increase total possible bandwidth of reads through the volume. The read policy can be set by the administrator. Possible policies are:
prefer Reads first from a plex that is named as the preferred plex. Specify one preferred plex when you set the prefer policy. If VxVM cannot find a preferred plex that can service the read, VxVM refers to the plex order in the select policy. round Reads each plex in turn in a round-robin fashion for each nonsequential I/O detected. Sequential access causes only one plex to be accessed. This approach takes advantage of the drive or controller read-ahead caching policies. select Chooses a plex based on the preference criteria (site, connectivity, media type, and layout). VxVM uses this method by default, unless site consistency is enabled. The select policy has the following preference order: o Locally connected striped SSD plexes o Locally connected SSD plexes o Locally connected striped plexes o Locally connected plexes o Remotely connected striped SSD plexes o Remotely connected SSD plexes If VxVM cannot find a local plex or a remote SSD plex that can service the read, VxVM uses the round-robin policy. siteread Reads preferentially from plexes at the locally defined site. This method is the default policy for volumes in disk groups where site consistency is enabled. The siteread policy has the following preference order: o Locally connected striped SSD plexes o Local site, locally connected striped SSD plexes o Local site, locally connected SSD plexes o Local site, locally connected striped plexes o Local site, locally connected plexes o Local site, remotely connected striped SSD plexes o Local site, remotely connected SSD plexes o Local site, remotely connected striped plexes o Local site, remotely connected plexes If VxVM cannot find a plex on the local site that can service the read, VxVM refers to the plex order in the select policy. split Divides the read requests and distributes them across all the available plexes. read/write-back recover mode This is a mode that applies to the volume, which is managed by utilities as part of plex consistency recovery. When this mode is enabled, each read operation recovers plex consistency for the region covered by the read. Plex consistency is recovered by reading data from blocks of one plex and writing that data to all other writable plexes. This ensures that a future read operation covering the same range of blocks reads the same data. start options This is a string that is organized as a set of usage-type options to apply when starting (enabling) a volume. See vxvol(1M) for details. usage type Each volume has a usage type that defines a particular class of rules for operating on the volume. The usage type is typically based on the expected content of the volume. Several Veritas Volume Manager utilities can apply extensions or limitations that apply to volumes with a particular usage type. Several basic usage types are included with VxVM: fsgen, for use with volumes that contain file systems; gen, for use with volumes that are used as swap devices or for other applications that do not use file systems; raid5, for use with RAID-5 volumes; and special root and swap usage types, which are specifically for use with the root file system volume and the primary swap device. usage-type state Usage types maintain a private state field related to the volume that relate to operations that have been performed on the volume, or to failure conditions that have been encountered. This state field contains a string of up to 14 characters. user|group|mode These attributes are the user, group, and file permission modes used for the volume device nodes. The user and group are normally root. The mode usually allows read and write permission to the owner, and no access by other users. volume kernel state Each volume is either enabled, disabled, or detached. When enabled, normal read and write operations are allowed on the volume, and any file system residing on the volume can be mounted, or used in the usual way. When disabled, no access to the volume or any of its associated plexes is allowed. write-back-on-read-failure mode This is a mode that applies to the volume, which can be enabled or disabled by the administrator using vxedit. If this mode is enabled, then a read failure for a plex causes data to be read from an alternate plex and then written back to the plex that got the read failure. This usually fixes the error. Only if the writeback fails is the plex detached for having an unrecoverable I/O failure. writecopy mode This is a mode that applies to the volume, which can be enabled or disabled by the administrator using vxedit. This mode takes affect only if dirty region logging or RAID-5 logging is in effect. When the operating system hands off a write request to the volume driver, the operating system may continue to change the memory that is being written to disk. VxVM cannot detect that the memory is changing, so it can inadvertently leave plexes with inconsistent contents. This is not normally a problem, because the operating system ensures that any such modified memory is rewritten to the volume before the volume is closed (such as by a clean system shutdown). However, if the system crashes, plexes may be inconsistent. Since the logging feature prevents recovery of the entire volume, it may not ensure that plexes are entirely consistent. Turning on the writecopy mode (which is normally set by default) often causes VxVM to copy the data for a write request to a new section of memory before writing it to disk. Because the write is done from the copied memory, it cannot change and so the data written to each plex is guaranteed to be the same if the write completes. See the vxedit(1M) manual page for more information.
The usage type of a volume represents a class of rules for operating on a volume. Each usage type is defined by a set of executables under the directory /etc/vx/type/usage_type, where usage_type is the name given to the usage type. The required executables are: vxinfo, vxmake, vxmend, vxplex, vxsd, and vxvol. These executables are invoked by Veritas Volume Manager administrative utilities with the same names. The executables under /etc/vx/type should not, normally, be executed directly.
Five usage types are provided with VxVM: gen, fsgen, root, swap, and raid5. It is possible for third-party products to install additional usage types.
The usage types provided with VxVM store state information in the RVG, RLINK, volume, and plex usage-type state fields.
The state fields defined for RVGs are:
EMPTY The RVG is not yet initialized. This is the initial state for RVGs created by vxmake. CLEAN The RVG has been stopped and contains consistent data. ACTIVE The RVG has been started and is running normally, or was running when the system was stopped. If the system stops in this state, then the RVG may require RLINK recovery. FAIL Some problem has been detected with the RVG. It is disabled. The state fields defined for RLINKs are: UNASSOC The RLINK is not fully initialized. This is the initial state for RLINKs created by vxmake, or which are not currently associated with an RVG. STALE The RLINK is associated with an RVG, but it is not actively taking part in replication. RLINK requires a full resync before it is up to date, unless the RVG is EMPTY. ACTIVE The RLINK is associated and actively replicating normally, or was when the system was stopped. PAUSING The RLINK is transitioning to the PAUSE state, or was doing so when the system was stopped. If the system stops in this state, then the RLINK must be reattached to put it back into the proper PAUSE state. PAUSE The RLINK is paused. If it is a primary (secondary) RLINK and was paused by the pause command, vxrlink pause, then it is said to be primary-paused (secondary-paused). A secondary RLINK can also enter the secondary-paused state if an error is detected with a secondary log or data volume. RESTORING Temporary state while RLINK is being restored (see vxrlink(1M) restore sub-command). RECOVER Temporary state while RLINK is being recovered with the vxrecover utility, which calls vxrlink recover. This state indicates that recovery is still needed. FAIL Some problem has been detected with the RLINK. It is disabled. The state fields defined for volumes are: ACTIVE The volume has been started and is running normally, or was running normally when the system was stopped. If the system crashes in this state, then the volume may require plex consistency recovery. CLEAN The volume has been stopped and the contents for all plexes are consistent. EMPTY The volume is not yet initialized. This is the initial state for volumes created by vxmake. INVALID The contents of an instant snapshot volume no longer represent a true point-in-time image of the original volume. NEEDSYNC The volume requires recovery. This is typically set after a system failure to indicate that the plexes in the volume may be inconsistent, so that they require recovery (see the resync operation in vxvol(1M)). SYNC Plex consistency recovery is currently being done on the volume. vxvol resync sets this state when it starts to recovery plex consistency on a volume that was in the NEEDSYNC state. The state fields defined for plexes are: ACTIVE The plex is running normally on a started volume. The plex condition flags (NODAREC, REMOVED, RECOVER, and IOFAIL) may apply if the system is rebooted and the volume restarted. CLEAN The plex was running normally when the volume was stopped. The plex is enabled without requiring recovery when the volume is started. DCOSNP This state indicates that a data change object (DCO) plex attached to a volume can be used by a snapshot plex to create a DCO volume during a snapshot operation. EMPTY The plex is not yet initialized. This state is set when the volume state is also EMPTY. LOG A dirty region logging (DRL) or RAID-5 log plex. OFFLINE The plex was disabled by the vxmend off operation. See vxmend(1M) for more information. SNAPATT This is a snapshot plex that is being attached by the vxassist snapstart operation. When the attach is complete, the state for the plex is changed to SNAPDONE. If the system fails before the attach completes, the plex and all of its subdisks are removed. SNAPDIS This is a snapshot plex created by vxplex snapstart that is fully attached. A plex in this state can be turned into a snapshot volume with vxplex snapshot. See vxplex(1M) for more information. If the system fails before the attach completes, the plex is dissociated from the volume. SNAPDONE This is a snapshot plex created by vxassist snapstart that is fully attached. A plex in this state can be turned into a snapshot volume with vxassist snapshot. See vxassist(1M) for more information. If the system fails before the attach completes, the plex and all of its subdisks are removed. SNAPTMP This is a snapshot plex being attached by the vxplex snapstart operation. When the attach is complete, the state for the plex is changed to SNAPDIS. If the system fails before the attach completes, the plex is dissociated from the volume. STALE The plex was detached, either by vxplex det or by an I/O failure. vxvol start changes the state for a plex to STALE if any of the plex condition flags are set. STALE plexes are reattached automatically, when starting a volume, by calling vxplex att. TEMP This is a plex that is being associated and attached to a volume with vxplex att. If the system fails before the attach completes, the plex is dissociated from the volume. TEMPRM This is a plex that is being associated and attached to a volume with vxplex att. If the system fails before the attach completes, the plex is dissociated from the volume and removed. Any subdisks in the plex are kept. TEMPRMSD This is a plex that is being associated and attached to a volume with vxplex att. If the system fails before the attach completes, the plex and its subdisks are dissociated from the volume and removed.
The majority of Veritas Volume Manager utilities use a common set of exit codes, which can be used by shell scripts or other types of programs to react to specific problems detected by the utilities. The number for each distinct exit code is described below.
Exit codes greater than 32 are reserved for use by usage types. Codes greater than 64 are reserved for use by specific utilities.
0 The utility is not reporting any error through the exit code. 1 Some command line arguments to the utility were invalid. 2 A syntax error occurred in a command or description, or a specified record name is too long or contains invalid characters. This code is returned only by utilities that implement a command or description language. This code may also be returned for errors in search patterns. 3 The volume daemon does not appear to be running. 4 An unexpected error was encountered while communicating with the volume daemon. 5 An unexpected error was returned by a system call or by the C library. This can also indicate that the utility ran out of memory. 6 The status for a commit was lost because the volume daemon was killed and restarted during the commit of a transaction, but after restart the volume daemon did not know whether the commit succeeded or failed. 7 The utility encountered an error that it should not have encountered. This generally implies a condition that the utility should have tested for but did not, or a condition that results from the volume daemon returning a value that did not make sense. 8 The time required to complete a transaction exceeded 60 seconds, causing the transaction locks to be lost. As most utilities reattempt the transaction at least once if a timeout occurs, this usually implies that a transaction timed out two or more times. 9 No disk group could be identified for an operation. This results either from naming a disk group that does not exist, or from supplying names on a command line that are in different disk groups or in multiple disk groups. 10 A change made to the database by another process caused the utility to stop. This code is also returned by a usage-type-dependent utility if it is given a record that is associated with a different usage type. If this situation occurs when the usage-type-dependent utility is called from a switchout utility, then the database was changed after the switchout utility determined the proper usage type to invoke. 11 A requested subdisk, plex, or volume record was not found in the configuration database. This may also mean that a record was an inappropriate type. 12 A name used to create a new configuration record matches the name of an existing record. 13 A subdisk, plex, or volume is locked against concurrent access. This code is used for inter-transaction locks associated with usage type utilities. The code is also used for the dissociated plex or subdisk lock convention, which writes a non-blank string to the tutil field in a plex or subdisk structure to indicate that the record is being used. 14 No usage type could be determined for a utility that requires a usage type. 15 An unknown or invalid usage type was specified. 16 A plex or subdisk is associated, but the operation requires a dissociated record. 17 A plex or subdisk is dissociated, but the operation requires an associated record. This code can also be used to indicate that a subdisk or plex is not associated with a specific plex or volume. 18 A plex or subdisk was not dissociated because it was the last record associated with a volume or plex. 19 Association of a plex or subdisk would surpass the maximum number that can be associated to a volume or plex. 20 A specified operation is invalid within the parameters specified. For example, this code is returned when an attempt is made to split a subdisk on a striped plex, or to use a split size that is greater than the size of the plex. 21 An I/O error was encountered that caused the utility to abort an operation. 22 A volume involved in an operation did not have any associated plexes, although at least one was required. 23 A plex involved in an operation did not have any associated subdisks, although at least one was required. 24 A volume could not be started by the vxvol start operation, because the configuration of the volume and its plexes prevented the operation. 25 A specified volume was already started. 26 A specified volume was not started. For example, this code is returned by the vxvol stop operation if the operation is given a volume that is not started. 27 A volume or plex involved in an operation is in the detached state, thus preventing a successful operation. 28 A volume or plex involved in an operation is in the disabled state, thus preventing a successful operation. 29 A volume or plex involved in an operation is in the enabled state, thus preventing a successful operation. 30 An unknown error condition was encountered. This code may be used, for example, when the volume daemon returns an unrecognized error number. 31 An operation failed because a volume device was open or mounted, or because a subdisk was associated with an open or mounted volume or plex. 32 A problem occurred with multipath coordination. 33 An object was already reserved for use by the operating system. 35 An error occurred while communicating with a remote host. 39 An error occurred because the offset or length of a VxVM object is not an integral multiple of the alignment value for the disk group. 65 An array-related API failed. 66 An array-specific guideline was violated. 67 Inconsistent configuration copies are present in a disk group. This may indicate a serial split brain condition. 68 FastResync is not enabled on a volume. 69 A volume has a bad Data Change Object (DCO). 70 A snapshot volume is invalid. 71 A Replicated Volume Group (RVG) has not been started.
mount(1M), vxassist(1M), vxattached(1M), vxcache(1M), vxcached(1M), vxcdsconvert(1M), vxclustadm(1M), vxcmdlog(1M), vxconfigbackup(1M), vxconfigbackupd(1M), vxconfigd(1M), vxconfigrestore(1M), vxdctl(1M), vxdco(1M), vxddladm(1M), vxdg(1M), vxdisk(1M), vxdiskadd(1M), vxdiskadm(1M), vxdisksetup(1M), vxdiskunsetup(1M), vxdmp(1M), vxedit(1M), vxencap(1M), vxevac(1M), vxinfo(1M), vxiod(1M), vxlicinst(1), vxlicrep(1), vxmake(1M), vxmend(1M), vxmirror(1M), vxnotify(1M), vxplex(1M), vxprint(1M), vxr5check(1M), vxreattach(1M), vxrecover(1M), vxrelayout(1M), vxrelocd(1M), vxresize(1M), vxsd(1M), vxsnap(1M), vxsplitlines(1M), vxstat(1M), vxtask(1M), vxtrace(1M), vxtranslog(1M), vxvol(1M), vxvset(1M)
For information about new features of VxVM, see the Release Notes.
For information about administering VxVM, see the Storage Foundation Administrators Guide.