Business Continuity

This paper describes the Dynamic Multi-pathing (DMP) feature of Veritas Storage Foundation™. The product architecture described herein was introduced with DMP release 5.0. It was subsequently backported to the Solaris 4.1 code base in SxRT 4.1 MP2, the AIX 4.0 code base in AxRT 4.0 MP4 and the Linux 4.1 code base in LxRT 4.1 MP4. These three releases and up are collectively referred to as DMP Backport releases throughout this document.

The paper should be used as a guide to understanding Dynamic Multi-pathing. For up-to-date information on features and coverage, readers are advised to consult Symantec documentation and support sources.

The Importance of Multiple Storage I/O Paths

The basic techniques for keeping business-critical computer applications and digital data available to users despite hardware and software failures are well-known:

  • Applications. Applications can be protected against server failures by interconnecting two or more servers to form a cooperative cluster controlled by software that enables an application running on any of the servers to fail over and restart on another, should its own server fail.
  • Data. Data can be preserved despite storage device failures by techniques such as mirroring identical copies on two or more disks1 and writing all updates to both simultaneously. Mirroring, sometimes called RAID-1, keeps data available if a disk fails, and also improves I/O performance by making two or more disks available to satisfy each application read request.

In enterprise data centers, there is another increasingly important link in the information access chain—the I/O path that connects servers with the data they process. The I/O path, represented in Figure 1, is a complex chain consisting of host bus adapter, cables, storage network switch, storage device adapter port, and, in disk arrays, a disk controller.

The I/O path shown in Figure 1 begins at a host bus adapter (HBA)2 that connects an I/O cable to a server’s internal memory access bus. The cable connects the HBA to a corresponding port in a storage network switch. As Figure 1 suggests, the switch manages logical connections between HBAs and ports within disk array controllers, or between HBAs and disk drives. Disk array controllers, which typically have more than one port, virtualize disks within the array and present them to the storage network as logical units, or LUNs. 3


1.1 Multiple I/O Paths Enhance Availability

With increasing deployment of storage networks, IT managers are becoming conscious of the important role that I/O paths play in keeping data available. For example, two disks mirrored by a host-based volume manager may be connected to their hosting server either by the same I/O path, as shown on the left side of Figure 2, or by different paths, as shown on the right. If multiple paths are available, mirroring not only protects against data loss due to disk failure, it also protects against loss of access to data if an I/O path element fails, as Figure 2 illustrates.

The server on the left in Figure 2 cannot access its data when the cable between its HBA and the network switch port fails, even though the storage itself remains completely functional, because the cable is a single point of failure. The server on the right, on the other hand, can continue to access data if one of its HBAs fails, if a cable fails, or even if one of the disk array’s controllers fails, because in each case there is an alternate path that does not include the failed element.

Thus, a second independent path between server and storage increases the number of component failures an I/O subsystem can withstand without loss of function. But even with an alternate path, I/O path failure can still be tantamount to storage device failure unless the system recognizes that it has an alternate path and reroutes I/O requests to it. If a server does not recognize an alternate path to a storage device, the device may as well have failed. Even with failure-tolerant mirrored devices, for example, only devices on still-functioning paths are updated after a path failure. Data redundancy is diminished, even though the unreachable device is still functional. Moreover, I/O performance decreases because one less device is available to satisfy read requests.

Thus, an ability to recognize and utilize alternate I/O paths to storage devices would clearly be preferable. If a path failed, I/O requests would be re-routed to the alternate. Mirrored data would remain fully protected, and the effect on I/O performance would be smaller.

1.2 Multiple I/O Paths Enhance I/O Performance

Multiple I/O paths between server and storage device can also improve I/O performance. In many applications, disk arrays satisfy a significant percentage of I/O requests from cache. For example, most disk arrays recognize sequential read patterns, and begin to read data into cache in advance of host I/O requests. In this scenario, I/O path bandwidth can actually limit LUN performance. With multiple I/O paths to a LUN however, all can be delivered concurrently as fast as applications request it. Similarly, if an I/O path that provides access to multiple LUNs becomes momentarily overloaded due to activity on one LUN, other LUNs’ I/O requests can be routed to less-busy paths.

2 Different Forms of Multi-Path Access

Disks and disk arrays support multi-path access to LUNs in several different ways. Fundamentally, there is a distinction between:

  • Active-active (A/A). If a disk array accepts and executes I/O requests to a single LUN on any port simultaneously, it is called an active-active (A/A) array. If a path to an active-active array fails, I/O requests can simply be re-routed to other paths, maintaining continuous access to data stored on the array’s LUNs.
    EMC’s Symmetrix and DMX arrays, Hitachi Data Systems’ 9900 Series (Lightning), and IBM’s ESS series (Shark) are active-active arrays.
  • Active-passive (A/P). If a disk array accepts and executes I/O requests to a LUN on one or more ports to one array controller (the primary), but is able to switch, or “fail over,” access to the LUN to alternate ports on another array controller (the secondary), it is called active-passive (A/P). A simple A/P disk array triggers failover for a LUN based on where I/O for that LUN is received. Since a LUN failover (also called trespass) is a slow operation that impacts performance, all I/Os should only be flowing to only one of the available controllers for a LUN at a given point in time. Efficiently managing LUN trespass on an A/P array is critical to provide high performance data access.
    EMC’s Clariion Cx600 and Cx700, Hitachi Data Systems’ 95xx and 9200 series, IBM FASt-T, and Sun’s T3 and T4 are active-passive arrays.

In addition to this broad classification, active-passive disk arrays capabilities differ in other ways that affect availability and I/O performance:

  • Multiple primary & secondary paths (A/PC). If an active-passive array accepts and executes simultaneous I/O requests to a LUN on two or more ports of the same array controller, it is called an active-passive concurrent (A/P-C) array. Active-passive concurrent array LUNs fail over to secondary paths on alternate array controllers only when all primary paths have failed. Note that DMP’s ability to do load balancing over multiple primary or secondary paths to a LUN in an active passive array is fully governed by the I/O policy configured for that enclosure (see Section 4.6.1 for details).
  • Explicit failover (A/PF). A basic active-passive array fails over from primary I/O paths to secondary ones automatically when it receives an I/O request to a LUN on a secondary path. An Explicit failover active passive (A/PF) array fails over only when it receives special array model-specific SCSI commands from their hosts. Explicit failover provides the control required to achieve high performance with active-passive arrays in clusters, where multiple hosts can issue I/O requests directly to LUNs. Without explicit failover capability, cluster software must carefully synchronize all hosts’ access to a LUN before initiating implicit failover so that I/O requests from multiple hosts do not result in continuous failovers.
    Note: It is generally recommended to configure arrays capable of A/PF as such in cluster configurations to ensure optimum performance and minimize system boot times. A good example is the EMC Clariion which should be set to Failovermode 1 (explicit)’ when used with DMP.
    EMC’s CLARiiON and Sun Microsystems T3 and T4 arrays are A/PF arrays.
  • LUN group failover (A/PG). In general, LUNs fail over from one array controller to another individually. Some active-passive arrays, however, can fail administratively defined groups of LUNs over together. Arrays with this capability are called active-passive with group failover capability (A/PG). If all primary paths to a LUN in an A/PG array fail, all the LUNs in its group fail over to secondary paths. LUN group failover is faster than failover of individual LUNs, and can therefore reduce the application impact of array controller failure, particularly in disk arrays that present large numbers of LUNs.
    Hitachi Data Systems 9200 series arrays and Fujitsu ETERNUS 3000 are A/PG arrays.
  • Active-active asymmetric (A/A-A). A/A-A arrays increasingly comply with the Asymmetric Logical Unit Access (ALUA) method specified in SCSI-3 standards. While a LUN in an active-passive array can only be accessed through the controller that owns it at a given point in time (accessing that LUN through the other controller will either result in a LUN trespass or an I/O failure), a LUN in an active-active asymmetric array can be accessed through both controllers without dramatic consequences. The only limitation is that I/O serviced through a LUN’s secondary controller will experience slower performance than I/O serviced through the primary controller. One can think of ALUA arrays as a more forgiving version of active-passive arrays using standard SCSI-3 commands to control LUN/array controller ownership.
    HP EVA and Hitachi Data Systems TagmaStore AMS/WMS series are examples of A/A-A arrays.

As discussed in later sections, the Dynamic Multi-pathing (DMP) feature of Storage Foundation has a modular architecture that allows it to support new and different types of multi-path access control quickly and easily.

2.1 Discovering Multiple I/O Paths

UNIX operating systems “discover” the storage devices that are accessible to them automatically when they start up. Operating system device discovery consists of:

  • Scanning I/O buses or querying storage network fabrics to determine which bus or network addresses connect to actual disks or LUNs
  • Creating in-memory data structures in the operating system device tree that identify and describe discovered devices
  • Loading any specialized drivers required to utilize the devices

At the end of device discovery, an operating system has an in-memory database, or device tree, that represents the storage devices with which it can communicate, and has loaded the drivers required to control them.

To an operating system, a storage device is an address on a network that responds appropriately to SCSI storage device commands. UNIX operating systems are not inherently multi-path aware. They view a storage device accessible on two or more paths as two devices at different network addresses. Path management software, such as Storage Foundation’s Dynamic Multi-pathing , is required to analyze the device tree and identify multi-path devices. DMP’s discovery process and the modifications it makes to the operating system device tree are described in Section 4.3.

1 In this paper, the term disk refers both to actual disk drives and to the logical units (LUNs) presented to storage network ports by disk arrays.
2 Some HBAs have multiple ports, each of which is the starting point of a separate path through the storage network. Since each port is effectively a separate HBA, the model is simplified by treating an HBA as a port.
3 In addition to disk array virtualization, both disks and LUNs are sometimes virtualized by appliances or switches within the storage network, and by host-based volume managers such as VxVM. The virtual devices that result from disk array virtualization are universally referred to as LUNs. Virtual devices that result from switch and host-based virtualization are called virtual disks or volumes.