cancel
Showing results for 
Search instead for 
Did you mean: 
ccarrero
Level 4
Employee

One of the new features that has been introduced within Storage Foundation Cluster File System HA 6.0.1 is I/O Shipping. This new feature improves resiliency across Cluster File System nodes reducing downtime and improving resiliency. I/O Shipping is about being able to ship I/O from one cluster node to another when a problem in the I/O path occurs. Once enabled, this work transparently for the user, automatically reacting to failures.

You may think in your backup job running in your media server that is writing to disk, your batch process that has been working for a few hours or your database transaction that should not be interrupted. Any outage in the path down to the storage may affect those jobs and therefore impact your business. There is no need to suffer any outage when working in a cluster and all the I/O can still being made by the other nodes in the cluster.

To prove how this feature works a four node cluster will be used. Each node has read and writes access to the same file system (/data01) using Cluster File System. Write workload will be generated to the file system from node cfs02 and disk1 will be abruptly unplugged from that node. All the I/O generated from node cfs02 to the missing disk will be shipped to the other nodes in the cluster.

I/O Shipping is not enabled by default, so first thing is to enable it using vxdg command:

# vxdg -g dg01 set ioship=on

Once we have enabled I/O shipping, let´s simulate some workload on the directory. Because it is a RAID0 volume layout, all the five disks are used:

Now a path failure is going to be simulated by removing the first disk from cfs02 node.

Now the disk is presented as local failed:

And the disk has disappeared from the OS output, but notice that the writes continue in the other disks:

Taking a look to the disk activity in the other nodes, we can observe that they all are writing to the disk that failed locally on cfs02 node.

cfs01:

cfs03:

cfs04:

In order to recover the original situation we just need to attach the storage back (or fix whatever issue made the path to fail). Once the path is recovered, the disk is presented to the OS again and the I/O is performed locally again:

All disks are fine now:

Therefore, I/O Shipping technology enhances service availability by not missing any transaction or breaking any running job. It allows the application to continue running, avoiding any recovery. Once the issue has been fixed, it again transparently for the application ship the I/O to the local node.

You may have noticed a little drop on performance, given that I/O goes trough the private links. We are already working on a new release that will bring an exciting technology to avoid that performance impact. Keep tuned!

 

Carlos Carrero.-

Comments
Armando_Crisafo
Level 2
Employee Accredited Certified

Great article Carlos: it is very clear with the example that you described.

Many thank indeed.

Douglas_Snyder
Level 5
Employee Accredited Certified
Excellent explanation of I/O shipping and the benefits to the customer. Great post, thanks for the time and effort!
Ashish_Yajnik
Level 3

Very informative Carlos. This is also useful for scenarios where we want to avoid application failover to a standby node in case of CFSHA and avoid recovery of applications and maximize SLA.

mikebounds
Level 6
Partner Accredited

Can someone confirm my understanding of this feature is correct:

I/O Shipping does not help in server or storage failures - it help in disk path failures like HBA card or FC switch failure, but this is what DMP provides, so the use cases of I/O shipping are:

  1. You only have 1 path to the storage
  2. You have multiple paths, but they all fail.

As both of these are not that common, I am wondering if I am misunderstanding what I/O Shipping provides.

Mike

ccarrero
Level 4
Employee

Mike,

You are correct in your use cases. The feature will enter in action anytime all your paths to your storage are lost, no matter what the failure is. It is adding an extra resiliency layer, and why not it may open doors for new more cost-effective architectures.

Cheers,

Carlos.- 

AnsariN
Not applicable

Hi Carlos,

Is there any reson to keep ioship=off as default ? See case 06099038 for at lease one issue that can be avoided by changing default to on.

Version history
Last update:
‎01-28-2013 02:09 PM
Updated by: