Scale compute and storage independently with InfoScale Storage 7.1

InfoScale storage's flexible storage sharing feature (FSS) allows pooling and sharing of storage resources among various compute entities whether they are physical machines, virtual machine or a mix of both.

The new version 7.1 of InfoScale takes FSS to the next level by providing the ability to isolate and on demand plumb the required amount of storage of desired charactistics to the compute entities where the application has been chosen to run thus improving the scalability of the cluster. Some of the key benefits are:

  • Improved view of the storage in a large cluster
  • Simplified storage provisioning
  • Easier device management

This blog would provide insight into the thought process that went into designing this.

Applications require a desired amount of compute, network and storage to run. Typically they are deployed on resource pools so as to not only improve utilization as well as provide high availability and resilience against component failures or scheduled maintenance activities.

In order to meet the storage needs of the applications in such an environment, the access to the storage needs to be dynamic and flexible. One way of doing this is to use a Storage Area Network (SAN) but this cumbersome as it requires the storage array and SAN switches be reconfigured for providing access on the fly. Network attached storage (NAS) provides the required flexiblity, we just need to connect to the same IP portal, but it cannot meet the performance demands of enterprise class applications. InfoScale flexible storage sharing fits the bill by allowing compute instances to dynamically share any type of block storage (local, directly attached - DAS or SAN) with one another.

Prior to InfoScale 7.1 this was achieved by mimicing locally available storage as a shared SAN storage by exporting them to all compute entities in the Infoscale cluster. For e.g as shown below, in a 16 node VMware virtual machine cluster (sal1 .. sal16) a VMDK on one VM can be shared with all other VMs. Notice how one VM even has SAN storage attached that can also be shared. Once created, the FSS diskgroup appears on all the cluster nodes (compute instances).

---------- Export the local disk sal1_vmdk0_4 on sal1 to all nodes in cluster ----------
[root@sal1 ~]# vxdisk export sal1_vmdk0_4
[root@sal1 ~]# vxdisk list
DEVICE       TYPE            DISK         GROUP        STATUS
emc0_019b    auto:cdsdisk    -            -            online thinrclm
emc0_019c    auto:cdsdisk    -            -            online thinrclm
emc0_019d    auto:cdsdisk    -            -            online thinrclm
emc0_019e    auto:cdsdisk    -            -            online thinrclm
sal1_vmdk0_0 auto:cdsdisk    disk01       testdg       online
sal1_vmdk0_1 auto:cdsdisk    -            -            online
sal1_vmdk0_2 auto:cdsdisk    -            -            online
sal1_vmdk0_3 auto:LVM        -            -            LVM
sal1_vmdk0_4 auto:cdsdisk    -            -            online exported
[root@sal1 ~]#

---------- Remote node sal16 sees the disk sal1_vmdk0_4 ----------
[root@sal16 ~]# vxdisk list
DEVICE       TYPE            DISK         GROUP        STATUS
sal1_vmdk0_4 auto:cdsdisk    -            -            online remote
sal16_vmdk0_0 auto:cdsdisk    -            -            online
sal16_vmdk0_1 auto:cdsdisk    -            -            online
sal16_vmdk0_2 auto:cdsdisk    -            -            online
sal16_vmdk0_3 auto:LVM        -            -            LVM
sal16_vmdk0_4 auto:cdsdisk    -            -            online
[root@sal16 ~]#

---------- Create a FSS disk group fssdg on remote disk sal1_vmdk0_4 ----------
[root@sal16 ~]# vxdg -o fss -s init fssdg sal1_vmdk0_4
[root@sal16 ~]# vxdisk list
DEVICE       TYPE            DISK         GROUP        STATUS
sal1_vmdk0_4 auto:cdsdisk    sal1_vmdk0_4  fssdg        online shared remote
sal16_vmdk0_0 auto:cdsdisk    -            -            online
sal16_vmdk0_1 auto:cdsdisk    -            -            online
sal16_vmdk0_2 auto:cdsdisk    -            -            online
sal16_vmdk0_3 auto:LVM        -            -            LVM
sal16_vmdk0_4 auto:cdsdisk    -            -            online
[root@sal16 ~]#

---------- All other nodes see the disk and fssdg ----------
[root@sal9 ~]# vxdisk list
DEVICE       TYPE            DISK         GROUP        STATUS
sal1_vmdk0_4 auto:cdsdisk    sal1_vmdk0_4  fssdg        online shared remote
sal9_vmdk0_0 auto:cdsdisk    -            -            online
sal9_vmdk0_1 auto:cdsdisk    -            -            online
sal9_vmdk0_2 auto:cdsdisk    -            -            online
sal9_vmdk0_3 auto:cdsdisk    -            -            online
sal9_vmdk0_4 auto:LVM        -            -            LVM
[root@sal9 ~]#

With InfoScale 7.1 this flexiblity in sharing storage is taken to the next level. There is no need to create a shared SAN using locally connected storage prior to using the storage to provision for applications. Instead which ever compute instance(s) the application needs to be run the storage is automatically mapped behind the scenes. If the application moves for high availabilty reasons, the storage is replumbed to the new compute instance. If more instances of the application need to be created on different compute instances then the storage is mapped additionally to only those nodes.

StorageAccessLayer.png

This is achieved by the 'storage access layer' that manages access to storage devices in InfoScale Storage cluster. It continously keeps track of all the storage devices along with their access state (cluster nodes or compute instances that currently can access it), device status (in-use, free, error) and makes sure this information is readily available with any node in the cluster, like a replicated state machine. This information can be viewed from any node in the cluster using the below CLI. Notice the node count information (last but one column) which provides the number of cluster nodes to which the storage is accessible.

---------- Clusterwide disk list that can be run on any node ----------
[root@sal9 ~]# vxdisk -o cluster list
DEVICE                           MEDIA      SIZE(MB)   GROUP        NODES      STATE     
emc0_019b                        hdd        30720      docdg        8          online    
emc0_019c                        hdd        30720      -            8          online    
emc0_019d                        hdd        30720      -            8          online    
emc0_019e                        hdd        30720      -            8          online    
sal1_vmdk0_0                     hdd        1024       testdg       1          online    
sal1_vmdk0_1                     hdd        1024       -            1          online    
sal1_vmdk0_2                     hdd        1024       -            1          online    
sal1_vmdk0_4                     hdd        1024       fssdg        1          online    
sal2_vmdk0_0                     hdd        1024       -            1          online    
sal2_vmdk0_2                     hdd        1024       -            1          online    
sal2_vmdk0_3                     hdd        1024       -            1          online    
sal2_vmdk0_4                     hdd        1024       -            1          online    
sal3_vmdk0_0                     hdd        1024       -            1          online    
sal3_vmdk0_2                     hdd        1024       -            1          online
. . .

---------- Details of a disk seen the cluster ----------
[root@sal9 ~]# vxdisk -o cluster list sal1_vmdk0_4
device      :  sal1_vmdk0_4
dg          :  fssdg
udid        :  VMware%5FVirtual%20disk%5Fvmdk%5F6000C29EC814CF8EF6A79BD81CF5D8E1
dgid        :  1463041712.50.sal1
guid        :  {51f6c334-e061-11e5-8b7f-89362c795deb}
mediatype   :  hdd
site        :  -
status      :  online
size        :  1024 MB
block size  :  512
max iosize  :  1024
connectivity:  
 node[14]    :  sal1

[root@sal9 ~]#

This list can be pretty huge when we have a large number of cluster nodes and each with its own local storage. So the CLI supports filter options based on node name, media type, free etc which can come in handy.

vxdisk -o cluster [-o udid] [-o dg] [-o dgid] [-o node] [-o free] [ -o mediatype={ssd|hdd}] list 

With this information available, any compute instance that requires to provision storage needs to just pick free storage available in the cluster and start using it. the other cluster nodes immediately get to know that the storage is no longer free.

---------- Initial disk list ----------
[root@sal9 ~]# vxdisk list
DEVICE       TYPE            DISK         GROUP        STATUS
sal1_vmdk0_4 auto:cdsdisk    sal1_vmdk0_4  fssdg        online shared remote
sal9_vmdk0_0 auto:cdsdisk    -            -            online
sal9_vmdk0_1 auto:cdsdisk    -            -            online
sal9_vmdk0_2 auto:cdsdisk    -            -            online
sal9_vmdk0_3 auto:cdsdisk    -            -            online
sal9_vmdk0_4 auto:LVM        -            -            LVM
[root@sal9 ~]#

---------- Choose a free disk sal7_vmdk0_0 from cluster list ----------
[root@sal7 ~]# vxdisk -o cluster -o free list
DEVICE                           MEDIA      SIZE(MB)   GROUP        NODES      STATE     
emc0_019c                        hdd        30720      -            8          online    
emc0_019d                        hdd        30720      -            8          online    
emc0_019e                        hdd        30720      -            8          online    
sal1_vmdk0_1                     hdd        1024       -            1          online    
sal1_vmdk0_2                     hdd        1024       -            1          online
...
sal6_vmdk0_2                     hdd        1024       -            1          online    
sal6_vmdk0_4                     hdd        1024       -            1          online    
sal7_vmdk0_0                     hdd        1024       -            1          online
sal7_vmdk0_1                     hdd        1024       -            1          online
...

---------- Directly use it to initialize a disk group saldg ----------
[root@sal9 ~]# vxdg init saldg sal7_vmdk0_0
[root@sal9 ~]# vxdisk list
DEVICE       TYPE            DISK         GROUP        STATUS
sal1_vmdk0_4 auto:cdsdisk    sal1_vmdk0_4  fssdg        online shared remote
sal7_vmdk0_0 auto:cdsdisk    sal7_vmdk0_0  saldg        online remote
sal9_vmdk0_0 auto:cdsdisk    -            -            online
sal9_vmdk0_1 auto:cdsdisk    -            -            online
sal9_vmdk0_2 auto:cdsdisk    -            -            online
sal9_vmdk0_3 auto:cdsdisk    -            -            online
sal9_vmdk0_4 auto:LVM        -            -            LVM
[root@sal9 ~]#

---------- The disk is shown to be in use from any node in cluster ----------
[root@sal16 ~]# vxdisk -o cluster list | grep sal7_vmdk0_0
DEVICE                           MEDIA      SIZE(MB)   GROUP        NODES      STATE
sal7_vmdk0_0                     hdd        1024       saldg        1          online    
[root@sal16 ~]#

In order to achieve migration of storage when the application consuming it migrates, all we need to do is deport the diskgroup from the source compute instance (cluster node) and reimport it on the destination. This just simplifies the operation and can be easily automated using high availabilty solutions like InfoScale Availability

---------- Deport the diskgroup saldg from sal 9 ----------
[root@sal9 ~]# vxdg deport saldg
[root@sal9 ~]# vxdisk list
DEVICE       TYPE            DISK         GROUP        STATUS
sal1_vmdk0_4 auto:cdsdisk    sal1_vmdk0_4  fssdg        online shared remote
sal9_vmdk0_0 auto:cdsdisk    -            -            online
sal9_vmdk0_1 auto:cdsdisk    -            -            online
sal9_vmdk0_2 auto:cdsdisk    -            -            online
sal9_vmdk0_3 auto:cdsdisk    -            -            online
sal9_vmdk0_4 auto:LVM        -            -            LVM
[root@sal9 ~]#

---------- Disk is automatically unmapped from sal9 ----------
[root@sal16 ~]# vxdisk list
DEVICE       TYPE            DISK         GROUP        STATUS
sal1_vmdk0_4 auto:cdsdisk    sal1_vmdk0_4  fssdg        online shared remote
sal16_vmdk0_0 auto:cdsdisk    -            -            online
sal16_vmdk0_1 auto:cdsdisk    -            -            online
sal16_vmdk0_2 auto:cdsdisk    -            -            online
sal16_vmdk0_3 auto:LVM        -            -            LVM
sal16_vmdk0_4 auto:cdsdisk    -            -            online

---------- Now import the disk group on another node sal16 ----------
[root@sal16 ~]# vxdg import saldg
[root@sal16 ~]# vxdisk list
DEVICE       TYPE            DISK         GROUP        STATUS
sal1_vmdk0_4 auto:cdsdisk    sal1_vmdk0_4  fssdg        online shared remote
sal7_vmdk0_0 auto:cdsdisk    sal7_vmdk0_0  saldg        online remote
sal16_vmdk0_0 auto:cdsdisk    -            -            online
sal16_vmdk0_1 auto:cdsdisk    -            -            online
sal16_vmdk0_2 auto:cdsdisk    -            -            online
sal16_vmdk0_3 auto:LVM        -            -            LVM
sal16_vmdk0_4 auto:cdsdisk    -            -            online
[root@sal16 ~]#

As an extension if there is a requirement to have parallel/shared access to application instances running on more than one compute instances (cluster nodes) then there is the application isolation feature, released as a 'tech preview' in InfoScale 7.1 which can provide this. Look out for a blog on it...

This way we can easily scale the number of application instances without worrying about the storage connectivity. At the same time we can scale up the storage requirements of the application by plugging in new nodes to the cluster (that can provide storage) or pluging-in new storage on any node in the cluster. An extension of the latter is to plug-in a SAN to only one node and use it from rest of the nodes in the cluster - something very unique with InfoScale Storage.

There is also a helper utility that provides a summary of the storage available in the cluster and on each node of the cluster. This can be run from any node in the cluster

---------- Obtain the total and free storage capacity of cluster from any node ----------
[root@sal7 ~]# vxsan list
nodes:            total=16	storage=16
diskgroups:       total=5	imported=0
devices:
  hdd:            total=68	capacity=188416 MB   free=153600 MB
  ssd:            total=0	capacity=0 MB	     free=0 MB
[root@sal7 ~]#

---------- Obtain a node wise listing of HDD & SDD storage capacity ----------
[root@sal7 ~]# vxsan list nodes
			HDD					SSD
		--------------------------------	---------------------------------
NODE		COUNT	TOTAL (MB)	FREE (MB)	COUNT	TOTAL (MB)	FREE (MB)
sal1	        8	126976	        94208	        0	0	        0
sal2	        8	126976	        96256	        0	0	        0
sal3	        8	126976	        96256	        0	0	        0
sal4	        8	126976	        95232	        0	0	        0
sal5	        8	126976	        96256	        0	0	        0
sal6	        8	126976	        96256	        0	0	        0
sal7	        8	126976	        95232	        0	0	        0
sal8	        8	126976	        96256	        0	0	        0
sal9	        4	4096	        4096	        0	0	        0
sal10	        4	4096	        4096	        0	0	        0
sal11	        4	4096	        4096	        0	0	        0
sal12	        4	4096	        4096	        0	0	        0
sal13	        4	4096	        4096	        0	0	        0
sal14	        4	4096	        4096	        0	0	        0
sal15	        4	4096	        4096	        0	0	        0
sal16	        4	4096	        4096	        0	0	        0
[root@sal7 ~]#

When resources are pooled/shared to improve utilization a typical problem encountered is the noisy neighbour situation. The storage access layer in conjuction with application isolation feature provides the access isolation while the performance isolation can be achieved by restricting the maximum number of IOPS a particular application can issue to the storage system. Be on a look out for a blog about this InfoScale 7.1 feature as well..

InfoScale storage can be used as the storage backbone not only to host enterprise applications like Oracle RAC, but also

All these get the same flexibility for meeting the storage needs while ensuring enterprise class storage performance and availability.

1 Comment

Excellent article!

Great to know the new cutting edge capabilities of Infoscale Storage. yes

And thanks for describing it in so easy to understand manner.