cancel
Showing results for 
Search instead for 
Did you mean: 

Need for advanced storage QoS in OpenStack

Prasanna_Wakhar
Not applicable
Employee

The cost of cloud services is claimed to be lower due to extensive sharing of resources. However, a higher degree of sharing would automatically mean less guarantees of performance, since performance expectations of users may be affected by the resource sharing. So quality of service (QoS) becomes the crucial factor for the success of Openstack storage providers. 

 

Storage QoS in Openstack

OpenStack supports 3 types of persistent storage:

1. Cinder for block storage

Cinder provides the support for managing block storage. This is presented to the compute (Nova) layer using iSCSI, Fibre Channel or NFS protocols, as well as a number of proprietary protocols that deliver back-end connectivity.

2. Swift supports object storage

Object storage within OpenStack is delivered via Swift, which implements a scale-out object store distributed across the nodes of an OpenStack cluster. Object stores store data as binary objects, with no specific reference to a format. Objects are stored and retrieved from Swift using simple commands such as PUT or GET, based on the HTTP (Web) protocol, also known as a RESTful API.

3. Manila:  File Share Service

Manila bridges the gap between block and object by providing the ability to map external storage systems using file-based protocols (NFS/CIFS) to Nova hosts and guests. File shares can be distributed between hosts and guests, as NAS protocol manages locking and data integrity processes required to provide multiple concurrent accesses to data.

Typically Openstack nova compute hosts are the commodity servers with powerful and cost effective hardware having NUMA nodes, at least 128 GM RAM, RAID controller, hybrid storage like SSD and HDD etc. As the hardware becomes more powerful few tens of diverse applications can be easily hosted on same compute. However, interference between collocated competitive workloads can degrade performance, violating the quality of service (QoS) guarantees that many cloud workloads require.

 

Why storage QoS becomes more significant?

 

Lack of fine grained and sophisticated QoS would result in following issues:

 

1.  Priority inversion (high priority important application is throttled at the cost of low priority application)

2.  Noisy neighbor (noisy application can easily consume an unfair share of the resources. Leaving nothing for others in a typical multi-tenant infrastructure)

3.  Starvation of low priority VM or low demand VM amongst equal priority VM’s.

4.  High and unpredictable latencies due to storage congestion.

5.  In Hybrid storage environment the slowest storage would bring down the performance of entire system.

6.  Bursty workload could overwhelm the storage.

7.  The internal IO activities like periodical data offload from nova compute host to swift, CEPH etc. for replication purpose. Copying data to peer node for fault tolerance purpose can overload the underlying storage. It would also affect the performance of application running on the same compute host.

8.  Adding more capacity of the system (for example adding more SSD cards) hardly result in proportional increase in performance.  This hinders overall scalability of the cluster.

   

How advanced storage QoS would help address the above mentioned problems?

 

1.  Priority given to workload would solve the priority inversion issue.

2.  Noisy neighbor would be solved by capping the iops of the noisy VM temporarily.

3.  Miniops dedicated to workload would take care of starvation.

4.  Latency guarantees by supporting workload deadline as SLA.

Together Priority, miniops, maxiops and deadline SLA’s would take care of performance guarantees and performance isolation in terms of throughput as well as latencies.

5.  QoS will understand the various storage tiers and also various caching layers. It will present the combined capacity based on usage trends of various performance tiers. This will make sure that the system capacity is neither limited to slowest storage nor overshoot by fastest performance tier.

6.  QoS will identify the burst and bursty application will be allowed to satisfy the burst without any delay.

7.  Prioritizing (or deprioritizing) internal ios like replication and data offload so that these ios won’t affect application SLA’s.

8.  True Scale out. As the capacity is added QoS will guarantee that the performance should increase linearly.

 

In summary, for an effective use of the commodity server resource pool in hyper converged architecture there is a need for sophisticated, advanced and fine-grained storage QoS on DAS attached storage.

 

Looking forward to your comments and experiences on performance SLA's. Your feedback would help in further fine tuning the offering…