Cloud outages and how to avoid them

S_Deshpande · ‎03-15-2018

Did you know that one out of three organizations experience between 31-90 minutes of downtime per month? Stay informed on the Truth In Cloud by reading the latest announcement and downloading the report here.

Don’t get me wrong – public clouds are great harbingers of digital transformation, and can really enable your organizations to up its competitive game by reducing precious Capex. In today’s modern economy cloud adoption is a must, but you need to understand that even clouds can fail – and in turn, so can your business. Did you know that most cloud service provider legal contracts state that they aren’t responsible for service interruptions and aren’t responsible or liable for resulting loss of business, revenue or profits? We’ve seen some major cloud outages over the last year and these have cost businesses up to millions of dollars. A simple human error, a power outage or even a cyber-attack in the cloud means that you are powerless – you don’t control the infrastructure, you don’t control the services and you don’t even control the personnel. It’s outside of your purview but these can have disastrous implications for your business. We’re also seeing a rise in multi-cloud adoption – on average organizations worldwide are using up to three cloud service providers. In a multi-cloud world, with minimal on-premises infrastructure footprint under direct control, it’s even more important to ensure that your organization takes responsibility for ensuring business uptime in addition to the service levels that your cloud provider offers.

Anatomy of a Cloud Outage

It's important to understand the anatomy of a cloud outage. Regardless of whether a cloud is down due to network issues, power outages or even human error, it can take a while for the cloud service provider to troubleshoot and rectify the error. In the meantime, customer applications running in that cloud or cloud region will take a hit. In the example below, it takes 5 hours for the cloud service provider to get back to normal operations.

anatomyoutage (2).png

But depending on the number of applications you have affected by the outage, and the complexity of these applications (start and stop dependencies across multitier applications), it can take much longer than 5 hours to get your business applications back up and running in the cloud.

Best Practices for Recovery

Your business needs a way to failback business services to your on-premises environments as soon as you detect an outage and possible system downtime. An automated risk monitoring solution that can alert you to any delays in in your application Recovery Time Objectives and Recovery Point Objectives is essential. But importantly, you need to get your business applications started on-premises which means that you should have been replicating data from the cloud to your on-premises data center for DR purposes. You can also use another cloud region, or better yet, another cloud provider as your DR target if you don’t wish to pursue an on-premises DR strategy for cloud-hosted applications. Either way a fully automated solution that lets you perform all DR tasks simply without relying on an entire team of people is best.

To learn more about the anatomy of a cloud outage and how you can ensure maximum uptime for your business applications watch the “Disaster Recovery for the Multi-Cloud” webcast, delivered by our cloud and continuity experts.

VOX

Understanding the Anatomy of Cloud Outages and Best Practices for Recovery