Showing results for 
Search instead for 
Did you mean: 

Time Machine: The Future of IT Resilience

Level 3


If you are a CIO with responsibility for an enterprise data center, your strategy needs to include planning for a resilient data center environment, especially with the movement to generation hybrid architectures. Historically, the IT community has looked at data center reliability through the lens of preventive defense, often measured through parameters like 2N, 2N+1, etc redundancy. However, as the definition of the data center expands beyond the scope of internally managed hardware/software into the integration of modular platforms and cloud services, simple redundancy calculations become only one factor in defining resilience. Ongoing reforms to legislation, relevant security standards and other regulations must also be continually monitored by organizations. Such requirements are generally established to ensure the resilience of the organization’s information assets, or information assets they hold on behalf of others in the course of their business. Compliance requirements may also be industry and/or location specific, with key sectors such as banking and finance, telecommunications and utilities subject to their own regulations.

There are a number of new practices that need to be implemented to insure resilience of key IT systems and business services. These approaches include completely automating monitoring critical services on a continuous basis, as well as designing systems which include persistence storage to withstand any disruptions to entire systems. IT Organizations should also look to augment their portfolio with new tools that can help boost their resilience and also implement novel best practices to test their systems hardness.  For example, Netflix has earned recognition for its novel use of resilience tools for testing the company’s ability to survive failures and operating abnormalities. The company’s Simian Army, consisting of services (monkeys), unleash failures on their systems to test how adaptive their environment actually is. The data center community needs to challenge itself to find similar means for testing adaptively in modern hybrid architectures if it is to rise to the challenge of ultra-reliability.

With data center infrastructure management (DCIM) tools becoming more sophisticated, and the need for integrations across multiple ecosystems increasing, availability of robust data for informing ongoing decision-making in the data center is a must have for ensuring IT Resiliency. No longer is resilient data center architecture just about the building and infrastructure. It’s about designing the right architecture plus including tools that can help reduce the probability of failure and insure better predictability of system availability.