The Technical Services team for Backup and Recovery have produced a number of documents we call "Blueprints".
These Blueprints are designed to show backup and recovery challenges around specific technologies or functions and how Backup Exec solves these challenges.
Each Blueprint consists of:
This document explores and explains:
Causes of Server Failure and Downtime
The danger of server failure is a reality for all IT professionals. There are a variety of events that can cause server failure—and natural disasters are only one example. The list of possible causes of server failure includes the following:
• User Error - The most common form of server failure is user error. Users are people, and people make mistakes. Whether it’s the end user downloading and installing the wrong application or visiting the wrong websites, or the IT administrator setting down a cup of coffee at the wrong place at the wrong time, the human element consistently leads the way among causes of server failure.
• Planned Downtime - Planned downtime is another common cause of server downtime. Servers require maintenance in order to perform at an optimal level over a long period of time. Sometimes planned maintenance events can inadvertently lead to server failure when maintenance tasks, for whatever reason, prevent a server from coming back online and operating correctly, or coming back online at all.
• Hardware Failures - When it comes to hardware failures, it’s not a question of when, but how often. Hardware failures happen on a frequent basis. This can be due to defective hardware, equipment maintenance problems, power-related issues, accidents, and other causes. The risk of hardware failure becomes greater as the size and complexity of a data center increases.
• Viruses and Malware - Other potential causes of system failure include malicious code designed specifically to exploit security vulnerabilities in IT infrastructure. Both viruses and malware can put servers at risk, even if security software is present and up to date. Some malicious code is designed to destroy data, while others are designed to steal data, and still others are designed to secretly take control of systems and compromise security over a long period of time.
• Natural Disasters - Natural disasters are also among the threats that can cause system failure, although they are among the most unlikely. Hurricanes, floods, fires, tornados, and other natural events can certainly bring servers down and cause them to fail, and perhaps even physically destroy them.
Cost of Server Downtime
The cost of server downtime includes tangible, direct costs such as lost transaction revenue, lost wages, lost inventory, remedial labor costs, marketing costs, bank fees and legal penalties from failing to meet regulatory compliance requirements or from not delivering on service level agreements, and intangible, indirect costs including lost business opportunities, loss of employees and/or employee morale, decrease in stock value, loss of customer/partner goodwill, brand damage, driving business to competitors or even bad publicity.
The cost of server downtime can be very significant to an organization, and perhaps even fatal. The longer the server downtime persists, the greater the damage, and the more likely the IT “blow” suffered to the organization becomes fatal. This is also true for partners and service providers with responsibility for the business continuity of end user customers. The ability to recover quickly from server failure is a key element of any service provider’s portfolio.
The simple equation is easy and most of us can do it in our heads. A six-hour outage on a 24x7 Є36 million per annum direct fulfilment system is Є36 million divided by on-line hours per year (365 x 24 = 8760) and multiplied by 6 hours would be Є24,657.00. But that is merely a superficial formula. Lost revenue is the simply the most obvious, most visible and easily identified cost of downtime, and the calculation above is a reasonable ballpark figure of that loss (notwithstanding sale fluctuations, calendar and time). But this simple calculation is also grossly inadequate and only touches lightly on the real costs to the organisation.
As organisations become more interdependent across business units and extend their supply chains, the impact of downtime escalates rapidly. Simply, the whole business is affected. To truly assess the financial impact of any outage an organisation must consider all the aspects that are in use at any time within that business. The list is horribly exhaustive and includes:
In fact, more or less any aspect of the organisation’s operation, production, and development together with its support functions will be affected by an outage, as will the outward appearance of the organisation to its existing and potential customers. Unreliability being one of the most detrimental characteristics to an organisation; and that’s exactly what an outage appears to the outside world – mitigating circumstances are inadmissible from the customer’s point of view and this includes internal as well as external customers, potential customers, suppliers, government agencies and the competition within the vertical market sector. Any weakness seen by the competition may well be used as a leg up by other organisations as a competitive advantage.
Server Recovery Problems and Obstacles
In light of the problem of server failure and downtime, it is critical that businesses equip themselves with tools and solutions to recover from such an event. Solutions that enable quick server recovery in the event of a disaster can mitigate both the server downtime itself as well as the associated costs.
Of course, there are obstacles and problems that make old server recovery methods, as well as new elements of the server recovery problem, difficult to overcome. These include the complexity of manual server recovery processes as well as the problem of recovering to dissimilar hardware configurations.
Complexity of Manual Server Recovery
Manual server recovery can be a time-consuming and tedious process. Typically, manual recovery includes rebuilding a server by reinstalling the operating system, rebooting several times throughout the recovery process, reconfiguring the system, loading backup software, and hoping that no errors have occurred along the way. This process, which can take hours or even days, generally exceeds the capabilities of the average small business.
For larger organizations, the complexity of the server recovery problem can be exacerbated when an organization has one or more remote sites at which servers are located.
The Dissimilar Hardware Problem
Recovering to dissimilar hardware is also essential to effective server protection. It is cost-prohibitive for companies to maintain standby replicas of production server configurations for recovery purposes. Even in situations where standby hardware is available, small variations in hardware builds can cause problems for full server recovery solutions that are not equipped to deal with dissimilar hardware.
Bare Metal and Dissimilar Hardware Recovery with Backup Exec™ 2014
To help businesses prepare for and overcome the problem of server failure and downtime, Symantec has introduced Backup Exec™ 2014 with integrated bare metal recovery and dissimilar hardware recovery – also known as hardware discovery - capabilities. These features make full server recovery easy, and offer it as a built-in element of Backup Exec™ 2014 data and application protection practices.
For step-by-step instructions for installing and managing Backup Exec™ 2014’s bare metal and dissimilar hardware recovery features, please refer to the Backup Exec™ 2014 Administrator’s Guide available here: TECH205797.
You can use this Blueprint to better understand Backup Exec's DLM technologies - please download from the link below.