cancel
Showing results for 
Search instead for 
Did you mean: 

Enhanced Instant Recovery resiliency for VMware in Backup Exec 16 FP 1

BillWaldrum
Level 1
Employee

The Backup Exec Product Team has been committed to quarterly releases that add real customer value since the release of Backup Exec 15.  Our product development philosophy has changed from to trying to release a large number of features in single yearly release to quarterly limited feature releases that are targeted at changing customer needs.  This gives our users a chance to see features much earlier and more importantly gives our product team a chance to incorporate early customer feedback and respond to rapidly changing market requirements.

Since starting the quarterly release cadence, we have released a number of features that have improved our customer’s experience protecting virtual workloads.  In Backup Exec 15 FP5 we added support for Instant Recovery of virtual workloads.  In keeping with our new philosophy, we gave the customer flexibility to use IR for a wide variety of use cases.  These included:

  1. IR a VM for test/dev purposes
  2. IR a VM for ad-hoc validation of the backup
  3. IR a VM for a disaster recovery purposes
  4. IR a VM for simple redirected restore.

After releasing IR as a general purpose use tool, we have had great feedback from customers on some of the ways that they use IR.  For example, customers have told us that they want to use IR to do VM validation.  So in Backup Exec 16 FP1, we are releasing our new Recovery Ready feature based on IR technology.  Recovery Ready validates a VM backup by ensuring that it can power on and boot to a login prompt.  This validation gives greater assurance to our customers that their most important VMs are protected and recoverable.

Customers have also indicated that they want to use IR for more long term applications. This means that the IR VM will be running longer than just the short period of time required to do validation or to copy to production storage.  Because of this feedback, we have made IR VMs more robust and resilient in Backup Exec 16 FP1.  To understand this new enhanced resiliency feature it is important to know how Backup Exec implements Instant Recovery.  The implementation is different between VMware and Hyper-V, so this post will cover the VMware implementation. 

When a VMware VM is instantly recovered, the Backup Exec media server executes the following steps behind the scenes:

  1. Creates a virtual, read-only copy of the original backup set of VMDKs in a unique directory.
  2. Loads a unique instance of the VxLatServer.exe process to handle all the I/O for the virtual directory. exe is a critical process that translates a given virtualized backup directory VMDKs into a format that can be understood by VMware ESXi hosts.
  3. Creates an NFS share on the virtualized directory
  4. Defines a ESXi Datastore based on the NFS share
  5. Creates a new VM based on the original VMs settings along with the IR job configuration
  6. Attaches the VMDKs exposed in step 1 to the new IR VM.
  7. Takes a VM snapshot so that any writes to the new IR VM go to production storage instead of the Backup Exec media server backup storages.

There are two key points with regard to resiliency that customers should understand about the above steps.  First is that all the VM read requests ultimately come from the Backup Exec media server storage.  If the media server becomes available for any reason, then the IR VM will not be able to read from its VMDKs.  From a guest perspective, it will look like someone unplugged a disk drive.  This is why the Backup Exec UI presents so many warning dialogs when creating an IR VM that emphasize the need to keep the media server available for the life of the IR VM.

The second key resiliency point is that a given IR VM’s VxLatServer process must be running to service the read requests from that VM.  If it the VxLatServer process is stopped or killed for any reason, the VM will not be able to read its data.  This will appear to the guest as if its disk drive has stopped responding.  In the original release of IR in Backup Exec 15 FP 5 there was no way to restart this critical process.  Once a VxLatServer process stopped running, an un-migrated IR VM permanently lost access to its disk drives with no way to recovery.   

It is this second point that is being addressed with the Backup Exec 16 FP 1 Enhanced Instant Recovery Resiliency feature.  Now when the Backup Exec Engine service is restarted, it will check to make sure that there is a host VxLatServer process running for every IR VM that is supposed to be active.  If it finds any IR VMs that don’t have a host process, it will start a new VxLatServer process.  From the IR VM perspective, it will look like its local drive was suddenly plugged in again.

Of course, if the media server is shut down unexpectedly or if the VxLatServer host process are terminated, the IR VMs will not be functional for the duration of time that the media server is unavailable.  Depending on what the IR VMs are doing, they may not react well to having their disk drives suddenly become unavailable.  However, once the media server restarts and the VxLatServer host process are automatically restarted, the storage once again becomes available and the IR VMs can pick up where they left off.

If a media server outage is planned, the IR VMs can either be shut down or suspended prior to the outage.  With enhanced resiliency, the media server will restart all the necessary processes to support the existing IR VMs.  Once the media server is available, The IR VMs can be powered on without losing any data saved prior to shut down. 

The Enhanced resiliency feature in Backup Exec 16 FP1 allows customers to use the IR technology to create longer lived IR VMs without fear of losing their IR VM data in the event of an unexpected media server outage.  It also allows customers more flexibility in how they do planned outages of their media server.  And finally, it represents our continued commitment to quickly bring incremental improvements to new features that we release based on feedback from our customers.