cancel
Showing results for 
Search instead for 
Did you mean: 

Active-Active Single Domain Site Resiliency in NetBackup Flex Scale 2.1

bkrieger
Moderator
Moderator
Employee

This blog is co-authored by @Ravindra Walde.  

NetBackup Flex Scale has site-level resiliency and replication capabilities that have grown easier to use and more flexible: 

  • Active/active operation mode with each site replicating the other’s data 
  • Dual-site, simultaneous, rolling upgrade of NetBackup Flex Scale
  • Automated disaster recovery relationship creation using either GUI or API
  • Volume-based, automated replication of NetBackup catalog
  • Independent and non-disruptive scaling and node replacement of the cluster segments 

In the case of NetBackup, beginning a disaster recovery procedure means that both data backed up at the original, “primary” site and metadata created about that data must be replicated to the remote, “secondary” site prior to the disaster, so that they will already exist in the appropriate location when it’s time for recovery.  

NetBackup Flex Scale provides the option for a dual-site, single-domain DR configuration. It can be configured in one of two ways: 

  • active-passive configuration whereby the primary site is running the NetBackup primary service and all clients connect to the media and storage services running there and replicate data to the secondary site. 

Screen Shot 2022-01-18 at 3.40.12 PM.png

  •  
    active-active configuration whereby both sites are used to receive local data and duplicate it to the remote site, effectively storing all client data twice, once via backup and once via replication. Clients at the secondary site will use the NetBackup primary service at the primary site for job control, but the local media service and storage pool at the secondary site.  

In either configuration, the clusters are deployed separately at first, with the secondary site as a waiting “empty” cluster until the disaster recovery relationship is created using the appliance UI or API.  

When the DR relationship is operational: 

  • only one instance of the NetBackup primary service runs, hosted by the nodes at the primary site.
  • In each site, the media service containers are treated collectively as the local storage unit (STU) for the client data, stored locally as a first copy before replication to a secondary/remote site STU.  
  • Storage lifecycle policies manage the initial backup and then the duplication to the secondary site STU. 

With the relationship in place, the replication relationships between STUs can be configured from the NetBackup primary service. 

For example, as in the above figure, necessary data is replicated from site to site in the following manner: 

  • Optimized duplication will replicate client backup data:
    • Primary site clients (client 1 – client n) will use the local/primary site storage unit for backups and the remote/secondary storage unit for duplication.
    • Secondary site clients (client A – client Z) will use the local secondary site storage unit for backups and the remote/primary storage unit for duplication. 
  • Veritas Volume Replication will replicate catalog data from the site with the primary service to the corresponding volume at the remote site. 

Screen Shot 2022-01-18 at 3.40.25 PM.png

 

A disaster recovery scenario could potentially involve the absence of either the primary or secondary site. If the primary site were affected, the following sequence of events would provide the replicated data as necessary: 

  1. The secondary site recognizes that the primary site’s heartbeat is missing and will create an event on the cluster. An alert will appear in the appliance GUI, and the takeover operation becomes available. (In the case of a test scenario, the appliance UI’s “Switch Roles” option can be used).  
  2. An administrator performs the takeover operation, and the secondary site now resumes service as the primary site using a predesignated IP address. 
  3. An administrator changes the NetBackup primary server’s FQDN so that its DNS name points to the primary servers’ secondary site IP address in step 2.
  4. Optionally: An administrator changes the storage lifecycle policies previously writing the local copy to reverse the direction of duplication, meaning that new data will go to the secondary site first and be duplicated to the primary site afterwards (when available). After the primary site becomes available again, at the appropriate time the backup administrator may change the storage lifecycle policy duplication direction, and once more clients will send data to the primary site, and NetBackup Flex Scale will duplicate it to the remote location, only using the remote site to receive duplicated data passively.

If the secondary site cluster segment were the subject of a disaster recovery scenario and no longer available, step 4 would only be required to redirect any clients to use the primary site media services, as the primary server would be operational throughout. 

With the exception of upgrades which are done in parallel on both sites, management of the cluster is done independently at each of the sites. For instance,  

  • node maintenance tasks, such as an update to the networking configuration or replacing a failed disk, are taken care of at the management UI for that site’s nodes.  
  • scaling or replacing a node in either of the cluster segments can be done independently, and ongoing backup or duplication jobs won’t be affected.  

In short, NetBackup Flex Scale provides both choice and flexibility when implementing a disaster recovery solution, so that it can be perfectly suited to a customer’s business continuity requirements.