Do you have a Hadoop cluster running with High Availability, multiple nodes, or SSL & Kerberos authentication enabled?
Do you want to protect your Hadoop environment using storage, Dedupe & performance benefits of NetBackup ?
You are at the right place then. Yes, NetBackup does support protection for all these Hadoop environments & helps in for a disaster recovery scenario.
How it works
NetBackup uses a parallel streaming framework to protect scale-out environments like Hadoop.
The NetBackup Hadoop plugin uses an agentless architecture; the administrator doesn’t need to install NetBackup clients on the Hadoop nodes. This supports the scale-out nature of Hadoop, where customers can configure a greater number of data nodes as the data growth occurs. The NetBackup Hadoop plugin that is installed on the Backup hosts (typically NetBackup Media servers) will be used to discover, backup and recover Hadoop data. Administrators can add additional backup hosts to effectively distribute data and improve performance.
Administrators can create point in time copies of Hadoop data using Full and Incremental backup schedules and recover to data to the same or alternate Hadoop clusters. The NetBackup Hadoop plugin backs up only one copy of data where replication is enabled, which reduces storage costs. If NetBackup deduplication (MSDP) or NetBackup Appliances are part of the environment, even further storage savings are possible with built in deduplication.
You can protect Apache Hadoop and popular HDFS distributions such as Cloudera. The NetBackup Hadoop plugin supports file & folder level recovery from a Hadoop cluster.
Administrators can recover root directory or a typical folder and files to same or alternate Hadoop cluster.
How to configure a Hadoop cluster using a NetBackup primary server
Backup:
a. Create a Bigdata policy as below, using a NetBackup Primary server
b. Under the Clients tab, mention Hadoop name node manually
c. Under the Backup Selection tab, please mention below strings manually as seen in screenshot
d. Select the storage location on Media server and then run the backup job with desired backup schedule
How to configure a HDFS cluster enabled with Kerberos Authentication
Here you need to distribute Kerberos tokens on all the backup hosts. To do this, please follow these steps
How to configure a HDFS cluster enabled with SSL(https)
To enable access to SSL clusters for backup and restore, you need to get the root CA certificate from the Certificate authority and copy this certificate onto the Backup hosts. The NetBackup Hadoop plugin also supports protection for HDFS clusters enabled with Certificate Revocation Lists (CRL).
The root CA certificate in environments like Cloudera distribution can be obtained from the Cloudera administrator. The Hadoop cluster may have a manual TLS configuration or Auto-TLS enabled . For both cases, NetBackup needs a root CA certificate from the administrator.
The root CA certificate from the Hadoop cluster will validate the certificates for all nodes and allow NetBackup to run the backup and restore process in the secure(SSL) cluster. This root CA certificate would be a bundle of certificates that has been issued to all nodes.
The certificate from the root CA should be configured under ECA_TRUST_STORE_PATH flag under bp.conf on the backup host in self-signed, third-party CA or Local/Intermediate CA environments. (e.g., In case of AUTO-TLSenabled Cloudera environments, you could typically find the root CA file named with "cm-auto-global_cacerts.pem" at path "/var/lib/cloudera-scm-agent/agent-cert")
For protecting secure HDFS with the NetBackup Hadoop plugin, you must configure the following conf files on all backup hosts:
/usr/openv/var/global/hadoop.conf
/usr/openv/netbackup/bp.conf
Recovering Hadoop data:
If you need to restore HDFS file/folder data, you can recover using the Backup, Archive and Restore window in the NetBackup UI. Please specify the following values under NetBackup machines and Policy Type window:
Once specified, you can proceed recovering HDFS folder or files using Backup, Archive and Restore window.
For more information on configuration and support, please follow the Veritas NetBackup™ for Hadoop Administrator's Guide and Software Compatibility List (SCL) for Hadoop
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.