Everything you need to know about protecting your Hadoop data using NetBackup
Yes, the elephant is out of the room :) You can now effectively protect your Hadoop data by simply deploying the NetBackup plug-in for Hadoop. Following linkshave all the informationyou will need: Introduction Installing and deploying Hadoop plug-in for NetBackup Configuring NetBackup for Hadoop Performing backups and restores of Hadoop Troubleshooting Also, see the About the NetBackup plug-ins and agents: Download, install, and availability informationarticle on Support site. Using the NetBackup Parallel Streaming Framework (PSF), Hadoop data can now be protected using NetBackup. The following diagram provides an overview of how Hadoop data is protected by NetBackup. As illustrated in the diagram: The data is backed up in parallel streams wherein the DataNodes stream data blocks simultaneously to multiple backup hosts. The job processing is accelerated due to multiple backup hosts and parallel streams. The communication between the Hadoop cluster and the NetBackup is enabled using the NetBackup plug-in for Hadoop. The plug-in is available separately and must be installed on all the backup hosts. For NetBackup communication, you need to configure a Big Data policy and add the related backup hosts. You can configure a NetBackup media server, client, or master server as a backup host. Also, depending on the number of DataNodes, you can add or remove backup hosts. You can scale up your environment easily by adding more backup hosts. The NetBackup Parallel Streaming Framework enables agentless backup wherein the backup and restore operations run on the backup hosts. There is no agent footprint on the cluster nodes. Also, NetBackup is not affected by the Hadoop cluster upgrades or maintenance.Solved2.8KViews4likes3Comments