Veritas NetBackup™ for Hadoop Administrator's Guide

Last Published:
Product(s): NetBackup (8.1.2)
Platform: Linux,UNIX,Windows
  1. Introduction
    1.  
      Protecting Hadoop data using NetBackup
    2.  
      Backing up Hadoop data
    3.  
      Restoring Hadoop data
    4.  
      NetBackup for Hadoop terminologies
    5.  
      Limitations
  2. Deploying the Hadoop plug-in
    1.  
      About deploying the Hadoop plug-in
    2. Pre-requisites for the Hadoop plug-in
      1.  
        Operating system and platform compatibility
      2.  
        NetBackup server and client requirements
      3.  
        License for Hadoop plug-in for NetBackup
    3.  
      Preparing the Hadoop cluster
    4.  
      Best practices for deploying the Hadoop plug-in
    5.  
      Verifying the deployment of the Hadoop plug-in
  3. Configuring NetBackup for Hadoop
    1.  
      About configuring NetBackup for Hadoop
    2. Managing backup hosts
      1.  
        Whitelisting a NetBackup client on NetBackup master server
      2.  
        Configure a NetBackup Appliance as a backup host
    3.  
      Adding Hadoop credentials in NetBackup
    4. Configuring the Hadoop plug-in using the Hadoop configuration file
      1.  
        Configuring NetBackup for a highly-available Hadoop cluster
      2.  
        Configuring a custom port for the Hadoop cluster
      3.  
        Configuring number of threads for backup hosts
    5.  
      Configuration for a Hadoop cluster that uses Kerberos
    6. Configuring NetBackup policies for Hadoop plug-in
      1. Creating a BigData backup policy
        1. Creating BigData policy using the NetBackup Administration Console
          1.  
            Using the Policy Configuration Wizard to create a BigData policy for Hadoop clusters
          2.  
            Using the NetBackup Policies utility to create a BigData policy for Hadoop clusters
        2.  
          Using NetBackup Command Line Interface (CLI) to create a BigData policy for Hadoop clusters
    7.  
      Disaster recovery of a Hadoop cluster
  4. Performing backups and restores of Hadoop
    1. About backing up a Hadoop cluster
      1.  
        Pre-requisite for running backup and restore operations for a Hadoop cluster with Kerberos authentication
      2.  
        Backing up a Hadoop cluster
      3.  
        Best practices for backing up a Hadoop cluster
    2. About restoring a Hadoop cluster
      1. Restoring Hadoop data on the same Hadoop cluster
        1.  
          Using the Restore Wizard to restore Hadoop data on the same Hadoop cluster
        2.  
          Using the bprestore command to restore Hadoop data on the same Hadoop cluster
      2.  
        Restoring Hadoop data on an alternate Hadoop cluster
      3.  
        Best practices for restoring a Hadoop cluster
  5. Troubleshooting
    1.  
      About troubleshooting NetBackup for Hadoop issues
    2.  
      About NetBackup for Hadoop debug logging
    3. Troubleshooting backup issues for Hadoop data
      1.  
        Backup operation fails with error 112
      2.  
        Backup operation fails with error 6609
      3.  
        Backup operation failed with error 6618
      4.  
        Backup operation fails with error 6647
      5.  
        Extended attributes (xattrs) and Access Control Lists (ACLs) are not backed up or restored for Hadoop
      6.  
        Backup operation fails with error 6654
      7.  
        Backup operation fails with bpbrm error 8857
      8.  
        Backup operation fails with error 6617
      9.  
        Backup operation fails with error 6616
    4. Troubleshooting restore issues for Hadoop data
      1.  
        Restore fails with error code 2850
      2.  
        NetBackup restore job for Hadoop completes partially
      3.  
        Extended attributes (xattrs) and Access Control Lists (ACLs) are not backed up or restored for Hadoop
      4.  
        Restore operation fails when Hadoop plug-in files are missing on the backup host
      5.  
        Restore fails with bpbrm error 54932
      6.  
        Restore operation fails with bpbrm error 21296

Protecting Hadoop data using NetBackup

Using the NetBackup Parallel Streaming Framework (PSF), Hadoop data can now be protected using NetBackup.

The following diagram provides an overview of how Hadoop data is protected by NetBackup.

Also, review the definitions of terminologies.See NetBackup for Hadoop terminologies.

Figure: Architectural overview

Architectural overview

As illustrated in the diagram:

  • The data is backed up in parallel streams wherein the DataNodes stream data blocks simultaneously to multiple backup hosts. The job processing is accelerated due to multiple backup hosts and parallel streams.

  • The communication between the Hadoop cluster and the NetBackup is enabled using the NetBackup plug-in for Hadoop.

    The plug-in is installed as part of the NetBackup installation.

  • For NetBackup communication, you need to configure a Big Data policy and add the related backup hosts.

  • You can configure a NetBackup media server, client, or master server as a backup host. Also, depending on the number of DataNodes, you can add or remove backup hosts. You can scale up your environment easily by adding more backup hosts.

  • The NetBackup Parallel Streaming Framework enables agentless backup wherein the backup and restore operations run on the backup hosts. There is no agent footprint on the cluster nodes. Also, NetBackup is not affected by the Hadoop cluster upgrades or maintenance.

For more information: