Veritas NetBackup for Hadoop Administrator's Guide
- Introduction
- Installing and deploying Hadoop plug-in for NetBackup
- Configuring NetBackup for Hadoop
- About configuring NetBackup for Hadoop
- Managing backup hosts
- Adding Hadoop credentials in NetBackup
- Configuring the Hadoop plug-in using the Hadoop configuration file
- Configuration for a Hadoop cluster that uses Kerberos
- Configuring NetBackup policies for Hadoop plug-in
- Disaster recovery of a Hadoop cluster
- Performing backups and restores of Hadoop
- Troubleshooting
- About troubleshooting NetBackup for Hadoop issues
- About NetBackup for Hadoop debug logging
- Troubleshooting backup issues for Hadoop data
- Backup operation for Hadoop fails with error code 6599
- Backup operation fails with error 6609
- Backup operation failed with error 6618
- Backup operation fails with error 6647
- Extended attributes (xattrs) and Access Control Lists (ACLs) are not backed up or restored for Hadoop
- Backup operation fails with error 6654
- Backup operation fails with bpbrm error 8857
- Backup operation fails with error 6617
- Backup operation fails with error 6616
- Troubleshooting restore issues for Hadoop data
- Restore fails with error code 2850
- NetBackup restore job for Hadoop completes partially
- Extended attributes (xattrs) and Access Control Lists (ACLs) are not backed up or restored for Hadoop
- Restore operation fails when Hadoop plug-in files are missing on the backup host
- Restore fails with bpbrm error 54932
- Restore operation fails with bpbrm error 21296
Backing up Hadoop data
Hadoop data is backed up in parallel streams wherein Hadoop DataNodes stream data blocks simultaneously to multiple backup hosts.
Note:
All the directories specified in Hadoop backup selection must be snapshot-enabled before the backup.
The following diagram provides an overview of the backup flow:
As illustrated in the following diagram:
A scheduled backup job is triggered from the master server.
Backup job for Hadoop data is a compound job. When the backup job is triggered, first a discovery job is run.
During discovery, the first backup host connects with the NameNode and performs a discovery to get details of data that needs to be backed up.
A workload discovery file is created on the backup host. The workload discovery file contains the details of the data that needs to be backed up from the different DataNodes.
The backup host uses the workload discovery file and decides how the workload is distributed amongst the backup hosts. Workload distribution files are created for each backup host.
Individual child jobs are executed for each backup host. As specified in the workload distribution files, data is backed up.
Data blocks are streamed simultaneously from different DataNodes to multiple backup hosts.
The compound backup job is not completed untill all the child jobs are completed. After the child jobs are completed, NetBackup cleans all the snapshots from the NameNode. Only after the cleanup activity is completed, the compound backup job is completed.