NetBackup™ for Hadoop Administrator's Guide
- Introduction
- Verifying the pre-requisites and best practices for the Hadoop plug-in for NetBackup
- Configuring NetBackup for Hadoop
- Managing backup hosts
- Configuring the Hadoop plug-in using the Hadoop configuration file
- Configuring NetBackup policies for Hadoop plug-in
- Performing backups and restores of Hadoop
- Troubleshooting
- Troubleshooting backup issues for Hadoop data
- Troubleshooting restore issues for Hadoop data
Best practice for improving performance during backup and restore
Performance issues such as slow throughput and high CPU usage are observed during the backup and recovery of Hadoop using the SSL environment (HTTPS). The issue is caused if the internal communications in Hadoop are not encrypted. The HDFS configurations must be tuned correctly in the HDFS cluster to improve the internal communication and performance in Hadoop, which can also improve the backup and recovery performance.
For a better backup and restore performance, NetBackup recommended to follow the Hadoop configuration recommendations from Apache or Hadoop distributions in use.
If you have Hadoop encryption turned on within the cluster, follow the recommendations from Apache or Hadoop distributions in use to select the right cipher and bit length for data transfer within Hadoop cluster.
NetBackup performs better during backup and recovery when AES 128 is used for data encryption during the block data transfer.
You can also increase the number of backup hosts in case of backup to get a better performance; when you have more than one folder to be backed up in the Hadoop cluster. You can have maximum one backup host per folder in the Hadoop cluster to get the maximum benefit.
You can also increase the number of threads per backup host that are used to fetch data from the Hadoop cluster by NetBackup during backup operation. If you have files with the size in the range of tens of GBs, then you can increase the number of threads for better performance. The default number for threads is 4.
For more details, refer Apache Hadoop documentation for secure mode.