NetBackup™ for Cassandra Administrator's Guide
Protecting Cassandra data using NetBackup
The NetBackup enables you to protect your Cassandra clusters that are deployed on-premises.
The following table describes the purpose of different components of the Cassandra backup and recovery solution.
Table:
| Components | Purpose | 
|---|---|
| Cassandra cluster | Represents the Cassandra production cluster that you want to protect. | 
| Data staging servers | During a backup or restore, Cassandra keyspace are streamed in-parallel between the Cassandra cluster and the data staging servers. The data staging servers, represent a staging cluster. You need to provision the nodes wherein, they are used depending on the size of data that needs to be backed up or restored. | 
| Backup host | The Cassandra Backup Recovery (CBR) solution, uses the BigData policy with application type cassandra. The BigData policy uses this backup host. The media server that is used to configure storage server for the CBR solution must be used as backup host. Note: You can also use NetBackup client as a backup host. | 
| NetBackup primary server | All the jobs are executed from the NetBackup primary server. | 
| Data reduction | As part of data reduction the following tasks are performed: 
 | 
- The data is backed up in parallel streams wherein the data nodes stream data blocks simultaneously to multiple data staging servers and from there to multiple backup hosts. The job processing is accelerated due to multiple backup hosts and parallel streams. The data staging servers help in optimizing the data being backed up thus achieving data deduplication. 
- The communication between the Cassandra cluster and NetBackup is enabled using the Cassandra backup and recovery component that gets deployed on the data staging servers and the Cassandra cluster. 
- For NetBackup communication, you need to configure a BigData policy and add the related backup hosts. 
- You can configure a NetBackup media server, client, or primary server as a backup host. Also, based on Cassandra data size, you can add or remove backup hosts and data staging servers. You can scale up your environment easily by adding more backup hosts. 
- The communication between the Cassandra cluster, data staging servers, and backup hosts happens over SSH. 
- The NetBackup Parallel Streaming Framework enables a thin client-based, agentless backup wherein the backup-restore operations are performed on the backup hosts. The NetBackup thin client binary (Cassandra backup and recovery component) is automatically pushed to the Cassandra cluster during the backup-recovery operations. This Cassandra backup and recovery component is automatically removed after the backup-recovery operations complete. 
Note:
Agent management is not required on the Cassandra cluster nodes.