NetBackup™ for Cassandra Administrator's Guide

Last Published:
Product(s): NetBackup (10.0)

Protecting Cassandra data using NetBackup

The NetBackup enables you to protect your Cassandra clusters that are deployed on-premises.

Figure: Architectural overview

Architectural overview

The following table describes the purpose of different components of the Cassandra backup and recovery solution.




Cassandra cluster

Represents the Cassandra production cluster that you want to protect.

Data staging servers

During a backup or restore, Cassandra keyspace are streamed in-parallel between the Cassandra cluster and the data staging servers.

The data staging servers, represent a staging cluster. You need to provision the nodes wherein, they are used depending on the size of data that needs to be backed up or restored.

Backup host

The Cassandra Backup Recovery (CBR) solution, uses the BigData policy with application type cassandra.

The BigData policy uses this backup host.

The media server that is used to configure storage server for the CBR solution must be used as backup host.


You can also use NetBackup client as a backup host.

NetBackup primary server

All the jobs are executed from the NetBackup primary server.

Data reduction

As part of data reduction the following tasks are performed:

  • Efficient reconciliation

    Data for same keys from different nodes are transferred to the same node in the backup nodes.

    Reconciliations happen in-parallel within each data staging servers without any inter-node communication.

  • Record synthesis

    While iterating over the records, columns of the same key from different SStables are merged.

  • Semantic Deduplication

    Stale and duplicate records (replicas) are identified and removed.

  • The data is backed up in parallel streams wherein the data nodes stream data blocks simultaneously to multiple data staging servers and from there to multiple backup hosts. The job processing is accelerated due to multiple backup hosts and parallel streams. The data staging servers help in optimizing the data being backed up thus achieving data deduplication.

  • The communication between the Cassandra cluster and NetBackup is enabled using the Cassandra backup and recovery component that gets deployed on the data staging servers and the Cassandra cluster.

  • For NetBackup communication, you need to configure a BigData policy and add the related backup hosts.

  • You can configure a NetBackup media server, client, or primary server as a backup host. Also, based on Cassandra data size, you can add or remove backup hosts and data staging servers. You can scale up your environment easily by adding more backup hosts.

  • The communication between the Cassandra cluster, data staging servers, and backup hosts happens over SSH.

  • The NetBackup Parallel Streaming Framework enables a thin client-based, agentless backup wherein the backup-restore operations are performed on the backup hosts. The NetBackup thin client binary (Cassandra backup and recovery component) is automatically pushed to the Cassandra cluster during the backup-recovery operations. This Cassandra backup and recovery component is automatically removed after the backup-recovery operations complete.


Agent management is not required on the Cassandra cluster nodes.