NetBackup™ Web UI Apache Cassandra Administrator's Guide

Last Published:
Product(s): NetBackup & Alta Data Protection (10.2)

NetBackup Apache Cassandra support overview

Apache Cassandra is a popular scale-out NoSQL database. It runs on commodity hardware with direct-attached storage. A typical Cassandra cluster consists of nodes that store data. Cassandra replicates data among the nodes to provide resiliency against node downtimes. There is no notion of a primary copy of data and any node may have a more recent version of data record than its replicas. One of the important characteristics of Cassandra is that it prefers availability over consistency. The database is always available even if the replicas of data are not always up to date.

NetBackup Cassandra Protection

NetBackup provides advanced solution to protecting Cassandra clusters. The solution has the following characteristics:

  1. Agentless: No need to place backup agents on Cassandra cluster nodes. Effectively, there is no code that hinders high-performance Cassandra cluster.

  2. Single pass data copy: During backup, a thin client is used to make a single pass over the Cassandra data files (called sstables) to minimize IO footprint.

  3. Off-host data optimization: Cassandra data is replicated for resiliency. Backups are for longer retention. NetBackup Cassandra solution processes data to:

    • Determine a cluster-consistent point-in-time.

    • Remove replica records.

    • Remove stale data that caused by record overwrites.

      All this processing happens off-host on Data staging servers to ensure that backup processes do not affect your high-performance Cassandra clusters.

  4. Incremental backups: NetBackup supports incremental backups of Cassandra to optimize backup times after a full backup. It automatically detects new key spaces or column families to take a full backup of these new structures while incremental backups of previously existing structures perform.

  5. Scalable Backup: Cassandra lets you easily scale your Cassandra cluster by adding more nodes whenever required. It automatically redistributes the existing data to new nodes while the cluster is online. NetBackup Cassandra protection is scalable and lets you add more Data Staging Servers to meet your backup requirements.

  6. Data Center Identification: NetBackup Cassandra protection can be configured to backup data from a specific data center. It queries Cassandra cluster and automatically identifies the nodes present in various data centers. It then engages only the nodes in the specific data center for backing up the data.

  7. Data Center aware restore: At the time of restore, NetBackup connects to the restore cluster and determines its current topology. It reconciles this topology with the one present at the backup time to allow for changes in the topology and restores considering the current topology. It provides more options for changing the data centers, number of replicas in each data center, change in keyspace and column family names, etc. to help you with your restore requirements.

  8. Granular restore: NetBackup Cassandra solution allows you restore a part of the backup data set. You have options to restore a few of the key spaces or only some of the column families.

  9. Repair-less Restore: The restore processes ensure that after data is restored, there is no need to perform further recovery steps. The data is available immediately after a restore in your high-performance Cassandra cluster.