Search <book_title>...

Important Update: Cohesity Products Documentation

All Cohesity product documentation are now managed via the Cohesity Docs Portal: https://docs.cohesity.com/HomePage/Content/home.htm. Some documentation available here may not reflect the latest information or may no longer be accessible.

Cohesity Cloud Scale Technology Manual Deployment Guide for Kubernetes Clusters

Last Published: 2025-11-24

Product(s): NetBackup (11.1)

MSDP-X and Primary server corrupted

Note the storage server, cloud LSU and cloud bucket name.
Note the DR Passphrase also.
Copy DRPackages files (packages) from the pod to the local VM if not received over the email using the following command:
kubectl cp <primary-pod-namespace>/<primary-pod-name>:/mnt/nbdb/usr/openv/drpackage_<storageservername> <Path_where_to_copy_on_host_machine>
Delete the corrupted MSDP and Primary server by running the following command:
kubectl delete -f environment.yaml -n <namespace>
Note:
Perform this step carefully as it would delete NetBackup.
Clean the PV and PVCs of primary and MSDP server as follows:
- Get names of PV attached to primary and MSDP server PVC (catalog, log and data) using the kubectl get pvc -n <namespace> -o wide command.
- Delete primary and MSDP server PVC (catalog, log and data) using the kubectl delete pvc <pvc-name> -n <namespace> command.
- Delete the PV linked to primary server PVC using the kubectl delete pv <pv-name> command.
(EKS-specific) Navigate to mounted EFS directory and delete the content from primary_catalog folder by running the rm -rf /efs/* command.
Modify the environment.yaml file with the paused: true field in the MSDP and Media sections.
Change CR spec from paused: false to paused: true in MSDP Scaleout and media servers. Save it.
Note:
Ensure that only primary server is deployed. Now apply the modified environment.yaml file.
Save the environment.yaml file. Apply the environment.yaml file using the following command:
kubectl apply -f environment.yaml -n <namespace>
After the primary server is up and running, perform the following:
- Execute the kubectl exec -it -n <namespace> <primary-pod-name> -- /bin/bash command in the primary server pod.
- Increase the debug logs level on primary server.
- Create a directory DRPackages at persisted location using mkdir /mnt/nblogs/DRPackages.
Copy earlier copied DR files to primary pod at /mnt/nblogs/DRPackages using the kubectl cp <Path_of_DRPackages_on_host_machine> <primary-pod-namespace>/<primary-pod-name>:/mnt/nblogs/DRPackages command.
Execute the following steps (after exec) into the primary server pod:
- Change ownership of files in /mnt/nblogs/DRPackages using the chown nbsvcusr:nbsvcusr <file-name> command.
- Deactivate NetBackup health probes using the /opt/veritas/vxapp-manage/nb-health deactivate command.
- Stop the NetBackup services using the /usr/openv/netbackup/bin/bp.kill_all command.
- Execute the /usr/openv/netbackup/bin/admincmd/nbhostidentity -import -infile /mnt/ndbdb/usr/openv/drpackage/<filename>.drpkg command.
- Clear NetBackup host cache by running the bpclntcmd -clear_host_cache command.
- Restart the pods as follows:
  - Navigate to the VRTSk8s-netbackup-<version>/scripts folder.
  - Run the cloudscale_restart.sh script as follows:
    ./cloudscale_restart.sh <action> <namespace>
    Provide the namespace and the required action:
    stop: Stops all the services under primary server (waits until all the services are stopped).
    start: Starts all the services and waits until the services are up and running under primary server.
    restart: Stops the services and waits until all the services are down. Then starts all the services and waits until the services are up and running.
  Note:
  Ignore if policy job pod does not come up in running state. Policy job pod would start once primary services start.
- Refresh the certificate revocation list using the /usr/openv/netbackup/bin/nbcertcmd -getcrl command.
Run the primary server reconciler as follows:
- Run the following command to pause the primary server CR:
  helm upgrade cloudscale cloudscale-<version>.tgz -n netbackup --reuse-values --set environment.primary.paused=true
- Run the following command to un-pause the primary server CR:
  helm upgrade cloudscale cloudscale-<version>.tgz -n netbackup --reuse-values --set environment.primary.paused=false
The SHA fingerprint is updated in the primary CR's status.
From Web UI, allow reissue of token from primary server for MSDP, media and Snapshot Manager server as follows:
Navigate to Security > Host Mappings for the MSDP storage server and select Allow Auto reissue Certificate.
Repeat this for media and Snapshot Manager server entries.
Edit the environment using kubectl edit environment -n <namespace> command and change paused field to false for MSDP.
Perform from step 2 in the following section:
“Scenario 2: MSDP Scaleout and its data is lost and the NetBackup primary server was destroyed and is re-installed”
Edit environment CR and change paused: false for media server.
Once media server pods are ready, perform full catalog recovery using one of the following options:
Trigger a catalog recovery from the Web UI.
Or
Exec into primary pod and run bprecover -wizard command.
Once recovery is completed, restart the NetBackup services:
Stop NetBackup services using the /usr/openv/netbackup/bin/bp.kill_all command.
Start NetBackup services using the /usr/openv/netbackup/bin/bp.start_all command.
Activate NetBackup health probes using the /opt/veritas/vxapp-manage/nb-health activate command.
Verify/Backup/Restore the backup images in NetBackup server to check if the MSDP-X cluster has recovered or not.
Verify that the Primary, Media, MSDP and Snapshot Manager server are up and running.
Verify that the Snapshot Manager is running.