NetBackup™ Deployment Guide for Kubernetes Clusters
- Introduction
- Section I. Configurations
- Prerequisites
- Recommendations and Limitations
- Configurations
- Configuration of key parameters in Cloud Scale deployments
- Tuning touch files
- Setting maximum jobs per client
- Setting maximum jobs per media server
- Enabling intelligent catalog archiving
- Enabling security settings
- Configuring email server
- Reducing catalog storage management
- Configuring zone redundancy
- Enabling client-side deduplication capabilities
- Parameters for logging (fluentbit)
- Section II. Deployment
- Section III. Monitoring and Management
- Monitoring NetBackup
- Monitoring Snapshot Manager
- Monitoring fluentbit
- Monitoring MSDP Scaleout
- Managing NetBackup
- Managing the Load Balancer service
- Managing PostrgreSQL DBaaS
- Managing fluentbit
- Performing catalog backup and recovery
- Section IV. Maintenance
- PostgreSQL DBaaS Maintenance
- Patching mechanism for primary, media servers, fluentbit pods, and postgres pods
- Upgrading
- Cloud Scale Disaster Recovery
- Uninstalling
- Troubleshooting
- Troubleshooting AKS and EKS issues
- View the list of operator resources
- View the list of product resources
- View operator logs
- View primary logs
- Socket connection failure
- Resolving an issue where external IP address is not assigned to a NetBackup server's load balancer services
- Resolving the issue where the NetBackup server pod is not scheduled for long time
- Resolving an issue where the Storage class does not exist
- Resolving an issue where the primary server or media server deployment does not proceed
- Resolving an issue of failed probes
- Resolving token issues
- Resolving an issue related to insufficient storage
- Resolving an issue related to invalid nodepool
- Resolving a token expiry issue
- Resolve an issue related to KMS database
- Resolve an issue related to pulling an image from the container registry
- Resolving an issue related to recovery of data
- Check primary server status
- Pod status field shows as pending
- Ensure that the container is running the patched image
- Getting EEB information from an image, a running container, or persistent data
- Resolving the certificate error issue in NetBackup operator pod logs
- Pod restart failure due to liveness probe time-out
- NetBackup messaging queue broker take more time to start
- Host mapping conflict in NetBackup
- Issue with capacity licensing reporting which takes longer time
- Local connection is getting treated as insecure connection
- Primary pod is in pending state for a long duration
- Backing up data from Primary server's /mnt/nbdata/ directory fails with primary server as a client
- Storage server not supporting Instant Access capability on Web UI after upgrading NetBackup
- Taint, Toleration, and Node affinity related issues in cpServer
- Operations performed on cpServer in environment.yaml file are not reflected
- Elastic media server related issues
- Failed to register Snapshot Manager with NetBackup
- Post Kubernetes cluster restart, flexsnap-listener pod went into CrashLoopBackoff state or pods were unable to connect to flexsnap-rabbitmq
- Post Kubernetes cluster restart, issues observed in case of containerized Postgres deployment
- Request router logs
- Issues with NBPEM/NBJM
- Issues with logging feature for Cloud Scale
- The flexsnap-listener pod is unable to communicate with RabbitMQ
- Troubleshooting AKS-specific issues
- Troubleshooting EKS-specific issues
- Troubleshooting issue for bootstrapper pod
- Troubleshooting AKS and EKS issues
- Appendix A. CR template
- Appendix B. MSDP Scaleout
- About MSDP Scaleout
- Prerequisites for MSDP Scaleout (AKS\EKS)
- Limitations in MSDP Scaleout
- MSDP Scaleout configuration
- Installing the docker images and binaries for MSDP Scaleout (without environment operators or Helm charts)
- Deploying MSDP Scaleout
- Managing MSDP Scaleout
- MSDP Scaleout maintenance
Primary server corrupted
When catalog backup is taken on external media server
When catalog backup is taken on MSDP-X
- Copy DRPackages files (packages) located at
/mnt/nblogs/DRPackages/from the pod to the host machine from where Kubernetes Service cluster is accessed.Run the kubectl cp <primary-pod-namespace>/<primary-pod-name>:/mnt/nblogs/DRPackages <Path_where_to_copy_on_host_machine> command.
- Preserve the data of
/mnt/nbdataand/mnt/nblogson host machine by creating tar and copying it using the kubectl cp <primary-pod-namespace>/<primary-pod-name>:<tar_file_name> <path_on_host_machine_where_to_preserve_the_data> command. - Change CR spec from paused: false to paused: true in primary, mediaServers, and msdpScaleouts sections in environment object using the following command:
kubectl edit <environment_CR_name> -n <namespace>
- Change replica count to 0 in primary server's statefulset using the kubectl edit statefulset <primary-server-statefulset-name> -n <namespace> command.
- Clean the PV and PVCs of primary server as follows:
Get names of PV attached to primary server PVC (catalog, log and data) using the kubectl get pvc -n <namespace> -o wide command.
Delete primary server PVC (catalog, log and data) using the kubectl delete pvc <pvc-name> -n <namespace> command.
Delete the PV linked to primary server PVC using the kubectl delete pv <pv-name> command.
- (EKS-specific) Navigate to mounted EFS directory and delete the content from primary_catalog folder by running the rm -rf /efs/* command.
- Change CR spec paused: true to paused: false in primary server section in and reapply yaml with the kubectl apply -f environment.yaml -n <namespace> command.
- Once the primary pod is in ready state, execute the following command in the primary server pod:
kubectl exec -it -n <namespace> <primary-pod-name> -- /bin/bash
Increase the debug logs level on primary server.
Create a directory
DRPackagesat persisted location usingmkdir /mnt/nblogs/DRPackages.Change ownership of the
DRPackagesfolder to service user using the chown nbsvcusr:nbsvcusr /mnt/nblogs/DRPackages command.
- Copy earlier copied DR files to primary pod at
/mnt/nblogs/DRPackagesusing the kubectl cp <Path_of_DRPackages_on_host_machine> <primary-pod-namespace>/<primary-pod-name>:/mnt/nblogs/DRPackages command. - (Applicable for catalog backup taken on external media server)
Execute the following steps in the primary server pod:
Change ownership of files in
/mnt/nblogs/DRPackagesusing the chown nbsvcusr:nbsvcusr <file-name> command.Deactivate NetBackup health probes using the /opt/veritas/vxapp-manage/nb-health deactivate command.
Stop the NetBackup services using
/usr/openv/netbackup/bin/bp.kill_all.Execute the nbhostidentity -import -infile /mnt/nblogs/DRPackages/<filename>.drpkg command.
Restart all the NetBackup services using
/usr/openv/netbackup/bin/bp.start_all.
Verify security settings are back.
Add respective media server entry in host properties using NetBackup Administration Console as follows:
Navigate to NetBackup Management > Host properties > Master Server > Add Additional server and add media server.
Restart the NetBackup services in primary server pod and external media server
Execute the kubectl exec -it -n <namespace> <primary-pod-name> -- /bin/bash command in the primary server pod.
Run the /usr/openv/netbackup/bin/bp.kill_all command. After stopping all services restart the same using the /usr/openv/netbackup/bin/bp.start_all command.
Run the /usr/openv/netbackup/bin/bp.kill_all command. After stopping all services restart the services using the /usr/openv/netbackup/bin/bp.start_all command on the external media server.
Perform catalog recovery from NetBackup Administration Console.
For more information, refer to the NetBackup Troubleshooting Guide.
Execute the kubectl exec -it -n <namespace> <primary-pod-name> -- /bin/bash command in the primary server pod.
Stop the NetBackup services using the /usr/openv/netbackup/bin/bp.kill_all command.
Start NetBackup services using the /usr/openv/netbackup/bin/bp.start_all command.
Activate NetBackup health probes using the /opt/veritas/vxapp-manage/nb-health activate command.
Delete the currently running request router pod using the following command:
kubectl delete pod <request-router-pod-name> -n <PrimaryServer-namespace>
Change CR spec from paused: true to paused: false in primary, mediaServers, and msdpScaleouts sections in environment object using the following command:
kubectl edit <environment_CR_name> -n <namespace>
To configure NetBackup IT Analytics refer to the following topic:
See Configuring NetBackup IT Analytics for NetBackup deployment.
- (Applicable for catalog backup taken on MSDP-X)
Execute the following steps (after exec) into the primary server pod:
Change ownership of files in
/mnt/nblogs/DRPackagesusing the chown nbsvcusr:nbsvcusr <file-name> command.Deactivate NetBackup health probes using the /opt/veritas/vxapp-manage/nb-health deactivate command.
Stop the NetBackup services using the /usr/openv/netbackup/bin/bp.kill_all command.
Execute the /usr/openv/netbackup/bin/admincmd/nbhostidentity -import -infile /mnt/ndbdb/usr/openv/drpackage/<filename>.drpkg command.
Clear bpclntcmd -clear_host_cacheNetBackup host cache by running the command.
Start NetBackup services using the /usr/openv/netbackup/bin/bp.start_all command.
Refresh the certificate revocation list using the /usr/openv/netbackup/bin/nbcertcmd -getcrl command.
From Web UI, allow reissue of token from primary server for MSDP only as follows:
Navigate to Security > Host Mappings for the MSDP storage server and select Allow Auto reissue Certificate.
Run the primary server reconciler as follows:
Edit the environment (using kubectl edit environment -n <namespace> command) and change primary spec's for paused field to true and save it.
To enable the reconciler to run, the environment must be edited again and the primary's paused field must be set to false.
The SHA fingerprint is updated in the primary CR's status.
Edit the environment using kubectl edit environment -n <namespace> command and change paused field to false for MSDP.
Verify if MSDP installation is successful and default MSDP storage server, STU and disk pool is created with old names. This takes some time. Hence, wait before the STU and disk pool display on the Web UI before proceeding to the next step.
Perform from step 2 in the following section:
Edit environment CR and change
paused: falsefor media server.Perform full catalog recovery using one of the following options:
Trigger a catalog recovery from the Web UI.
Or
Exec into primary pod and run bprecover -wizard command.
Once recovery is completed, restart the NetBackup services:
Stop NetBackup services using the /usr/openv/netbackup/bin/bp.kill_all command.
Start NetBackup services using the /usr/openv/netbackup/bin/bp.start_all command.
Activate NetBackup health probes using the /opt/veritas/vxapp-manage/nb-health activate command.
Verify/Backup/Restore the backup images in NetBackup server to check if the MSDP-X cluster has recovered or not.
Verify that the Primary, Media, MSDP and Snapshot Manager server are up and running.