NetBackup™ Deployment Guide for Kubernetes Clusters
- Introduction
- Section I. Configurations
- Prerequisites
- Preparing the environment for NetBackup installation on Kubernetes cluster
- Prerequisites for Snapshot Manager (AKS/EKS)
- Prerequisites for Kubernetes cluster configuration
- Prerequisites for Cloud Scale configuration
- Prerequisites for deploying environment operators
- Prerequisites for using private registry
- Recommendations and Limitations
- Configurations
- Configuration of key parameters in Cloud Scale deployments
- Tuning touch files
- Setting maximum jobs per client
- Setting maximum jobs per media server
- Enabling intelligent catalog archiving
- Enabling security settings
- Configuring email server
- Reducing catalog storage management
- Configuring zone redundancy
- Enabling client-side deduplication capabilities
- Parameters for logging (fluentbit)
- Managing media server configurations in Web UI
- Prerequisites
- Section II. Deployment
- Section III. Monitoring and Management
- Monitoring NetBackup
- Monitoring Snapshot Manager
- Monitoring fluentbit
- Monitoring MSDP Scaleout
- Managing NetBackup
- Managing the Load Balancer service
- Managing PostrgreSQL DBaaS
- Managing logging
- Performing catalog backup and recovery
- Section IV. Maintenance
- PostgreSQL DBaaS Maintenance
- Patching mechanism for primary, media servers, fluentbit pods, and postgres pods
- Upgrading
- Cloud Scale Disaster Recovery
- Uninstalling
- Troubleshooting
- Troubleshooting AKS and EKS issues
- View the list of operator resources
- View the list of product resources
- View operator logs
- View primary logs
- Socket connection failure
- Resolving an issue where external IP address is not assigned to a NetBackup server's load balancer services
- Resolving the issue where the NetBackup server pod is not scheduled for long time
- Resolving an issue where the Storage class does not exist
- Resolving an issue where the primary server or media server deployment does not proceed
- Resolving an issue of failed probes
- Resolving issues when media server PVs are deleted
- Resolving an issue related to insufficient storage
- Resolving an issue related to invalid nodepool
- Resolve an issue related to KMS database
- Resolve an issue related to pulling an image from the container registry
- Resolving an issue related to recovery of data
- Check primary server status
- Pod status field shows as pending
- Ensure that the container is running the patched image
- Getting EEB information from an image, a running container, or persistent data
- Resolving the certificate error issue in NetBackup operator pod logs
- Pod restart failure due to liveness probe time-out
- NetBackup messaging queue broker take more time to start
- Host mapping conflict in NetBackup
- Issue with capacity licensing reporting which takes longer time
- Local connection is getting treated as insecure connection
- Backing up data from Primary server's /mnt/nbdata/ directory fails with primary server as a client
- Storage server not supporting Instant Access capability on Web UI after upgrading NetBackup
- Taint, Toleration, and Node affinity related issues in cpServer
- Operations performed on cpServer in environment.yaml file are not reflected
- Elastic media server related issues
- Failed to register Snapshot Manager with NetBackup
- Post Kubernetes cluster restart, flexsnap-listener pod went into CrashLoopBackoff state or pods were unable to connect to flexsnap-rabbitmq
- Post Kubernetes cluster restart, issues observed in case of containerized Postgres deployment
- Request router logs
- Issues with NBPEM/NBJM
- Issues with logging feature for Cloud Scale
- The flexsnap-listener pod is unable to communicate with RabbitMQ
- Job remains in queue for long time
- Extracting logs if the nbwsapp or log-viewer pods are down
- Troubleshooting AKS-specific issues
- Troubleshooting EKS-specific issues
- Troubleshooting issue for bootstrapper pod
- Troubleshooting AKS and EKS issues
- Appendix A. CR template
- Appendix B. MSDP Scaleout
- About MSDP Scaleout
- Prerequisites for MSDP Scaleout (AKS\EKS)
- Limitations in MSDP Scaleout
- MSDP Scaleout configuration
- Installing the docker images and binaries for MSDP Scaleout (without environment operators or Helm charts)
- Deploying MSDP Scaleout
- Managing MSDP Scaleout
- MSDP Scaleout maintenance
Steps for upgrading Cloud Scale from multiple media load balancer to none
For version 10.5 and later, there is no need of load balancers for media servers. this section describes the post upgrade procedure to be performed for converting from multiple load balancers to none.
Steps to convert from multiple load balancers to none
- After successfully upgrading the Cloud Scale Technology to 10.5 or later, all the load balancers associated with the media server are upgraded to version 10.5 or later.
- Deactivate the policy and the schedules. Wait for the running jobs to complete.
- Add the
media2object in the mediaServers without te network load balancers. - Edit the environment, copy the media1 section (remove the load balancer section) and rename it to media2. Save the environment.
Add media2 mediaServers: - minimumReplicas: 2 name: media1 networkLoadBalancer: ipList: - fqdn: nbux-10-244-33-122.vxindia.veritas.com ipAddr: 10.244.33.122 - fqdn: nbux-10-244-33-123.vxindia.veritas.com ipAddr: 10.244.33.123 type: Private nodeSelector: labelKey: agentpool labelValue: nbuxpool paused: false replicas: 2 storage: data: autoVolumeExpansion: false capacity: 50Gi storageClassName: managed-csi-hdd log: autoVolumeExpansion: false capacity: 30Gi storageClassName: managed-csi-hdd - minimumReplicas: 2 name: media2 nodeSelector: labelKey: agentpool labelValue: nbuxpool paused: false replicas: 2 storage: data: autoVolumeExpansion: false capacity: 50Gi storageClassName: managed-csi-hdd log: autoVolumeExpansion: false capacity: 30Gi storageClassName: managed-csi-hdd - Once the
media2is successfully added, check the status of the media servers, media2 pods and services. Wait for the all the Pods to come up with media server status as success.Get mediaservers Kubectl get mediaserver -n <namespace> media1 11.0.x-xx 79m nbux-10-244-33-120.vxindia.veritas.com Success media2 11.0.x-xx 79m nbux-10-244-33-120.vxindia.veritas.com Success - Ensure that all the user settings that were present for media1 pods are added manually to media2 pods also. For example, LogLevel, FIPS Mode, DNAT and so on.
- Pause both the media reconcilers using the commands:
Pause mediaServers objects kubectl patch -n <namespace> environments <environment name> --type='json' -p='[{"op": "add", "path": "/spec/mediaServers/0/paused", "value": true}]' kubectl patch -n <namespace> environments <environment name> --type='json' -p='[{"op": "add", "path": "/spec/mediaServers/1/paused", "value": true}]'Paused mediaservers Kubectl get mediaserver -n <namespace> media1 11.0.x-xx 79m nbux-10-244-33-120.vxindia.veritas.com Paused media2 11.0.x-xx 79m nbux-10-244-33-120.vxindia.veritas.com Paused - Navigate to NetBackup web UI and select Storage. On the Disk Storage, select the Storage server. There are multiple entries for media server with and without the load balancers.
- Delete the entry for
media1. You have to remove all the entries of media servers containing FQDN. - Remove the media1 from the CR using the command:
kubectl patch -n <namespace> environments <environment name> --type='json' -p='[{"op": "remove", "path": "/spec/mediaServers/0"}]'
- Check the
bp.conffile. - Delete the
media1entries (FQDN entries of media1) from thebp.conffile. Exec into the primary pod and open/usr/openv/netbackup/bp.confand delete the entries and save the file. - Resume the
media2using the following command:kubectl patch -n <namespace> environments <environment name> --type='json' -p='[{"op": "add", "path": "/spec/mediaServers/0/paused", "value": false}]'
- Note the minimum replica value of
media2. Change the minimum replica value using the following command:kubectl get pvc -n nbu | grep -i data-media1-media | wc -l 2
Wait for the pod to be in running state and the status of the media server must be displayed as
success.Note:
Although minimum replica value of
media server = 0is supported in cloud scale, for PSF workloads (for example, MongoDB) at least one elastic media server must be running (that is, minimum replica of media server >= 1). The name of media server can be obtained from Host mappingsFor elastic media server, this name is same as the media server pods.
- Login into the primary server Pod and move the database from previous media server to new media server using the following command:
bpmedia -movedb -allvolumes -oldserver <old mediaserver name> -newserver <new mediaserver name>
example: bpmedia -movedb -allvolumes -oldserver nbux-10-244-33-122.vxindia.veritas.com -newserver media2-media-0
Repeat this step for other media servers.
- Using the command delete the alias or hostname and repeat this for each media server:
nbemmcmd -deletehost -machinename <old mediaserver name> -machinetype media
example: nbemmcmd -deletehost -machinename nbux-10-244-33-122.vxindia.veritas.com -machinetype media
- Modify the minimum replica value to the original value noted in step 13.
- Add the entry of the previous media server object alias to the new media server in the Host mappings.
Note:
When migrating from multiple load balancers to no load balancers in NetBackup version 11.0 or later, delete the stale host mapping entries as follows:
Navigate to, NetBackup Web UI ==> Security==> Host mappings ==> Select the Actions menu next to the primary host name ==> Manage mappings ==> Delete the old load balancer media name mapping and click on Save.
- Restore the backup and check the status in the Activity monitor and to the location where the backup is restored.